::Go back to Oozie Documentation Index::

Action Configuration

Oozie supports providing default configuration for actions of a particular action type and default configuration for all actions

Hadoop Default Configuration Values

Oozie supports action configuration equivalent to the component’s *-site.xml and *.properties files.

The configuration property in the oozie-site.xml is oozie.service.HadoopAccessorService.action.configurations and its value must follow the pattern [AUTHORITY=ACTION-CONF-DIR,]*. Where AUTHORITY is the HOST:PORT of the Hadoop service (JobTracker/ResourceManager or HDFS). The ACTION-CONF-DIR is the action configuration directory. If the specified directory is a relative path, it will be looked under the Oozie configuration directory. An absolute path can also be specified. Oozie will load and process the action configuration files in the following order.

  1. All files in default/*.xml (sorted by lexical name, files with names lexically lower have lesser precedence than the following ones), if present.
  2. default.xml, if present.
  3. All supported files in actionname/*, e.g. actionname/*.xml and actionname/*.properties (based on filename extension, sorted by lexical name, files with names lexically lower have lesser precedence than the following ones), if present.
  4. actionname.xml, if present.

For example, for Hive action (which has the actionname defined as hive ), the list of files (under relevant ACTION-CONF-DIR ) processed would be,

  1. All files in default/*.xml, if present
  2. default.xml, if present.
  3. All files in hive/*.xml and hive/*.properties, if present
  4. hive.xml, if present.

Files processed earlier for an action have the lowest precedence and can have the configuration parameters redefined. All files and directories are relative to the ACTION-CONF-DIR directory.

In addition to explicit authorities, a ‘*’ wildcard is supported. The configuration file associated with the wildcard will be used as default if there is no action configuration for the requested Hadoop service.

For example, the configuration in the oozie-site.xml would look like:

...
    <property>
        <name>oozie.service.HadoopAccessorService.action.configurations</name>
        <value>*=hadoop-conf,jt-bar:8021=bar-cluster,nn-bar:8020=bar-cluster</value>
    </property>
...

The action configuration files use the Hadoop configuration syntax.

By default Oozie does not define any default action configurations.

Dependency deduplication

Using Oozie with Hadoop 3 may require to have dependency file names distinguishable, which means having two files on sharelib and in your app’s dependencies with identical names, leads to job submission failure. To avoid this you can enable the deduplicator by setting oozie.action.dependency.deduplicate=true in oozie-site.xml (false, by default). Dependencies which are closer to your application has higher priority: action jar > user workflow libs > action libs > system lib, where dependency with greater prio is used.

Real world example: You have an application workflow which is uploaded to HDFS in /apps/app directory. You have your app.jar and dependency jars. You also define a spark action in your workflow and set use system libs; the HDFS tree is similar to this:

 + /apps/app/
   - app.jar
   - workflow.xml
   + libs
     - app.jar
     - jackson-annotations-1.0.jar
 + share/lib/
   + spark
     - app.jar
     - jackson-annotations-1.0.jar
   + oozie
     - jackson-annotations-1.0.jar

The deduplicator code will create the following list of files: /apps/app/app.jar,/apps/app/libs/jackson-annotations-1.0.jar And no other files will be passed at job submission.

::Go back to Oozie Documentation Index::