::Go back to Oozie Documentation Index::
The goal of this document is to define a new oozie abstraction called bundle system specialized in submitting and maintaining a set of coordinator applications.
Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control.
More specififcally, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline.
Kick-off-time: The time when a bundle should start and submit coordinator applications.
Bundle Application: A bundle application defines a set of coordinator applications and when to start those. Normally, bundle applications are parameterized. A bundle application is written in XML.
Bundle Job: A bundle job is an executable instance of a bundle application. A job submission is done by submitting a job configuration that resolves all parameters in the application definition.
Bundle Definition Language: The language used to describe bundle applications.
Bundle application definitions can be parameterized with variables.
At job submission time all the parameters are resolved into concrete values.
The parameterization of bundle definitions is done using JSP Expression Language syntax from the JSP 2.0 Specification (JSP.2.3) , allowing not only to support variables as parameters but also complex expressions.
EL expressions can be used in XML attribute values and XML text element values. They cannot be used in XML element and XML attribute names.
At any time, a bundle job is in one of the following status: PREP, RUNNING, RUNNINGWITHERROR, SUSPENDED, PREPSUSPENDED, SUSPENDEDWITHERROR, PAUSED, PAUSEDWITHERROR, PREPPAUSED, SUCCEEDED, DONEWITHERROR, KILLED, FAILED .
Valid bundle job status transitions are:
When a bundle job is submitted, oozie parses the bundle job XML. Oozie then creates a record for the bundle with status PREP and returns a unique ID.
When a user requests to suspend a bundle job that is in PREP state, oozie puts the job in status PREPSUSPENDED . Similarly, when pause time reaches for a bundle job with PREP status, oozie puts the job in status PREPPAUSED .
Conversely, when a user requests to resume a PREPSUSPENDED bundle job, oozie puts the job in status PREP . And when pause time is reset for a bundle job that is in PREPPAUSED state, oozie puts the job in status PREP .
There are two ways a bundle job could be started.
* If kick-off-time (defined in the bundle xml) reaches. The default value is null which means starts coordinators NOW.
* If user sends a start request to START the bundle.
When a bundle job starts, oozie puts the job in status RUNNING and it submits all the coordinator jobs. If any coordinator job goes to FAILED/KILLED/DONEWITHERROR state, the bundle job is put in RUNNINGWITHERROR
When a user requests to kill a bundle job, oozie puts the job in status KILLED and it sends kill to all submitted coordinator jobs.
When a user requests to suspend a bundle job that is in RUNNING status, oozie puts the job in status SUSPENDED and it suspends all submitted coordinator jobs. Similarly, when a user requests to suspend a bundle job that is in RUNNINGWITHERROR status, oozie puts the job in status SUSPENDEDWITHERROR and it suspends all submitted coordinator jobs.
When pause time reaches for a bundle job that is in RUNNING status, oozie puts the job in status PAUSED . When pause time reaches for a bundle job that is in RUNNINGWITHERROR status, oozie puts the job in status PAUSEDWITHERROR .
Conversely, when a user requests to resume a SUSPENDED bundle job, oozie puts the job in status RUNNING . Similarly, when a user requests to resume a SUSPENDEDWITHERROR bundle job, oozie puts the job in status RUNNINGWITHERROR . And when pause time is reset for a bundle job and job status is PAUSED , oozie puts the job in status RUNNING . Similarly, when the pause time is reset for a bundle job and job status is PAUSEDWITHERROR , oozie puts the job in status RUNNINGWITHERROR
When all the coordinator jobs finish, oozie updates the bundle status accordingly. If all coordinators reaches to the same terminal state, bundle job status also move to the same status. For example, if all coordinators are SUCCEEDED , oozie puts the bundle job into SUCCEEDED status. However, if all coordinator jobs don't finish with the same status, oozie puts the bundle job into DONEWITHERROR .
A bundle definition is defined in XML by a name, controls and one or more coordinator application specifications:
Syntax:
<bundle-app name=[NAME] xmlns='uri:oozie:bundle:0.1'> <controls> <kick-off-time>[DATETIME]</kick-off-time> </controls> <coordinator name=[NAME] > <app-path>[COORD-APPLICATION-PATH]</app-path> <configuration> <property> <name>[PROPERTY-NAME]</name> <value>[PROPERTY-VALUE]</value> </property> ... </configuration> </coordinator> ... </bundle-app>
Examples:
A Bundle Job that maintains two coordinator applications:
<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'> <controls> <kick-off-time>${kickOffTime}</kick-off-time> </controls> <coordinator name='coordJobFromBundle1' > <app-path>${appPath}</app-path> <configuration> <property> <name>startTime1</name> <value>${START_TIME}</value> </property> <property> <name>endTime1</name> <value>${END_TIME}</value> </property> </configuration> </coordinator> <coordinator name='coordJobFromBundle2' > <app-path>${appPath2}</app-path> <configuration> <property> <name>startTime2</name> <value>${START_TIME2}</value> </property> <property> <name>endTime2</name> <value>${END_TIME2}</value> </property> </configuration> </coordinator> </bundle-app>
As of schema 0.2, a list of formal parameters can be provided which will allow Oozie to verify, at submission time, that said properties are actually specified (i.e. before the job is executed and fails). Default values can also be provided.
Example:
The previous Bundle Job application definition with formal parameters:
<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.2'> <parameters> <property> <name>appPath</name> </property> <property> <name>appPath2</name> <value>hdfs://foo:8020/user/joe/job/job.properties</value> </property> </parameters> <controls> <kick-off-time>${kickOffTime}</kick-off-time> </controls> <coordinator name='coordJobFromBundle1' > <app-path>${appPath}</app-path> <configuration> <property> <name>startTime1</name> <value>${START_TIME}</value> </property> <property> <name>endTime1</name> <value>${END_TIME}</value> </property> </configuration> </coordinator> <coordinator name='coordJobFromBundle2' > <app-path>${appPath2}</app-path> <configuration> <property> <name>startTime2</name> <value>${START_TIME2}</value> </property> <property> <name>endTime2</name> <value>${END_TIME2}</value> </property> </configuration> </coordinator> </bundle-app>
In the above example, if appPath is not specified, Oozie will print an error message instead of submitting the job. If =appPath2= is not specified, Oozie will use the default value, hdfs://foo:8020/user/joe/job/job.properties .
When submitting a bundle job, the configuration must contain a user.name property. If security is enabled, Oozie must ensure that the value of the user.name property in the configuration match the user credentials present in the protocol (web services) request.
When submitting a bundle job, the configuration may contain the oozie.job.acl property (the group.name property has been deprecated). If authorization is enabled, this property is treated as as the ACL for the job, it can contain user and group IDs separated by commas.
The specified user and ACL are assigned to the created bundle job.
Oozie must propagate the specified user and ACL to the system executing its children jobs (coordinator jobs).
A bundle application consist exclusively of bundle application definition and associated coordinator application specifications. They must be installed in an HDFS directory. To submit a job for a bundle application, the full HDFS path to bundle application definition must be specified.
When a bundle job is submitted to Oozie, the submitter must specified all the required job properties plus the HDFS path to the bundle application definition for the job.
The bundle application definition HDFS path must be specified in the 'oozie.bundle.application.path' job property.
All the bundle job properties, the HDFS path for the bundle application, the 'user.name' and 'oozie.job.acl' must be submitted to the Oozie using an XML configuration file (Hadoop XML configuration file).
Example: :
<?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>user.name</name> <value>joe</value> </property> <property> <name>oozie.bundle.application.path</name> <value>hdfs://foo:8020/user/joe/mybundles/hello-bundle1.xml</value> </property> ... </configuration>
Oozie provides a way of rerunning a bundle job. The user could request to rerun a subset of coordinators within a bundle by defining a list of coordinator's names. In addition, a user could define a list of dates or ranges of dates (in UTC format) to rerun for those time windows. There is a way of asking whether to cleanup all output directories before rerun. By default, oozie will remove all output directories. Moreover, there is an option by which a user could ask to re-calculate the dynamic input directories defined by latest function in coordinators.
$oozie job -rerun <bundle_Job_id> [-coordinator <list of coordinator name separate by comma> [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z] [-nocleanup] [-refresh]
After the command is executed the rerun bundle job will be in RUNNING status.
Refer to the Rerunning Coordinator Actions for details on rerun of coordinator job.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bundle="uri:oozie:bundle:0.1" elementFormDefault="qualified" targetNamespace="uri:oozie:bundle:0.1"> <xs:element name="bundle-app" type="bundle:BUNDLE-APP"/> <xs:simpleType name="IDENTIFIER"> <xs:restriction base="xs:string"> <xs:pattern value="([a-zA-Z]([\-_a-zA-Z0-9])*){1,39})"/> </xs:restriction> </xs:simpleType> <xs:complexType name="BUNDLE-APP"> <xs:sequence> <xs:element name="controls" type="bundle:CONTROLS" minOccurs="0" maxOccurs="1"/> <xs:element name="coordinator" type="bundle:COORDINATOR" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/> </xs:complexType> <xs:complexType name="CONTROLS"> <xs:sequence minOccurs="0" maxOccurs="1"> <xs:element name="kick-off-time" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:complexType name="COORDINATOR"> <xs:sequence minOccurs="1" maxOccurs="1"> <xs:element name="app-path" type="xs:string" minOccurs="1" maxOccurs="1"/> <xs:element name="configuration" type="bundle:CONFIGURATION" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/> <xs:attribute name="critical" type="xs:string" use="optional"/> </xs:complexType> <xs:complexType name="CONFIGURATION"> <xs:sequence> <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bundle="uri:oozie:bundle:0.2" elementFormDefault="qualified" targetNamespace="uri:oozie:bundle:0.2"> <xs:element name="bundle-app" type="bundle:BUNDLE-APP"/> <xs:simpleType name="IDENTIFIER"> <xs:restriction base="xs:string"> <xs:pattern value="([a-zA-Z]([\-_a-zA-Z0-9])*){1,39}"/> </xs:restriction> </xs:simpleType> <xs:complexType name="BUNDLE-APP"> <xs:sequence> <xs:element name="parameters" type="bundle:PARAMETERS" minOccurs="0" maxOccurs="1"/> <xs:element name="controls" type="bundle:CONTROLS" minOccurs="0" maxOccurs="1"/> <xs:element name="coordinator" type="bundle:COORDINATOR" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="name" type="xs:string" use="required"/> </xs:complexType> <xs:complexType name="PARAMETERS"> <xs:sequence> <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="value" minOccurs="0" maxOccurs="1" type="xs:string"/> <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="CONTROLS"> <xs:sequence minOccurs="0" maxOccurs="1"> <xs:element name="kick-off-time" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:complexType name="COORDINATOR"> <xs:sequence minOccurs="1" maxOccurs="1"> <xs:element name="app-path" type="xs:string" minOccurs="1" maxOccurs="1"/> <xs:element name="configuration" type="bundle:CONFIGURATION" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/> <xs:attribute name="critical" type="xs:string" use="optional"/> </xs:complexType> <xs:complexType name="CONFIGURATION"> <xs:sequence> <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema>