::Go back to Oozie Documentation Index::
Oozie provides a command line utility, oozie , to perform job and admin tasks. All operations are done via sub-commands of the oozie CLI.
The oozie CLI interacts with Oozie via its WS API.
usage: the env variable 'OOZIE_URL' is used as default value for the '-oozie' option the env variable 'OOZIE_TIMEZONE' is used as default value for the '-timezone' option the env variable 'OOZIE_AUTH' is used as default value for the '-auth' option custom headers for Oozie web services can be specified using '-Dheader:NAME=VALUE' oozie help : display usage . oozie version : show client version . oozie job <OPTIONS> : job operations -action <arg> coordinator rerun/kill on action ids (requires -rerun/-kill); coordinator log retrieval on action ids (requires -log) -allruns Get workflow jobs corresponding to a coordinator action including all the reruns -auth <arg> select authentication type [SIMPLE|KERBEROS] -change <arg> change a coordinator/bundle job -config <arg> job configuration file '.xml' or '.properties' -D <property=value> set/override value for given property -date <arg> coordinator/bundle rerun on action dates (requires -rerun) -definition <arg> job definition -doas <arg> doAs user, impersonates as the specified user -dryrun Dryrun a workflow (since 3.3.2) or coordinator (since 2.0) job without actually executing it -info <arg> info of a job -kill <arg> kill a job (coordinator requires -action or -date) -len <arg> number of actions to be returned, used for pagination(default 1000, requires -info) -filter <arg> All coordinator actions satisfying the filter will be retrieved. Filter is of the format <key><comparator><value>[;<key><comparator><value>]* key: status or nominaltime comparator: =, !=, <, <=, >, >= value: valid status like SUCCEEDED, KILLED, RUNNING etc. Only = and != apply for status nominalTime is valid date of the format yyyy-MM-dd'T'HH:mm'Z' (like 2014-06-01T00:00Z) Filter with '=' is concatenated with 'OR' and other filters are concatenated with 'AND'. Currently, supported only for coordinator job. -order <arg> order to show coordinator actions (default ascending order, 'desc' for descending order, requires -info) Currently, only supported for coordinator job. -localtime use local time (same as passing your time zone to -timezone). Overrides -timezone option -log <arg> job log -nocleanup do not clean up output-events of the coordinator rerun actions (requires -rerun) -offset <arg> offset of actions returned relative to all actions matching the filter criteria, used for pagination (default '1', requires -info) -oozie <arg> Oozie URL -refresh re-materialize the coordinator rerun actions (requires -rerun) -rerun <arg> rerun a job (coordinator requires -action or -date; bundle requires -coordinator or -date) -resume <arg> resume a job -run run a job -start <arg> start a job -submit submit a job -suspend <arg> suspend a job -timezone <arg> use time zone with the specified ID (default GMT). See 'oozie info -timezones' for a list -value <arg> new endtime/concurrency/pausetime value for changing a coordinator job; new pausetime value for changing a bundle job -verbose verbose mode -update Update coordinator definition and properties -logfilter job log search parameter. Can be specified as -logfilter opt1=val1;opt2=val1;opt3=val1. Supported options are recent, start, end, loglevel, text, limit and debug. -ignore <arg> ignore a coordinator job or action (requires '-action' to ignore a coordinator action, if no option given, ignore a coodinator job) . oozie jobs <OPTIONS> : jobs status -auth <arg> select authentication type [SIMPLE|KERBEROS] -doas <arg> doAs user, impersonates as the specified user. -filter <arg> user=<U>\;name=<N>\;group=<G>\;status=<S>\;... -jobtype <arg> job type ('Supported in Oozie-2.0 or later versions ONLY - coordinator' or 'wf' (default)) -len <arg> number of jobs (default '100') -localtime use local time (same as passing your time zone to -timezone). Overrides -timezone option -offset <arg> jobs offset (default '1') -oozie <arg> Oozie URL -timezone <arg> use time zone with the specified ID (default GMT). See 'oozie info -timezones' for a list -verbose verbose mode . oozie admin <OPTIONS> : admin operations -auth <arg> select authentication type [SIMPLE|KERBEROS] -doas <arg> doAs user, impersonates as the specified user. -oozie <arg> Oozie URL -queuedump show Oozie server queue elements -servers list available Oozie servers (more than one only if HA is enabled) -status show the current system status -systemmode <arg> Supported in Oozie-2.0 or later versions ONLY. Change oozie system mode [NORMAL|NOWEBSERVICE|SAFEMODE] -version show Oozie server build version -shareliblist List available sharelib that can be specified in a workflow action -sharelibupdate Update server to use a newer version of sharelib . oozie validate <ARGS> : validate a workflow XML file . oozie sla <OPTIONS> : sla operations (Deprecated as of Oozie 4.0) -auth <arg> select authentication type [SIMPLE|KERBEROS] -len <arg> number of results (default '100', max limited by oozie server setting which defaults to '1000') -offset <arg> start offset (default '0') -oozie <arg> Oozie URL -filter <arg> jobid=<JobID/ActionID>\;appname=<Application Name> . oozie pig <OPTIONS> -X <ARGS> : submit a pig job, everything after '-X' are pass-through parameters to pig, any '-D' arguments after '-X' are put in <configuration> -auth <arg> select authentication type [SIMPLE|KERBEROS] -doas <arg> doAs user, impersonates as the specified user. -config <arg> job configuration file '.properties' -D <property=value> set/override value for given property -file <arg> Pig script -oozie <arg> Oozie URL -P <property=value> set parameters for script . oozie hive <OPTIONS> -X<ARGS> : submit a hive job, everything after '-X' are pass-through parameters to hive, any '-D' arguments after '-X' are put in <configuration> -auth <arg> select authentication type [SIMPLE|KERBEROS] -config <arg> job configuration file '.properties' -D <property=value> set/override value for given property -doas <arg> doAs user, impersonates as the specified user -file <arg> hive script -oozie <arg> Oozie URL -P <property=value> set parameters for script . oozie sqoop <OPTIONS> -X<ARGS> : submit a sqoop job, any '-D' arguments after '-X' are put in <configuration> -auth <arg> select authentication type [SIMPLE|KERBEROS] -config <arg> job configuration file '.properties' -D <property=value> set/override value for given property -doas <arg> doAs user, impersonates as the specified user -command <arg> sqoop command -oozie <arg> Oozie URL . oozie info <OPTIONS> : get more detailed info about specific topics -timezones display a list of available time zones . oozie mapreduce <OPTIONS> : submit a mapreduce job -auth <arg> select authentication type [SIMPLE|KERBEROS] -config <arg> job configuration file '.properties' -D <property=value> set/override value for given property -doas <arg> doAs user, impersonates as the specified user -oozie <arg> Oozie URL
The oozie CLI automatically perform authentication if the Oozie server requests it. By default it supports both pseudo/simple authentication and Kerberos HTTP SPNEGO authentication.
To perform a specific authentication, the auth option with authentication type requests Oozie client to run the specified authentication mechanism only. Oozie client provides two types simple and kerberos which supports pseudo/simple and Kerberos .
For pseudo/simple authentication the oozie CLI uses the user name of the current OS user.
For Kerberos HTTP SPNEGO authentication the oozie CLI uses the default principal for the OS Kerberos cache (normally the principal that did kinit ).
Oozie uses Apache Hadoop-Auth (Java HTTP SPENGO) library for authentication. This library can be extended to support other authentication mechanisms.
Once authentication is performed successfully the received authentication token is cached in the user home directory in the .oozie-auth-token file with owner-only permissions. Subsequent requests reuse the cached token while valid.
The use of the cache file can be disabled by invoking the oozie CLI with the -Doozie.auth.token.cache false= option.
To use an custom authentication mechanism, a Hadoop-Auth Authenticator implementation must be specified with the -Dauthenticator.class =CLASS option.
The -doas option allows the current user to impersonate other users when interacting with the Oozie system. The current user must be configured as a proxyuser in the Oozie system. The proxyuser configuration may restrict from which hosts a user may impersonate users, as well as users of which groups can be impersonated.
All oozie CLI sub-commands expect the -oozie OOZIE_URL option indicating the URL of the Oozie system to run the command against.
If the -oozie option is not specified, the oozie CLI will look for the OOZIE_URL environment variable and uses it if set.
If the option is not provided and the environment variable is not set, the oozie CLI will fail.
The -timezone TIME_ZONE_ID option in the job and jobs sub-commands allows you to specify the time zone to use in the output of those sub-commands. The TIME_ZONE_ID should be one of the standard Java Time Zone IDs. You can get a list of the available time zones with the command oozie info -timezones .
If the -localtime option is used, it will cause Oozie to use whatever the time zone is of the machine. If both -localtime and -timezone TIME_ZONE_ID are used, the -localtime option will override the -timezone TIME_ZONE_ID option. If neither option is given, Oozie will look for the OOZIE_TIMEZONE environment variable and uses it if set. If neither option is given and the environment variable is not set, or if Oozie is given an invalid time zone, it will use GMT.
If you export OOZIE_DEBUG=1 then the Oozie CLI will output the Web Services API details used by any commands you execute. This is useful for debugging purposes to or see how the Oozie CLI works with the WS API.
Oozie CLI retries connection to Oozie servers for transparent high availability failover when one of the Oozie servers go down. =Oozie= CLI command will retry for all commands in case of ConnectException. In case of SocketException, all commands except PUT and POST will have retry logic. All job submit are POST call, examples of PUT and POST commands can be find out from WebServicesAPI . Retry count can be configured with system property oozie.connection.retry.count . Default count is 4.
* Submitting bundle feature is only supported in Oozie 3.0 or later. Similarly, all bundle operation features below are only supported in Oozie 3.0 or later.
Example:
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -submit . job: 14-20090525161321-oozie-joe
The parameters for the job must be provided in a file, either a Java Properties file (.properties) or a Hadoop XML Configuration file (.xml). This file must be specified with the -config option.
The workflow application path must be specified in the file with the oozie.wf.application.path property. The coordinator application path must be specified in the file with the oozie.coord.application.path property.The bundle application path must be specified in the file with the oozie.bundle.application.path property. Specified path must be an HDFS path.
The job will be created, but it will not be started, it will be in PREP status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -start 14-20090525161321-oozie-joe
The start option starts a previously submitted workflow job, coordinator job or bundle job that is in PREP status.
After the command is executed the workflow job will be in RUNNING status, coordinator job will be in RUNNING status and bundle job will be in RUNNING status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -run . job: 15-20090525161321-oozie-joe
The run option creates and starts a workflow job, coordinator job or bundle job.
The parameters for the job must be provided in a file, either a Java Properties file (.properties) or a Hadoop XML Configuration file (.xml). This file must be specified with the -config option.
The workflow application path must be specified in the file with the oozie.wf.application.path property. The coordinator application path must be specified in the file with the oozie.coord.application.path property. The bundle application path must be specified in the file with the oozie.bundle.application.path property.The specified path must be an HDFS path.
The job will be created and it will started, the job will be in RUNNING status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -suspend 14-20090525161321-oozie-joe
The suspend option suspends a workflow job in RUNNING status. After the command is executed the workflow job will be in SUSPENDED status.
The suspend option suspends a coordinator/bundle job in RUNNING , RUNNIINGWITHERROR or PREP status. When the coordinator job is suspended, running coordinator actions will stay in running and the workflows will be suspended. If the coordinator job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.
When the bundle job is suspended, running coordinators will be suspended. If the bundle job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -resume 14-20090525161321-oozie-joe
The resume option resumes a workflow job in SUSPENDED status.
After the command is executed the workflow job will be in RUNNING status.
The suspend option suspends a coordinator/bundle job in SUSPENDED , SUSPENDEDWITHERROR or PREPSUSPENDED status. If the coordinator job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.
When the coordinator job is resumed it will create all the coordinator actions that should have been created during the time it was suspended, actions will not be lost, they will delayed.
When the bundle job is resumed, suspended coordinators will resume running. If the bundle job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -kill 14-20090525161321-oozie-joe
The kill option kills a workflow job in PREP , SUSPENDED or RUNNING status and a coordinator/bundle job in =PREP=, RUNNING , PREPSUSPENDED , SUSPENDED , PREPPAUSED , or PAUSED status.
After the command is executed the job will be in KILLED status.
Example:
$oozie job -kill <coord_Job_id> [-action 1, 3-4, 7-40] [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]
Example:
$ oozie job -oozie http://localhost:11000/oozie -change 14-20090525161321-oozie-joe -value endtime=2011-12-01T05:00Z\;concurrency=100\;2011-10-01T05:00Z $ oozie job -oozie http://localhost:11000/oozie -change 0000001-140321155112907-oozie-puru-C -value status=RUNNING
The endtime/concurrency/pausetime option changes a coordinator job that is not in KILLED status.
Valid value names are:
Conditions and usage:
After the command is executed the job's end time, concurrency or pause time should be changed. If an already-succeeded job changes its end time, its status will become running.
Example:
$ oozie job -oozie http://localhost:11000/oozie -change 14-20090525161321-oozie-joe -value pausetime=2011-12-01T05:00Z
The change option changes a bundle job that is not in KILLED status.
Valid value names are:
Repeated value names are not allowed. An empty string "" can be used to reset pause time to none. New end time should not be before job's kickoff time.
Example:
$ oozie job -oozie http://localhost:11000/oozie -change 14-20090525161321-oozie-joe -value pausetime=2011-12-01T05:00Z
The change option changes a bundle job that is not in KILLED status.
Valid value names are:
Repeated value names are not allowed. An empty string "" can be used to reset pause time to none. New end time should not be before job's kickoff time.
Bundle will execute pause/end date change command on each coordinator job. Refer conditions and usage section of coordinator change command for more details Coordinator job change command .
Example:
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -rerun 14-20090525161321-oozie-joe
The rerun option reruns a completed ( SUCCCEDED , FAILED or KILLED ) job skipping the specified nodes.
The parameters for the job must be provided in a file, either a Java Properties file (.properties) or a Hadoop XML Configuration file (.xml). This file must be specified with the -config option.
The workflow application path must be specified in the file with the oozie.wf.application.path property. The specified path must be an HDFS path.
The list of nodes to skipped must be provided in the oozie.wf.rerun.skip.nodes property separated by commas.
After the command is executed the job will be in RUNNING status.
Refer to the Rerunning Workflow Jobs for details on rerun.
Example:
$oozie job -rerun <coord_Job_id> [-nocleanup] [-refresh] [-action 1, 3-4, 7-40] (-action or -date is required to rerun.) [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z] (if neither -action nor -date is given, the exception will be thrown.)
The rerun option reruns a terminated (=TIMEDOUT=, SUCCEEDED , KILLED , FAILED , IGNORED ) coordinator action when coordinator job is not in FAILED or KILLED state.
After the command is executed the rerun coordinator action will be in WAITING status.
Refer to the Rerunning Coordinator Actions for details on rerun.
Example:
$oozie job -rerun <bundle_Job_id> [-nocleanup] [-refresh] [-coordinator c1, c3, c4] (-coordinator or -date is required to rerun.) [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z] (if neither -coordinator nor -date is given, the exception will be thrown.)
The rerun option reruns coordinator actions belonging to specified coordinators within the specified date range.
After the command is executed the rerun coordinator action will be in WAITING status.
Example:
$ oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe . .---------------------------------------------------------------------------------------------------------------------------------------------------------------- Workflow Name : map-reduce-wf App Path : hdfs://localhost:8020/user/joe/workflows/map-reduce Status : SUCCEEDED Run : 0 User : joe Group : users Created : 2009-05-26 05:01 +0000 Started : 2009-05-26 05:01 +0000 Ended : 2009-05-26 05:01 +0000 Actions .---------------------------------------------------------------------------------------------------------------------------------------------------------------- Action Name Type Status Transition External Id External Status Error Code Start End .---------------------------------------------------------------------------------------------------------------------------------------------------------------- hadoop1 map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 +0000 2009-05-26 05:01 +0000 .----------------------------------------------------------------------------------------------------------------------------------------------------------------
The info option can display information about a workflow job or coordinator job or coordinator action. The info option for a Coordinator job will retrieve the Coordinator actions ordered by nominal time. However, the info command may timeout if the number of Coordinator actions are very high. In that case, info should be used with offset and len option.
The offset and len option should be used for pagination. offset determines the start offset of the action returned among all the actions that matched the filter criteria. len determines number of actions to be returned.
The localtime option displays times in local time, if not specified times are displayed in GMT.
The filter
option can be used to filter coordinator actions based on some criteria.
The filter option syntax is:
Multiple values must be specified as different name value pairs. The query is formed by doing AND of all conditions, with the exception of which uses OR if there are multiple values for the same key. For example, filter 'status RUNNING;status=WAITING;nominalTime>=2014-06-01T00:00Z' maps to query (status RUNNING OR status = WAITING) AND nominalTime > 2014-06-01T00:00Z which returns all waiting or running actions with nominalTime >= 2014-06-01T00:00Z.
Currently, the filter option can be used only with an info option on Coordinator job.
The verbose option gives more detailed information for all the actions, if checking for workflow job or coordinator job. An example below shows how the verbose option can be used to gather action statistics information for a job:
$ oozie job -oozie http://localhost:11000/oozie -info 0000001-111219170928042-oozie-para-W@mr-node -verbose ID : 0000001-111219170928042-oozie-para-W@mr-node------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------ Console URL : http://localhost:50030/jobdetails.jsp?jobid=job_201112191708_0006 Error Code : - Error Message : - External ID : job_201112191708_0006 External Status : SUCCEEDED Name : mr-node Retries : 0 Tracker URI : localhost:8021 Type : map-reduce Started : 2011-12-20 01:12 Status : OK Ended : 2011-12-20 01:12 External Stats : {"org.apache.hadoop.mapred.JobInProgress$Counter":{"TOTAL_LAUNCHED_REDUCES":1,"TOTAL_LAUNCHED_MAPS":1,"DATA_LOCAL_MAPS":1},"ACTION_TYPE":"MAP_REDUCE","FileSystemCounters":{"FILE_BYTES_READ":1746,"HDFS_BYTES_READ":1409,"FILE_BYTES_WRITTEN":3524,"HDFS_BYTES_WRITTEN":1547},"org.apache.hadoop.mapred.Task$Counter":{"REDUCE_INPUT_GROUPS":33,"COMBINE_OUTPUT_RECORDS":0,"MAP_INPUT_RECORDS":33,"REDUCE_SHUFFLE_BYTES":0,"REDUCE_OUTPUT_RECORDS":33,"SPILLED_RECORDS":66,"MAP_OUTPUT_BYTES":1674,"MAP_INPUT_BYTES":1409,"MAP_OUTPUT_RECORDS":33,"COMBINE_INPUT_RECORDS":0,"REDUCE_INPUT_RECORDS":33}} External ChildIDs : null ------------------------------------------------------------------------------------------------------------------------------------
The two fields External Stats and External ChildIDs display the action statistics information (that includes counter information in case of MR action and PigStats information in case of a pig action) and child ids of the given job.
Note that the user can turn on/off External Stats by specifying the property oozie.action.external.stats.write as true or false in workflow.xml. By default, it is set to false (not to collect External Stats). External ChildIDs will always be stored.
A coordinator action kicks off different workflows for its original run and all subsequent reruns. Getting a list of those workflow ids is a useful tool to keep track of your actions' runs and to go debug the workflow job logs if required. Along with ids, it also lists their statuses, and start and end times for quick reference.
This is achieved by using the Coordinator Action info command and specifying a flag *=allruns=* along with the info command.
$ oozie job -info 0000001-111219170928042-oozie-joe-C@1 -allruns -oozie http://localhost:11000/oozie . Job ID Status Started Ended .---------------------------------------------------------------------------------------------------- 0000001-140324163709596-oozie-joe-W SUCCEEDED 2014-03-24 23:40 GMT 2014-03-24 23:40 GMT .---------------------------------------------------------------------------------------------------- 0000000-140324164318985-oozie-joe-W SUCCEEDED 2014-03-24 23:44 GMT 2014-03-24 23:44 GMT .---------------------------------------------------------------------------------------------------- 0000001-140324164318985-oozie-joe-W SUCCEEDED 2014-03-24 23:44 GMT 2014-03-24 23:44 GMT .----------------------------------------------------------------------------------------------------
Example:
$ oozie job -oozie http://localhost:11000/oozie -definition 14-20090525161321-oozie-joe<workflow-app xmlns="uri:oozie:workflow:0.2" name="sm3-segment-2413"> <start to="p0"/> <action name="p0"> </action> <end name="end"/> </workflow-app>
Example:
$ oozie job -oozie http://localhost:11000/oozie -log 14-20090525161321-oozie-joe
Example:
$ oozie job -log <coord_job_id> [-action 1, 3-4, 7-40] (-action is optional.)
User can provide multiple option to filter logs using -logfilter opt1=val1;opt2=val1;opt3=val1. This can be used to fetch only just logs of interest faster as fetching Oozie server logs is slow due to the overhead of pattern matching. Available options are:
Examples. Searching log with log level ERROR or WARN will only give log with Error and Warning (with stack-trace) only. This will be useful if job has failed and user want to find error logs with exception.
$ ./oozie job -log 0000006-140319184715726-oozie-puru-W -logfilter loglevel=WARN\;limit=3 -oozie http://localhost:11000/oozie/ 2014-03-20 10:01:52,977 WARN ActionStartXCommand:542 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000006-140319184715726-oozie-puru-W] ACTION[0000006-140319184715726-oozie-puru-W@:start:] [***0000006-140319184715726-oozie-puru-W@:start:***]Action status=DONE 2014-03-20 10:01:52,977 WARN ActionStartXCommand:542 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000006-140319184715726-oozie-puru-W] ACTION[0000006-140319184715726-oozie-puru-W@:start:] [***0000006-140319184715726-oozie-puru-W@:start:***]Action updated in DB! 2014-03-20 10:01:53,189 WARN ActionStartXCommand:542 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000006-140319184715726-oozie-puru-W] ACTION[0000006-140319184715726-oozie-puru-W@mr-node-1] Error starting action [mr-node-1]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: java.io.IOException: java.io.IOException: Queue "aaadefault" does not exist at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3615) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3561) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426) Caused by: java.io.IOException: Queue "aaadefault" does not exist at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:433) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3613) ... 12 more $
Search with specific text and recent option.
$ ./oozie job -log 0000003-140319184715726-oozie-puru-C -logfilter text=Missing\;limit=4\;recent=1h -oozie http://localhost:11000/oozie/ 2014-03-20 09:59:50,329 INFO CoordActionInputCheckXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-140319184715726-oozie-puru-C] ACTION[0000003-140319184715726-oozie-puru-C@1] [0000003-140319184715726-oozie-puru-C@1]::CoordActionInputCheck:: Missing deps:hdfs://localhost:9000/user/purushah/examples/input-data/rawLogs/ 2014-03-20 09:59:50,330 INFO CoordActionInputCheckXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-140319184715726-oozie-puru-C] ACTION[0000003-140319184715726-oozie-puru-C@1] [0000003-140319184715726-oozie-puru-C@1]::ActionInputCheck:: In checkListOfPaths: hdfs://localhost:9000/user/purushah/examples/input-data/rawLogs/ is Missing. 2014-03-20 10:02:19,087 INFO CoordActionInputCheckXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-140319184715726-oozie-puru-C] ACTION[0000003-140319184715726-oozie-puru-C@2] [0000003-140319184715726-oozie-puru-C@2]::CoordActionInputCheck:: Missing deps:hdfs://localhost:9000/user/purushah/examples/input-data/rawLogs/ 2014-03-20 10:02:19,088 INFO CoordActionInputCheckXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-140319184715726-oozie-puru-C] ACTION[0000003-140319184715726-oozie-puru-C@2] [0000003-140319184715726-oozie-puru-C@2]::ActionInputCheck:: In checkListOfPaths: hdfs://localhost:9000/user/purushah/examples/input-data/rawLogs/ is Missing. $
Search example with specific date range.
$ ./oozie job -log 0000003-140319184715726-oozie-puru-C -logfilter "start=2014-03-20 10:00:57,063;end=2014-03-20 10:10:57,063" -oozie http://localhost:11000/oozie/ 2014-03-20 10:00:57,063 INFO CoordActionUpdateXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-140319184715726-oozie-puru-C] ACTION[0000003-140319184715726-oozie-puru-C@1] Updating Coordintaor action id :0000003-140319184715726-oozie-puru-C@1 status to KILLED, pending 0 2014-03-20 10:02:18,967 INFO CoordMaterializeTransitionXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[aggregator-coord] JOB[0000003-140319184715726-oozie-puru-C] ACTION[-] materialize actions for tz Coordinated Universal Time, start=Thu Dec 31 18:00:00 PST 2009, end=Thu Dec 31 19:00:00 PST 2009, timeUnit 12, frequency :60:MINUTE, lastActionNumber 1 2014-03-20 10:02:18,971 WARN CoordELFunctions:542 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[aggregator-coord] JOB[0000003-140319184715726-oozie-puru-C] ACTION[-] If the initial instance of the dataset is later than the current-instance specified, such as coord:current(-200) in this case, an empty string is returned. This means that no data is available at the current-instance specified by the user and the user could try modifying his initial-instance to an earlier time. 2014-03-20 10:02:18,975 INFO CoordMaterializeTransitionXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[aggregator-coord] JOB[0000003-140319184715726-oozie-puru-C] ACTION[-] [0000003-140319184715726-oozie-puru-C]: all actions have been materialized, set pending to true 2014-03-20 10:02:18,975 INFO CoordMaterializeTransitionXCommand:539 - SERVER[ ] USER[-] GROUP[-] TOKEN[] APP[aggregator-coord] JOB[0000003-140319184715726-oozie-puru-C] ACTION[-] Coord Job status updated to = RUNNING
* This feature is only supported in Oozie 2.0 or later.
Example:
$ oozie job -oozie http://localhost:11000/oozie -dryrun -config job.properties ***coordJob after parsing: *** <coordinator-app xmlns="uri:oozie:coordinator:0.1" name="sla_coord" frequency="20" start="2009-03-06T010:00Z" end="2009-03-20T11:00Z" timezone="America/Los_Angeles"> <output-events> <data-out name="Output" dataset="DayLogs"> <dataset name="DayLogs" frequency="1440" initial-instance="2009-01-01T00:00Z" timezone="UTC" freq_timeunit="MINUTE" end_of_duration="NONE"> <uri-template>hdfs://localhost:8020/user/angeloh/coord_examples/${YEAR}/${MONTH}/${DAY}</uri-template> </dataset> <instance>${coord:current(0)}</instance> </data-out> </output-events> <action> </action> </coordinator-app> ***actions for instance*** ***total coord actions is 1 *** ------------------------------------------------------------------------------------------------------------------------------------ coordAction instance: 1: <coordinator-app xmlns="uri:oozie:coordinator:0.1" name="sla_coord" frequency="20" start="2009-03-06T010:00Z" end="2009-03-20T11:00Z" timezone="America/Los_Angeles"> <output-events> <data-out name="Output" dataset="DayLogs"> <uris>hdfs://localhost:8020/user/angeloh/coord_examples/2009/03/06</uris> <dataset name="DayLogs" frequency="1440" initial-instance="2009-01-01T00:00Z" timezone="UTC" freq_timeunit="MINUTE" end_of_duration="NONE"> <uri-template>hdfs://localhost:8020/user/angeloh/coord_examples/${YEAR}/${MONTH}/${DAY}</uri-template> </dataset> </data-out> </output-events> <action> </action> </coordinator-app> ------------------------------------------------------------------------------------------------------------------------------------
The dryrun option tests running a coordinator job with given job properties and does not create the job.
The parameters for the job must be provided in a file, either a Java Properties file (.properties) or a Hadoop XML Configuration file (.xml). This file must be specified with the -config option.
The coordinator application path must be specified in the file with the oozie.coord.application.path property. The specified path must be an HDFS path.
* This feature is only supported in Oozie 3.3.2 or later.
Example:
$ oozie job -oozie http://localhost:11000/oozie -dryrun -config job.properties OK
The dryrun option tests running a workflow job with given job properties and does not create the job.
The parameters for the job must be provided in a file, either a Java Properties file (.properties) or a Hadoop XML Configuration file (.xml). This file must be specified with the -config option.
The workflow application path must be specified in the file with the oozie.wf.application.path property. The specified path must be an HDFS path.
If the workflow is accepted (i.e. Oozie is able to successfully read and parse it), it will return "OK" ; otherwise, it will return an error message describing why it was rejected.
Existing coordinator definition will be replaced by new definition. The refreshed coordinator would keep the same coordinator ID, state, and coordinator actions. All created coord action(including in WAITING) will use old configuration. One can rerun actions with -refresh option, -refresh option will use new configuration to rerun coord action
Update command also verifies coordinator definition like submit command, if there is any issue with definition, update will fail. Update command with -dryrun will show coordinator definition and properties differences. Config option is optional, if not specified existing coordinator property is used to find coordinator path.
Update command doesn't allow update of coordinator name, frequency, start time, end time and timezone and will fail on an attempt to change any of them. To change end time of coordinator use the -change command
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -update 0000005-140402104721140-oozie-puru-C -dryrun**********Job definition changes********** @@ -3,8 +3,8 @@ <concurrency>1</concurrency> </controls> <input-events> - <data-in name="input" dataset="raw-logs"> - <dataset name="raw-logs" frequency="20" initial-instance="2010-01-01T00:00Z" timezone="UTC" freq_timeunit="MINUTE" end_of_duration="NONE"> + <data-in name="input" dataset="raw-logs-rename"> + <dataset name="raw-logs-rename" frequency="20" initial-instance="2010-01-01T00:00Z" timezone="UTC" freq_timeunit="MINUTE" end_of_duration="NONE"> <uri-template>hdfs://localhost:9000/user/purushah/examples/input-data/rawLogs/</uri-template> <done-flag /> </dataset> ********************************** **********Job conf changes********** @@ -8,10 +8,6 @@ <value>hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml</value> </property> <property> - <name>old</name> - <value>test</value> - </property> - <property> <name>user.name</name> <value>purushah</value> </property> @@ -28,6 +24,10 @@ <value>hdfs://localhost:9000</value> </property> <property> + <name>adding</name> + <value>new</value> + </property> + <property> <name>jobTracker</name> <value>localhost:9001</value> </property> **********************************
Example:
$oozie job -ignore <coord_Job_id>
The ignore option changes a coordinator job in KILLED , FAILED to IGNORED state. When a coordinator job in a bundle is in IGNORED state, the coordinator job doesn't impact the state of the bundle job. For example, when a coordinator job in a bundle failed and afterwards ignored, the bundle job becomes SUCCEEDED instead of DONEWITHERROR as long as other coordinator jobs in the bundle succeeded. A ignored coordinator job can be changed to RUNNING using -change command. Refer to the Coordinator job change command for details.
Example:
$oozie job -ignore <coord_Job_id> -action 1,3-4,7-40The ignore option changes a coordinator action(s) in terminal state (=KILLED=, FAILED , TIMEDOUT ) to IGNORED state, while not changing the state of the coordinator job. When a coordinator action is in IGNORED state, the action doesn't impact the state of a coordinator job. For example, when a coordinator action failed and afterwards ignored, a coordinator job becomes SUCCEEDED instead of DONEWITHERROR as long as other coordinator actions succeeded.
A ignored coordinator action can be rerun using -rerun command. Refer to the Rerunning Coordinator Actions for details. When a workflow job of a ignored coordinator action is rerun, the coordinator action becomes RUNNING state.
Example:
$ oozie jobs -oozie http://localhost:11000/oozie -localtime -len 2 -filter status=RUNNING . Job Id Workflow Name Status Run User Group Created Started Ended .---------------------------------------------------------------------------------------------------------------------------------------------------------------- 4-20090527151008-oozie-joe hadoopel-wf RUNNING 0 joe other 2009-05-27 15:34 +0530 2009-05-27 15:34 +0530 - 0-20090527151008-oozie-joe hadoopel-wf RUNNING 0 joe other 2009-05-27 15:15 +0530 2009-05-27 15:15 +0530 - .----------------------------------------------------------------------------------------------------------------------------------------------------------------
The jobs sub-command will display information about multiple jobs.
The offset and len option specified the offset and number of jobs to display, default values are 1 and 100 respectively.
The localtime option displays times in local time, if not specified times are displayed in GMT.
The verbose option gives more detailed information for each job.
A filter can be specified after all options.
The filter option syntax is: [NAME=VALUE][;NAME=VALUE]* .
Valid filter names are:
The query will do an AND among all the filter names. The query will do an OR among all the filter values for the same name. Multiple values must be specified as different name value pairs.
* This feature is only supported in Oozie 2.0 or later.
Example:
$ oozie jobs -oozie http://localhost:11000/oozie -jobtype coordinator . Job ID App Name Status Freq Unit Started Next Materialized .---------------------------------------------------------------------------------------------------------------------------------------------------------------- 0004165-100531045722929-oozie-wrkf-C smaggs-xaggsptechno-coordinator SUCCEEDED 1440 MINUTE 2010-05-27 00:00 2010-05-29 00:00 .---------------------------------------------------------------------------------------------------------------------------------------------------------------- 0003823-100531045722929-oozie-wrkf-C coordcal2880minutescurrent SUCCEEDED 2880 MINUTE 2010-02-01 16:30 2010-02-05 16:30 .----------------------------------------------------------------------------------------------------------------------------------------------------------------
The jobtype option specified the job type to display, default value is 'wf'. To see the coordinator jobs, value is 'coordinator'.
* This feature is only supported in Oozie 3.0 or later.
Example:
$ oozie jobs -oozie http://localhost:11000/oozie -jobtype bundle Job ID Bundle Name Status Kickoff Created User Group .------------------------------------------------------------------------------------------------------------------------------------ 0000027-110322105610515-oozie-chao-B BUNDLE-TEST RUNNING 2012-01-15 00:24 2011-03-22 18:07 joe users .------------------------------------------------------------------------------------------------------------------------------------ 0000001-110322105610515-oozie-chao-B BUNDLE-TEST RUNNING 2012-01-15 00:24 2011-03-22 18:06 joe users .------------------------------------------------------------------------------------------------------------------------------------ 0000000-110322105610515-oozie-chao-B BUNDLE-TEST DONEWITHERROR2012-01-15 00:24 2011-03-22 17:58 joe users .------------------------------------------------------------------------------------------------------------------------------------
The jobtype option specified the job type to display, default value is 'wf'. To see the bundle jobs, value is 'bundle'.
This command-line query helps to directly query for a bulk of jobs using a set of rich filters. The jobs need to have a parent *Bundle , and this performs a deep query to provide results about all the workflows that satisfy your filters. These results display relevant job-ids, app-names, and error message (if any) and are most helpful when you need to monitor a bulk of jobs and get a gist, while avoiding typing multiple queries.
This feature is only supported in Oozie 3.3 or later.
Example 1:
$ oozie jobs -oozie http://localhost:11000/oozie -bulk bundle=bundle-app-1 . Bundle Name Bundle ID Coord Name Coord Action ID External ID Status Created Time Error Message .------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- bundle-app-1 0000000-130408151805946-oozie-chit-B coord-1 0000001-130408151805946-oozie-chit-C@1 0000002-130408151805946-oozie-chit-W KILLED 2013-04-08 22:20 GMT null .-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Example 2: This example illustrates giving multiple arguments and -verbose option.
NOTE: The filter string after -bulk should be enclosed within quotes
. $ oozie jobs -oozie http://localhost:11000/oozie -bulk 'bundle=bundle-app-2;actionstatus=SUCCEEDED' -verbose . Bundle Name : bundle-app-2 .------------------------------------------------------------------------------------------------------------------------------------ Bundle ID : 0000000-130422184245158-oozie-chit-B Coordinator Name : coord-1 Coord Action ID : 0000001-130422184245158-oozie-chit-C@1 Action Status : SUCCEEDED External ID : 0000002-130422184245158-oozie-chit-W Created Time : 2013-04-23 01:43 GMT User : user_xyz Error Message : - .------------------------------------------------------------------------------------------------------------------------------------
You can type 'help' to view usage for the CLI options and filters available. Namely:
$ oozie jobs <OPTIONS> : jobs status -bulk <arg> key-value pairs to filter bulk jobs response. e.g. bundle=<B>\;coordinators=<C>\;actionstatus=<S>\; startcreatedtime=<SC>\;endcreatedtime=<EC>\; startscheduledtime=<SS>\;endscheduledtime=<ES>\; coordinators and actionstatus can be multiple comma separated values. "bundle" and "coordinators" are 'names' of those jobs. Bundle name is mandatory, other params are optional
Similar to the usual jobs filter, different filter arguments here should be separated by semicolon (;).
Example:
$ oozie admin -oozie http://localhost:11000/oozie -status . Safemode: OFF
It returns the current status of the Oozie system.
* This feature is only supported in Oozie 2.0 or later.
Example:
$ oozie admin -oozie http://localhost:11000/oozie -systemmode [NORMAL|NOWEBSERVICE|SAFEMODE] . Safemode: ON
It returns the current status of the Oozie system.
Example:
$ oozie admin -oozie http://localhost:11000/oozie -version . Oozie server build version: 2.0.2.1-0.20.1.3092118008--
It returns the Oozie server build version.
* This feature is for administrator debugging purpose
Example:
$ oozie admin -oozie http://localhost:11000/oozie -queuedump[Server Queue Dump]: (coord_action_start,1),(coord_action_start,1),(coord_action_start,1) (coord_action_ready,1) (action.check,2)
It returns the Oozie server current queued commands.
Example:
$ oozie admin -oozie http://hostA:11000/oozie -servers hostA : http://hostA:11000/oozie hostB : http://hostB:11000/oozie hostC : http://hostC:11000/oozie
It returns a list of available Oozie Servers. This is useful when Oozie is configured for High Availability ; if not, it will simply return the one Oozie Server.
Example:
$ oozie validate myApp/workflow.xml . Error: Invalid workflow-app, org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'xend'. One of '{"uri:oozie:workflow:0.1":decision, "uri:oozie:workflow:0.1":fork, "uri:oozie:workflow:0.1":join, "uri:oozie:workflow:0.1":kill, "uri:oozie:workflow:0.1":action, "uri:oozie:workflow:0.1":end}' is expected.
It performs an XML Schema validation on the specified workflow XML file.
This command is used to get list of available sharelib. If the name of the sharelib is passed as an argument (regex supported) then all corresponding files are also listed.
$ oozie admin -oozie http://localhost:11000/oozie -shareliblist [Available ShareLib] oozie hive distcp hcatalog sqoop mapreduce-streaming pig$ oozie admin -oozie http://localhost:11000/oozie -sharelib pig* [Available ShareLib] pig hdfs://localhost:9000/user/purushah/share/lib/lib_20131114095729/pig/pig.jar hdfs://localhost:9000/user/purushah/share/lib/lib_20131114095729/pig/piggybank.jar
This command makes the oozie server(s) to pick up the latest version of sharelib present under oozie.service.WorkflowAppService.system.libpath directory based on the sharelib directory timestamp or reloads the sharelib metafile if one is configured. The main purpose is to update the sharelib on the oozie server without restarting.
$ oozie admin -oozie http://localhost:11000/oozie -sharelibupdate [ShareLib update status] ShareLib update status] host = host1:8080 status = Successful sharelibDirOld = hdfs://localhost:9000/user/purushah/share/lib/lib_20131114095729 sharelibDirNew = hdfs://localhost:9000/user/purushah/share/lib/lib_20131120163343 host = host2:8080 status = Successful sharelibDirOld = hdfs://localhost:9000/user/purushah/share/lib/lib_20131114095729 sharelibDirNew = hdfs://localhost:9000/user/purushah/share/lib/lib_20131120163343 host = host3:8080 status = Server not found
Sharelib update for metafile configuration.
$ oozie admin -oozie http://localhost:11000/oozie -sharelibupdate [ShareLib update status] host = host1 status = Successful sharelibMetaFile = hdfs://localhost:9000/user/purushah/sharelib_metafile.property sharelibMetaFileOldTimeStamp = Thu, 21 Nov 2013 00:40:04 GMT sharelibMetaFileNewTimeStamp = Thu, 21 Nov 2013 01:01:25 GMT host = host2 status = Successful sharelibMetaFile = hdfs://localhost:9000/user/purushah/sharelib_metafile.property sharelibMetaFileOldTimeStamp = Thu, 21 Nov 2013 00:40:04 GMT sharelibMetaFileNewTimeStamp = Thu, 21 Nov 2013 01:01:25 GMT
This set of sla commands are deprecated as of Oozie 4.0 with a newer SLA monitoring system.
Example:
$ oozie sla -oozie http://localhost:11000/oozie -len 3 . <sla-message> <event> <sequence-id>1</sequence-id> <registration> <sla-id>0000000-130130150445097-oozie-joe-C@1</sla-id> <app-type>COORDINATOR_ACTION</app-type> <app-name>aggregator-sla-app</app-name> <user>joe</user> <group /> <parent-sla-id>null</parent-sla-id> <expected-start>2013-01-30T23:00Z</expected-start> <expected-end>2013-01-30T23:30Z</expected-end> <status-timestamp>2013-02-08T18:51Z</status-timestamp> <notification-msg>Notifying User for 2013-01-30T23:00Z nominal time</notification-msg> <alert-contact>www@yahoo.com</alert-contact> <dev-contact>abc@yahoo.com</dev-contact> <qa-contact>abc@yahoo.com</qa-contact> <se-contact>abc@yahoo.com</se-contact> <alert-percentage>80</alert-percentage> <alert-frequency>LAST_HOUR</alert-frequency> <upstream-apps /> <job-status>CREATED</job-status> <job-data /> </registration> </event> <event> <sequence-id>2</sequence-id> <status> <sla-id>0000000-130130150445097-oozie-joe-C@1</sla-id> <status-timestamp>2013-01-30T23:05Z</status-timestamp> <job-status>STARTED</job-status> <job-data /> </status> </event> <event> <sequence-id>3</sequence-id> <status> <sla-id>0000000-130130150445097-oozie-joe-C@1</sla-id> <status-timestamp>2013-01-30T23:30Z</status-timestamp> <job-status>SUCCEEDED</job-status> <job-data /> </status> </event> <last-sequence-id>3</last-sequence-id> </sla-message>
The offset and len option specified the offset and number of sla events to display, default values are 1 and 100 respectively.
The offset corresponds to sequence ID of an event.
The max value of len limited by oozie server setting which defaults to '1000'. To get more than 1000 events, it is necessary to iterate based on the number of records you want.
The return message is XML format that can be easily consumed by SLA users.
* This feature is only supported in Oozie 2.0 or later.
Example: Get the SLA event with sequenceID = 3 (Note that offset corresponds to sequence ID)
$ oozie sla -oozie http://localhost:11000/oozie -offset 2 -len 1 . <sla-message> <event> <sequence-id>3</sequence-id> <status> <sla-id>0000000-130130150445097-oozie-joe-C@1</sla-id> <status-timestamp>2013-01-30T23:05Z</status-timestamp> <job-status>SUCCEEDED</job-status> <job-data /> </status> </event> <last-sequence-id>3</last-sequence-id> </sla-message>
* This feature is only supported in Oozie 2.0 or later.
Example:
$ oozie sla -filter jobid=0000000-130130150445097-oozie-joe-C@1\;appname=aggregator-sla-app -len 1 -oozie http://localhost:11000/oozie <sla-message> <event> <sequence-id>1</sequence-id> <registration> <sla-id>0000000-130130150445097-oozie-joe-C@1</sla-id> <app-type>COORDINATOR_ACTION</app-type> <app-name>aggregator-sla-app</app-name> <user>joe</user> <group /> <parent-sla-id>null</parent-sla-id> <expected-start>2010-01-01T02:00Z</expected-start> <expected-end>2010-01-01T03:00Z</expected-end> <status-timestamp>2013-01-30T23:05Z</status-timestamp> <notification-msg>Notifying User for 2010-01-01T01:00Z nominal time</notification-msg> <alert-contact>www@yahoo.com</alert-contact> <dev-contact>abc@yahoo.com</dev-contact> <qa-contact>abc@yahoo.com</qa-contact> <se-contact>abc@yahoo.com</se-contact> <alert-percentage>80</alert-percentage> <alert-frequency>LAST_HOUR</alert-frequency> <upstream-apps /> <job-status>CREATED</job-status> <job-data /> </registration> </event> </sla-message>
A filter can be specified after all options.
The filter option syntax is: [NAME=VALUE][\;NAME=VALUE]* . Note \ before semi-colon is for escape.
Valid filter names are:
The query will do an AND among all the filter names. The query will do an OR among all the filter values for the same name. Multiple values must be specified as different name value pairs.
Syntax:
$ oozie pig -file PIG-SCRIPT -config OOZIE-CONFIG [-Dkey=value] [-Pkey=value] [-X [-Dkey=value opts for Launcher/Job configuration] [Other opts to pass to Pig]]
Example:
$ oozie pig -file pigScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -param_file params . job: 14-20090525161321-oozie-joe-W . $cat job.properties fs.default.name=hdfs://localhost:8020 mapreduce.jobtracker.kerberos.principal=ccc dfs.namenode.kerberos.principal=ddd oozie.libpath=hdfs://localhost:8020/user/oozie/pig/lib/
The parameters for the job must be provided in a Java Properties file (.properties). jobtracker, namenode, libpath must be
specified in this file. pigScriptFile is a local file. All jar files (including pig jar file) and all other files needed by the pig
job (e.g., parameter file in above example) need to be uploaded onto HDFS under libpath beforehand. In addition to a parameter file,
specifying script parameters can be done via -Pkey=value. The workflow.xml will be created in Oozie server internally. Users can get
the workflow.xml from console or command line(-definition). The -D options passed after the -X will be placed into the generated
workflow's
The job will be created and run right away.
Syntax:
$ oozie hive -file HIVE-SCRIPT -config OOZIE-CONFIG [-Dkey=value] [-Pkey=value] [-X [-Dkey=value opts for Launcher/Job configuration] [Other opts to pass to Hive]]
Example:
$ oozie hive -file hiveScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -v . job: 14-20090525161321-oozie-joe-W . $cat job.properties fs.default.name=hdfs://localhost:8020 mapreduce.jobtracker.kerberos.principal=ccc dfs.namenode.kerberos.principal=ddd oozie.libpath=hdfs://localhost:8020/user/oozie/hive/lib/
The parameters for the job must be provided in a Java Properties file (.properties). jobtracker, namenode, libpath must be
specified in this file. hiveScriptFile is a local file. All jar files (including hive jar file) and all other files needed by the
hive job need to be uploaded onto HDFS under libpath beforehand. Specifying script parameters can be done via -Pkey=value. The
workflow.xml will be created in Oozie server internally. Users can get the workflow.xml from console or command line(-definition).
The -D options passed after the -X will be placed into the generated workflow's
The job will be created and run right away.
Syntax:
$ oozie sqoop [-Dkey=value] -command completeSqoopCommand -config OOZIE-CONFIG [-X [-Dkey=value opts for Launcher/Job configuration]]
Example:
$ oozie sqoop -oozie http://localhost:11000/oozie -Dfs.default.name=hdfs://localhost:8020 -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --table WF_JOBS --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default . job: 14-20090525161322-oozie-joe-W .
Sqoop Freeform Example:
$ oozie sqoop -oozie http://localhost:11000/oozie -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --query "SELECT a.id FROM WF_JOBS a WHERE \$CONDITIONS" --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default . job: 14-20090525161321-oozie-joe-W . $cat job.properties fs.default.name=hdfs://localhost:8020 mapreduce.jobtracker.kerberos.principal=ccc dfs.namenode.kerberos.principal=ddd oozie.libpath=hdfs://localhost:8020/user/oozie/sqoop/lib/
The parameters for the job must be provided in a Java Properties file (.properties). jobtracker, namenode,
libpath must be specified in this file. All jar files (including sqoop jar file) and all other files needed by the
sqoop job need to be uploaded onto HDFS under libpath beforehand. The workflow.xml will be created in Oozie server
internally. Users can get the workflow.xml from console or command line(-definition).
The -D options passed after the -X will be placed into the generated workflow's
The job will be created and run right away.
Note: in the freeform query example, the "select" query itself must be double quoted and the "$" sign in the query is properly escaped by "\". And all other variables containing "$" within sqoop command are escaped by single quoting the variable itself like the value of "--target-dir". All the "-D" arguments before "-X" that are overriding given property must be placed before the "-command" argument.
The Info sub-command provides a convenient place for Oozie to display misc information.
Example:
$ oozie info -timezones . The format is "SHORT_NAME (ID)" Give the ID to the -timezone argument GMT offsets can also be used (e.g. GMT-07:00, GMT-0700, GMT+05:30, GMT+0530) Available Time Zones : SST (Pacific/Midway) NUT (Pacific/Niue) SST (Pacific/Pago_Pago) SST (Pacific/Samoa) SST (US/Samoa) HAST (America/Adak) HAST (America/Atka) HST (HST) ...
The -timezones option will print out a (long) list of all available time zones.
These IDs (the text in the parentheses) are what should be used for the -timezone TIME_ZONE_ID option in the job and jobs sub-commands
Example:
$ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties
The parameters must be in the Java Properties file (.properties). This file must be specified for a map-reduce job. The properties file must specify the mapred.mapper.class , mapred.reducer.class , mapred.input.dir , mapred.output.dir , =oozie.libpath=, mapred.job.tracker , and fs.default.name properties.
The map-reduce job will be created and submitted. All jar files and all other files needed by the mapreduce job need to be uploaded onto HDFS under libpath beforehand. The workflow.xml will be created in Oozie server internally. Users can get the workflow.xml from console or command line(-definition).