Saturday, February 18, 2012

Talend Open Studio: Scheduling and command line execution


Talend Open Studio: Scheduling and command line execution

In this tutorial we will take a look at how to export a Talend Open Studio ETL job to an autonomous folder and schedule the job via crontab. In order to follow this tutorial, the reader should be familiar with the basic functionality of Talend Open Studio for Data Integration.


How to export a job


Right click on your job and choose Export job.


In the export settings define:
  • the export folder and file name
  • the Job Version
  • set the Export type to Autonomous Job
  • tick Export dependencies
  • define the Context and tick Apply to children
Click on Finish and your job will be exported.


How to execute the job from the command line


Navigate to the folder where the zip file was exported to and unzip it. Then navigate to:


<jobname>_<version>/<jobname>

Within this folder you will find an executable shell and/or batch file:


Open this file in a text editor:


Note that the context is defined as a command line argument. It is currently set to the value which you specified on export, but you can change it any time to another value here.

To execute the job on the command line simply navigate to this folder and run:
sh ./<jobname>_run.sh



How to execute a job with specific context variables

As you might have guessed, the approach is very similar to the one shown above, we just add command line arguments:

sh ./<jobname>_run.sh --context_param variable1=value1 --
context_param variable2=value2



How to change the default context variables

If you ever need to change the value of any of your context variables, you can find the property file for each context in:

<jobname>_<version>/<jobname>/<projectname>/<jobname>_<version>/contexts/

Which in my case is:

Open one of them to understand how they are structured:

As you can see it is extremely easy to change these values.


How to schedule a job

If you make use of context variables regularly, then it is best to include them directly in the *_run.sh or *_run.bat file. Just open the file with your favourite text editor and add the variables after the context argument similar to this one:
Ideally though, especially if you are dealing with dates, you want to make this more dynamic, like this one:
On Linux use Crontab to schedule a job:

crontab -e

And then set it up similar to the one shown below:

On Windows you can use the Windows Scheduler. As this one has a GUI, it is quite straight forward to set it up and hence will not be explained here.

11 comments:

  1. thats a nice detailed explanation, too bad Talend Open Studio does not not its own scheduler

    ReplyDelete
    Replies
    1. Thanks a lot for your feedback! Much appreciated!

      Delete
  2. Thanks for the nice explanation Diethard

    ReplyDelete
  3. Can you please provide how to work on windows.. i tried but its not working

    ReplyDelete
  4. Nice documentation ! useful for beginners :)

    ReplyDelete
  5. Excellent, thank you. I've been looking for this answer.

    ReplyDelete
  6. I have a tMongoDBOutput on Talend it works well but on the .bat file it gives NULLPointerException

    ReplyDelete
  7. Thanks! We found this very helpful!

    ReplyDelete