Tuesday, September 22, 2009

Leveraging the Pentaho Data Integration Carte Server for EAI

The Carte Server that comes bundled with Pentaho Data Integration (PDI) provides a way for external applications to interface with the Pentaho Transformations and Jobs. The operations exposed by Carte, though limited, can be leveraged to integrate Pentaho Data Integration capabilities into your Business Processes. This can be done using the HTTP endpoints provided by BPEL engines like Intalio, IBM BPEL4J and Oracle BPEL Process Manager. You may use this for Enterprise Application Integration, to implement Event Driven Workflows or to monitor your Data Integration Jobs and Tranformations with your SOA monitoring service.

Carte consists a set of servlets that run in the lightweight servlet container jetty. A basic web based console exposes the operations provided by Carte. These can be accessed by starting the Carte Server and pointing your browser to http://:<Carte Port>/kettle/status/. These are the same operations that are used by the PDI ETL Engine to execute clustered transformations.

The operations that are exposed by the Carte web server are listed below. This list has been compiled by inspecting / debugging the Pentaho Code and executing the requests. References to the Pentaho Java Classes have been added for those who wish to explore further.

[pdi code reference : org.pentaho.di.www.WebServer]

For the purpose of this article, we have four transformations and one job that are configured to be executed remotely of a carte server running at http://dataalp.com:8083


The following are the requests that Carte web server handles

Note : All the requests take a request parameter : xml. This parameter, when set to ‘Y’ returns the response in an xml format. This response format is particularly useful in EAI implementations.

1. Get the status of the Carte Server

Request parameters :

xml=Y (optional)

Sample URL : http://dataalp.com:8083/kettle/status?xml=Y

Response :

The status of all the transformations and jobs registered to run on this Carte server.

The response contains the following information for each transformation registered to run on the Carte server


<transstatus>
<transname>Row generator test</transname>
<status_desc>Running</status_desc>
<error_desc/>
<paused>N</paused>
<stepstatuslist>
</stepstatuslist>
<logging_string><![CDATA[]]></logging_string>
</transstatus>

[pdi code reference : GetStatusServlet]

2. Get the Status of a Transformation

Request parameters :

name = (mandatory)

xml=Y (optional)

Sample URL : http://dataalp.com:8083/kettle/transStatus/?name=Process+Customer+Addition

Response : The Status of the transformation which includes the status of each individual steps of the transformation, number of lines output, read, written, rejected, deleted or updated, exit status, timestamp of the run, etc.

[pdi code reference : GetTransStatusServlet]

3. Prepare a Transformation for execution

This operation prepares a transformation for execution. Use this if the initialization of the transformation takes a long time. After prepare operation completes, the state of the transformation is WAITING. The transformation starts execution when the startExec operation is requested.

Request Parameters :

name = (mandatory)

xml=Y (optional)

Sample URL : http://dataalp.com:8083/kettle/prepareExec?name=Process+Customer+Addition

Response:

Status indicating if the Transformation Prepare was successful


<webresult>
<result>OK</result>
<message/>
</webresult>

[pdi code reference : PrepareExecutionTransServlet]

4. Start the execution of a transformation.

This operation executes a transformation. This operation must be called after prepareExec is called else it results in an error. This operation is not exposed directly from the Carte Console. To execute a transformation, use startTrans.

Request Parameters:

name = (mandatory)

xml=Y (optional)

Sample URL: http://dataalp.com:8083/kettle/startExec?name=Row+generator+test&xml=Y

Response:

Status indicating if the transformation was successfully started


<webresult>
<result>OK</result>
<message/>
</webresult>

[pdi code reference : StartExecutionTransServlet]

5. Start a Transformation

It is important to note that only those transformations that are configured to run remotely on this Carte server can be executed. If Carte cannot find the transformation name in the list of registered transformations an error is thrown. Note that this operation is a sequential execution of prepareExec and startExec.

Request Parameters:

name = (mandatory)

xml=Y (optional)

Sample URL: http://dataalp.com:8083/kettle/startTrans?name=Row+generator+test

Response:

Status indicating if the transformation was successfully started


<webresult>
<result>OK</result>
<message>Transformation [Process Customer Addition] was started.</message>
</webresult>

[pdi code reference : StartTransServlet]

6. Pause or Resume a transformation

You can pause a running transformation or resume a paused transformation with this operation. This operation toggles a running transformation between paused and running states. If the transformation is in the paused state, it starts running. If the transformation is running, it is paused

Request Parameters:

name = (mandatory)

xml=Y (optional)

Sample URL: http://dataalp.com:8083/kettle/pauseTrans?name=Row+generator+test

Response:

Status indicating if the transformation was successfully paused / resumed.

<webresult>
<result>OK</result>
<message>Transformation [Row generator test] : pause requested.</message>
</webresult>
<webresult>
<result>OK</result>
<message>Transformation [Row generator test] : resume requested.</message>
</webresult>

7. Stop a transformation

This operation can be used to stop a transformation

Request Parameters:

name = (mandatory)

xml=Y (optional)

Sample URL: http://dataalp.com:8083/kettle/pauseTrans?name=Row+generator+test

Response:

Status indicating if the transformation was successfully stopped

<webresult>
<result>OK</result>
<message>Transformation [Row generator test] stop requested.</message>
</webresult>

8. Add a transformation

This operation is used to add a transformation to be managed by and executed by the Carte Server. This is the same operation that Kettle uses when you configure a transformation to be executed remotely on a Carte Server. I would suggest you to run the transformation once on the remote Carte Server so that this transformation is registered with the Carte Server and all the operations can be invoked on this transformation. Note that you would have to repeat this process each time the transformation is changed.

Request Parameters:

xml=Y (optional)

Request body :

The metadata of the transformation to be added (registered) in the xml format.

You can also have a custom transformation deployment application that you can use to deploy your transformations to the Carte Server. Trans.sendXMLToSlaveServer(...) can be referred for more details.

4 comments:

  1. Great summary. Thanks for putting this together.

    ReplyDelete
  2. Agreed - a great summary which has helped crystalize the Carte concept. Thanks.

    ReplyDelete
  3. Great artile, thanks for helping.

    ReplyDelete
  4. A very good summary.
    Thanks

    ReplyDelete