Skip to end of metadata
Go to start of metadata

Tutorial

Taverna tutorial 2014: Advanced Taverna features includes a tutorial exercise on Looping.

Looping in Taverna

Taverna workflows are inherently data-driven workflows, where data returned from one service is pushed directly to downstream services. A Taverna workflow definition does not linearly say when a service should be invoked, but where its data should come from and go to. This philosophy lets the user focus on how services are connected together, and the Taverna execution takes care of invoking services as soon as the required inputs are ready.

In iterative and object oriented programming languages one often needs to iterate over a set of numbers, objects or strings. Taverna does these iterations implicitly, if you connect a service which outputs a list to a a service which input port expects a single item, implicit iteration will invoke the second service for each of the elements of the list, and create new lists on the outputs.

There are however situations where you don't have or know the values to iterate over, but where you want some steps of your workflow to be repeated until a certain condition is true, like a do...while construct in programming. 

When could looping be required?

Before we enable looping, we'll just remind ourselves of one of the most important rules when doing recursion, that we'll need a base case to end the iterating, so it does not go on forever. The base case is stated as a condition, something which we want to be true or false in the end.

One typical use case for when you need looping is for invoking asynchronous services, that is a web service or similar where you have a pattern of first submitting the job with input parameters, which returns you a job ID, secondly, check the status of the job using that ID. You keep checking the status of the job as long as the job is in an active state (running), and finally when the job is in a final state, you get the results for the given job ID. The EBI Interproscan example workflow shows how this can be used in practice, but below we'll use a dummy workflow to avoid dependencies of third-party services.

In this attached example workflow, the service createJob returns a job identifier, which can be used with checkStatus until the job is finished, ending with getResults retrieving the final result from the job. This mirrors how many real-life asynchronous web services work, but with dummy Beanshell scripts.

If you run the workflow as it is, both firstValue and finalValue will return 0, and state will be RUNNING. Assume that we need to keep calling checkStatus until the state returned is COMPLETE - in which case we should get a higher value from getResults. (In a real-life service, the equivalent of getResults would typically not work before the job has finished).

First of all we need to make sure getResults is not run until checkStatus is complete. We can't make a normal link from state to getResults, as it only expects the jobId parameter, but instead we can add a control link.

To make sure getResults is called after checkStatus has finished:

  1. Right click on the service getResults
  2. Select Run after -> checkStatus

A new control link should appear in the diagram.

Adding a control link enforces some processing to be performed before the result is retrieved, but the job is still RUNNING, we are not yet looping

Next we'll enable looping for the checkStatus so that we can reach that glorious result 10 instead of 1.

Enable looping for service

To enable looping for checkStatus:

  1. Select the service checkStatus
  2. Click on Details in the Workflow explorer
  3. Go to the Advanced section
  4. Click the Add looping button.

As an alternative you can also right click on the service and select Configure running -> Looping..

A configuration dialogue for the looping should appear.

Configure looping

To call the service repeatedly as long as an output port returns a given value RUNNING:

  1. Ensure the output port is connected to a workflow output port or another service in the workflow. (See below)
  2. Enable or configure looping. (See above)
  3. In the top-left drop-down menu, select the correct port to check, in this case state.
  4. In the second drop-down, select the comparison is not equal to
  5. In the string field, type RUNNING
  6. Click OK to confirm the configuration.

In our example we could also have chosen to loop until the output state is equal to COMPLETE. This tiny difference can be important when a service can also return other states. For instance, imagine that checkStatus could also return FAILED if the job stopped working. In that case a loop waiting for COMPLETE would never finish, but a loop that continues while the status is RUNNING would give up.

However, if the service in question could return both PENDING and RUNNING before finally returning COMPLETE it would be better to check on is equal to COMPLETE. More complex scenarios can be covered with a custom looping condition. (See below)

Looping is shown the Diagram by double lines. The Advanced details should now show the looping configuration.

The delay will add a sleep between each iteration, which can be useful to avoid 10.000 status checks per second and killing the web service.

To configure the delay between repeated calls to a service.

  1. Select the service to modify, in this case checkStatus
  2. Go to Details -> Advanced -> Loop
  3. Click Configure to modify the looping
  4. Set the adding a delay of field to the desired delay in seconds, ie. 5.
  5. Click OK

To avoid any delay, set the delay to 0 or an empty string.

Short delays should be avoided

Note no delay, or a very short delay, could use excessive CPU and network traffic both for running Taverna and the external service. Some service providers might block your IP address if you place too many calls in a short period of time.

Not all output ports are listed

Only service output ports returning single values (depth 0) are included in the port drop-down. If you want to compare output ports with lists, you will need to create a Customized loop condition to inspect the list.

You can download the finished example looping workflow from myExperiment. When running you will see that checkStatus takes a while to finish, and that you get the outputs result: 10 and state: COMPLETE.

Output ports must be connected

In order for the looping mechanism to check an output port, you will need to connect the port to something in the workflow, like a new workflow output port or another service. (This is a known bug)

Comparing numbers

Looping can check the output value not just by string equality, but also by numerical value. This can be useful to test in looping if your service is returning say a quality metric 0.43 or the number of discovered items (15), and you don't know exactly what value you need.

In order to test this with our example workflow, we'll expose the current value from the checkStatus script.

Modify the checkStatus beanshell script:

  1. Right-click on checkStatus and select Edit beanshell script...
  2. Under Output ports Add port called value, depth 0.
  3. Click OK
  4. Right-click checkStatus in diagram to Show ports (if needed)
  5. Right-click on the port value and connect it to a New workflow output port called intermediate_value

We'll now want to change our looping so that we finish when the value is higher than 8.

To make the loop compare a numerical value:

  1. Select the service to modify, in this case checkStatus
  2. Go to Details -> Advanced -> Loop
  3. Click Configure to modify the looping
  4. Change the second drop-down box to is greater than
  5. Type in the number 8 in the text field

If you run this workflow now, you should find the output state to be RUNNING (as we did not reach 10), and the value to be 9 - as 9 is greater than 8.

Comparing decimals and large numbers?

The is greater than and is less than comparisons can check outputs that can be parsed as java.lang.Doubles, which includes negative numbers -521, decimals like 0.4412 and large numbers like 22e12).

Any non-number would cause the test to fail, and the looping to terminate.

Infinite loops

Be careful not to put greater than 10 in this example - as our beanshell script never goes above 10 this would cause an infinite loop, possibly causing excessive use of CPU.

If this happens, click the Cancel button in the Results perspective to abort the workflow.

Using regular expressions

We can also compare the output value by using regular expressions. If you select matches \d\d (or does not match \d) the service would loop until the output value matches the regular expression for two digits.

Port feedback

In our example the looped service knew on its own how to progress, so that the final condition eventually is reached. For an asynchronous service this would be when the job is finished.

What if we want to do a more iterative process, and modify our parameters? This could be a nested workflow that is doing some kind of analysis followed by a quality assessment. If the quality is at the required level, it is to return, otherwise it is to perform the analysis again with some modified parameters.

Taverna can do this by ticking Enable output port to input port feedback in the looping configuration.

In this example workflow, a nested workflow find_squared is called repeatedly until the output on divided is less than the number 2.

On each repeated call, the value from the workflow output port root is given as the new input for the workflow input root instead of the value from the initial_root. Inside the nested workflow add1 prepares the next value (by adding 1), while divide uses the calculated square as a test.

The looping is controlled on the find_square service, with the tick box Enable output port to input port feedback enabled.

In this case the value we get out from root is the same as the final output of the workflow. So run with the number set to 16, we'll get the correct answer of 4, as 4*4=16.

Imagine that we were doing a more complicated example, and we want to know the value of root as it was used in the last execution of square. This value is not exposed from this nested workflow, as the workflow input root will no longer be coming from initial_root, but from previous runs of the nested workflow. The easiest way to expose this is to edit the nested workflow, and add a second workflow output port found_root and connect it directly to the workflow input port root.

All the workflow ports must be connected

If you are using port feedback, it is very important that all service inputs have an initial value, and that all service outputs are connected to something else in the workflow or to a workflow output port.

The reason for this is that the loop mechanism can only pick up values that are to be passed around in the workflow.

Check your depths

As no implicit iteration will be performed on the values from the feedback, the service (in this case nested workflow) outputs must match both in name and in depth when port feedback is enabled. That means that if a workflow expects a list of depth 1 at a port, if there is an output port of the same name, it must also give a list of depth 1, etc.

Customizing the loop condition

In some cases it can be necessary or cleaner to perform the loop condition yourself. As in the example workflow from above, the nested workflow has to also perform the divide function in order to test if the found root is big enough, as the standard loop mechanism can only test simple conditions like if a the output is larger/smaller than X.

To customize the loop condition based on an existing looping:

  1. Select the service to modify, in this case checkStatus
  2. Go to Details -> Advanced -> Loop
  3. Click Configure to modify the looping
  4. Click Customize loop condition
  5. Modify the Beanshell script
  6. Click Apply for the script
  7. Click OK for the loop configuration

A Beanshell script editor should appear.

In this dialogue you will see the code generated from the selections in the loop configuration window:

If you inspect the Beanshell script inputs and outputs from the example workflow, you will find the inputs divided and root. These script inputs should match the workflow outputs, by name and depth, and means that the script is free to check any of the workflow outputs, not just a single port.

The magic output port of the Beanshell script is called loop. If the returned loop is a string equal to "true", Taverna will rerun the service.

Converting loop to a string

The "" + (boolean statement) converts boolean true into the string "true" - this conversion is not required in newer versions of Taverna.

The Thread.sleep-statement has been added to the script in this case because a delay was requested from the loop configuration.

The other script outputs, if matching the service inputs, will be used instead of the original inputs. As we had ticked feedback, all the matching ports have been automatically added to the script inputs and outputs, and are sent through unmodified. The script may however choose to modify these values in-place, allowing you to move loop-related modifications out of the nested workflow.

Testing lists

If your loop condition needs to inspect a workflow output port that is a list, then you have two options:

  • Add a "should I loop" shim and test for the single value
  • Create a custom loop script that checks the list directly

To add a "Should I loop" shim:

  1. Inside the nested workflow, add a new Beanshell script or similar shim
  2. Create and connect required input ports (remember to set the list depth)
  3. Inside the script, determine if the workflow should loop by inspecting inputs
  4. Return a depth 0 output looping with true or false
  5. Connected the script to a workflow output port of the same name
  6. In the loop condition, test if the output port looping is equal to true

This example workflow shows how a nested workflow can be looped using this customized loop script:

Notice how this example workflow is also gradually building a list by feeding back the output port list to the input port list.

Labels
  • None