Skip to end of metadata
Go to start of metadata

Looping in Taverna

Taverna workflows are inherently data-driven workflows, where data returned from one service is pushed directly to downstream services. A Taverna workflow definition does not linearly say when a service should be invoked, but where its data should come from and go to. This philosophy lets the user focus on how services are connected together, and the Taverna execution takes care of invoking services as soon as the required inputs are ready.

In iterative and object oriented programming languages one often needs to iterate over a set of numbers, objects or strings. Taverna does these iterations implicitly, if you connect a service which outputs a list to a a service which input port expects a single item, implicit iteration will invoke the second service for each of the elements of the list, and create new lists on the outputs.

There are however situations where you don't have or know the values to iterate over, but where you want some steps of your workflow to be repeated until a certain condition is true, like a do...while construct in programming. Taverna 2.0 added support for such looping, and 2.1.0 added a few more features to this looping.

What do we loop over?

Before we enable looping, we'll just remind ourselves of one of the most important rules when doing recursion, that we'll need a base case to end the iterating. The base case is stated as a condition, something which we want to be true or false in the end.

One typical use case for when you need looping is for invoking asynchronous services, that is web services or similar where you have a pattern of first submitting the job with input parameters, which returns you a job ID, secondly, check the status of the job using that ID. You keep checking the status of the job as long as the job is in an active state (running), and finally when the job is in a final state, you get the results for the given job ID.

We'll start by creating some dummy services, that will pretend to work in such an asynchronous way. We'll create three Beanshell scripts corresponding to createJob, checkStatus and getResults. Instead of a server or database we'll just keep the state of the asynchronous service in a temporary file, and use the file name as the job identifier.

Create a Beanshell script called createJob with two output ports jobId and value (depth 0). Set the script to:

This would create a temporary file (so workflow runs are independent), and write the initial value "0" to the file. The returned job ID is the full path to the file.

Right click for each of the beanshell output ports and connect to new workflow output ports, and run the workflow to check that you get out 0 and a file name.

Secondly, add another script checkStatus, with an input port jobId and output port state. The script should be:

Again connect the ports, with jobId coming from createJob. This script reads back in again the file given as the job identifier, and parses the content as an integer. We say that as long as the integer (remember initially 0) is less than 10, the job is still in the state "RUNNING" (although nothing is in fact running in this example), in which case we'll increase the number by 1 and write the incremented number out again to the same file. Thanks to Apache Commons-IO's FileUtils (which is on the Beanshell's class path) this is quite straight forward. Finally, if the value is 10 or more, we don't increase the number, and the state is said to be "COMPLETE".

Finally add the beanshell script getResults with input jobId and output value.

If we connect this all up and run the workflow as below, we would get semi-randomly 0 or 1 as the value.

We'll right click on getResults and select Run after -> checkStatus to enforce that getResults run after checkStatus has finished.

Notice the coordination link in the new diagram:

Although this enforces the single increment inside checkStatus, we still are not looping. So next we'll enable looping for the checkStatus so that we can reach that glorious result 10 instead of 1. Select checkStatus and click on Details in the Workflow explorer. Go to the Advanced accordion and click the Add looping button.

This will enable the looping feature, but we have not configured it yet to say what the final condition is. Click Configure to bring up the looping settings. (Or the little trash bin on the right if you no longer want looping).

In this pop up, select the output port state, the condition is not equal to and set the string to RUNNING. Set the delay to {{0.5}}s and click OK.

Why not check if status is COMPLETE?

In our example we could also have chosen to loop until the output state is equal to COMPLETE. This tiny difference can be important when a service can also return other states. For instance, imagine that checkStatus could also return FAILED if the job stopped working. In that case a loop waiting for COMPLETE would never finish, but a loop that continues while the status is RUNNING would give up.

Our Advanced details should now show the looping condition. (It is a bug that this is not currently shown in the diagram).

The delay will add a sleep between each iteration, which can be useful to avoid 10.000 status checks per second and killing the web service. The delay will not be applied once the looping is finished.

You can download the finished example looping workflow from myExperiment. When running you will see that checkStatus takes a while to finish, and that you get the outputs result: 10 and state: COMPLETE.

Other loop comparisons

We'll play around with the different settings in the loop configuration. First we'll modify the checkStatus beanshell script so that it also has an output port value. Due to a bug T2-641 you will also need to connect the value port to something in the workflow, like a new workflow output port.

Now modify the looping for checkStatus so that it is instead value is greater than the number 8.

If you run this workflow now, you should find the output state to be RUNNING (as we did not reach 10), and the value to be 9 - as 9 is greater than 8. is greater than and is less than can compare outputs that can be parsed as Doubles (so including 0.4412 and 22e12).

Infinite loops

Be careful not to put greater than 10 - as our beanshell script never goes above 10 this would make an infinite loop, and there are unfortunately not yet a way to stop a workflow except to quit Taverna (bug T2-414). 

  Finally we can also compare the output value by using regular expressions. If you select matches \d\d (or does not match \d) the service would loop until the output value matches the regular expression for two digits.

Port feedback

In our example the looped service knew on its own how to progress, so that the final condition eventually is reached. For an asynchronous service this would be when the job is finished.

What if we want to do a more iterative process, and modify our parameters? Close the loop configuration, and go edit the checkStatus beanshell script. Add an input port called increase and an output port also called increase. (Due to a bug T2-1127 Taverna will incorrectly stop you from creating a beanshell script with an input and output port of the same name if you are clicking Apply, but due to T2-1128 you can override this check by instead clicking Close and then Yes to save the configuration.)

Change the script to increment the number using increase instead.

Now connect a string constant to the input increase and set it to 1 (the initial value), and create a workflow output port connected to the beanshell script output increase. (T2-641 again)

Go to the Loop configuration again, and tick the box for Feed back matching ports.
 

 
If we run this time, every time the looping condition says that another iteration is needed, the matching input ports to the service will get their values from the previous output ports. So in our case the increase input will be coming from last iterations increase output, except in the first iteration when the string constant is used.

The checkStatus service is therefore taking control of its own parameters for the next iteration. For beanshell scripts we admit this is a bit silly as you could write that loop inside the script instead, but imagine that instead of that beanshell script you have a nested workflow that does a complex image analysis, and the nested workflow is trying to tweak various parameters. Simply include output ports named the same as the input ports you want to replace, connect them up and tick the feedback-box, remember to have a condition for when looping is to stop, and off you go. You could add a beanshell script to your nested workflow that calculates a 'how good is it-score' from various inputs, and use that score output in the loop condition.

Go to Loop

Email from Stian Soiland-Reyes to taverna-users on 2010-01-26:

See http://www.myexperiment.org/workflows/820 for an example of looping - click the processor checkStatus to see Details -> Advanced:

It is possible to set such looping also on nested workflows, but as this example shows you would no longer need to use Nested workflow-Fail_if_false-Critical-Retry trick (which don't work in 2.1, as nested workflows no longer fail themselves; errors are delivered to output ports instead).

There's also an interesting checkbox in there for 'Feed back matching ports' - if you tick this, then any output ports from the service (most probably: a nested workflow) which port name is the same as an input port in the same service, then for the second iteration and so on, the service would receive the previous outputs as inputs, allowing it to iteratively modify its own. You will have to provide the initial value from the mother workflow, though. Remember that looping is always done after the first execution, so more like a do...while construct in Pascal.

If you inside Looping click 'Customize' you can also write a more complex beanshell script that can check the looping condition (if it should rerun the service or not) - in fact you can also modify port values here, any additional output ports from this beanshell script that matches the input port of the service will be passed along when looping (the second time and so on) - any beanshell input ports matching the service output ports will be inputs to the script (so you can check several outputs), and finally any leftover input ports that match the service input ports, will be provided with the values as passed in from the workflow.

Notice that any output port you inspect from the looping needs to be connected to something in the parent workflow so that it will be extracted from the service. Similarly, if you want to create feedback-ports you would also need to connect these outputs to something below in the workflow (like a workflow output port), so that it will be delivered by the service, and can be picked up by the looping. This is partly a bug due to the way looping is implemented, and we've planned to remove the need to do these connections - see http://www.mygrid.org.uk/dev/issues/browse/T2-641

Labels
  • None