Looping in Taverna
Taverna workflows are inherently data-driven workflows, where data returned from one service is pushed directly to downstream services. A Taverna workflow definition does not linearly say when a service should be invoked, but where its data should come from and go to. This philosophy lets the user focus on how services are connected together, and the Taverna execution takes care of invoking services as soon as the required inputs are ready.
In iterative and object oriented programming languages one often needs to iterate over a set of numbers, objects or strings. Taverna does these iterations implicitly, if you connect a service which outputs a list to a a service which input port expects a single item, implicit iteration will invoke the second service for each of the elements of the list, and create new lists on the outputs.
There are however situations where you don't have or know the values to iterate over, but where you want some steps of your workflow to be repeated until a certain condition is true, like a
do...while construct in programming.
When could looping be required?
Before we enable looping, we'll just remind ourselves of one of the most important rules when doing recursion, that we'll need a base case to end the iterating, so it does not go on forever. The base case is stated as a condition, something which we want to be true or false in the end.
One typical use case for when you need looping is for invoking asynchronous services, that is a web service or similar where you have a pattern of first submitting the job with input parameters, which returns you a job ID, secondly, check the status of the job using that ID. You keep checking the status of the job as long as the job is in an active state (running), and finally when the job is in a final state, you get the results for the given job ID. The EBI Interproscan example workflow shows how this can be used in practice, but below we'll use a dummy workflow to avoid dependencies of third-party services.
In this attached example workflow, the service
createJob returns a job identifier, which can be used with
checkStatus until the job is finished, ending with
getResults retrieving the final result from the job. This mirrors how many real-life asynchronous web services work, but with dummy Beanshell scripts.
If you run the workflow as it is, both
finalValue will return
state will be
RUNNING. Assume that we need to keep calling
checkStatus until the state returned is
COMPLETE - in which case we should get a higher value from
getResults. (In a real-life service, the equivalent of getResults would typically not work before the job has finished).
Add a control link
First of all we need to make sure
getResults is not run until
checkStatus is complete. We can't make a normal link from
getResults, as it only expects the
jobId parameter, but instead we can add a control link.
Adding a control link enforces some processing to be performed before the result is retrieved, but the job is still RUNNING, we are not yet looping
Next we'll enable looping for the
checkStatus so that we can reach that glorious result
10 instead of
Enable looping for service
As an alternative you can also right click on the service and select Configure running -> Looping..
A configuration dialogue for the looping should appear.
In our example we could also have chosen to loop until the output
state is equal to
COMPLETE. This tiny difference can be important when a service can also return other states. For instance, imagine that
checkStatus could also return
FAILED if the job stopped working. In that case a loop waiting for
COMPLETE would never finish, but a loop that continues while the status is
RUNNING would give up.
However, if the service in question could return both
RUNNING before finally returning
COMPLETE it would be better to check on is equal to
COMPLETE. More complex scenarios can be covered with a custom looping condition. (See below)
The Advanced details should now show the looping configuration. (It is a known bug that looping is not shown in the diagram).
The delay will add a sleep between each iteration, which can be useful to avoid 10.000 status checks per second and killing the web service.
To avoid any delay, set the delay to
0 or an empty string.
|Short delays should be avoided|
Note no delay, or a very short delay, could use excessive CPU and network traffic both for running Taverna and the external service. Some service providers might block your IP address if you place too many calls in a short period of time.
|Not all output ports are listed|
Only service output ports returning single values (depth 0) are included in the port drop-down. If you want to compare output ports with lists, you will need to create a Customized loop condition to inspect the list.
You can download the finished example looping workflow from myExperiment. When running you will see that
checkStatus takes a while to finish, and that you get the outputs
result: 10 and
|Output ports must be connected|
In order for the looping mechanism to check an output port, you will need to connect the port to something in the workflow, like a new workflow output port or another service. (This is a known bug)
Looping can check the output value not just by string equality, but also by numerical value. This can be useful to test in looping if your service is returning say a quality metric
0.43 or the number of discovered items (
15), and you don't know exactly what value you need.
In order to test this with our example workflow, we'll expose the current value from the
We'll now want to change our looping so that we finish when the value is higher than
If you run this workflow now, you should find the output
state to be
RUNNING (as we did not reach 10), and the value to be
9 - as 9 is greater than 8.
|Comparing decimals and large numbers?|
The is greater than and is less than comparisons can check outputs that can be parsed as
Any non-number would cause the test to fail, and the looping to terminate.
Be careful not to put greater than 10 in this example - as our beanshell script never goes above 10 this would cause an infinite loop, possibly causing excessive use of CPU.
If this happens, click the Cancel button in the Results perspective to abort the workflow.
Using regular expressions
We can also compare the output value by using regular expressions. If you select matches
\d\d (or does not match
\d) the service would loop until the output
value matches the regular expression for two digits.
In our example the looped service knew on its own how to progress, so that the final condition eventually is reached. For an asynchronous service this would be when the job is finished.
What if we want to do a more iterative process, and modify our parameters? This could be a nested workflow that is doing some kind of analysis followed by a quality assessment. If the quality is at the required level, it is to return, otherwise it is to perform the analysis again with some modified parameters.
Taverna can do this by ticking Enable output port to input port feedback in the looping configuration.
In this example workflow, a nested workflow
find_squared is called repeatedly until the output on
divided is less than the number 2.
On each repeated call, the value from the workflow output port
root is given as the new input for the workflow input
root instead of the value from the
initial_root. Inside the nested workflow
add1 prepares the next value (by adding 1), while
divide uses the calculated
square as a test.
The looping is controlled on the
find_square service, with the tick box Enable output port to input port feedback enabled.
In this case the value we get out from
root is the same as the final output of the workflow. So run with the
number set to
16, we'll get the correct answer of
Imagine that we were doing a more complicated example, and we want to know the value of
root as it was used in the last execution of
square. This value is not exposed from this nested workflow, as the workflow input
root will no longer be coming from
initial_root, but from previous runs of the nested workflow. The easiest way to expose this is to edit the nested workflow, and add a second workflow output port
found_root and connect it directly to the workflow input port
|All the workflow ports must be connected|
If you are using port feedback, it is very important that all service inputs have an initial value, and that all service outputs are connected to something else in the workflow or to a workflow output port.
The reason for this is that the loop mechanism can only pick up values that are to be passed around in the workflow.
|Check your depths|
As no implicit iteration will be performed on the values from the feedback, the service (in this case nested workflow) outputs must match both in name and in depth when port feedback is enabled. That means that if a workflow expects a list of depth 1 at a port, if there is an output port of the same name, it must also give a list of depth 1, etc.
Customizing the loop condition
In some cases it can be necessary or cleaner to perform the loop condition yourself. As in the example workflow from above, the nested workflow has to also perform the
divide function in order to test if the found root is big enough, as the standard loop mechanism can only test simple conditions like if a the output is larger/smaller than X.
In this dialogue you will see the code generated from the selections in the loop configuration window:
If you inspect the Beanshell script inputs and outputs from the example workflow, you will find the inputs
root. These script inputs should match the workflow outputs, by name and depth, and means that the script is free to check any of the workflow outputs, not just a single port.
The magic output port of the Beanshell script is called
loop. If the returned
loop is a string equal to
"true", Taverna will rerun the service.
|Converting loop to a string|
Thread.sleep-statement has been added to the script in this case because a delay was requested from the loop configuration.
The other script outputs, if matching the service inputs, will be used instead of the original inputs. As we had ticked feedback, all the matching ports have been automatically added to the script inputs and outputs, and are sent through unmodified. The script may however choose to modify these values in-place, allowing you to move loop-related modifications out of the nested workflow.
If your loop condition needs to inspect a workflow output port that is a list, then you have two options:
- Add a "should I loop" shim and test for the single value
- Create a custom loop script that checks the list directly
This example workflow shows how a nested workflow can be looped using this customized loop script:
Notice how this example workflow is also gradually building a list by feeding back the output port
list to the input port