There are often cases in workflow construction where the output of one service is not quite right for the input of the next. In such cases, Beanshell services come in handy to help you make the necessary data transformations and "shimming" between incompatible services. Beanshells in Taverna typically perform data manipulation, parsing and formatting functions, saving to a local hard disk, etc.
Beanshell is a Java scripting language. A Beanshell service in Taverna enables you to write various Java code snippets and execute them as part of your workflows. For users who have never attempted Java programming we recommend the Java tutorial on Sun Microsystem's Web site. There are certain minor differences between the core language described there and the version used by the Beanshell; these are further documented at the Beanshell Web site. The good news is that almost all these differences make it easier to use Beanshell than conventional Java; it is unlikely a typical user would ever encounter them.
Adding a Beanshell
To add a Beanshell to your workflow:
Note that you can add a Beanshell from the Insert menu as well.
The above has only added an "empty" Beanshell template to the workflow. You need to enter the actual Beanshell code/script to be executed in order for this service to make any sense, as explained in Configuring a Beanshell next.
Configuring a Beanshell
Configuring a Beanshell typically involves:
- Configuring the code/script to be executed
- Configuring the Beanshell's inputs and outputs so you can pass some values in and get some values out (optional)
- Configuring the dependencies of the Beanshell, if you need access to some external libraries in the code/script of the Beanshell (optional)
Configuring the Script
As an example of a simple script, let us create a Beanshell script that will print your full name given your first name and surname.
In the pop up window's Script tab enter the following code:
This script will create a variable called
To upload an existing script from a file, click Load script button at the bottom of the Beanshell configuration dialog.
Configuring Input and Output Ports
Input and output ports are the connection points between the rest of the workflow and the Beanshell service. From a programming point of view, you can look at the input parameters as parameters to a function call, and at the output parameters as return values.
To add an input port:
Our example script that prints a full name given a first name and a surname requires two input ports:
mySurname. They are both of depth 0 (i.e. they are just simple strings and not lists).
You can add an output port in a similar manner to input ports but this time select the Output ports tab.
Input and output ports are variable names within the Beanshell script, their names should therefore match. Beanshell port and variable names are case sensitive (myName and myname are different)!
To test the Beanshell service you have just created, create two workflow inputs (call them
mySurname to correspond to the Beanshell's input ports - although they do not have to be called that) and one workflow output (called
myFullName to correspond to the Beanshell's output - again this is not compulsory, the workflow ports can be called anything) and connect them to the appropriate ports of the Beanshell. Your workflow should look like in the figure below. Now run the workflow and look at the results.
Just like in Java, in Beanshell you are allowed to reference existing classes using import statements. By default you should have access to the full Java Platform API so you should have no problems using say a java.util.HashSet. However, it is often the case that you already have some library provided by you or some third party that does what you want. If these libraries are available as JARs you can make them accessible from within the Beanshell by clicking the Dependencies tab and configuring them from there.
The dialogue gives you the location of the library folder into which you must copy the needed JARs.
After copying the JARs, close and open the Beanshell window again. The JARs you have copied should now appear in the dialog and you should be able to tick off the ones you require. Different services in the workflow, just as different workflows, can depend on different JAR files without getting conflicts.
Workflows with dependencies are inherently more difficult to share with other Taverna users, as other users would also need to download and install the depending JAR libraries.
A More Advanced Example
Let us look now at a more advanced Beanshell script example that takes a piece of XML text and an XPath expression as inputs, applies the XPath to the text and returns the result values either as a list of nodes or as XML text. This is actually XPath From Text local service, which is just a pre-canned Beanshell.
Here is what the script looks like:
The input ports of the service look like:
The output ports of the service look like:
Advanced - Dependency Classloaders
This section can be quite technical even for hard-core Java programmers.
You may have noticed the classloader persistence option on the Dependencies tab. This option enables you to have a certain level of control on how the classes of the depending libraries are loaded, in particular if a library you depend on has complex initialisation routines or stores state in static variables. It also enables several Beanshell with dependencies to cooperate by sharing dependencies.
Possible classloader persistence options are to share it for whole workflow or to use a system classloader.
Shared for whole workflow
The dependency classes are loaded fresh for each workflow run, but are shared between all Beanshell services with this persistence option. The JAR files that are searched is the union of all the dependency selections of all Beanshells in the workflow. Normally this means that you only need to tick off the required JAR files in one of the Beanshell services, as long as all of them have Shared for whole workflow option set. This option allows the dependency to share state through internal static members, and so the behaviour of one Beanshell might depend on the behaviour of another. This is not recommended for scientificly sound provenance, but the isolation level is still at the workflow run so that each workflow is run with fresh classes. This is the default option.
The classes are loaded using the system classloader. This means they are only ever loaded once, even if you run several workflows or re-run a workflow. This option is generally only recommended as a last resort or if you are accessing JNI-based native libraries, which in their nature can only be loaded once. Notice that if you do not use the standard Taverna startup script you will have to add the JAR files to the
-classpath. See the section on JNI-based libraries for more information.
In general we recommend using Shared for whole workflow option.
JNI-based Native Libraries
JNI-based libraries is a way for Java programs to access natively compiled code, typically written in languages such as C, C++ or Fortran. Even if you do not depend on such a library, one of your dependencies might.
A JNI-based library is normally identified by an extension such as .jnilib instead of .jar. Compiling and building JNI libraries is out of the scope for this documentation, but we will cover how to access such libraries from within Taverna.
In this section we will assume we have a Java library
hello.jar that depends on some native functions in
hello.jnilib. To complicate matters, our
hello.jnilib again depends on the native dynamic library
libfish.dylib (pick your favourite extension depending on your operating system).
First of all you need to make a decision as to where to install the libraries. We generally recommend installing the
.jnilib files in the same location as the
.jar files (i.e. in
lib in your Taverna home directory), as described in the section Configure dependencies. However, since supporting JNI will require you to modify the Taverna startup scripts, you might want to install them in the
lib directory in the Taverna installation/startup directory instead. Here we will do the Taverna home directory option.
In the Taverna installation directory, locate
taverna.sh, depending on your operating system. Open this file in a decent editor.
You need to add a few lines to set the library path so that the
.jnilib can find its dependencies. This step might not be required if you have no
.dylib files in addition to the
.jnilib file, but it might be if you have more than one
.jnilib file. Here we will set the dynamic library path to be the
lib directory in your Taverna home directory.
In addition, we are going to modify the Java startup parameters to set the system property
java.library.path which tells Java where to look for
.jnilib files. Since both paths and variable names vary with operating system we will show the modifications for Window, Linux and OS X.
In the Taverna installation folder, find and edit
taverna-debug.bat with your favourite editor, and add/modify the lines to reflect the following*:*
In the Taverna installation folder, find end edit
tverna.sh with your favourite editor, and add/modify the lines to reflect the following:
Mac OS X
On Mac OS X, a startup script is not used as Taverna is wrapped in an application bundle. A MAC OS X bundle are a kind of directory. If dependency on dynamic libraries is needed, we recommend you install the JNI libraries inside the Taverna bundle. However the JAR files must be configured as explained in the Configure dependencies section.
Technically, you could use almost the same solution as in Linux, but you would have to start Taverna using the command line:
Use the Terminal and change directory to inside the Taverna 2.2.0.app bundle. Navigate down to
Contents/MacOS. This is where we will copy our
dylib files, in this example
To do the same in the Finder, right-click (or control-click) on the Taverna icon in Applications folder and select Show Package Contents.
Navigate down to
Contents/MacOS and copy in our
libfish.dylib) files there.
However, in order for
libhello.jnilib to find its dependency
libfish.dylib, we will have to use the Terminal and modify the library path using
install_name_tool. We will use
otool to inspect the paths.
Why does this work?
If you experience errors, and want to check console for debug messages from your library, instead of double-clicking the Taverna icon, you can start it from the Terminal.
The example below shows a typical message when
libhello.jnilib is located, but some of its dependencies (
libfish.dylib) cannot be found (e.g. because we did not run the