I’m currently looking into an old bug, TAV-480, that seems to have reappeared. This bug is related to Raven, our classloader system based on Maven that allows us to do plugins and network updates. I’ll come with another post about this bug, but first let’s do an introduction of Raven.

The idea of Raven is quite simple, we use Maven to deploy all our modules, and all of our dependencies are also available either from our own or from official third-party Maven repositories. One thing we decided to support from Taverna 1.5 was plugins and dynamic updates over the web. How could we do this?

We decided to reuse the existing Maven infrastructure also at runtime. To install a plugin would then simply be a matter of specifying which Maven artifacts are needed, and from which Maven repositories to get the artifacts, such as in the plugin description for the t2 preview plugin. So we built Raven, which although inspired by Maven doesn’t use any Maven code.

Given a Maven artifact as described in the plugin file:

<artifact groupId="net.sf.taverna.t2"
artifactId="biomart-activity" version="0.2.0"/>

Raven fetches the POM and JAR files for biomart-activity from the repository, parses the Maven artifact description file biomart-activity-0.2.0.pom, and then does the same for each of the dependencies listed, and with their dependencies again, and so on until everything required has been downloaded.

What remains is to get all these JARs available on the classpath. The normal way to do this is to use a ClassLoader, such as the URLClassLoader. You can’t normally modify the classloader Java gives you at startup, because it has been determined from the -classpath parameter and you can’t officially add new URLs to it once it’s constructed. However you are free to construct a new classloader and load classes from there.

This is what the bootstrapper asks Raven to do. Since we can have several plugins, each plugin would need its own classloader. The plugins can come from different third-parties (you can add plugin sites to Taverna), and they might depend on different versions of common artifacts such as jdom. In some cases these versions are not compatible.

What we ended up with as a solution was that each Maven artifact gets its own LocalArtifactClassLoader. This classloader also have a list of the dependencies as declared in the POM, and when asked to resolve a classname, it will first search each of it’s dependencies. As the dependencies have the same kind of classloaders, they will use the same logic, and the search will be depth-first. If none of the dependencies have the class, then maybe the JAR file associated with this artifact does, so it does a normal search through its own superclass URLClassLoader which searches the single JAR file. If this fails, a ClassNotFoundException is thrown and whoever depended on this artefact would go on to check the next dependency.

When you download Taverna these days, what you get in the zip-file is a tiny bootstrapper class together with a configuration file raven.properties that says which are the global Maven repositories, and which profile should be used for the main program. The profile version basically says what is Taverna by listing all the required artifacts, very similar to the plugin definition. For instance you would find all the different processors listed here.

If we decide that we need to publish an update of a certain processor, we can just deploy the new version to the Maven repositories, and publish a new version of this profile and list it in the profile list that Taverna checks on startup.

Although this solution came with a few quirks we had to work out (for instance many of the official POMs didn’t correctly state what where the true dependencies of the library), in general it’s a nice solution with lots of possibilities, for instance it doesn’t prevent you to have two versions of the same artifact at once, so you could theoretically build a workflow that used both an old and a new version of the wsdl-activity – if for instance we had fixed a bug that you suspect might have affected some old workflow runs and you want to compare the outputs.

However, one of the difficulties with doing all of the ClassLoader work ourself is that as this is quite low-level and hard-core Java with many tiny little pitfalls to worry about and lots of concepts to understand. I’ll come back in the next post with the example related to our bug TAV-480.

If you are interested in using Raven in your own project, contact us on taverna-hackers!

Stand up and be counted

Taverna workflows are full of shims. That’s a fact. Shims are the little adapter services, mostly using Beanshells, which convert the outputs of one workflow processor before sending it to the input of another. They are always being re-invented and 90% of the time do the same things – concatentate strings, swap things around…..

The problem is that these shims are designed to be use once, almost throw away, so they are not annotated with any sort of helpful information. This becomes a problem with the scientists greatest challenge – provenance and data lineage (OK, I might be being a bit melodramatic about it being the greatest challenge but it is up there somewhere – maybe nearer to the challenge of making the perfect cup of tea, no mean feat). Data goes into a shim and comes out the other side but what is happening inside? Shims are not the only processors guilty of this, there are plenty of black box services out there in the world. So, how can we address this problem? Well, we have started to collect the shims that people actually use in a my experiment group, we will then figure out all the similarities and come up with an annotated set which we can all use from Taverna 2 (T2). The current idea is for the T2 workbench to have an intelligent workflow designer which will recognise that you are trying to do some shim magic and suggest one to use. Maybe we will need a Taverna clippy style pop up (think Word etc) – ideas on a postcard……

So, if you are a shim it’s time to stand up and announce to the world – “I’m a shim and I’m proud of it”.