Skip to end of metadata
Go to start of metadata

These pages are outdated.

Users are recommended to install and use the updated Taverna-PROV plugin which produces PROV-O traces and includes the data values. The Taverna-PROV traces are more complete, more are "correct" and addresses many of the known issues in OPM/Janus.

This page explains how to populate and then access the provenance database for Taverna up to v.2.1.2.

For the impatient...

A few ready-made "packs" are available from the myExperiment site containing sample workflows along with their provenance graphs (both Janus and OPM). These have been used in the context of a DataONE summer project of code", and are:

  • A myExperiment pack. This is a version of (half of) the Provenance challenge 1 workflow, encoded in Taverna but executing processors locally, as a mockup version. This is useful as it does not depend on external services or plugins and is guaranteed to work, if all you need is to generate provenance graphs!

Generating provenance during workflow runs

Provenance is generated automatically every time a workflow is executed. However, you have control over:

  1. whether a database is used at all.
  2. which database is used, i.e., either Derby or mySQL

for (1): from the Taverna workbench, go to Preferences. Make sure "enable provenance capture" is ticked, and make sure in-memory storage is not ticked

for (2): the Derby back-end configuration is not supported. I (Paolo) will only support mySQL.
Follow the instructions at this configuration page to edit a simple config file to enable provenance to be written to a mySQL DB "For use with mySQL":. After you have edited the configuration file as explained, restart Taverna for the changes to take effect.

Important note: we do not supply a mySQL installation with Taverna. You are responsible for managing your own installation, and once that is up and running, insert the appropriate connect string into the config file.

Accessing provenance data from the mySQL database

You have the following options (from easy to expert):

  1. export a provenance trace for a workflow run as RDF graph. The graph conforms with the Janus provenance ontology. This is accomplished simply using a command line utility. Please see below
  2. export a provenance trace for a workflow run as OPM graph. This is accomplished using a simple command line shell script. Please see below
  3. access provenance data programmatically, using the Provenance Access API. In particular the API lets you query a workflow run. The query result is a native Java object, with the option to export to an OPM graph as well.
  4. inspect the mySQL provenance DB directly.

These options are described next.

Exporting the entire provenance graph as a Janus RDF graph

In addition to its native provenance, Taverna can produce RDF graphs that conform to the Janus provenance ontology.

The Janus OWL ontology for Taverna provenance is attached

The export function is currently just a utility, implemented as another standalone API client. The easiest action is to download the ProvenanceExportToJanus.jar stand-alone jar file from the SVN repository, and execute the command:

java -Dconf=<my config file> -jar ProvenanceExportToJanus.jar

where <config file> minimally contains the mySQL connection parameters, in addition to other switches that can be used to configure the provenance API. An example can be found here: APIClient.properties.
In the config file you also specify where the RDF output file is located.

The ProvenanceExportToJanus.jar bundle is simply a self-contained jarfile for the main API client class NativeToRDF.java. The source code is available here.
Follow instructions in the next section to build the source and more generally to write your own client.

It is in the same package as the sample query client, and is called NativeToRDF.java.

Writing your own provenance client to run specific provenance queries

Downloading and building the Maven provenance client project

All client code is available through the myGrid-labs SVN repository for the provenance client project. The code is managed through Maven 2: maven2. This means that after checking out the project, you build it from its root directory like so:

mvn install

Many people use eclipse with a Maven plugin, which makes it easy to manage the project. If you follow this route, you will need to have maven installed on your machine in addition to having the maven plugin in eclipse. In this case, you

  1. check out the provenance-client code at URL above, as a new Java project
  2. enable maven on the project
  3. run as... maven install

Sample provenance client

The example client class ProvenanceAPISampleClient in package net.sf.taverna.t2.provenance.api.client takes a provenance query, specified in its config file, runs the query and illustrates how you navigate through the Native Answer java data structure to display the query results.

Note that the client accesses a steady-state provenance DB, i.e., after the workflow run has completed. So the normal sequence is

  1. run a Taverna workflow to completion. This will populate the database
  2. run your client on the DB.

Of course, the DB holds multiple runs for any number of workflows. The provenance query language lets you specify the specific workflow and run as a scope for the query, however if none is specified, it defaults to the latest run that as added to the DB.

Client as a command line util

For convenience, I have packaged up the generic client as a stand alone jar file that can be downloaded here: TavernaProvenanceClient.jar.

This can be executed from the command line, like so:
java -Dconf=<my config file> -jar TavernaProvenanceClient.jar
where you specify your own config file as indicated above.

API and query language documentation

Two pieces of documentation are needed to write a client:

  1. the ProvenanceAccess API javadoc, and
  2. the query language documentation

Javadoc for the API

The JavaDoc for the ProvenanceAccess API is here. The main classes of interest are:
*class ProvenanceAccess in package net.sf.taverna.t2.provenance.api
*class QueryAnswer in package net.sf.taverna.t2.provenance.api, which also encapsulates the OPM graph for the answer, in addition to a native Java object (class NativeAnswer)

Here is a visual description of the NativeAnswer object in action on a workflow trace:

This is a very partial fragment of the entire object. It shows the nested structure of a NativeAnswer that describes a path in a provenance graph.

The workflow that generates the trace is here along with sample inputs for it.

Creating provenance queries

The documentation for our simple provenance query language is in progress. The current version is here

A few sample queries can be found in folder src/main/resources on the SVN.

In particular, you can write a really minimal query (see minimal.xml) which will compute the provenance of each of the workflow's final outputs only, with focus on the initial inputs only. The implicit scope in this case is the latest run recorded in the DB.

Queries can be much more selective and explicit as to which values they compute provenance for, and for which runs. The complete XML schema for the query language can be found in src/main/resources/pquery.xsd.

Exporting provenance traces as OPM graphs

In addition to Janus RDF graphs, Taverna provenance also offers the capability to export the result of a provenance query as an OPM graph, in addition to the native provenance. This is specified using the property:
OPM.computeGraph=true
in the config file for any client that executes a query. In particular, if you want the OPM graph corresponding to the entire provenance trace, you can run a query that returns the entire graph, and enable OPM as above.

The graph is saved in a file in RDF/XML syntax.

An example of such query is here: completeGraph.xml

So your config file should specify at least the following:
query.file=<path1>/completeGraph.xml
OPM.computeGraph=true
OPM.rdf.file=<path2>/OPMGraph.rdf

where <path1> and <path2> specify the locations of your query file and of your resulting OPM file.

Accessing the database directly

You can of course get to the DB tables themselves, using your favourite mySQL client (from the GUI tools or programmatically, for example). This is not recommended as the schema encodes the provenance information in a way that is designed for efficiency rather than for user access. However, as a reference a snapshot of the conceptual model for the schema is available here.

A more detailed schema, along with an explanation of most of the attributes in the table, is available here.

This will hopefully evolve into a full-fledged doc at some point.

Labels
  • None