These pages are outdated.
Users are recommended to install and use the updated Taverna-PROV plugin which produces PROV-O traces and includes the data values. The Taverna-PROV traces are more complete, more are "correct" and addresses many of the known issues in OPM/Janus.
This page explains how to populate and then access the provenance database for Taverna up to v.2.1.2.
A few ready-made "packs" are available from the myExperiment site containing sample workflows along with their provenance graphs (both Janus and OPM). These have been used in the context of a DataONE summer project of code", and are:
Provenance is generated automatically every time a workflow is executed. However, you have control over:
for (1): from the Taverna workbench, go to Preferences. Make sure "enable provenance capture" is ticked, and make sure in-memory storage is not ticked
for (2): the Derby back-end configuration is not supported. I (Paolo) will only support mySQL.
Follow the instructions at this configuration page to edit a simple config file to enable provenance to be written to a mySQL DB "For use with mySQL":. After you have edited the configuration file as explained, restart Taverna for the changes to take effect.
Important note: we do not supply a mySQL installation with Taverna. You are responsible for managing your own installation, and once that is up and running, insert the appropriate connect string into the config file.
You have the following options (from easy to expert):
These options are described next.
In addition to its native provenance, Taverna can produce RDF graphs that conform to the Janus provenance ontology.
The Janus OWL ontology for Taverna provenance is attached
The export function is currently just a utility, implemented as another standalone API client. The easiest action is to download the ProvenanceExportToJanus.jar stand-alone jar file from the SVN repository, and execute the command:
java -Dconf=<my config file> -jar ProvenanceExportToJanus.jar
where <config file> minimally contains the mySQL connection parameters, in addition to other switches that can be used to configure the provenance API. An example can be found here: APIClient.properties.
In the config file you also specify where the RDF output file is located.
The ProvenanceExportToJanus.jar bundle is simply a self-contained jarfile for the main API client class NativeToRDF.java. The source code is available here.
Follow instructions in the next section to build the source and more generally to write your own client.
It is in the same package as the sample query client, and is called NativeToRDF.java.
All client code is available through the myGrid-labs SVN repository for the provenance client project. The code is managed through Maven 2: maven2. This means that after checking out the project, you build it from its root directory like so:
Many people use eclipse with a Maven plugin, which makes it easy to manage the project. If you follow this route, you will need to have maven installed on your machine in addition to having the maven plugin in eclipse. In this case, you
The example client class ProvenanceAPISampleClient in package net.sf.taverna.t2.provenance.api.client takes a provenance query, specified in its config file, runs the query and illustrates how you navigate through the Native Answer java data structure to display the query results.
Note that the client accesses a steady-state provenance DB, i.e., after the workflow run has completed. So the normal sequence is
Of course, the DB holds multiple runs for any number of workflows. The provenance query language lets you specify the specific workflow and run as a scope for the query, however if none is specified, it defaults to the latest run that as added to the DB.
For convenience, I have packaged up the generic client as a stand alone jar file that can be downloaded here: TavernaProvenanceClient.jar.
This can be executed from the command line, like so:
java -Dconf=<my config file> -jar TavernaProvenanceClient.jar
where you specify your own config file as indicated above.
Two pieces of documentation are needed to write a client:
The JavaDoc for the ProvenanceAccess API is here. The main classes of interest are:
*class ProvenanceAccess in package net.sf.taverna.t2.provenance.api
*class QueryAnswer in package net.sf.taverna.t2.provenance.api, which also encapsulates the OPM graph for the answer, in addition to a native Java object (class NativeAnswer)
Here is a visual description of the NativeAnswer object in action on a workflow trace:
This is a very partial fragment of the entire object. It shows the nested structure of a NativeAnswer that describes a path in a provenance graph.
The workflow that generates the trace is here along with sample inputs for it.
The documentation for our simple provenance query language is in progress. The current version is here
A few sample queries can be found in folder src/main/resources on the SVN.
In particular, you can write a really minimal query (see minimal.xml) which will compute the provenance of each of the workflow's final outputs only, with focus on the initial inputs only. The implicit scope in this case is the latest run recorded in the DB.
Queries can be much more selective and explicit as to which values they compute provenance for, and for which runs. The complete XML schema for the query language can be found in src/main/resources/pquery.xsd.
In addition to Janus RDF graphs, Taverna provenance also offers the capability to export the result of a provenance query as an OPM graph, in addition to the native provenance. This is specified using the property:
in the config file for any client that executes a query. In particular, if you want the OPM graph corresponding to the entire provenance trace, you can run a query that returns the entire graph, and enable OPM as above.
The graph is saved in a file in RDF/XML syntax.
An example of such query is here: completeGraph.xml
So your config file should specify at least the following:
where <path1> and <path2> specify the locations of your query file and of your resulting OPM file.
You can of course get to the DB tables themselves, using your favourite mySQL client (from the GUI tools or programmatically, for example). This is not recommended as the schema encodes the provenance information in a way that is designed for efficiency rather than for user access. However, as a reference a snapshot of the conceptual model for the schema is available here.
A more detailed schema, along with an explanation of most of the attributes in the table, is available here.
This will hopefully evolve into a full-fledged doc at some point.