Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current ·  View Page History

Taverna PROV Data Bundle (Taverna 2.x)

Taverna 2.4 with the Taverna-PROV plugin 2.1.5 or later can export Taverna workflow runs as a Data Bundle. The bundle can be saved from within the Workbench results (Save All) or from the command line. The Data Bundle contains the workflow input and output values, intermediate values, a provenance trace and a copy of the executed workflow definition.

Structure of exported provenance

The .bundle.zip file is a RO bundle, which species a structured ZIP file with a manifest (.ro/manifest.json). 

Mime type: 

application/vnd.wf4ever.robundle+zip

File extension:

.bundle.zip

You can explore the bundle by unzipping it or browse it with a program like 7-Zip.

The Taverna-PROV source code includes an example bundle and unzipped bundle as a folder. This data bundle has been saved after running a simple hello world workflow.

The remaining text of this section describes the content of the RO bundle, as if it was unpacked to a folder. Note that many programming frameworks include support for working with ZIP files, and so complete unpacking might not be necessary for your application. For Java, the Data bundle API gives a programmatic way to inspect and generate data bundles.

Inputs and outputs

The folders inputs/ and outputs/ contain files and folders corresponding to the input and output values of the executed workflow. Ports with multiple values are stored as a folder with numbered outputs, starting from 0. Values representing errors have extension .err, other values have an extension guessed by inspecting the value structure, e.g. .png. External references have the extension .url - these files can often be opened as "Internet shortcut" or similar, depending on your operating system.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls
inputs intermediates mimetype outputs workflow.wfbundle workflowrun.prov.ttl
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls outputs
greeting.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat outputs/greeting.txt
Hello, John Doe

Workflow run provenance

The file workflowrun.prov.ttl contains the PROV-O export of the workflow run provenance (including nested workflows) in RDF Turtle format.

This log details every intermediate processor invocation in the workflow execution, and relates them to inputs, outputs and intermediate values.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat workflowrun.prov.ttl | head -n 40 | tail -n 8
<#taverna-prov-export>
rdf:type prov:Activity ;
prov:startedAtTime "2013-11-22T14:01:02.436Z"^^xsd:dateTime ;
prov:qualifiedCommunication _:b1 ;
prov:endedAtTime "2013-11-22T14:01:03.223Z"^^xsd:dateTime ;
rdfs:label "taverna-prov export of workflow run provenance"@en ;
prov:wasInformedBy <http://ns.taverna.org.uk/2011/run/385c794c-ba11-4007-a5b5-502ba8d14263/> ;

See the provenance graph for a complete example. The provenance uses the vocabularies W3C PROV-Owfprov and tavernaprov.

ns.taverna.org.uk URIs

Note that the URIs starting with 

http://ns.taverna.org.uk/2011/run/

http://ns.taverna.org.uk/2011/data/

http://ns.taverna.org.uk/2010/workflowBundle/

are not meant to be clickable (HTTP resolvable) and would currently give 404 Not Found.

The reason for this is that myGrid does not (and will not) store centrally any workflow run information, data values or workflow definitions. It is however still useful that each workflow definition, each workflow run and each produced data value can be uniquely identified, therefore we build these URIs using UUIDs that are generated within Taverna. It is possible that in the future these URIs could redirect to public search results, e.g. on myExperiment.

Intermediate values

Intermediate values are stored in the intermediates/ folder and referenced from workflowrun.prov.ttl

Intermediate value from the example provenance:

Here we see that the bundle file intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt contains the output from the "hello" processor, which was also the input to the "Concatenate_two_strings" processor. Details about processor, ports and parameters can be found in the workflow definition.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls intermediates/d5
d588f6ab-122e-4788-ab12-8b6b66a67354.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat intermediates/d5/d58*
Hello,

Note that "small" textual values are also included as cnt:chars in the graph, while the referenced intermediate file within the workflow bundle is always present.

<intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt>
rdf:type cnt:ContentAsText ;
cnt:characterEncoding "UTF-8"^^xsd:string ;
cnt:chars "Hello, "^^xsd:string ;
tavernaprov:byteCount "7"^^xsd:long ;
tavernaprov:sha512 "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"^^xsd:string ;
tavernaprov:sha1 "f52ab57fa51dfa714505294444463ae5a009ae34"^^xsd:string ;
rdf:type tavernaprov:Content .
Workflow definition

The file workflow.wfbundle is a copy of the executed workflow in SCUFL2 workflow bundle format. This is the format which will be used by The file workflow.wfbundle contains the executed workflow in Taverna 3.

You can use the SCUFL2 API to inspect the workflow definition in detail.

The file .ro/annotations/workflow.wfdesc.ttl contains the abstract structure (but not all the implementation details) of the executed workflow, in RDF Turtle according to the wfdesc ontology.

Taverna 3 Data bundle

Taverna 3 uses the same Data Bundle format as Taverna-PROV plugin. Currently there are some differences due to the two different implementations for capturing provenance.

Taverna 3 does not yet export provenance trace to workflowrun.prov.ttl.

Taverna 3 introduces a new resource in the data bundle, workflowrun.json which is a much more Taverna-centric and it mirrors the actual execution state while running a workflow. This example excerpt shows the structure. (See also the full workflowrun.json)

The subject refers to the executed workflow/processor/activity, as identified within the SCUFL2 workflow.wfbundle

 




 

 

Labels
  • None