Taverna PROV Data Bundle (Taverna 2.x)
Taverna 2.4 with the Taverna-PROV plugin 2.1.5 or later can export Taverna workflow runs as a Data Bundle. The bundle can be saved from within the Workbench results (Save All) or from the command line. The Data Bundle contains the workflow input and output values, intermediate values, a provenance trace and a copy of the executed workflow definition.
Structure of exported provenance
.bundle.zip file is a RO bundle, which species a structured ZIP file with a manifest (
You can explore the bundle by unzipping it or browse it with a program like 7-Zip.
The remaining text of this section describes the content of the RO bundle, as if it was unpacked to a folder. Note that many programming frameworks include support for working with ZIP files, and so complete unpacking might not be necessary for your application. For Java, the Data bundle API gives a programmatic way to inspect and generate data bundles.
outputs/ contain files and folders corresponding to the input and output values of the executed workflow. Ports with multiple values are stored as a folder with numbered outputs, starting from
0. Values representing errors have extension
.err, other values have an extension guessed by inspecting the value structure, e.g.
.png. External references have the extension
.url - these files can often be opened as "Internet shortcut" or similar, depending on your operating system.
inputs intermediates mimetype outputs workflow.wfbundle workflowrun.prov.ttl
Hello, John Doe
This log details every intermediate processor invocation in the workflow execution, and relates them to inputs, outputs and intermediate values.
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat workflowrun.prov.ttl | head -n 40 | tail -n 8
rdf:type prov:Activity ;
prov:startedAtTime "2013-11-22T14:01:02.436Z"^^xsd:dateTime ;
prov:qualifiedCommunication _:b1 ;
prov:endedAtTime "2013-11-22T14:01:03.223Z"^^xsd:dateTime ;
rdfs:label "taverna-prov export of workflow run provenance"@en ;
prov:wasInformedBy <http://ns.taverna.org.uk/2011/run/385c794c-ba11-4007-a5b5-502ba8d14263/> ;
Note that the URIs starting with
Intermediate values are stored in the
intermediates/ folder and referenced from
Intermediate value from the example provenance:
Here we see that the bundle file
intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txtcontains the output from the "hello" processor, which was also the input to the "Concatenate_two_strings" processor. Details about processor, ports and parameters can be found in the workflow definition.
Note that "small" textual values are also included as
cnt:chars in the graph, while the referenced intermediate file within the workflow bundle is always present.
rdf:type cnt:ContentAsText ;
cnt:characterEncoding "UTF-8"^^xsd:string ;
cnt:chars "Hello, "^^xsd:string ;
tavernaprov:byteCount "7"^^xsd:long ;
tavernaprov:sha512 "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"^^xsd:string ;
tavernaprov:sha1 "f52ab57fa51dfa714505294444463ae5a009ae34"^^xsd:string ;
rdf:type tavernaprov:Content .
workflow.wfbundle is a copy of the executed workflow in SCUFL2 workflow bundle format. This is the format which will be used by The file
workflow.wfbundle contains the executed workflow in Taverna 3.
You can use the SCUFL2 API to inspect the workflow definition in detail.
Taverna 3 Data bundle
Taverna 3 uses the same Data Bundle format as Taverna-PROV plugin. Currently there are some differences due to the two different implementations for capturing provenance.
Taverna 3 does not yet export provenance trace to
Workflow report (workflowrun.json)
Taverna 3 introduces a new resource in the data bundle,
workflowrun.json which is a more Taverna-centric and it mirrors the actual execution state while running a workflow. This example shows excerpt of a workflow run report (See also the full workflowrun.json):
Structure (optional means the property might not be present. Properties marked final should be present after the workflow has finished:
- workflow report (top-level JSON Object)
subjectthe URI identifying the executed workflow, as identified within the SCUFL2
stateof the last workflow run; one of
createdDateDate/time (in ISO 8601 dateTime format) of creation of the workflow report, e.g. when execution of the top-level workflow was started.
startedDate(final) Date/time this workflow initially executed
pausedDate(optional) Date/time this workflow last entered the
pausedDates(optional) A chronological JSON list of Date/times of each time a workflow has entered the
resumedDate(optional) Date/time this workflow last resumed from the
resumedDates(optional) A chronological JSON List of Date/times of each time a workflow has resumed from the
cancelledDate(optional) Date/time this workflow entered the
failedDate(optional) Date/time this workflow entered the
completedDate(optional) Date/time this workflow entered the
invocationsJSON List of workflow invocations. For the top-level workflow, this list always contain only 1 item which mirrors the information above.
idA identifier for this invocation, unique within this workflow report
parent(optional) The identifier of the parent invocation. When this invocation was a nested workflow run, this will be the identifier of the corresponding activity invocation within the parent workflow
nameA name for this invocation, unique within this list of invocations
index(optional) List of JSON integers, indicating the iteration index within the executed workflow
stateof this workflow invocation; one of
startedDateDate/time when this invocation started.
completedDate(final) Date/time when this invocation ended.
- inputs A JSON Object of the input port values. The keys are port names, e.g.
"name", the values are relative URIs referring to resources within the Data Bundle, eg.
- outputs A JSON Object of the output port values. The keys are port names, e.g.
"greeting", the values are relative URIs referring to resources within the Data Bundle, eg.
processorReportsA list of processor reports, one per processor in the current workflow
subjectthe URI identifying the executed processor, as identified within the SCUFL2
- [remaining properties as in workflow report:
state, createdDate, pausedDate, pausedDates, resumedDate, desumedDates, cancelledDate, failedDate, completedDate]