Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current ·  View Page History

Taverna PROV Data Bundle (Taverna 2.x)

Taverna 2.4 with the Taverna-PROV plugin 2.1.5 or later can export Taverna workflow runs as a Data Bundle. The bundle can be saved from within the Workbench results (Save All) or from the command line. The Data Bundle contains the workflow input and output values, intermediate values, a provenance trace and a copy of the executed workflow definition.

Structure of exported provenance

The .bundle.zip file is a RO bundle, which species a structured ZIP file with a manifest (.ro/manifest.json). 

Mime type: 

application/vnd.wf4ever.robundle+zip

File extension:

.bundle.zip

You can explore the bundle by unzipping it or browse it with a program like 7-Zip.

The Taverna-PROV source code includes an example bundle and unzipped bundle as a folder. This data bundle has been saved after running a simple hello world workflow.

The remaining text of this section describes the content of the RO bundle, as if it was unpacked to a folder. Note that many programming frameworks include support for working with ZIP files, and so complete unpacking might not be necessary for your application. For Java, the Data bundle API gives a programmatic way to inspect and generate data bundles.

Inputs and outputs

The folders inputs/ and outputs/ contain files and folders corresponding to the input and output values of the executed workflow. Ports with multiple values are stored as a folder with numbered outputs, starting from 0. Values representing errors have extension .err, other values have an extension guessed by inspecting the value structure, e.g. .png. External references have the extension .url - these files can often be opened as "Internet shortcut" or similar, depending on your operating system.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls
inputs intermediates mimetype outputs workflow.wfbundle workflowrun.prov.ttl
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls outputs
greeting.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat outputs/greeting.txt
Hello, John Doe

Workflow run provenance

The file workflowrun.prov.ttl contains the PROV-O export of the workflow run provenance (including nested workflows) in RDF Turtle format.

This log details every intermediate processor invocation in the workflow execution, and relates them to inputs, outputs and intermediate values.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat workflowrun.prov.ttl | head -n 40 | tail -n 8
<#taverna-prov-export>
rdf:type prov:Activity ;
prov:startedAtTime "2013-11-22T14:01:02.436Z"^^xsd:dateTime ;
prov:qualifiedCommunication _:b1 ;
prov:endedAtTime "2013-11-22T14:01:03.223Z"^^xsd:dateTime ;
rdfs:label "taverna-prov export of workflow run provenance"@en ;
prov:wasInformedBy <http://ns.taverna.org.uk/2011/run/385c794c-ba11-4007-a5b5-502ba8d14263/> ;

See the provenance graph for a complete example. The provenance uses the vocabularies W3C PROV-Owfprov and tavernaprov.

ns.taverna.org.uk URIs

Note that the URIs starting with 

http://ns.taverna.org.uk/2011/run/

http://ns.taverna.org.uk/2011/data/

http://ns.taverna.org.uk/2010/workflowBundle/

are not meant to be clickable (HTTP resolvable) and would currently give 404 Not Found.

The reason for this is that myGrid does not (and will not) store centrally any workflow run information, data values or workflow definitions. It is however still useful that each workflow definition, each workflow run and each produced data value can be uniquely identified, therefore we build these URIs using UUIDs that are generated within Taverna. It is possible that in the future these URIs could redirect to public search results, e.g. on myExperiment.

Intermediate values

Intermediate values are stored in the intermediates/ folder and referenced from workflowrun.prov.ttl

Intermediate value from the example provenance:

Here we see that the bundle file intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt contains the output from the "hello" processor, which was also the input to the "Concatenate_two_strings" processor. Details about processor, ports and parameters can be found in the workflow definition.

Example listing:

c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls intermediates/d5
d588f6ab-122e-4788-ab12-8b6b66a67354.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat intermediates/d5/d58*
Hello,

Note that "small" textual values are also included as cnt:chars in the graph, while the referenced intermediate file within the workflow bundle is always present.

<intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt>
rdf:type cnt:ContentAsText ;
cnt:characterEncoding "UTF-8"^^xsd:string ;
cnt:chars "Hello, "^^xsd:string ;
tavernaprov:byteCount "7"^^xsd:long ;
tavernaprov:sha512 "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"^^xsd:string ;
tavernaprov:sha1 "f52ab57fa51dfa714505294444463ae5a009ae34"^^xsd:string ;
rdf:type tavernaprov:Content .
Workflow definition

The file workflow.wfbundle is a copy of the executed workflow in SCUFL2 workflow bundle format. This is the format which will be used by The file workflow.wfbundle contains the executed workflow in Taverna 3.

You can use the SCUFL2 API to inspect the workflow definition in detail.

The file .ro/annotations/workflow.wfdesc.ttl contains the abstract structure (but not all the implementation details) of the executed workflow, in RDF Turtle according to the wfdesc ontology.

Taverna 3 Data bundle

Taverna 3 uses the same Data Bundle format as Taverna-PROV plugin. Currently there are some differences due to the two different implementations for capturing provenance.

Taverna 3 does not yet export provenance trace to workflowrun.prov.ttl.

Workflow report (workflowrun.json)

Taverna 3 introduces a new resource in the data bundle, workflowrun.json which is a more Taverna-centric and it mirrors the actual execution state while running a workflow. This example shows excerpt of a workflow run report (See also the full workflowrun.json):

 

Structure (optional means the property might not be present. Properties marked final should be present after the workflow has finished:

  • workflow report (top-level JSON Object)
    • subject the URI identifying the executed workflow, as identified within the SCUFL2 workflow.wfbundle
    • state of the last workflow run; one of CREATED,  RUNNINGCOMPLETEDCANCELLEDFAILED
    • createdDate Date/time (in ISO 8601 dateTime format) of creation of the workflow report, e.g. when execution of the top-level workflow was started. 

    • startedDate (final) Date/time this workflow initially executed
    • pausedDate (optional) Date/time this workflow last entered the PAUSED state
    • pausedDates (optional) A chronological JSON list of Date/times of each time a workflow has entered the PAUSED state
    • resumedDate (optional) Date/time this workflow last resumed from the PAUSED state
    • resumedDates (optional) A chronological JSON List of Date/times of each time a workflow has resumed from the PAUSED state.
    • cancelledDate (optional) Date/time this workflow entered the CANCELLED state
    • failedDate (optional) Date/time this workflow entered the FAILED state
    • completedDate (optional) Date/time this workflow entered the COMPLETED state
    • invocations JSON List of workflow invocations. For the top-level workflow, this list always contain only 1 item which mirrors the information above.
      • id A identifier for this invocation, unique within this workflow report
      • parent (optional) The identifier of the parent invocation. When this invocation was a nested workflow run, this will be the identifier of the corresponding activity invocation within the parent workflow
      • name A name for this invocation, unique within this list of invocations
      • index (optional) List of JSON integers, indicating the iteration index within the executed workflow
      • state of this workflow invocation; one of CREATED,  RUNNINGCOMPLETEDCANCELLEDFAILED
      • startedDate Date/time when this invocation started. 
      • completedDate (final) Date/time when this invocation ended. 
      • inputs A JSON Object of the input port values. The keys are port names, e.g. "name", the values are relative URIs referring to resources within the Data Bundle, eg. "/inputs/name.txt"
      • outputs A JSON Object of the output port values. The keys are port names, e.g. "greeting", the values are relative URIs referring to resources within the Data Bundle, eg. "/outputs/greeting.txt"
    • processorReportsA list of processor reports, one per processor in the current workflow
      • subject the URI identifying the executed processor, as identified within the SCUFL2 workflow.wfbundle
      • [remaining properties as in workflow report: state, createdDate, pausedDate, pausedDates, resumedDate, desumedDates, cancelledDate, failedDate, completedDate]
      • invocations 
      • activityReports

Labels
  • None