Taverna 2.4 with the Taverna-PROV plugin 2.1.5 or later can export Taverna workflow runs as a Data Bundle. The bundle can be saved from within the Workbench results (Save All) or from the command line. The Data Bundle contains the workflow input and output values, intermediate values, a provenance trace and a copy of the executed workflow definition.
The .bundle.zip
file is a RO bundle, which species a structured ZIP file with a manifest (.ro/manifest.json
).
Mime type:
application/vnd.wf4ever.robundle+zip
File extension:
.bundle.zip
An RO Bundle is effectively a structured ZIP file, with a JSON-LD manifest that follows the Research Object data model, adding provisions for annotations, provenance and annotations of resources. These resources can be embedded within the ZIP file or aggregated from external sources by using URL references.
You can explore the bundle by unzipping it or browse it with a program like 7-Zip.
The Taverna-PROV source code includes an example bundle and unzipped bundle as a folder. This data bundle has been saved after running a simple hello world workflow.
The remaining text of this section describes the content of the RO bundle, as if it was unpacked to a folder. Note that many programming frameworks include support for working with ZIP files, and so complete unpacking might not be necessary for your application. For Java, the Data bundle API gives a programmatic way to inspect and generate data bundles.
The folders inputs/
and outputs/
contain files and folders corresponding to the input and output values of the executed workflow. Ports with multiple values are stored as a folder with numbered outputs, starting from 0
. Values representing errors have extension .err
, other values have an extension guessed by inspecting the value structure, e.g. .png
. External references have the extension .url
- these files can often be opened as "Internet shortcut" or similar, depending on your operating system.
Example listing:
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls
inputs intermediates mimetype outputs workflow.wfbundle workflowrun.prov.ttl
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls outputs
greeting.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat outputs/greeting.txt
Hello, John Doe
The file workflowrun.prov.ttl
contains the PROV-O export of the workflow run provenance (including nested workflows) in RDF Turtle format.
This log details every intermediate processor invocation in the workflow execution, and relates them to inputs, outputs and intermediate values.
Example listing:
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat workflowrun.prov.ttl | head -n 40 | tail -n 8
<#taverna-prov-export>
rdf:type prov:Activity ;
prov:startedAtTime "2013-11-22T14:01:02.436Z"^^xsd:dateTime ;
prov:qualifiedCommunication _:b1 ;
prov:endedAtTime "2013-11-22T14:01:03.223Z"^^xsd:dateTime ;
rdfs:label "taverna-prov export of workflow run provenance"@en ;
prov:wasInformedBy <http://ns.taverna.org.uk/2011/run/385c794c-ba11-4007-a5b5-502ba8d14263/> ;
See the provenance graph for a complete example. The provenance uses the vocabularies W3C PROV-O, wfprov and tavernaprov.
Note that the URIs starting with http://ns.taverna.org.uk/2011/run/ http://ns.taverna.org.uk/2011/data/ http://ns.taverna.org.uk/2010/workflowBundle/ are not meant to be clickable (HTTP resolvable) and would currently give 404 Not Found. The reason for this is that myGrid does not (and will not) store centrally any workflow run information, data values or workflow definitions. It is however still useful that each workflow definition, each workflow run and each produced data value can be uniquely identified, therefore we build these URIs using UUIDs that are generated within Taverna. It is possible that in the future these URIs could redirect to public search results, e.g. on myExperiment. |
Intermediate values
Intermediate values are stored in the intermediates/
folder and referenced from workflowrun.prov.ttl
Intermediate value from the example provenance:
<http://ns.taverna.org.uk/2011/data/385c794c-ba11-4007-a5b5-502ba8d14263/ref/d588f6ab-122e-4788-ab12-8b6b66a67354> tavernaprov:content <intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt> ; wfprov:describedByParameter <http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/processor/Concatenate_two_strings/in/string1> ; wfprov:describedByParameter <http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/processor/hello/out/value> ; wfprov:wasOutputFrom <http://ns.taverna.org.uk/2011/run/385c794c-ba11-4007-a5b5-502ba8d14263/process/bbaedc02-896f-491e-88bc-8dd350fcc73b/> . |
Here we see that the bundle file intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt
contains the output from the "hello" processor, which was also the input to the "Concatenate_two_strings" processor. Details about processor, ports and parameters can be found in the workflow definition.
Example listing:
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>ls intermediates/d5
d588f6ab-122e-4788-ab12-8b6b66a67354.txt
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat intermediates/d5/d58*
Hello,
Note that "small" textual values are also included as cnt:chars
in the graph, while the referenced intermediate file within the workflow bundle is always present.
<intermediates/d5/d588f6ab-122e-4788-ab12-8b6b66a67354.txt>
rdf:type cnt:ContentAsText ;
cnt:characterEncoding "UTF-8"^^xsd:string ;
cnt:chars "Hello, "^^xsd:string ;
tavernaprov:byteCount "7"^^xsd:long ;
tavernaprov:sha512 "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"^^xsd:string ;
tavernaprov:sha1 "f52ab57fa51dfa714505294444463ae5a009ae34"^^xsd:string ;
rdf:type tavernaprov:Content .
Workflow definition
The file workflow.wfbundle
is a copy of the executed workflow in SCUFL2 workflow bundle format. This is the format which will be used by The file workflow.wfbundle
contains the executed workflow in Taverna 3.
You can use the SCUFL2 API to inspect the workflow definition in detail.
The file .ro/annotations/workflow.wfdesc.ttl
contains the abstract structure (but not all the implementation details) of the executed workflow, in RDF Turtle according to the wfdesc ontology.
c:\Users\stain\workspace\taverna-prov\example\helloanyone.bundle>cat .ro/annotations/workflow.wfdesc.ttl | head -n 20 @base <http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix prov: <http://www.w3.org/ns/prov#> . @prefix wfdesc: <http://purl.org/wf4ever/wfdesc#> . @prefix wf4ever: <http://purl.org/wf4ever/wf4ever#> . @prefix roterms: <http://purl.org/wf4ever/roterms#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix comp: <http://purl.org/DP/components#> . @prefix dep: <http://scape.keep.pt/vocab/dependencies#> . @prefix biocat: <http://biocatalogue.org/attribute/> . @prefix : <#> . <processor/Concatenate_two_strings/> a wfdesc:Process , wfdesc:Description , owl:Thing , wf4ever:BeanshellScript ; rdfs:label "Concatenate_two_strings" ; wfdesc:hasInput <processor/Concatenate_two_strings/in/string1> , <processor/Concatenate_two_strings/in/string2> ; wfdesc:hasOutput <processor/Concatenate_two_strings/out/output> ; wf4ever:script "output = string1 + string2;" . |
Taverna 3 uses the same Data Bundle format as Taverna-PROV plugin. Currently there are some differences due to the two different implementations for capturing provenance.
Taverna 3 adds the workflow report workflowrun.json
(see below).
Taverna 3 does not yet export provenance trace to workflowrun.prov.ttl
, (Taverna-PROV issue 3) but as the workflow run report captures the same/similar information, the content of the provenance trace equivalent to the Taverna 2 output can in theory be generated from the workflow report. (T3-829, T3-970).
Taverna 3 can also open an existing data bundle and display it in the Results perspective. Opening existing data bundles created with the Taverna PROV plugin is currently not supported in Taverna 3, as the implementation assumes the workflow report is present in the bundle (T3-971).
Taverna 3 introduces a new resource in the data bundle, workflowrun.json
which is a more Taverna-centric and it mirrors the actual execution state while running a workflow. This example shows excerpt of a workflow run report (See also the full workflowrun.json):
{ "subject" : "http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/", "state" : "COMPLETED", "createdDate" : "2013-11-27T19:04:02.016+0000", "startedDate" : "2013-11-27T19:04:02.023+0000", "completedDate" : "2013-11-27T19:04:02.054+0000", "invocations" : [ { "id" : "Hello_Anyone", "state" : "COMPLETED", "inputs" : { "name" : "/inputs/name" }, "outputs" : { "greeting" : "/outputs/greeting" } } ], "processorReports" : [ { "subject" : "http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/processor/Concatenate_two_strings/", "state" : "COMPLETED", "completedDate" : "2013-11-27T19:04:02.054+0000", "jobsCompleted" : 1, "invocations" : [ { "id" : "Hello_Anyone/Concatenate_two_strings", "parent" : "Hello_Anyone", "name" : "Concatenate_two_strings", "state" : "COMPLETED", "startedDate" : "2013-11-27T19:04:02.033+0000", "inputs" : { "string1" : "/intermediates/64/64140288-cf8b-4a47-99ae-b76cb4c531ad", "string2" : "/intermediates/3d/3d548b58-ec18-44ab-aeb6-7d9d5999ad21" }, "outputs" : { "output" : "/intermediates/92/92721f0a-4fac-4aba-9a09-b2651f303577" } } ], "activityReports" : [ { "subject" : "http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/profile/taverna-2.4.0/activity/Concatenate_two_strings/", "state" : "COMPLETED", "completedDate" : "2013-11-27T19:04:02.049+0000", "invocations" : [ { "id" : "Hello_Anyone/Concatenate_two_strings/invocation80", /* .. */ "inputs" : { "string1" : "/intermediates/64/64140288-cf8b-4a47-99ae-b76cb4c531ad", "string2" : "/intermediates/3d/3d548b58-ec18-44ab-aeb6-7d9d5999ad21" }, "outputs" : { "output" : "/intermediates/92/92721f0a-4fac-4aba-9a09-b2651f303577" } } ] } ] }, { "subject" : "http://ns.taverna.org.uk/2010/workflowBundle/01348671-5aaa-4cc2-84cc-477329b70b0d/workflow/Hello_Anyone/processor/hello/", "state" : "COMPLETED" /* .... */ } ] } |
The structure of the |
JSON structure, where optional means the property might not be present, and properties marked final should be present after the workflow has finished:
workflow report (top-level JSON Object)
subject
the URI identifying the executed workflow, as identified within the SCUFL2 workflow.wfbundle
state
of the last workflow run; one of CREATED
, RUNNING
, COMPLETED
, CANCELLED
, FAILED
createdDate
Date/time (in ISO 8601 dateTime format) of creation of the workflow report, e.g. when execution of the top-level workflow was started.
startedDate
(final) Date/time this workflow initially executedpausedDate
(optional) Date/time this workflow last entered the PAUSED
statepausedDates
(optional) A chronological JSON list of Date/times of each time a workflow has entered the PAUSED
stateresumedDate
(optional) Date/time this workflow last resumed from the PAUSED
stateresumedDates
(optional) A chronological JSON List of Date/times of each time a workflow has resumed from the PAUSED
state.cancelledDate
(optional) Date/time this workflow entered the CANCELLED
statefailedDate
(optional) Date/time this workflow entered the FAILED
state. Note that a workflow does not normally fail in this way even though some of its outputs could be errors. A FAILED
state indicates a workflow execution problem within the Taverna Platform.completedDate
(optional) Date/time this workflow entered the COMPLETED
stateinvocations
JSON List of workflow invocations. For the top-level workflow, this list always contain only 1 item which mirrors the information above.id
An identifier for this workflow invocation, unique within this workflow report. parent
(optional) The identifier (id
) of the parent activity invocation. This property is only provided if this invocation was a nested workflow run, in which case it will be the identifier of the corresponding activity invocation within the parent workflow. name
A name for this invocation, unique within this list of invocations. By convention the invocation of the top-level workflow has the same name as the Workflow within the Workflow Bundle, e.g. "Hello_anyone"
, but this is subject to change. state
of this workflow invocation; one of CREATED
, RUNNING
, COMPLETED
, CANCELLED
, FAILED
startedDate
Date/time when this invocation started. completedDate
(final) Date/time when this invocation ended. inputs
A JSON Object of the input port values. The keys are port names, e.g. "name"
, the values are relative URI references to resources within the Data Bundle, eg. "/inputs/name.txt".
Workflow inputs will normally be identified with the same relative URI reference where they are used as processor inputs. outputs
A JSON Object of the output port values. The keys are port names, e.g. "greeting"
, the values are relative URI references to resources within the Data Bundle, eg. "/outputs/greeting.txt"
or "intermediates/16/160b64ba-c2b4-435b-8699-465e2d190994"
. processorReports
A list of processor reports, one per processor in the current workflowsubject
the URI identifying the executed processor, as identified within the SCUFL2 workflow.wfbundle
state, createdDate, pausedDate, pausedDates, resumedDate, resumedDates, cancelledDate, failedDate, completedDate
invocations
JSON List of processor invocations. The content of this list corresponds to iterations over this processor (or its containing nested workflow), and so might contain 0, 1 or more invocations depending on the workflow structure and execution.id
An identifier for this processor invocation, unique within this workflow report. By convention this identifier is composed by concatination of the parent
, "/"
and the name
(e.g. "Hello_Anyone/Concatenate_two_strings"
), but this is subject to change.parent
The identifier (id
) of the corresponding parent workflow invocation. When this processor is within a nested workflow, this will identify the particular invocation of the nested workflow. name
A name for this invocation, unique within this list of invocations. Note that although this name might in some cases match the actual processor name, this will not be the case when there are iterations over this processor.index
(optional) List of JSON integers, indicating the iteration index within the executed workflow invocation (parent
), e.g. [0]
(first position within a single list) or [3,7]
(fourth position within outer list and eight position within inner list)state
, startedDate, completedDate
inputs
A JSON Object of the input port values. The keys are port names, e.g. "name"
, the values are relative URI references to resources within the Data Bundle, eg. "intermediates/16/160b64ba-c2b4-435b-8699-465e2d190994"
. outputs
A JSON Object of the output port values. The keys are port names, e.g. "greeting"
, the values are relative URI references to resources within the Data Bundle, eg. "intermediates/16/160b64ba-c2b4-435b-8699-465e2d190994"
. activityReports
JSON List of activity invocations. This list usually contains only 1 item, but might contain several reports if the workflow uses Looping, Retry or Failover.subject
the URI identifying the executed activity, as identified within the SCUFL2 workflow.wfbundle
state, createdDate, pausedDate, pausedDates, resumedDate, resumedDates, cancelledDate, failedDate, completedDate
invocations
JSON List of activity invocations. The content of this list corresponds to each invocation of the activity, and so may contain multiple invocations due to nested workflow invocations, processor iterations, looping, retry or failover.nestedWorkflowReport
(optional) A nested workflow report, if this activity is a nested workflow. invocations
as in top-level workflow, with their parent
matching the corresponding activity invocation id
The Data Bundle can be processed using normal ZIP support, such as with the command line Info-ZIP tool unzip
, built-in operating system support or third-party programs like 7-zip.
Additionally, programming languages will typically have API support or libraries for working with ZIP files, such as the Java 7 zipfs and Apache Commons Compress API, or Ruby's rubyzip gem.
In order to facilitate tighter integration with the Data Bundle formats, we have developed the Java Data Bundle API, which provide higher-level access to reading, creating and modifying data bundles. Example:
try (Bundle dataBundle = DataBundles.openBundle(zip)) { Path outputs = DataBundles.getOutputs(dataBundle); Path greeting = DataBundles.getPort(outputs, "greeting"); System.out.println(DataBundles.getStringValue(greeting)); } |
The above code will print out the content of outputs/greeting.txt
. Regular Java 7 NIO Files operations can also be used with these Path
s, for instance for binary content or larger values that won't fit in memory.
The Data Bundle API also ties into the SCUFL2 API to inspect the executed workflow definition:
WorkflowBundle wfBundle = DataBundles.getWorkflowBundle(dataBundle); for (Processor processor : wfBundle.getMainWorkflow().getProcessors() { System.out.println("Processor " + processor); } |
In addition, you may retrieve the workflow run report as a Jackson JsonNode.
JsonNode runReport = DataBundles.getWorkflowRunReport(dataBundle); for (JsonNode procReport : runReport.path("processorReports")) { URI subject = URI.create(procReport.path("subject").asText()); for (JsonNode invocation: procReport.path("invocations")) { System.out.println("Invocation started": + invocation.path("startedDate").asText()); } } |
Looking up the subject
to the corresponding SCUFL2 Processor using URITools and Scufl2Tools.
URITools uriTools = new URITools(); Processor proc = (Processor)uriTools.resolveBean(wfBundle, subject); System.out.println("Execution of " + proc); Scufl2Tools scufl2Tools = new Scufl2Tools(); Configuration activityConfig = scufl2Tools .configurationForActivityBoundToProcessor(proc); System.out.println("Activity: " + activityConfig.getJsonAsString()); |
And printing the intermediate outputs of a particular processor invocation
by looking up its bundle Path
:
for (Port outputPort : proc.getOutputs()) { System.out.println("Output " + outputPort); String output = invocation.path(outputPort.getName()).asText(); Path outputPath = dataBundle.getRoot().resolve(output); System.out.println("Value: " + DataBundles.getStringValue(outputPath)); } |