Skip to end of metadata
Go to start of metadata

PROV export

Export of OPM and Janus is being replaced by export to PROV.

Users are recommended to install and use the updated Taverna-PROV plugin which produces PROV-O traces and includes the data values. The Taverna-PROV traces are more complete, more are "correct" and addresses many of the known issues in OPM/Janus detailed on this page.

Documentation

See official documentation on Provenance export to OPM and Janus

Issues with current OPM format

OPM export is currently experimental. Please contact myGrid for any comments and bug reports.

These are the known issues:

  • Values are not shown, only Taverna's internal t2 references. T2-1866
  • Workflow outputs (here O1 and O2) are not shown. Could also be linked to actual files stored in output folder. T2-1937@Jira
  • OPM does not include provenance on non-output values T2-1934
  • Nested workflows are not exposed (they appear as a single processor T2-1935
  • http://taverna.opm.org/iteration is an unauthorised URI namespace - FIXED - uses http://ns.taverna.org.uk/2011/provenance/opm/iteraton
  • The Process URI is for the definition (and hence the same for multiple wf runs) - but should be identifying the process execution according to OPM T2-1938@Jira
  • The Role should probably rather point to the port the value was generated from or used in  T2-1938@Jira
  • The 
  • http://ns.taverna.org.uk/2011/provenance/opm/iteraton property is attached to the Process (which is general across workflow runs) - but because of the above a Process has many iteration values T2-1950@Jira
  • Start/stop execution times are not captured for iterations, processors or workflow. T2-1936
  • The URIs for iteration roles are not be unique if a nested workflow is included twice as two different processors or in two different nested workflows. (Not relevant until nested workflows are exposed) T2-1939@Jira
  • Collections and iterations are not explicitly represented - only individual data values T2-1951@Jira.
  • A (currently theoretical) service that returns an existing t2 reference unchanged (say "Pick first element" local worker) will appear as having 'generated' that artifact as well as the upstream service T2-1952@Jira.
  • No information of the structure of the workflow is exposed - http://ns.taverna.org.uk/2010/workflow/e024ea01-89fd-4f93-b6ae-ad6da4a6df08/ is not mentioned - only individual processors of the workflow like http://ns.taverna.org.uk/2010/workflow/e024ea01-89fd-4f93-b6ae-ad6da4a6df08/processor/P1/. Scufl2 tools can provide this information - but one would have to parse the URIs to retrieve the identifier of the workflow that was run T2-1953@Jira.

Known issues in Janus

  • has_processor_type is just a Java class name and does not include important configuration such as the WSDL location and method T2-1954@Jira
  • Use of rdfs:comment for values and names - should have separate properties :name and perhaps :stringValue T2-1955@Jira
  • has_port_order a bit vague - is this related to dot/cross products? (If so - it's not specific enough without the iteration strategy) T2-1956@Jira
  • has_value_binding links directly from (static) port to (runtime) value binding - should include the execution (workflow run) and the iteration. Given two provenance traces it is impossible to separate which bindings occurred in which run. (without parsing the data reference URI and guessing that its namespace part equals the run id) T2-1957@Jira
  • is_top_level true a bit vague - workflow seems to be modeled as a fake processor in itself with a name/URI which might be in conflict with processors T2-1958@Jira
  • are even large values and binaries included as rdfs:comments? T2-1959@Jira
  • links_from missing from workflow input ports T2-1960@Jira
  • ports are not declared as input or output ports (except outputs won't have links_from?) T2-1961@Jira
  • Start/stop execution times are not captured for iterations, processors or workflow T2-1962@Jira.
  • The URIs for iteration roles are not be unique if a nested workflow is included twice as two different processors or in two different nested workflows. (Not relevant until nested workflows are exposed) T2-1963@Jira
  • Collections and iterations are not explicitly represented - only individual data values T2-1964@Jira.
  • A (currently theoretical) service that returns an existing t2 reference unchanged (say "Pick first element" local worker) will appear as having 'generated' that artifact as well as the upstream service T2-1965@Jira.
  • Data values are declared as port values - having iterations and port value order. With cross product and reference-outputting services a value will appear at several iteration positions - should probably be a port_value which has a data_value T2-1966.
  • Port 'value bindings' are done at the element of depth 0 items - even if in reality it was a list (collection_structure) T2-1967@Jira
  • Iterations are modeled as a plain-text string T2-1968@Jira
  • Values are declared as members of a collection, but not in which positions T2-1969@Jira
  •  Empty lists do not appear T2-1970@Jira
  • has_iteration can be sometimes both 1 and 1,1 as one of them is for the output port and the other for the input port. Any value is obviously observed at both T2-1971@Jira.
  • Unsure if merges are handled properly (need to test) T2-1972@Jira
  • Where does the Janus ontology live? Seems to be a wiki attachment for now.. http://purl.org/net/taverna/janus does not resolve properly T2-1973@Jira
  • Janus export uses invalid URIs from http://purl.org/taverna/janus# instead of http://purl.org/net/taverna/janus# as declared in janus.owl Fixed T2-1933
Labels
  • None