These pages are outdated.
Users are recommended to install and use the updated Taverna-PROV plugin which produces PROV-O traces and includes the data values. The Taverna-PROV traces are more complete, more are "correct" and addresses many of the known issues in OPM/Janus.
This page describes the provenance query language through examples.
The provenance component supports a simple query language on provenance. The general architecture is the following:
In practice, the query engine accepts a xML-formatted query and responds with three types of query answers, which are documented separately:
- a Java structure (native provenance)
- an OPM graph
- an RDF graph, which complies with the Janus ontology.
This document is focused on the structure of the input XML query. The query language is based on the principle that the provenance of a data value V consists of all the values that appear on any path in the provenance graph that leads to V. Not all values V are interesting, though, so the user should be able to target which of the potentially very many data values that appear in the provenance graph. Furthermore, users should also be able to focus on only some of the nodes in a path, represented by workflow processors, that leads to any of the selected V values. Thus, the query language accounts for the specification of two elements:
- target: the set of data values V (either atomic, or whole collections, or elements within nested collections) for which provenance is sought
- focus: the set of workflow processors whose output (and, optionally, also input) data values should be part of the provenance report if they happen to fall into the paths to any of the selected values
Additionally, the language allows for the specification of the query scope, i.e., the set of workflow runs on which provenance analysis is performed.
The language is designed according to the principle that the specification of each of the elements can range from the very detailed, to completely missing. In this spectrum, suitable defaults are applied whenever part of the specification is missing. Following this principle, we describe the language using a sequence of increasingly complex example queries.
This query is essentially empty. The defaults are as follows:
- scope: use the latest run that was recorded in the provenance database
- target: includes all the values bound to the output ports of the top-level workflow
- focus: the input ports of the top-level workflow
Let us consider an example workflow:
Executing Q1 on one run of this workflow will return all the dependencies of each value bound to the atlasGraphic output port of the workflow, on each of the values bound to the input ports of the workflow, namely the values bound to anatomyInput.