Skip to end of metadata
Go to start of metadata

Highlights from the Open Provenance Model (OPM) Provenance Challenge 3 meeting (PC3), Amsterdam, 10-11 June 2009

The purpose of the challenge is to test the use of the OPM spec as a way to exchange process provenance information across different workflow models. In particular, each team had to perform the following tasks:

  1. Implement one challenge workflow (given)
  2. Produce and export OPM from one run of the workflow
  3. Answer a set of core (+optional) provenance queries
  4. Import and consume OPM that was produced by other teams

Day I: reports from challenge teams (16 teams), listed here

There was a broad variety of results, with virtually no team reporting complete success on importing other teams' graphs and answering queries on them, but all of them (including us, UoM) reporting some partial success.

A link to my presentation at the meeting is Here

The challenge was a success, in that the reports helped generate a list of outstanding issues, current shortcomings and new requirements for OPM. A partial list of these follows here.

Some of these issues were then addressed directly during day II as indicated in the list.

Governance model for OPM

Perhaps most importantly, a governance model for further development of the OPM spec was agreed upon, in the form of an informal (i.e., no formal membership) and open collaboration within a community. The main points are:

  • involvement in the development process is manifested simply by registration to the challenge 3 wiki
  • anyone who has expressed involvement in this way has the right to vote on proposals
  • any participant can present proposals for changes and extensions to OPM core. These are subject to vote and are approved by a "qualified majority" (normally two thirds of voters, to be revised following an initial trial period).
  • extensions in the form of new profiles can also be put forward for discussion. As profiles do not affect OPM core, discussion and voting here is less formal, and there is the expectation that interested groups will be able to converge on an agreed upon document in this case.

The option to form a W3C incubator group was also discussed, but a decision was postponed while the level of financial commitment required is clarified – not all the participating organizations are currently W3C members.

Provenance challenge 4: "connect my provenance to yours" into a whole OPM provenance graph.

The idea is to describe a scenario where different groups collaborate indirectly on some project through data sharing, i.e., group B can pick up data products generated by a process executed by group A, and in turn produces results that are used by a process in group C, etc. The groups need not even be directly aware of this interaction. In this form of collaboration, we are interested in joining up the provenance graphs produced, independently, by each of the processes to provide a global view of provenance through several stages of a data product's lifetime. The challenge is to use OPM as the common model, by which an end-to-end view of provenance can be (automatically) created.

Main points for discussion and OPM issues.

Points that are directly relevant to the successful completion of the challenge 3 efforts:

  • annotations on OPM graphs: agreement reached (see wiki [ ] for details)
  • ways to refer to artifact values: there is a need for a common identifier scheme, and artifacts must be URLs that can
    be resolved by a service.
  • typing: a hierarchical type system (i.e., user-defined types and sub-types) can be used for artifacts and processes. This is not entire clear to me (i.e., I thought we were referring to simple types only here). Types are annotations on graph nodes (to be clarified)
  • nesting and more generally relationships of accounts: required but not yet fully addressed
  • the nature of "wasDerivedFrom" edges: asserted vs inferred. Discussed but no final agreement reached.

More general issues, which for the most part still await a discussion:

  • scope of OPM: there is a suggestion that OPM is as much about describing the flow of information, as it is about describing causality.
  • query language for OPM.
  • naming schemes and values versioning (reminds me of LSIDs)
  • OPM and persistent data (i.e., data in DBs)
  • None