Skip to end of metadata
Go to start of metadata

Taverna 2.0 state

Note: This page is not about detailed problems with the Taverna 2.0 data storage implementation.

In Taverna 2.0, the workbench (and the enactment engine) registers and retrieves data from a ReferenceService.  When data is registered with a ReferenceService, a T2Reference is returned.  The T2Reference can be mapped to a (supposed) URI.  The actual data storage and retrieval is handled for the ReferenceService by a DAO (data access object?).  There are currently two sets of DAOs (in memory and hibernate), with each set having three separate DAOs to handle error documents, lists and ordinary data.

If provenance capture is turned on, then information about a run of a workflow is captured within a provenance database, including information about the data passed to enactments of processors (sort of equivalent to calls of services).  The data is not stored within the provenance database; instead it stores the URI representation of the T2Reference.  The provenance information for a particular workflow run is organized so that it "knows" which run and which workflow the provenance information is for.  It also holds additional information about the type of data that is referenced.

The provenance database is not the same database as that which may be used to store data values.

Issues:

  1. The T2Reference (when represented as a URI) is only unique within a particular running Taverna.  When a Taverna is restarted, an equivalent T2Reference can be used to refer to different data.  Two different running Tavernas may also use equivalent T2References.
  2. The URIs for the T2Reference do not comply with IANA formatting norms
  3. If in memory storage is used, then when the current Taverna is closed, the data is also deleted.  No warning is given to the user that the data will be lost.
  4. In the absence of the currently running Taverna, there is no way to easily access data, even if it has been stored via hibernate.
  5. It is impossible to load a previous run and examine it.
  6. In order to retrieve data from the ReferenceService it is necessary to know a data type to ask for.  (This is hacked for workflow outputs in Taverna 2.0 Workbench.  The provenance keeps the information.)  This requirement could prevent Taverna-independent browsing of data and/or provenance.

Prerequisites for future work:

  1. The string representation of a T2Reference (currently a supposed URI) must be unique
  2. The URIs (if they are kept) for the T2Reference must comply with IANA norms
  3. Data must not be lost when Taverna is closed (except by explicit user choice).  If the data for a run is not saved, then the data for the provenance (if generated) of that run can, and probably should, be deleted.  Data saving could be the default or by the user explicitly archiving the data.
  4. The string representation of a T2Reference (currently a URI) should preferably be of a form so that the workflow and the run can be readily determined.  (This does not necessarily imply they are within the representation.)
  5. The data for a run must be accessible in the absence of a running Taverna.
    • the ReferenceService could be available stand-alone outside Taverna
    • the data could be kept (or archived) in a way that is not dependent upon ReferenceService (LSID?)
  6. What about loading/viewing a previous run?
  7. The preferred/original data type must be determinable.  Possibly from the T2Reference.
    • It could be stored with the data
    • It could be part of the URI

Note that 4-7 are not immediate killers for Taverna 2.1 but need to be sorted out preferably as soon as possible.

Immediate work:

  1. Describe some use cases for provenance and data use, both directly relating to Taverna and also in a wider context - needs non-myGrid/Manchester input
  2. Guided by (1), decide how T2References should be represented ( e.g. Taverna-specific URI, just string, LSID, HTTP address, something else) - needs non-myGrid/Manchester input
  3. Decide exact syntax of T2Reference representation - needs non-myGrid/Manchester input
  4. Alter Taverna (including provenance) code for new T2Reference representation
  5. Decide way of saving run data
  6. Implement way of saving run data

1-4 can possibly be done at the same time as 5-6.

Labels
  • None
  1. 2009-03-06

    Issues/Prerequisites for future work: T2References's string representation that is currently saved as part of provenance cannot be (easily) converted to the T2References object.

    Something to think about - are we going to have a separate (third) lightweight app for viewing old results (in addition to the editor and player) that will just have the reference service and provenance. Or users would be able to do this in the player?

    1. 2009-03-11

      Why not in the player? this viewer can be a separate component but bundling it as part of existing user envs reduces confusion, I think

  2. 2009-03-11

    on Issues:
    agree on all points, please bear in mind that provenance always resolves a T2Reference URI using a run ID (which is a UUID).

    on Prerequisites:
    3. – provenance traces that refer to non-persistent data are indeed useless and must be deleted. This can be done cleanly at the granularity of an entire run (keep/delete a run).
    Does T2 make any effort to share data across runs? My guess is not but this would complicate the deletion logic

    6 – this should be driven by the provenance DB. It contains the workflow structure (can be enriched if needed) but not enough info to run the same workflow again – i.e., all activities associated to processors, etc. There is no problem storing the actual serialised model, however, so that it can be executed again. But provenance at the moment is not a repository of workflows, it is a repository of past workflow runs.
    Xin will hopefully (!!) help with this part

    on Immediate work:
    2- not sure 2 is driven by (1). I expect use cases to be independent of the naming scheme for datarefs, which I see as internal to the architecture.

  3. 2009-03-11

    Quite an extensive set of use-cases were gathered and recorded on the old wiki here:

    http://www.mygrid.org.uk/wiki/Mygrid/ProvenancePhase1Documents

  4. 2009-03-11

    Exporting to OPM is also a feature in the TODO list. Not top priority for our users, however. It is mostly a feature to keep us in touch with the international workflow/provenance community. It's unlikely it will become important any time soon.