Skip to end of metadata
Go to start of metadata

This is a page for notes about Prototyping Provenance.

Known implementation issues 

Provenance Layer:

Incoming - going down the stack:


  • owningProcess, e.g: facade0:dataflow0:String_Constant
  • Job - contains the data reference for each and all input ports
  • list of potential activities
  • queue.index tells us whether is an iteration. [] means no iteration, [n] means an iteration has started
    • subsequent iterations call eventAdded with the owningProcess - data for the next iteration is added to the queue.
  • What we lack:
    • information about the port
    • difficult to get the next iteration data items from the queue.

Outgoing - coming back up the stack:


  • owningProcess
  • iteration index, or [] if no iteration
  • data reference for each output port in a Map
  • whether streaming was used.


  • cause (TargetError)
  • failed Activity
  • failure type (??)
  • owningProcess
  • message
  • iteration index, or []

Invoke Layer:


  • owningProcess
  • input & output port info, including annotation when they exist
  • iteration index, or []
  • data references for each port
  • activityInvoked


  • owningProcess
  • result data reference for each port
  • iteration index, or []
  • all the error info received by receiveError

Data Gathered in Invoke Layer

  • parsed owningProcess: facadeID, dataflowInstanceID, ProcessorID
  • iteration id
    • Activity ID
      • for each input port:
        • port ID
        • input data reference - (resolved to inner references if a list in the Provenance Layer)
      • for each output port:
        • port ID
        • output data reference - (resolve to inner references if a list in the Provenance Layer)
      • any error information:
        • cause
        • message
        • error type

Class Diagram (rough sketch)


Example generated XML

Example XML for a simple iteration.

  • None
  1. 2008-03-10

    Since the ProvenanceConnector travels up and down the stack with the InvocationContext we could populate this with ProvenanceCollection objects.  Starting at the Invoke layer receiveJobmethod.  Get the context and grab its List of ProvenanceCollection objects and add a new one with the Activities, iteration index, inputs and owning process.  This owning process should uniquely identify the Workflow facade run and would let us grab this object later for poupulating with more provenance.  Not exactly sure yet what should be inside this object and whether it needs the data inside to be ordered by when things happened.  For example if retry events happen then what should it store etc....

    Also, need to decide what layer should collect what provenance.  What would be the job of a provenance layer if individual layers are connecting information as things happen?