Skip to end of metadata
Go to start of metadata

This page is for gathering Taverna Server requirements and also for noting useful contacts.  It is under development.

First Draft of Architecture

Presentation from 14 April 2010
Taverna Server Architecture

Key:

Blue refers to Taverna Software Components.
Green refers to hardware (a strongly-firewalled cluster).
Purple refers to external services.
Red refers to storage.
Light blue/cyan refers to Taverna Workflows.
White refers to small components that are induced by the rest of the diagram.
Grey is the firewall/physically-separated network that stops the back-end from communicating directly with the outside world.
Blue Arrows are messages/connections.
Red Arrows are conceptual information flows; actual information goes along message channels.

Architecture Notes

Generally, the concept is that the client (workbench, portal, ...) contacts the server and asks to make a workflow instance. The workflow instance is created (in a modified version of the command line workflow engine) in a Stopped state; the instance will be running potentially in another user account or virtual machine (or even on a private cluster node) which guarantees strong isolation. The client can then upload any input data so that it is available to the workflow, before causing the workflow to transition to the Running state. The workflow then runs, contacting the external services as needed (potentially via a network proxy) until it finishes, when it will have created such outputs as it wants. The client will then be able to download the outputs before relinquishing the worker back to the system (the end of their overall session).

Security

Caller needs to provide a security context that will describe:

  • What account to run the workflow in (if that matters)
  • What credentials to use when contacting other services
    • Might need to be given on a case-by-case basis, but a general delegatable identity context (e.g., VOMS) would be better where possible
  • What credentials are required to be able to upload data to or fetch data from the workflow instance.

Note that the processors will need to use the credential system and not have embedded credentials, or they will be wholly un-reusable.

Provenance

Provenance to go into a database. Location of database is non-trivial; keeping it with the worker has the advantage of being much easier to make robust against network troubles, but putting it on (or close to) the master makes it easier for querying and management (where not using VM images for workers). Provenance system must record reasoning for decisions in placement of service invocations.

Notification

State updates to be written to a "whiteboard"/tuple-space. Observers of that space can listen out for interesting things and react appropriately. Observers should be scriptable (forces Java 6 to allow general scripting). Whiteboard to be (probably) kept in worker as that makes it easier to secure.
Notification Subsystem Architecture
It is expected that a user will use the same (or very similar) basic notification strategies (i.e., plugins) across many workflow runs, and probably many workflows.

Data Transfer

Workflow to be able to read/write the "working directory" on the worker node; WD will not persist past node relinquishment. Data format to be based on (but not repeating the mistakes of) Baclava. Must not require serialized Java objects! Would like to be able to transfer substantial chunks in one go (e.g., a ZIP file that is automatically packed/unpacked). Could to with limiting the size of local data store as a system deployment policy.

Release Schedule

  • Alpha release to selected groups (e.g., HELIO project) in time for a demo; early-mid June 2010 (demo date: about 20 June)
  • Beta release to general developers/users who like early-access code; late July 2010
  • Full release to all; November 2010

Key Prerequisite

T2 Command-line Tools - this will be the core of the workflow execution system.
T2 Input/Output Data Format (Baclava2) - need to get data in and out somehow.(#62)

Planned Features for Alpha

Single Worker. Master and Worker colocated (shared filesystem). Single user account.
Workflow execution based on command-line Taverna. No firewall support. Security only so far as needed for demo.

Execution model/story:

  1. Contact Master, supplying workflow and security context(#63).
  2. Master spawns Worker (in "Initialized" state: #65) and returns address of worker to client.
  3. Client uploads extra files as necessary(#64).
  4. Client starts worker (transition to "Operating": #65).
  5. Worker processes until completion (current state available through some monitoring interface: #65).
  6. On completion, worker is stopped again ("Finished" but not all gone yet: #65).
  7. Client downloads result files from worker(#64).
  8. Client kills worker (which goes away; service interface no longer available at that point: #66).

Notes: Can restrict exactly what processors are used to ones which are "well-behaved"; sandboxing is not a worry here.

Planned Features for Beta

(Note that the time between Alpha and Beta is fairly short.)

  1. Multiple Workers. (No load management. #67)
  2. Ability to separate Workers from Master (even if not actually implemented: #67).
  3. Whiteboard and passive baseline observer so client can see status when connected. (Requires monitoring port/operation. #68)
  4. Provenance core. (#69)

Planned Features for Final

  1. Client access to Worker via Master (#70)
    • Data upload and download through Master. No assumption that Master can see Workers' filesystem area.
    • Monitor via master
  2. Full whiteboard. (#71)
    • At least one demonstration of pushing notifications. Interacts with security context?
  3. Provenance, including support for database management. (#72)
    1. Placing the database with worker or with master.
    2. Download of provenance data to client.
    3. Lifetime management for provenance data.
  4. Multiple types of security context supported. (#73)
  5. Support for Workers using a proxy to access outside world. (#74)

Features for Future Releases

  1. "Connected" operation, e.g., via XMPP
  2. Service selection and discovery according to policy context.
  3. Multiple supported deployment configurations.
Labels
  • None