Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for discussion of the model and XML syntax of SCUFL2 which will replace the current t2flow format.

Compare this with the t2flow XML schema - which has documentation about workflow elements as currently serialized.

Scufl2 has moved to Apache (incubator) 

Information in this section is out of date!

 

Excerpt

SCUFL2 is the proposed new mechanism for specifying Taverna workflows. SCUFL2 defines a model, a workflow bundle file format (.wfbundle), and a Java API for working with workflow structures. SCUFL2 is the workflow language for Taverna 3, and replaces Taverna 2's t2flow format.

Summary

SCUFL2 is the proposed new mechanism for specifying Taverna workflows. SCUFL2 adopts Linked Data technology and preservation methodologies to create a platform-independent workflow language that can be inspected, modified, created and executed.

SCUFL2 comes with a Java API that can be used for programmatic access to read and write SCUFL2 workflow bundles. A workflow bundle is a structured ZIP file with the workflow definitions included as XML documents. Those workflow documents are described by an XML Schema and are also valid RDF/XML. The XML Schema allows tools to read and write SCUFL2 workflow definitions as regular structured XML. The RDF allows RDF-enabled tools to link workflow definitions with external resources.

The workflow structure is defined using an OWL ontology and annotated with URIs so that third parties can form semantic statements about any component of a Scufl2 workflow, for example to state that a particular service produces outputs of a certain type, or that a data link was added by a specific researcher.

Semantic annotations and a manifest for the bundle declare the purpose of, and links between the different components forming a workflow. This allows third parties to extract and append annotations about data and services used by the workflow.

Motivation

The t2flow serialization format suffers from being very close to the Java object model, and contains various items that are simply Java beans serialized using XMLBeans. As the t2flow format is very verbose, it can be difficult to deal with for third party software to do inspection ("Which services does this workflow use?"), modification ("Change all calls to http://broken.com/ to http://fixed.com/") and generation ("Build a custom workflow from a button").

...

We have therefore decided to form a new serialisation format for workflows, called SCUFL2. This format will be accompanied with an UML model, and a primary serialisation format as XML, but also with possible secondary serialisations as JSON and RDF, all following the UML model. This model will also be reflected in a lightweight API, which can deserialize and serialize these formats, in addition to .scufl and .t2flow, but also more easily allow inspection of workflow structures, modification and generation.

Info
titleRoadmap

As detailed in the Taverna roadmap, myGrid will be working on SCUFL2 during the summer 2010: (Subject to change)

June 2010 - SCUFL2 language specification draft
New Taverna workflow language specification more like SCUFL from Taverna 1.7.x to replace the current T2Flow serialisation format

July 2010 - SCUFL2 tools Beta
Including conversion from T2FLOW to SCUFL2, SCUFL2 to T2FLOW and SCUFL to SCUFL2
The supplied compiler will convert the above formats into internal Taverna workflow representation

September 2010 - SCUFL2 tools
A stabilised and fully tested version

See planned SCUFL2 tasks in myGrid's Jira.

Material

Warning
titlePreliminary work

This page reflect preliminary work, and these specifications are not yet at alpha level. Do not write any applications assuming the SCUFL2 format will stay as discussed on this page.

Material for iteration 2 of the UML Scufl model includes

  • native zargo file
  • nice pictures of the class diagrams

Here is an attempt at demonstrating the new proposed XML syntax for Scufl2: as.scufl2.xml - a translation of as.t2flow

This has been produced using the early scufl2 code from http://taverna.googlecode.com/svn/unsorted/scufl2/trunk/

Suggestion for identifiers in [Taverna URI templates.

Introduction

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
To: List for general discussion and hacking of the Taverna project <taverna-hackers@lists.sourceforge.net>
Date: Wed, 6 Jan 2010 10:47:34 +0000
Subject: Scufl 2 workflow language
Message-ID: <a20e6fb11001060247k72e422b5v75f0a471ffa6896@mail.gmail.com>

Hi!

We're working on making the new SCUFL2 workflow language.

This will be a simplification of the current .t2flow serialisation, but will also come with an API.

We're basing this new workflow definition language on what we have learnt are the best features of Scufl (from Taverna 1) and .t2flow - Scufl was quite easy for third party suppliers to generate or parse (for instance myExperiment generates the Taverna 1 diagrams from scratch using Ruby code that parses the scufl), while .t2flow allowed to specify all the finer grained details possible in the new Taverna 2 engine - but this also made it a bit too verbose.

These are early days, so we'll figure out what the language should be and what the API should look like. Paolo Missier has done good work in making a proposed UML model of the new language, which we can then use as the basis for figuring out the XML serialisation, but also the Java beans of the API, and possibly also RDF and JSON versions.

In my spare time I've tried to tie together some simple Java beans implementing this UML model, and I've now checked this into Subversion. - these beans are not complete yet, have no integration with Taverna code, and the API can only serialise to RDF currently. (To test out Sesame/Elmo annotations on beans).

I might come back with code examples so we can discuss what the API should look like. The current tests only builds a workflow from scratch - also note that scufl2-rdf is not yet connected to scufl2-api and can be considered an early version of the scufl2-api. (API wise there are pulls in different directions, for instance we want to make it easy to inspect a workflow, but also to construct one. If more information is needed for inspection, this could make it more tricky to construct.)

The API should minimally be able to:

  • Work independently, without any Taverna dependencies, runtime or plugin system
  • Load .t2flows
  • Save as .scufl2 (undetermined yet what this format is - most likely XML and/or RDF inside a Research Object .zip)
  • Inspect an existing workflow to tell:
    a) Processors
    b) Connection between processor/workflow ports (and conditional links)
    c) Activities/Services (ie. 'WSDL' method 'fish' from endpoint 'http://asdkljasdkjasdkj')
    d) Annotations
  • Allow modification and creation from scratch of such workflows

The API should be rich enough so that you could use it to generate the workflow diagram - ie. what the myExperiment does already in Ruby.

Optionally:

  • Load scufl 1 .xml from Taverna 1
  • Save as backwards compatible .t2flow or even scufl 1 if possible
  • Exposed as a RESTful service

However, the API should also be lightweight, so it will not do tasks better done by Taverna engine (t2core):

  • Determining if a workflow definition is valid (checking for loops, invalid iteration strategies etc)
  • Perform the actual execution of the workflow

Other tasks are also better suited for the main Taverna code base, as they require various plugins or other considerations:

  • Discovering available services/methods
  • Find input/output ports of a given service definition
  • Determining what configuration can be done for a given service
  • Merging workflows

If you talk about a client/server architecture, you can picture these (RESTful?) services:

  • Taverna engine: execute workflow and manage data/provenance
  • Taverna inspection: check workflow definition validity, calculate depths, etc
  • Taverna service descriptions: Find available services, specify possible service definition, determine ports for service definition
  • Taverna editing: Workbench-type activities, Undo/redo, merge workflows, workflow refactoring
  • Taverna diagram: Generate workflow diagram in various formats and configurations

(The last two of these should be possible to implement using mainly the Scufl2 API.)

A client could then use the Scufl2 API and a selection of these services - and still be able to implement what would look like the current Taverna workbench. The client could be written in a non-Java language, and use the Scufl2 serialisation schema/ontology directly with the help of whatever XML/RDF/JSON support is available for its language - this should give the same functionality but without a few convenience methods.

We're very interested in hearing about potential use cases for what such a SCUFL2 language and API could be used for. Feel free to add your comments!

Overview

SCUFL2 consists of: