Skip to end of metadata
Go to start of metadata

Scufl2 has moved to Apache (incubator) 

Information in this section is out of date!

This page describes a proposal to change the format/API of the SCUFL2 configuration of activities (Note: not the whole workflow definition) to JSON.

Workflow structure as JSON

If you are interested in inspecting the Taverna workflow structure using JSON, then you might want to have a look at the experimental jsonexport tool which can export JSON from any t2flow or wfbundle. Note that configuration is not included in that export.

Current implementation

The Configuration of activities and dispatch layers in SCUFL2 <= 0.11.0 are implemented as:

  • A Configuration, of a given type, that configures a Configurable (Activity, DispatchLayer) (which are of a corresponding type). A profile activates (and typically contain) a configuration.
  • A tree of PropertyObject instances (PropertyLiteral, PropertyList, PropertyResource, etc) describe the properties of the specific configuration in a structure allowing any combination of "maps", "lists", "sets" and "primitives". The API have convenience method such as getPropertyAsString() to avoid casting.
  • Properties (and types) are named by URIs (typically within an activity-specific namespaces like http://ns.taverna.org.uk/2010/activity/rest#, but also using existing vocabularies like HTTP in RDF). Accessing properties through the SCUFL2 API would typically use ACTIVITY_URI.resolve("#propertyName") instead of full URIs.
  • A ConfigurationDefinition defines somewhat a 'schema' of what the configuration should looks like, by providing a tree of PropertyDefinitions - mirroring the desired structure of the PropertyObject tree, with details such as label, description, multiplicity, required/optional.
    • ConfigurationDefinition's are not currently saved to the workflow bundle - it was intended for them to be serialized as a kind of mini OWL/RDFS vocabulary and to be embedded in the bundle - thus providing self-validation of a syntax-valid configuration without any activity-specific code
  • In the workflow bundle serialization, the configurations are serialized as a constrained form of RDF/XML, embedded within the XML for the profile. The RDF nodes closely mirror the PropertyObject tree structure, so any odd RDF/XML won't necessarily be deserializable back to PropertyObjects.
  • In the Taverna 3 engine, activities have Taverna 2-like configuration java beans (POJO objects with getters and setters), which are described with Java annotations @ConfigurationProperty and @ConfigurationBean
  • In the Taverna 3 platform, these annotations are converted to SCUFL2 PropertyDefinitions, exposed through the ActivityService.
    • ..and when running a workflow, used for mapping from the properties of SCUFL2 PropertyObjects to the corresponding methods in the activity configuration bean.

Issues with the current approach

  • Activity developers need to work with many different forms of the same configuration - PropertyObjects in the SCUFL2 world, Java objects in the activity world, and (RDF)/XML in serialization.
  • Difficulty in ensuring the correct keys (URIs) and structure across code like t2flow conversion, activity bean annotations and service discovery (e.g. WSDLs in Available Services).
  • PropertyObject API is too verbose and a bit cumbersome to use as it is modelled closely to the the RDF tree model
  • Difficult to have a fallback-GUI for modifying configurations of types which don't have custom UI code (or which plugin has not been installed)
  • Where do the definition live? Requirement on making an activity bean with annotations seems too restrictive, as it would effectively require somewhat access to the activity implementation in order to define a workflow using it
    • Exposed by Taverna 3 platform ActivityService - but not serializable or distangled from activity plugin
  • ConfigurationDefinition instances are not saved to the bundle
  • How are upgrades/conflicts detected and solved? If a new property is required by an execution environment, or a configuration uses a property that is not supported by the environment (say because the activity plugin is newer/older) - what happens?

Example of current approach

The Configuration node always includes a rdf:type and what it configure. The remaining properties, with somewhat dubious auto-generated XML prefixes, is a serialization of the PropertyObject tree.

This example shows how the activity-specific properties like outgoingDataFormat and absoluteURITemplate co-habit with the existing vocabulary HTTP in RDF to describe the HTTP method mthd and {{RequestHeader}}s.

Current API usage

From https://github.com/myGrid/scufl2-examples/blob/master/src/main/java/com/example/WorkflowMaker.java#L318
to make a Beanshell script configuration:

Proposed solution

  • Configurations are described in JSON- which is strongly supported on pretty much any platform
    • Not a requirement, but the default activity configurations happen to also be valid JSON-LD so that RDF (somewhat equivalent to the current RDF) can be extracted
  • SCUFL2 API does not provide its own object structure for configuration, but just refers to a Jackson JsonNode which has equivalent convenience accessor methods like the SCUFL2 PropertyResource.
  • JSON configuration are passed through "as is" (e.g. as JsonNode objects) to the Taverna 3 platform
    • No additional mapping required on the engine side - activity keeps JsonNode and extracts properties when needed
    • .. although activities MAY independently choose to use Jackson ObjectMapper and annotate a 'classic' configuration bean
  • A JSON Schema is required (draft v3has most support, v4 is under development) for each configuration.
  • The JSON of a particular configuration, and its schema, are stored as separate files within the workflow bundle
  • The JSON-LD context is published on the URI described by @context (TODO: Should this also be included in the bundle?). A minimal context just defines a @vocab binding to an activity-specific namespace. (TODO: Can this be assumed/implied based on the configuration type URI?)

TODO:

  • How are user-defined ports (such as in a beanshell script) specified? Move to SCUFL2 and use arbitrary referencing to ports in config - port rename-problem.

Example of proposed solution

This example shows how the REST activity configuration from above could be represented as JSON, validated according to a JSON Schema and mapped to RDF using a JSON-LD context.

See also https://gist.github.com/stain/5763598

configuration.json

configuration.json

The only 'magic' bit here is the "@context" which relates the JSON to the JSON-LD context - this would typically be a constant per activity type.

TODO: Should the JSON also have a relative link to the schema?

schema.json

This is the JSON Schema for the REST activity. It includes human-readable descriptions which in theory could be used by a fall-back GUI for editing the configuration, and provides help for third-party clients who generate or parse the configurations.

Note that in JSON Schema a property is optional unless it is stated as required. Note also that JSON Schema v4 changes how required properties are listed, it uses instead "required": "prop1", "prop2" one level higher - we specify the $schema version and should probably restrict which schema versions are supported in SCUFL2.

schema.json

context.json

The JSON-LD context. This provides a mapping to RDF properties. Basically the below says that the default namespace is http://ns.taverna.org.uk/2010/activity/rest#", httpv: is expanded to the HTTP in RDF vocabulary, and the terms headers, header, etc. are mapped to particular properties from that vocabulary. The @container property makes the RDF use RDF Containers for the list of headers, according to the vocabulary. (Default would be multiple property statements).

context.json

rdr-from-json.ttl

This Turtle document shows the generated RDF after processing with the JSON-LD playground. Note that the RDF has been "prettified" using CWM.

rdr-from-json.ttl

This RDF corresponds closely to the the original RDF/XML, but leaves out typing information such as httpv:RequestHeader (these are generally be inferred from the used vocabularies). Typing could be included by adding "@type": "httpv:RequestHeader" etc - but this would make the JSON slightly more verbose without any configuration meaning except where there are multiple subclasses available.

profile-linking-to-json.rdf

As the configuration is no longer RDF/XML, it should not be of any particular advantage to have the configuration embedded within the profile document. This already suffers from becoming massive in size. Additionally the schema needs to be included - as this would typically just be copied from the JAR classpath it would make most sense if it is just a file in the bundle. Therefore the JSON configurations also become just files in the workflow bundle.

The Configuration element in the profile therefore shrinks form the current approach to something like:

profile-linking-to-json.rdf

To be decided:

  • Path pattern for configurations
  • Can configurations be non-JSON?
  • Path pattern for schema
    • How to avoid multiple copies of the same schema - e.g. just one per used activity type
  • Should the URI for the configuration/REST_Service.json be the same as the URI for the Configuration object, or a separate property like schema? E.g. can the same JSON file be reused by multiple activities? (If so - why is not instead the same activity reused?)

Proposed API usage

This simply modifies the JSON ObjectNode directly.

Setting the schema is a bit trickier as one probably would not want to write them by hand in code - typically if you build workflows like this without being connected to the T3 platform (which can provide the schemas as part of the ActivityService), the schemas would be simply added to the JAR and loaded from the classpath.

An alternative is to make the JSON Schema optional - but we don't consider that a good idea at the moment.

Labels
  • None