Skip to end of metadata
Go to start of metadata

The projects myGrid developers are involved with have recently developed in many ways similar approaches to what can be called minimal information models, checklists, profiles or schema validation.

This page is a summary of a meeting on 2013-05-15 that identified and compared the different approaches. A consensus as made to further try to integrate the solutions in two directions:

  • Identify the immediate requirements of BioVel/Components and agree on which solution(s) can satisfy this.
  • Further mapping and integration to a common minimal information model

Approaches:

Minimal information model (MIM) - Matthew Gamble

This approach is centric to the minimal information model itself, concerning itself with MUST/SHOULD, nesting of requirements and restrictions.

MIM Presentation (PDF)

You need flash player installed to preview ppt and pdf files

Get Adobe Flash player

Overview:

Key points:

  • RDF-based model - inspired by Paper checklists in biology, similar to OWL constraints
  • mim:RequirementSet mim:hasMust/Should/OptionalRequirement some mim:Requirement
  • Restrictions allow cardinalities (min 4, max 15) and conditions
  • RequirementSets can be chained and reused
  • A mim:ReportSet is produced by checking - bottoms-up - if a mim:RequirementSet is satisfied
  • Implemented by converting to SPIN rules
  • REST service for Validator/reporter (source)
  • Rules could in theory be built from a not-so-complicated UI
  • A form/UI could in theory be generated to (collect data that) furfull a given MIM
  • Very quick in execution (given the requirement tree is not too deep)

MIMIM - Graham Klyne/Jun Zhao (Wf4Ever)

The Wf4Ever built on MIM to produce minim, including a plugin-like data model, enabling different type of queries like SPARQL and checking for availability of web resources. They built a web-frontend with "traffic lights" (more like ticks).

This approach is more centered on the actual checking or validation - rules can be executed, and the report is available in HTML.

Key points:

  • RDF-based model
  • More implementation-centric model (processual rules rather than showing desired state)
  • No RequirementSet grouping or recursion
  • Pluggable types of rules and queries
  • Property matching described as SPARQL - allows arbitrary complexity in query 
  • Making a model manually requires detailed understanding of MINIM
  • Model includes "on error" and "on ok" messages for user ("Does not have a phone number")
  • REST service
  • Simple traffic light display with checklist marks
  • Quick in execution
  • Does not lend itself easily to UI/form generation

Open PHACTS validator - Christian Brennikmeijer

Used by the Open PHACTS. This solution is using regular OWL subclasses and constraints, with an OWL annotation with rdfs:comment of "MUST" or "SHOULD" to modulate the subclass relation.

This approach is very centric to the OWL ontology that is to be followed, but it is recommended to add these annotations to a secondary ontology just for this purpose; as the SHOULD restrictions would not be desirable to include in the general OWL vocabulary as it could mess up reasoning.

This example looks like this in Protege:

And in OWL Turtle:

Key points:

  • OWL-based model
  • An RDF instance is checked per type of the resource - the marked statements must be present in the graph (not just inferrable)
  • "Checklist" ontology hardcoded or dropped into folder of validator
  • Easy to design model for anyone who knows OWL already
    • But this could also be confusing as normal OWL semantics and reasoning is more like "A person must exist, let's create it!"
    • 'Regular' OWL subclasses throw an exception - however code could be changed to then assume a MUST
  • .. but only a few required (typical) OWL constraints work (properties, cardinality, and/or) - bails out with "not implemented" if you go beyond
    • No reason code can not be extended to handle more
  • Hierarchy through classes - so for instance in the example above hasChild some Person, so the rules for Person is also checked.
  • No deeper grouping or structuring of requirements - all or nothing
  • Follow-your-nose HTTP crawling for more Linked Data (warning) for instance if hasChild <http://example.com/child>, then it would also download <http://example.com/child> as RDF if <http://example.com/child> does not satisfy Person requirements
    • Able to convert github uris to raw github uris
    • Can follow ftp links
    • Can add user name and password is configured with them
  • Runs a a stand alone service
    • With a web service (json, xml or http forms)
    • with a web service client on top of service to get back same interface as the stand alone service

SysMO-JERM minimal information models (Stuart Owen / Katy Wolstencroft)

Details to be filled in by Stuart / Katy

SCAPE/BioVel Taverna component profiles (Alan Williams / Donal Fellows)

An XML schema for describing the expected annotations on Taverna components (workflow, ports and activities).

This approach is quite application-specific for Taverna workflows, for instance it separates between the built-in annotations in Taverna (Example, Description) and arbitrary semantic annotations (which are embedded in the t2flow using a special annotation plugin).

An XML file declares a profile that a component is meant to conform to - and it contains various sub-requirement for its ports and activities.

It can be slightly confusing that there is an XML schema for making profile XML documents, and these documents themselves contains XML schema-like structures like minOccurs and maxOccurs. OWL/RDFS ontologies are referenced from within the document with <ontology id="components">http://purl.org/DP/components</ontology> and can then be used later as ontology="components" predicate="http://purl.org/DP/components#portType".

Example:

Key points:

  • XML document for defining profile (minimal information model)
  • Used to drive UI generation for filling in annotations (together with domain/range from ontologies)
    • Guiding the UI by having more specific ranges in the XML, for instance xsd:dateTime on dct:created (ontology says xsd:string); or only presenting a subset of possible foaf: properties
  • Semantics of profile is unclear and confusing - as the validator has not been made yet these details have not been settled
    • e.g. can an input port match several <inputPort>?
    • must all activities match an <activity> if it is the same type?
    • What if multiple input ports could match several inputPorts, but you don't know which one is meant for which (they don't have name matching), and this "uses up" the port for later rules?
    • What is the purpose of a minOccurs=0 maxOccurs=unbounded test?
    • AND or OR logic?

Related work

W3C RDF Validaton workshop

The 2013 W3C RDF Validation workshop has published their position papers

Dublin Core application profiles

As raised by Karen Coyle on the BIBFrame mailing list and the OpenAnnotation mailing list:

The Dublin Core community has been working on a concept of Application Profiles (also sometimes called Community Profiles) that would seem to fit the BIBFRAME/Open Annotation use case.

An Application Profile (AP) is a way for a particular community to define their use of an ontology or a standard in the case where they may be using only a portion of the standard, or may be extending it. The AP cannot change the underlying standard or model, but it can narrow or expand its usage. It should therefore be entirely compatible with the underlying model.

The purpose of an AP is 3-fold:

  1. It gives a community a view that makes sense for its use cases, and is therefore easier for its members to understand
  2. It can be used by targeted systems (such as the library ILS's) to integrate the aspects of the standard that will be used in the community's data, without having to program for the entire standard if it isn't needed
  3. The AP can be used to enforce constraints that are not part of RDF/OWL, or that would have a negative effect on the sharing of data in the open. An AP could define cardinality (repeatable, mandatory, etc.), and could constrain values (e.g. require controlled authority lists for certain statements). These constraints are not fully compatible with the open world assumption of the Semantic Web, but are often desired for quality control within a community at the points of creation and use.

A simple example of an AP in the library world would be a system designed for small libraries that uses only a portion of the RDA data elements. Another example would be a special library, like a film archive, that selects the elements it needs from RDA but extends them for its special needs.

Note that #3 above could be used by the Open Annotation community to implement constraints that are in its standard but that cannot be defined in RDF/OWL. This includes pretty much everything in that standard that uses terms like "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL". It is precisely these types of constraints that the Dublin Core AP work hopes to address.

There is a proposal for an AP structure, but as yet not a fully formed machine-actionable version. The Dublin Core annual meeting in September, 2013, will have working session on this concept, and we hope that we can get some consensus on how to make this concept into a usable, actionable standard. It would be wonderful to have folks there from the Open Annotation community to join in this discussion.

It would be worthwhile to contact Karen Coyle and DC about our approaches and see if further alignments can be possible.

The structure of Application Profile is a chained set of annotation templates: (figure reproduced from http://dublincore.org/documents/dc-dsp/#sect-2)

The proposed encoding of this as XML (not RDF) is shown by example:

Key points:

  • XML-centric - unclear relation to RDF - unclear relation to RDF
  • Dublin Core - big community
  • Dublin Core - big community
  • Tool support: ?
  • Status: Working Drafts (2008/2009) - to be re-addressed in 2013-10
Labels
  • None