What exactly happened to LSID?

What exactly happened to LSID? It was a technically sound approach it would seem and one whose failure we would do well to learn more from.

Authors:

Cite as: 10.5281/zenodo.46804

Stian Soiland-Reyes, Alan R Williams (2016):
What exactly happened to LSID?
myGrid developer blog, 2016-02-26.
http://dev.mygrid.org.uk/blog/2016/02/what-exactly-happened-to-lsid/
doi:10.5281/zenodo.46804

Life Science Identifiers (LSID) was an identification scheme for identifying Life Science information, e.g. describing genes, proteins, species. It was created by the bioinformatics community, and standardized in 2004 as an LSID specification through the Object Management Group. As of 2016 it is no longer used, except within the biodiversity community for identifying species.

LSID overview

The LSID specification  specifies 4 aspects of LSIDs:

  • LSID Syntax specifying a URN scheme, e.g. URN:LSID:rcsb.org:PDB:1D4X:22
  • LSID Resolution Service, an API for retrieving data and metadata for a given LSID
  • LSID Resolution Discovery Service – an API for finding LSID Resolution Services for a given LSID namespace
  • LSID Assigning Service – an API for minting new LSIDs

The APIs were specified as Java interfaces,  WSDL Web Services (SOAP and basic WSDL bindings for HTTP GET and FTP retrieval). The specification suggests a method for registering LSID Resolution Services through SRV records in DNS.

The aims of LSIDs in 2004 were certainly promising:

LSIDs are expressed as a URN namespace and share the following functional capabilities of URNs:

  • Global scope: A LSID is a name with global scope that does not imply a location. It has the same meaning everywhere.
  • Global uniqueness: The same LSID will never be assigned to two different objects
  • Persistence: It is intended that the lifetime of an LSID be permanent. That is, the LSID will be globally unique forever, and may be used as a reference to an object well beyond the lifetime of the object it identifies or of any naming authority involved in the assignment of its name.
  • Scalability: LSIDs can be assigned to any data element that might conceivably be available on the network, for hundreds of years
  • Legacy Support: The LSID naming scheme must permit the support of existing legacy naming systems, insofar as they meet the requirements specified below.
  • Extensibility: Any scheme for LSIDs must permit future extensions to the scheme.
  • Independence: It is solely the responsibility of a name issuing authority to determine conditions under which it will issue a name.
  • Resolution: A URN will not impede resolution (translation to a URL).

Source: Life Sciences Identifiers Specification, page 15

So what went wrong?

The short answer is that LSID did not receive enough uptake. But we think the real reason for that is more complicated.

Lack of support from main actors

Lack of uptake might partially be because the importance of global identifiers in Life Sciences, although well known at the time (and  obviously causing LSID to be created), did not receive enough focus from upstream bodies like funders, institutions and even PIs, and remained a niche for the Semantic Web branch of Life Science data management.

Many large data providers in life sciences did not adapt LSIDs, probably because it meant too many changes to their architecture. Thus the exposure, knowledge and skills around LSIDs did not propagate to the masses.

Data Management Policies (which would require repositories with identifiers for deposits) were in their infancy at the time, now they are pretty much mandated both by funders and institutions.

Technically there are of course many other potential reasons why LSID failed, which would have influenced the social-political attitudes.

New URI scheme

LSIDs use a its own URN scheme urn:lsid, which was not supported by browsers or operating systems.

Adding support for resolving a sub-scheme of urn:  to browsers seemed difficult as it meant handling the whole urn:   scheme, some even used another URI scheme lsidres: as a workaround.

If LSIDs were linked to at all, then most common would be application-specific http:// links which embeds the LSID somewhere in its path or parameters, e.g. http://ipni.org/urn:lsid:ipni.org:names:986604-1:1.1.2.1.1.2 which uses HTTP 303 See Other redirects to a HTML representation on http://www.ipni.org/ipni/plantNameByVersion.do?id=986604-1&version=1.1.2.1.1.2&output_format=lsid-metadata&show_history=true (and thus is a Cool URI)

urn:lsid was never registered with IANA as an official  namespace and so remained an non-standard scheme.

Dependency on DNS records

While an LSID promised to be location independent (allowing multiple sources to describe the same object), and had provisioning for multiple alternative LSID resolvers to be discovered via http://lsidauthority.org/ – this never manifested, and in practice LSID was bound to a DNS domain name.

For such an LSID to be resolvable without additional configuration,  this required technical changes to the DNS records. At the time, most Life Science web services were not even using DNS CNAMEs to provide service names (e.g. repository.example.com), and Web Service URLs like http://underthedesk18762.institute.example.edu:8081/~phdstudent5/service2.cgi were unfortunately very common.

Asking such service providers to modify (and maintain!) their DNS SRV records was probably asking for too much. (As a side-note, the required SRV service type was not registered with IANA either).

In practice the reference implementation LSID Resolver Service also needed to run on its own port, which required it to work through firewall changes.

The end result was that most LSIDs you could find during the 2000s were not resolvable through the LSID Resolution mechanism unless you knew by hard-coding where the LSID Resolution Service was hosted.

Difficult to resolve

While tools like BioJava included support for resolving LSIDs (downloading the data), practically LSIDs were not used for resolution programmatically, at least not through the LSID Resolution Service.

This could be because these services were difficult to find (see DNS section above), or poorly maintained (running on a separate port, often down for weeks until someone notice), or difficult to call (required SOAP libraries, which in the early 2000s suffered from many incompatibility issues).

In practice LSIDs were resolved as in the ipni.org example, with simple HTTP redirects from simple HTTP URIs, but from services which only supported “their own” LSIDs. Such HTTP redirections are simple to support in pretty much any server frameworks and client libraries.

And so this begs the  question – what makes an LSID different from a plain old HTTP URI?

So the LSID design suffered with a requirements for resolution that made it trickier (or at least gave the impression of being trickier) to use and mint, but in the end only the identifier bit of LSID was used.

Lack of metadata requirements

LSID had provisioning to ask for metadata in addition to the data, having two methods getData() and getMetadata()

While keeping a clear distinction between data and metadata seems like a good idea, in practice it can be quite hard as one researcher’s metadata is another researchers data.  There’s also the question if this is the metadata about the identifier, (e.g. allocated in April 2004),   the metadata about the record (e.g. last updated in 2014), or the metadata about the thing (e.g. discovered in 1823).

Often the distinction between a thing and the description about the thing can be blurry, which is a well known problem for the Semantic Web (httpRange-14). Additionally many data formats include their own metadata mechanism, e.g. FASTA headers, which could be hard to keep in sync with the metadata in the LSID record.

The reference implementation for the LSID Allocation Service did not require any particular metadata, which meant that in practice you could get away with no metadata at all.

LSID metadata was provided in RDF, which at the time had poor application library support, forcing many to hand-write metadata in the awkward RDF/XML format (since replaced by JSON-LD and Turtle).

The landscape of RDF vocabularies and ontologies for describing biological information was quite rough in the 2000s, with many incompatible or confusing approaches, often require a full “buy in” to a particular model. Practically the only commonly used vocabularies at the time were Dublin Core Terms and FOAF – but the LSID specification did not provide any minimal metadata requirements or guidance on how to form the metadata.

Today the simplicity of DC Terms has evolved to schema.org , useful for all kinds of lightweight metadata,  detailed provenance can be described with PROV, and collective efforts like the OBO Foundry create domain-specific ontologies.

Rather than distinguishing between data and metadata, today we do content-negotiation between different formats/representations (e.g. HTML, JSON, XML, JSON-LD, RDF Turtle), or embed metadata in-line of HTML using microformats like RDFa.

Non-distributed allocation

Allocating LSIDs centrally through an LSID Allocation Service was tricky for distributed architectures, which were popular at the time.

For instance, Taverna Workbench version 1, running on desktop computers, used LSIDs to identify every data item that was created during a workflow run. In theory LSIDs were a good fit – as such data don’t have a “proper home” yet – it’s still good to give them identifiers early on and carry it along in the provenance trace.

But in practice Taverna had to “call home” to mygrid.org.uk’s LSID Allocation Service to get new identifiers. This meant that this LSID service became a single point of failure for any Taverna installation (unless overriden in local configuration), and any downtime would stop every workflow running. Yet these LSIDs were allocated blindly and sequentially, we (deliberately for privacy reasons) did not record any metadata beyond “producer=Taverna” and had no ability to resolve the data from the LSID server.

And so in a next version of Taverna we changed the identifier  code to  generate random UUIDs locally, and allocated LSIDs like:

urn:lsid:net.sf.taverna:DataThing:b1b5c94f-d54d-4039-901c-3ad022e5845f

Now what makes that LSID URI any different from an URI with the prefix urn:uuid: or http://ns.taverna.org.uk/ ?

HTTP took over – death of WSDL

LSIDs are born in the early WSDL days. WSDL and SOAP was a glorified XML-RPC message-passing protocol that just happened to run over HTTP – in theory you could even transport SOAP messages over email!

WSDL users did not easily agree on common XML schemas, and so each web service would have its own schema for its operations and results.

So an <id> XML field (or was that attribute?) that could be sent across multiple WSDL services seemed useful – hence there was a need for LSID.

There was no need for the LSID to be directly downloadable – as WSDL services always ran through a single HTTP endpoint and just modified which XML message requests were POSTed.

But this meant some size challenges, the LSID Resolution protocol is primarily WSDL based, which proved tricky with the getData() method returning base64-encoded bytes, which would make the SOAP libraries eat memory – anything over 50 MB became “big data”!

In plain HTTP we have known for a long time now to transfer largish files, while LSID had to resolve to a separate byte-range-variant of getData() which, if at all supported, complicated download for clients.

In 2000, Roy Fielding published his PhD thesis Architectural Styles and the Design of Network-based Software Architectures – summarised by its mantra “hypermedia as the engine of application state“, and effectively establishing the Representational State Transfer (REST) as the software architectural style of the Web.

REST services are now ubiquitous on the web,  and with JSON taking over as a much simpler data format than XML, REST now power not just most bioinformatics web services but also today’s modern mobile apps and web applications, like Facebook, Twitter and GMail.

So with REST the HTTP Resources and their URIs become first citizens again (rather than WSDLs lonely endpoints). HTTP URIs are resolvable directly to both human (HTML) and computational representations (JSON, XML, RDF) thanks to content-negotiation.

So I can use http://purl.uniprot.org/uniprot/P99999 to refer to a protein, and even if you have no special code to resolve this, you can just paste the link into your browser and read about Cytochrome c protein. But if you retrieve that Uniprot URI as Linked Data, then you can programmatically access the data, or retrieve just the FASTA sequence.  Even as a uniprot.org assigned identifier, the URI works well with third-party APIs, e.g. with with Open PHACTS to retrieve the related pathways.

As a plain old URI, http://purl.uniprot.org/uniprot/P99999 can be added to web pages, publications and other repositories without any further explanation.

Centralization

Since last decade we have moved back towards centralised architectures. While our architectures consists of distributed REST services (e.g. ElasticSearch and Apache CouchDB) and vertically scalable platforms (e.g. Apache Hadoop, Docker microservices), but now it is running on centralised cloud services owned by a handful of companies like Amazon AWS, DigitalOcean and Microsoft Azure.

Large data integration efforts like Uniprot and ChEMBL have also effectively centralised ID allocations in bioinformatics. Yet after the advent of next-gen sequencing and robotic synthesis and analysis we are also producing more data than ever before.

Difficult to integrate

This is just speculation, but given the state of Web Services support in server frameworks at the time of LSID it would be very hard for actors like EBI and Uniprot to modify their existing web-based services to support the LSID protocol.

Thus they would need to set up a separate LSID Resolution Service, but that didn’t know anything about their existing ID schemes which understandably they didn’t want to change over night.

Done again today (20/20 hindsight) as a plain HTTP redirection based service, LSID could easily be implemented even in modern frameworks like node.js or Ruby on Rails without needing any special libraries.

Conclusion

LSIDs were doing all the “right things” according to its time: defining a location-independent URN scheme,  using DNS SRV entries, providing WSDL services, separating data and metadata, using RDF, providing discovery mechanisms and alternative resolution services.

Yet it can be argued that LSIDs, in many ways just like SOAP, was a complicated way to replicate something that could already do the job – the Web and plain-old http:// URIs.

Perhaps we needed to go the long way around to figure it all out.

Ember JS Flash Message Alerts

There are lots of ways to do this and some plugins out there (eg aexmachina’s ember-notify and poteto’s ember-cli-flash) but here is my take* on Rails style flash messages for Ember as seen in the Open PHACTS Explorer 2.

Use a simple array controller to keep track of the current messages, a model to hold the message and what type of alert it is, a view with an action to remove the alert and a template to show it on the page.

The page needs a template containing the flash messages.

<script type="text/x-handlebars">
    {{#each flashMessage in controllers.flash.content}}
        {{view "flash" contentBinding="flashMessage"}}
    {{/each}}
</script>

It also needs a template for the flash message.

<script type='text/x-handlebars' id='flash'>
    {{#if flashMessage.isNotice}}
    <div class="alert notice">
        {{flashMessage.message}}
        <button type="button" class="right" {{action "click" flashMessage target=view}}><span>×</span></button>
    </div>
    {{/if}}
    {{#if flashMessage.isSuccess}}
    <div class="alert success">
        <button type="button" class="right" {{action "click" flashMessage target=view}}><span>×</span></button>
        {{flashMessage.message}}
    </div>
    {{/if}}
    {{#if flashMessage.isError}}
    <div class="alert error">
        <button type="button" class="right" {{action "click" flashMessage target=view}}><span>×</span></button>
        {{flashMessage.message}}
    </div>
    {{/if}}
</script>

We need a controller to keep track of all the flash messages

App.FlashController = Ember.ArrayController.extend({
    createFlash: function(options) {
        if (options.type !== null && options.message !== null) {
            this.pushObject(this.get('store').createRecord(
                "flashMessage", {
                    type: options.type,
                    message: options.message
                }
            ));
        }
    }
});

We also need model for a flash message.

App.FlashMessage = DS.Model.extend({
    type: DS.attr('string'),
    message: DS.attr('string'),
    isNotice: function() {
        return this.get("type") === "notice";
    }.property("type"),
    isSuccess: function() {
        return this.get("type") === "success";
    }.property("type"),
    isError: function() {
        return this.get("type") === "error";
    }.property("type")
});

If a controller wants to create a flash message then it ‘needs’ the Flash controller. Here the Application controller creates a flash message in response to an action.

App.ApplicationController = Ember.Controller.extend({
    needs: ['flash'],
    actions: {
        createFlashNotice: function() {
            this.get('controllers.flash').createFlash({
                type: "notice",
                message: "I'm a flash notice."
            });
        },
        createFlashError: function() {
            this.get('controllers.flash').createFlash({
                type: "error",
                message: "I'm a flash error."
            });
        },
        createFlashSuccess: function() {
            this.get('controllers.flash').createFlash({
                type: "success",
                message: "I'm a flash success."
            });
        }
    }
});

We need a view for a flash messages which can remove it in response to a user action.

App.FlashView = Ember.View.extend({
    templateName: 'flash',
    classNames: ['hide'],
    didInsertElement: function() {
        this.$().fadeIn(1000);
    },
    actions: {
        click: function(alert) {
            this.get('controller').get(
                'controllers.flash').removeObject(
                this.get('content'));
            this.destroy();
        }
    }
});

JS Bin example. Click on one of the buttons to create a flash message and remove it by clicking on the ‘x’.

Thanks to Eric Berry for the ‘flash’ of inspiration.

* I originally wrote this for the Ember guides cookbook but they are moving away from this format and relying on blog posts etc instead.

Ember 1.10

(It’s been a while….)
So, a few months after updating to Ember 1.7 in the Open PHACTS Explorer 2 I had a look at what the Ember folks have been up to. The latest release is now 1.10 with 1.11 just around the corner. Wow. It’s not a straightforward update to 1.10 but not too hard either. Here are the headlines.

each helper

The {{#each}} helper now requires the scope instead of inferring it from the controller. So you need to change

{{#each}}
{{this.title}}
{{/each}}

to

{{#each post in model}}
{{post.title}}
{{/each}}

view helper

Syntax like {{View Ember.TextArea}} is now {{textarea}}. You will see an error mentioning the global lookup of views….

Template compiler

With the transition towards the HTMLBars templating engine you will need to include the ember template compiler in your code. You can get it from http://builds.emberjs.com/release/ember-template-compiler.js.

I have changed my old blog example to meet these new requirements.

Ember is also embracing NodeJS with how Ember apps are created and structured and it is worth looking at Ember CLI to get yourself prepared. It is mentioned on the Ember homepage but the documentation has not yet found its way into the main Ember guides.

You want to do what? There’s an app for that…….

tl;dr I don’t want to write web applications which have code to do everything the world could conceivably want to use it for. I want the web to tell me what other applications can do that thing. The Web Intents standard met that requirement before it was abandoned.

The problem

The Open PHACTS Explorer is a Javascript based web application which is used to browse the Open PHACTS Pharmacology API. As well as showing facts about compounds, proteins, pathways etc. it can also be used to view and edit compound structures. It use ketcher for this. In an iFrame. Ouch. Writing our own version of this would take a long time and isn’t really our core business. Another part of the application shows proteins and a branch can render these in 3D using GLmol. This required some integration which took time we rarely have and there is always the worry that introducing a 3rd party piece of code could break your application in ways you do not notice.
What I really want is for 3rd party ‘widgets’ to take over the heavy loading for those bits of the application. If you click on a link to a PDF or on a mailto: link the browser knows what to do with those things. How cool would it be if the same could be done for a link to a molecule in PDB or a compound SMILES property. The browser would know that the link was to a SMILES through some magic formatting and pop up a dialog box – “Do you want to go to the page for that compound or view it in the compound editor?”.

The solution

The Android mobile platform has the concept of “Intents“. When you click on an email link it asks you what application you want to use to fulfil this “Intent”. When you design an application for Android you tell the outside world what Intents your application can handle. This lets the device give you the option of using it when you want to do something that application can handle.
Imagine if you could do the same in a desktop web application.
Enter the “Web intent“. This a W3C specification which defines a method for client side (i.e. browser) service discovery. An application registers it’s intents via an html tag like below or in a web application manifest file.


<intent action="http://webintents.org/share"
type="image/*"
href="share.html"
disposition="window|inline"></intent>

A client application uses an intent using some javascript


var intent = new Intent("http://webintents.org/share",
"text/uri-list",
"http://news.bbc.co.uk");
window.navigator.startActivity(intent);

The browser then tries to find any installed apps which meet the intent. In this case a URL sharing application. Sounds perfect. But it was not to be.

Warning: Deprecated in Chrome 24. Web intents are no longer supported.

See here for details.

The future

Maybe there is some light at the end of the web intents tunnel though. This post implies that Mozilla might be actively looking at something and google may be willing to join them. Other people are thinking about specifications.

Putting all the pieces together

In parallel to the Open PHACTS Explorer we have been providing widgets to the BioJS registry. A BioJS widget is a snippet of HTML, CSS & JS with a documented API which generally does one life science type of thing. For example here is a compound info page. Click on the Edit button to see the re-usable code in JSBin. The registry is a little bit rough round the edges at the moment but the idea seems pretty sound. Imagine if you could easily install one of these widgets in your browser just by clicking a button which also meant that the widget registered its intents. Any web page you were viewing could then use that widgets intents.
Imagine then if you then clicked on a link on that page and your browser couldn’t handle that intent but knows that there are some apps in the chrome/firefox/whatever store that could. It could then take you to a page allowing you to decide which of these you wanted to use. Maybe the intent information is actually embedded in the links href or other property and doesn’t even need any Javascript like the Web Intents spec implied.
It is quite possible in the future that the BioJS widgets could also be packaged up as web components. This would seem to be a perfect match for Web Intents. Or this another dead end?

Who has the influence?

As a final point I make the following observation.

There always seems to be several different complimentary (or conflicting depending on your point of view) things happening in the web standards world at any one time. How does anyone manage to keep track of them? How do they then manage to influence them while keeping their workload to a sensible level and sanity intact? Answers on a postcard……

IE6 is dead, maybe we can move on

Now that IE6, 7 & 8 have been sent to the heavenly software repository(1) it’s maybe time to stop relying on jQuery quite so much and have a look at some of the ‘new’ things which javascript ES5 can do natively. We’ve been clinging on to things like $.each and $.isArray when javascript and most browsers can do it natively using Array.forEach and Array.isArray.
If you really, really, really have to support these old browsers then maybe you can just use the polyfills in the mozilla reference, at least then you can definitely move on and use Javascript as it should be.
As the ES5 reference shows there is widespread support for most of the features, unfortunately the same cannot be said for ES6 yet.

(1) Rumours of their complete death may be exaggerated but the downward trend gives us hope for the future.

Seeing Triple

The Semantic Web, Linked Data and their modelling language, Resource Description Format (RDF), have been around for over a decade now. It’s been a slow burner and take up by organizations outside of the media (BBC, Guardian) and academia has been slow. Why is this? Maybe it’s down to the perception that it is hard and also our entrenchment in 40 years of RDBMS and SQL. I’ve been involved in several projects (eg. Open PHACTS, GMDSP) where Linked Data has been used to free data from its silos and enrich its meaning and usefulness across and between disciplines and business domains. This post describes the basics behind RDF and a simple introduction to modelling a resource.

Linked data is simply a collection of facts which are expressed as triples, written in RDF, in the form “something has a property with a value”. This bit is really important and underpins the whole concept. Remember and repeat: “Something has a property with a value…………”

The something here is the resource in RDF. In RDF terms this translates as subject (something) has predicate (property) with object (value). You will hear the terms Subject, Predicate, Object a lot and will also come across their shorthand form S, P, O. Let’s try an example statement “Bob has blue eyes”. The subject/something is Bob. The property/predicate is eye colour. The object/value of the property is blue. We now have a triple:

Something: "Bob"
Property: "eye colour"
Value: "blue"

Someone else may want to talk about the properties for Bob so how can we ensure that we are all talking about the same person? In linked data if we want 2 things to be the same we need to use the same “Uniform Resource Identifier” or URI. You use URIs all the time when visiting websites although you probably use the term URL. URI is just a fancier term where the things that they describe do not have to be browsable or even exist on the web, although it is great if they do. Lets define a URI for Bob. We can use anything that uniquely identifies Bob. Could be a personal website, google plus page, ORCID identifier, github ID etc. It doesn’t really matter as long as it is something that we know uniquely represents Bob. Lets say it is a personal website http://example.com/bob. So now we have

Subject/something: http://example.com/bob
Predicate/property: "eye colour"
Object/value: "Blue"

Lets say we had all our friends represented in RDF and we wanted to find out who had Blue eyes. We could search for the predicate “eye colour”. What if someone used the US English spelling of “color”. Or spelled something wrong. Or used a more elaborate phrase. Just like the subjects, the predicates and the objects can be represented by URIs. Then we know that other resources described with these URIs are representing the same things. You can google to find URIs to use in your RDF and there are also lots of predefined ones from dbpedia, dublin core, rdf, rdfs, xsd, foaf, vcard and others. We call these collections of properties ontologies. You can search for them here with the top 100 searches at this site shown here.We will use 2 URIs from dbpedia.org which gives us

Subject: http://example.com/bob
Predicate: http://dbpedia.org/ontology/eyeColor
Object: http://dbpedia.org/page/Blue

The predicate is http://dbpedia.org/ontology/eyeColor and the object is http://dbpedia.org/page/Blue. The part which describes the collection of properties is known as the prefix eg http://dbpedia.org/ontology/. You can click on these links and look them up in your browser.

What about Bob’s name, we can’t expect humans or machines to know that it is Bob purely from the identifier http://example.com/bob so we use another property from the foaf ontology.

S: http://example.com/bob P: http://xmlns.com/foaf/0.1/name O: "Bob"

Lets add another property for Bob's age

S: http://example.com/bob P: http://xmlns.com/foaf/0.1/age O: 28

We can also let people know that Bob is a person, rather than a cat, dog, chair, bicycle etc. In RDF we should enclose the URIs with angle brackets which would look like:

<http://example.com/bob> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>

It is more common to use the shorthand 'a' for type rather than http://www.w3.org/1999/02/22-rdf-syntax-ns#type like this:

<http://example.com/bob> a <http://xmlns.com/foaf/0.1/Person>

Now we have a few triples which describe ‘Bob’ but if we want to share with someone else then we need them to understand our format. Luckily there are standard ways of ‘serialising’ RDF. We will use Turtle but others are available.


<http://example.com/bob> <http://dbpedia.org/ontology/eyeColor> <http://dbpedia.org/page/Blue> .
<http://example.com/bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
<http://example.com/bob> <http://xmlns.com/foaf/0.1/age> 28 .
<http://example.com/bob> a <http://xmlns.com/foaf/0.1/Person> .

The full stops ‘.’ denote one set of triples. There is a lot of repetition here and turtle allows us to group sets of triples about the same subject.


<http://example.com/bob> 
    <http://dbpedia.org/ontology/eyeColor> <http://dbpedia.org/page/Blue> ;
    <http://xmlns.com/foaf/0.1/name> "Bob" ;
    <http://xmlns.com/foaf/0.1/age> 28 ;
    a <http://xmlns.com/foaf/0.1/Person> .

The semi-colons ‘;’ say that the next predicate object pair are about the subject above it. In our example there are 4 statements about the same subject. Just like our earlier example, the full stop ‘.’ says that that is the end of our statements about a particular subject.

There is still a lot of repetition here and we can use shorthand about our URIs to make it easier to read.


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dbpedia:   <http://dbpedia.org/ontology/> .

<http://example.com/bob>
    dbpedia:eyeColor dbpedia:Blue ;
    foaf:name "Bob" ;
    foaf:age 28 ;
    a foaf:Person .

In this example wherever we see, for example, foaf we mean http://xmlns.com/foaf/0.1/. So, foaf:name means http://xmlns.com/foaf/0.1/name

Coding without coding: announcing the Open PHACTS HTML widgets

What if you could have an element in your web page which is replaced by some HTML depicting some really cool chemistry stuff without having to know a lot about HTML, CSS, Javascript or how the Open PHACTS API works. Well, that’s our intention with the OPS HTML widgets. It’s built using the ideas and templates we designed for the Explorer but without the full Ember JS MVC stack. It’s early days for the library and there are only a couple of widgets available right now. We are also looking at aligning with BioJS.

Just by adding a div with a specific class and a data attribute you can get facts (phacts!) and images for chemical compounds and targets embedded in your page with no need to know anything about the Open PHACTS API.
You need to load the jquery, handlebars, ops.js and ops-html-widgets libraries in your page and then you can either insert the info using a div like


<div id="compound-info-div" style="display: none;"/>

or programatically using some javascript like


var compoundWidget = new Openphacts.CompoundWidget(appUrl, appID, appKey); compoundWidget.infoByURI("http://www.conceptwiki.org/concept/dd758846-1dac-4f0d-a329-06af9a7fa413", "compound-info-div");

You can style the divs and include as much or as little info you want using simple handlebars tags like


<div id="compound-info-div" style="display: none;">
  <div>Preferred Label: {{prefLabel}}</div>
  <div>SMILES: {{smiles}}</div>
  <div>Inchi: {{inchi}}</div>
</div>

The results from one of these compound-info-divs has magically appeared below. Honest.




Ruby 1.9.3 on Scientific Linux (& Centos)

The default Ruby on Red Hat (Enterprise) based distros is still 1.8.7 which is almost completely unsupported. I don’t like using RVM on production machines since I don’t want to manually update versions. A search around the web revealed that RHEL provide extra packages as software collections. Scientific Linux also has this collection available here.
I installed this on a fresh machine with no Ruby installed, it might be a good idea to remove any 1.8.7 packages first with

yum remove ruby

I though you could just run

yum install yum-conf-softwarecollections

but I couldn’t get it to work so just downloaded the rpm and installed from the local file with

rpm -ivh yum-conf-softwarecollections-1.0-1.el6.noarch.rpm

Then you can install ruby 1.9.3 with

yum install ruby193

However, Ruby probably won’t be on your path yet. It is installed in /opt/rh/ruby193/. Inside this directory is a file called enable which has the commands to add the binaries etc to your path and also the shared libraries. Add these commands to your .bashrc, update your environment with

. .bashrc

and run

ruby -v

We also need to ensure that apache can pick it up, I have added it to the top of /etc/init.d/httpd, hopefully that will be enough. Another issue is delayed job, we need to ensure that the delayed job daemon is using the version of ruby we installed. Perhaps adding it to the delayed job startup script will do the trick.

Here is some info about software collections for Centos which I have not had a chance to try but looks like a good bet.

Good luck.

Delayed Job missing method in production fixed

A few months ago I had an issue
with phusion passenger not starting due to a missing delayed job method. Yesterday I finally got round to figuring out why. Turns out that the delayed job gem was being loaded in the rails assets group in the Gemfile which of course are not used in production. Usually it only takes an hour or 2 away from the keyboard to spot these things. This time it seems there were a lot of trees obscuring the wood.

The developers of myGrid tell of their quest of the code