Skip to end of metadata
Go to start of metadata

Who is this guide for?Developers, Webmasters, or Managers of life science training websites.

Goal of guide - This guide tells you how to structure the data in your website so that it can easily be included in TeSS - a registry of training materials and events

Contact - niall.beard@manchester.ac.uk for mistakes, clarification requests, or any advice.  

Content Aggregation in TeSS

The TeSS portal automatically aggregates training materials and event metadata from lots of websites. To do this we write scrapers and parsers that extract information about your resources every night and update the records in TeSS accordingly.

Minimum information needed to be included in TeSS

The minimum required fields for a Material to be entered into TeSS are

  • Name
  • URL
  • Description

For an Event it is the above as well as:

  • start date
  • end date
  • location 

Further Fields

The full events specification table is here https://docs.google.com/document/d/12O8gsOuH2qHKpis6vTHxXExraNH5XZEnoilwNgWWgaQ/edit#heading=h.uc6mc8iukqe

The full materials specification can be found here https://docs.google.com/document/d/1HG2fEjCoDUE4tn1XZ_ZIeWLEFXnI3YtS_FRIIFIbv-s/edit#heading=h.mca2et5tnibo

We are working on a clearer documentation; but for now note that the fields of importance are the ones with any values in the last 3 columns. The rest are inherited and are unlikely to be of any use.

 

There are several mandatory fields listed in these tables (in addition to the above). They are mandatory to achieve Bioschemas compliant content, however, not filling them will not affect your inclusion into TeSS - but they are highly recommended to improve the findability of your materials.

For materials these are:

For events these are:

Motivation for structuring your data 

You can request that we write a scraper to try and extract information from your HTML pages; but this is very volatile as any HTML or CSS class name changes will break the scraper.
Instead, we ask you to structure your metadata first so that
  • we can produce a more robust extractor that will not break after HTML changes
  • Search engines will be able to understand your content and, as a result, will include your pages in higher positions in search rankings
  • we have less problems with understanding ambiguous content
  • our jobs will be easier (wink)

Methods of structuring

There's three main ways to structure your data.
1. Using schema.org types to markup your HTML
2. Making an XML dump containing all your materials/events. For example the dutch did this: http://www.dtls.nl/courses/feed/?filter_course=active
3. Use our API

schema.org is our recommended method as it means search engines will understand your content better as well as us. The benefits are when people Google things about your site you'll
XML dumps are often quite simple to produce with modern CMS's and web frameworks. They have the advantage that our scrapers won't hit every page of your site once a day, but will rather just hit the one page. This will not obscure your statistics.
The TeSS API allows you to upload your training materials and events directly to TeSS via the API. We have an API client in Ruby only. Benefits are that there is no statistics skewing but the downside is you will have to maintain the code, run it regularly, and potentially update the code it if our API changes. Our api client can be found on Github

schema.org

You can markup the HTML pages in your site with schema.org types in either microdata or RDFa formats.
The http://schema.org/CreativeWork type is used for training materials and the http://schema.org/Event type is used for events.

The methods you can use to markup a site using schema.org are
  • Manually add markup elements to HTML 
  • Use an extension/Module in a CMS 
  • Use a library in a framework 

 An example of what the end result of a material marked up in schema.org will look like can be seen in Goblet pages (uses Drupal 7). 

This page has RDFa markup in the HTML (right click > inspect to view) http://mygoblet.org/training-portal/materials/de-novo-assembly-tgac-2015
schema.org RDFa snippet
<div id="node-716" class="node node-training-material clearfix" about="/training-portal/materials/de-novo-assembly-tgac-2015" typeof="schema:CreativeWork sioc:Item foaf:Document">
...........
      <meta content="De Novo Assembly @ TGAC 2015" about="/training-portal/materials/de-novo-assembly-tgac-2015"property="schema:name"/>
     .......
     <div class="field-item even" rel="schema:genre">
        <a href="/edam-topic/de-novo" typeof="skos:Concept" property="rdfs:label skos:prefLabel" datatype="">De Novo</a>
      </div>
..........
</div>

We recommend using http://linter.structured-data.org to verify your schema.org markup is correct.

When you run it through this tool you can see the data can easily be extracted http://linter.structured-data.org/?url=http:%2F%2Fmygoblet.org%2Ftraining-portal%2Fmaterials%2Fde-novo-assembly-tgac-2015 
We use a similar tool for TeSS; so this nicely illustrates the concept of how we extract your structured data for uploading into TeSS.

schema.org on Drupal 

Drupal introduced native support for schema.org in Drupal 8. There's a robust schema.org module you can install on 6 and 7 which can be found along with documentation here https://www.drupal.org/project/schemaorg

schema.org on Joomla

There are two predominent ways, both require the Joomla Content Editor (JCE)

XML Dumps

An XML dump would be a listing of all of your materials and/or events in one file. All attributes describing a resource should be nested within a parent and have opening and closing tags surrounding it e.g. 

<resources>
    <resource>
        <name>blah</name>
        <url>http://example.org</url>
        <description>This training material guides you through using XML dumps to get your content in TeSS</description>
    </resource>
    <resource>
        <name>bloop</name>
        <url>http://exampiella.org</url>
        <description>This training material is intended to teach core schema.org concepts</description>
    </resource>

</resources>
 

This is an example of a real XML dump used by TeSS. http://www.dtls.nl/courses/feed/?filter_course=active 

Production of a dump does not need to conform to any specific schema or XSD. We can map all element names to fields in TeSS when we write our parser. 

TeSS API

A generic overview of our API is documented on our GitHub page.

A more functional overview can be found on our SwaggerHub page

Authorization

To use our API you will need to create an account on TeSS and e-mail tess@elixir-uk.info or niall.beard@manchester.ac.uk to be given API authorization.

Once this has been done, navigate to your TeSS profile page and find your Authentication Token. This can then be used to upload materials via the API. 

TeSS API client

We have so far produced one API client library to help you upload your content to TeSS.

It is written in Ruby and can be installed as a gem. The code and documentation on how to use it can be found here: https://github.com/ElixirUK/TeSS_api_client

Examples of how to use our API can be found in any file ending with _scraper.rb here: https://github.com/ElixirUK/TeSS_scrapers

 

 

Labels
  • None