Skip to end of metadata
Go to start of metadata

Introduction

R is a popular scripting language oriented towards statistical computing. There are a large number of "modules" that can be used to add functionality within R, for example the BioConductor module, suitable for biological data analysis.

The Rshell service in Taverna allows your workflow to include services that run R scripts on an R installation. The R installation can be on the same machine as you use to run the workflow, or on a different machine. To allow Taverna to talk to the R installation, Rserve must also be running on the same machine as R.

Users of Taverna Workbench do not need to install R - all they need is access to an Rserve installation.

Tutorial

The tutorial covers installation of RServe and writing a Taverna workflow with a R script.

Adding an Rshell service to a workflow

To add an Rshell service to a workflow, either:

  • locate  Rshell under Service templates in the Service paneland
    • drag the service to the Workflow explorer or Workflow diagram, or
    • right click and select Add to workflow
  • select Rshell from the Insert menu in the top menu bar
  • select Rshell from the Insert section of the pop-up menu when you right click on an empty area of the Workflow diagram, or on the workflow in the Workflow explorer
  • use the keyboard shortcut alt-shift-R

When you add an Rshell service to a workflow, the Rshell configuration dialog will automatically pop up.

Configuring an Rshell service

The configuration of the Rshell service is split into several tabs.  Each tab has Apply and Close buttons at the bottom.  Apply saves the configuration as shown in the tabs, and Close closes the configuration dialog.

Apply saves the whole of the configuration.  For example, if you change the script and then go to the Input ports tab and add a port, clicking Apply will save the altered script as well as the added port.

Rshell script

The first tab of the Rshell configuration is used to enter the R script that will be run.  Manuals on writing R can be found on the R project website .

An an example we will use a simple sin function for the R script.  The workflow is available on myExperiment .

You can also load scripts from an existing R script file, save the script or clear the contents of the script.

Rshell input and output ports

Input and output ports are the connection points between the rest of the workflow and the Rshell service.  The RShell service makes input ports available as variables named after the port, and output ports read their named variable after executing the script. That is, the last assigned value to the variable will be the one returned from the service. So for this script to make sense we have to make an input port x and an output port y.

To add an input port:

  1. Select the Input ports tab from the Rshell configuration dialog.
  2. Click Add port button.
  3. Enter the name of the input port (for this example x)
  4. Specify the input port type (for this example Numeric)

The input port type indicates the type this variable will have within the R-script. R is a typed language and you need to specify that in this case x is to be parsed a number, for example 0.45.

The possible types for R input ports are:

  • Logical
  • Numeric
  • Integer
  • String
  • Logical vector
  • Numeric vector
  • Integer vector
  • String vector
  • Text-file

Rshell services can also pass values if the value's type is not one of the above list.

An output port can be added in a similar way:

  1. Select the Output ports tab from the Rshell configuration dialog.
  2. Click Add port button.
  3. Enter the name of the output port (for this example y)
  4. Specify the output port type (for this example Numeric)


The output port type indicates the type this variable has within the R-script. The possible types for R output ports are:

  • Logical
  • Numeric
  • Integer
  • String
  • Logical vector
  • Numeric vector
  • Integer vector
  • String vector
  • PNG-image
  • Text-file

Rshell services can also pass values if the value's type is not one of the above list.

Rshell connection settings

If your Rserve installation is on a different machine to where you run Taverna workbench, is using a different port, or requires authentication (username and password), you can flip to the Connection settings tab to configure these connection parameters.

If you are using an Rserve on the same machine that you run Taverna workbench on, then you probably do not need to change the connection settings.

In addition, you can tick off Keep session alive, which will re-use the same connection each time you execute the script. This means that if the script assigns objects to other variable names, say z=x+1337, z will be available in the R namespace for the next execution, like in an iteration. However, we generally recommend transferring such state through the workflow instead of keeping it in the R environment.

Rshell input and output port types

Logical

For input ports, Taverna interprets what is passed to the service as an R logical value.  The values passed into the Rshell service should be a string containing one of

  • TRUE
  • FALSE

For output ports, if the type of the port is logical then, if the corresponding variable has the value

  • TRUE - returns the string "TRUE"
  • FALSE - returns the string "FALSE"
  • NA (not available) - returns the string "NA"

An example workflow showing the interpretation of values on Logical ports is available on myExperiment

Numeric

For input ports, Taverna interprets what is passed to the service as an R numeric value i.e. a floating point number.  Passing a non-numeric value will cause the service to throw an exception.

For output ports, a string is returned containing the numeric value of the corresponding R variable.  If the corresponding R variable does not contain a numeric value then the service will throw an exception.

An example workflow showing the use of numeric values is available on myExperiment.

Integer

For input ports, Taverna interprets what is passed to the service as an R integer value.  Passing a non-integer value will cause the service to throw an exception.

For output ports, a string is returned containing the integer value of the corresponding R variable.  If the corresponding R variable does not contain an integer value then the service will throw an exception.

An example workflow showing the use of integer values is available on myExperiment .

String

For input ports, Taverna interprets what is passed to the service as an R string (character vector).

For output ports, a string is returned containing the string value of the corresponding R variable.

Objects other than strings in R can still yield a string value. If an R variable has a non-string value it can often be exposed as a string.  For example, if an output fred is meant to be a string value but is assigned -2.3, then calling the Rshell service will yield [-2.3].

An example workflow showing the use of string values is available on myExperiment .

Logical vector

For input ports, Taverna takes a list of values and interprets each value as a Logical value.  The corresponding R variable is set to a vector of those Logical values.

For output ports, Taverna expects the R variable to be set to a vector of Logical values.  The service returns a list of strings where each element of the list corresponds to the equivalent vector element.

An example workflow showing the use of logical vector values is available on myExperiment .

Numeric vector

For input ports, Taverna takes a list of values and interprets each value as a Numeric value.  The corresponding R variable is set to a vector of those Numeric values.

For output ports, Taverna expects the R variable to be set to a vector of Numeric values.  The service returns a list of strings where each element of the list corresponds to the equivalent vector element.

An example workflow showing the use of numeric vector values is available on myExperiment .

Integer vector

For input ports, Taverna takes a list of values and interprets each value as an Integer value.  The corresponding R variable is set to a vector of those Integrervalues.

For output ports, Taverna expects the R variable to be set to a vector of Integer values.  The service returns a list of strings where each element of the list corresponds to the equivalent vector element.

An example workflow showing the use of integer vector values is available on myExperiment .

String vector

For input ports, Taverna takes a list of values and interprets each value as a String value.  The corresponding R variable is set to a vector of those String values.

For output ports, Taverna expects the R variable to be set to a vector of String values.  The service returns a list of strings where each element of the list corresponds to the equivalent vector element.

An example workflow showing the use of string vector values is available on myExperiment .

PNG-image

PNG images can only be output from an Rshell service.

For an output port, an R variable with the name of the port is associated with the file used for output.  In the example R script below, the figure variable corresponds to an output port in Taverna.

Remember to close the writing device within the Rshell service.

An example workflow showing the generation of a PNG image is available on myExperiment .

It is not currently possible to return a vector of PNG images

Non-PNG images can also be passed, for example by using pdf rather than png as the R function.  The confusion in naming will be fixed in a future version of Taverna.

Text-file

For input ports, Taverna takes what is passed to the service and writes it to a file on the R server.  The R variable corresponding to the port is set to the name of the file.  For example:

where input_file is an input port of the Rshell service of type Text-file.

For an output port, an R variable with the name of the port is associated with the file used for output.  For example:

where output_file is an output port of the Rshell service of type Text-file.

An example workflow showing the use of numeric values is available on myExperiment.

Other R types

You can use Taverna to pass R values between Rshell services, even when the R value is not one of the explicitly supported types.  To do this you can export the actual value from a script as a String vector using:

where actual_value is the R value you want to pass and value_as_string_vector is an output port of the Rshell service of type String vector.

To read the value into an Rshell service, you use:

where actual_value is the R value you want to read and value_as_string_vector is an input port of the Rshell service of type String vector.

An example workflow showing the use of String vectors to pass R values is available on myExperiment .

In future versions of Taverna, a separate value type will be introduced so that you do not need to include the deparsing/parsing code in your scripts.

Citation

The Rshell service was mainly written by Ingo Wassinck of the University of Twente, Netherlands.

Cite as:

Li, P.; Castrillo, J.; Velarde, G.; Wassink, I.; Soiland-Reyes, S.; Owen, S.; Withers, D.; Oinn, T.; Pocock, M.; Goble, C.; Oliver, S. & Kell, D. (2008), Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data, BMC Bioinformatics 9(334), doi: 10.1186/1471-2105-9-334

 

Labels
  • None