Alex has read the comments by others and has nothing to add - they have covered everything :-).
On the week of Nov. 8 I went to visit Dr. Jacek Sroka at the University of Warsaw, Poland. Jacek and I had collaborated on the definition of a formal semantics for Taverna. This trip was to find suitable M.Sc. projects for students in his Faculty, centred on myGrid technologies and Taverna specifically.
Projects span the entire duration of the M.Sc. program (1.5 years), this means that students will start working on their project at the start of the program, alongside their course curriculum.
Some of you may have caught the reference to Krzysztof Kieslowski's Dekalog
Project ideas, in approximate order of perceived relevance
1. Automating the publication process for a workflow along with all its local dependencies.
Scenario: researcher has developed local tools (services, Java libs) along with workflows that use them. One specific instance is a PhD student at Warsaw who is using home-grown simulation methods to test his SBML models.
User then wants to publish the workflows to myExperiment.
What needs to happen is that all the local components that the workflow depends on, and which are not embedded in the workflow, be published into a suitable web-accessible space so that (a) they can execute efficiently, and (b) the workflow really becomes "public" as its local dependencies are resolved.
The idea is to automate the entire component publication process as much as possible, including the wrapping step (as a WS, if appropriate) and deployment in an infrastructure that may include various types of clouds or Grid environments.
Alan says: It sounds as if there are three separate pieces of work here.
All of them would be a good thing.
Workflow bundles in Scufl2 sounds like most obvious approach. Should limit this task to a particular execution environment, like perl scripts (c (can be put in temp folder) or WAR (deployed to Tomcat)
Packaging a workflow and its dependencies to be run standalone could be done with something like Vagrant (http://vagrantup.com/docs/getting-started/index.html). Similar tools exist for creating ec2 instances and the like.
I've also developed SBML model simulation workflows. Documentation can be found here. The Copasi simulation service I used was packaged up in a WAR file that was deployed in a Tomcat server.
2. Collaboration on the ongoing OSGI effort
A very good student, who also works for IBM in Poland and is the local reference person for a variety of Apache projects, is very interested in learning more about the effort to move from Raven to OSGI for Taverna.
Alan says: It is not clear what work this might involve. Does the person want to help? A possible project would be to look at all the other OSGi work "out there" and make suggestions as to what could be leveraged/used and to implement a few
Tycho aims to bridge the gap between the Maven and Eclipse mechanisms for handling dependencies and building of source code. A couple of days work has already gone into setting this up, but with no real success. A couple of show stopping bugs were identified - https://issues.sonatype.org/browse/TYCHO-477 and https://issues.sonatype.org/browse/TYCHO-458. The student could iron out these bugs and produce a fully automated and robust build system.
This would be a project with direct and tangible positive effects on the next gen Taverna workbench project.
If not fleshy enough, could add bits to improve/extend Tycho itself to do much more advanced stuff.
Could try building a Taverna 2 workbench in osgi (as a fallback option), but would be very hard, and also easily stepping on David's toes and getting list in Raven
3. Spreadsheets as Taverna workflows.
The Taverna LC project developed at Warsaw has shown that one can effectively embed a Taverna workflow in an OpenOffice Calc spreadsheet. This offers interesting opportunities, for example the workflow inputs and the outputs can be seamlessly pre- and post-processed, respectively, as part of the same spreadsheet. A demo video can be found here .
The project is no longer progressing but this is a chance for it to be extended and/or better integrated with the Taverna codebase.
Alan says: It would be very interesting to update this project. Also if it could be integrated somehow with RightField/Populous then lots of things would be possible.
Interesting - also: implicit wf building from spreadsheet with Taverna components as excel functions. Or just update e listing work to use latest Taverna.
4. Development of a library for creating workflow editors on the Web.
This goes in the direction of Web-based workflow editing, but generalises to a framework that can be applied to editing Taverna workflows, along with other graphical workflow languages, such as Kepler, for example. The framework would be customised for specific graphical languages.
Alan says: I think this would be extremely useful. I would prefer though that it be done for Taverna and then, having done it well for one specific case, be generalized to others. To repeat what I have said several times before, something like wireit could do a lot.
Jits says: Agreed, this would be extremely useful. The student could build this using the Eclipse Rich AJAX Platform (RAP) (http://www.eclipse.org/rap/) which complements the Eclipse Rich Client Platform (RCP) that we are using to build the next gen workbench.
A lot of the pieces "under the hood" that we are building for the next gen workbench could be reused in an RAP application to provide a web based Taverna! Ideally, the two would live under the same umbrella project, so as to keep things in sync and support both at the same level.
Hard! Forget about "general". Taverna server does not provide the backend support needed like what ports on processor X " "which services ". Could limit it to certain service types and no fancy iteration stuff. Still need to build backend.. Could do simple dropdown boxes and. generate diagram on server. Challenge data input/output
As an application, this leads to the idea of
5. Achieving some level of integration between Kepler and Taverna.
Integration amongst workflow systems has been shown to be useful in some contexts, e.g. Taverna <--> Galaxy.
Although we know that the Kepler and Taverna models are different, one can "mix and match" in a limited way, and at the granularity of whole workflows.
It would for example make sense to avoid replication of implementation effort for some types of components, by exposing Kepler-specific components through simple Kepler workflows that can be launched from Taverna, and vice versa.
Alan says: Antoon looked at this. Creating a Taverna activity to run Kepler workflows should be possible and vice versa.
Need to do browsing of Kepler components/workflows or vice-versa. We can more easily in executing Kepler from Taverna than the other way.
5.i Co-designing / editing Taverna and Kepler workflows, using an integrated Web-based editing environment.
This would be an application of the framework from the previous point. The syntax-aware editor would know about data types that are specific to each of the two systems, for example, and will provide integrated editing capabilities (e.g. for strucutural nesting of different workflows).
Suitable starting points include yahoo pipes and the google Web toolkit.
Also look at recent thread with SurfNet.
Sorry, but.. this is not feasible.
Anything is feasible. What you probably mean is it is 'too hard' and would take a lot of time and major effort compared to its usefulness.
5.ii Launching entire Kepler workflows from Taverna, and vice versa.
This is presumably a simple matter of developing wrappers as Taverna processors and Kepler actors.
Isn't this the same as 5?
6. Mining workflows from myExperiments to extract patterns that can be used to help users during workflow design.
This is a bottom-up approach to assistive workflow design. It involves mining a catalog of graph patterns for service usage from myExperiment, so that the workflow design env can recommend the use of specific patterns to users when a specific service becomes part of the workflow.
The FU work is complementary to this. FUs are a top-down approach where recommendations for correct design are based on service annotations (e.g. in BioCatalogue), while design pattern mining is bottom-up, as it "discovers" usage patterns from a catalogue of published workflows.
Alan says: Antoon looked at workflow patterns. His thesis would be a good start.
I've got a PDF copy of Antoon's thesis.
Jits says: This would be very helpful to the workflow components work being done as part of the next gen workbench. These mined patterns could then be used to generate component definitions and published into the components library, making it available to all Taverna next gen workbench users.
We intend to do some basic mining of workflows for patterns too. So having a student do this would be very handy.
Can use Sparql endpoint at myExperiment, and/or SCUFL2 tools (and then sparql), also BioCatalogue linking could be very interesting.
7. Android apps for the mobile researcher:
The general idea is to control Taverna from a mobile phone. There is much interest in Android programming amongst students at Warsaw but we need to consult with users (e.g. Paul) to make this concrete.
Alan says: We specified this up on the whiteboard. I don't know if Paul took pictures before it got cleaned. This was for finding workflows on myExperiment and then running them on a Taverna server. It appeared to be feasible for a good student.
Jits says: I think one of Carole's students is already doing the project Alan mentions.
Carole's student is doing annotation only, so it should be available. Paul wants to be able to check his workflow (running on Taverna Server) from the pub, and rerun it with any parameters.
It would be nice if this could be done via a web portal of some kind so that it would be platform agnostic. That would need a service to be stood up though, so not as practical in that respect.
Need to be a very well designed Web Portal to enable easy viewing on a tiny screen. As soon as you need to zoom/scroll etc then it gets messy. I guess this is why there are 'apps' - they make best use of the phones gui paradigms and not the webs.
8. Developing a front-end for the Taverna provenance component.
Simple part: point-and-click generation of provenance queries using the existing language(in XML syntax). This can be a workbench plugin, for example.
Less simple part: what is the proper way to convey the result of a provenance query to the user? A query essentially returns a slice of the entire provenance graph. One simple idea is to visualise such slice as a (hopefully small) graph. However, is this the most appropriate paradigm? shouldn't be provenance graphs be as invisible as, say, a relational table? and how far does the graph approach scale on large slices?
- Zoom work on views on workflows and their provenance
- The Kepler provenance browser
Alan says: Also look at David's work for Moby where he started work on this for Mark Wilkinson
Not as a provenance graph, that's not a user interface. Clicking on the existing workflow diagram, and just show the values there (using some SVG-tricks which David could help with.)
Are there any OPM viewers? What does the current results view not do that users need?
9. Provenance mining.
One can mine provenance logs in the hope to understand the latent (implict) data types for processor ports during past runs, and thus help users in the next design.
Ref.: Jerzy's prototype Biocatalogue plugin. That piece of work can be continued, in fact it may be possible for the students to get in touch with Jerzy, who I think is in Warsaw.
Also, some form of log mining (not provenance, though) can be used to predict future performance of workflows based on past execution, thereby helping to suggest suitable deployment strategies for services and whole workflows.
Not quite sure what is the concrete work proposed to do here.
10. Smart caching: explore the role of provenance in reproducibility of workflow results.
Context: this fits well with the Wf4Ever vision.
Reference: Pegasus exploits this basic idea for essentially the complementary purpose: avoid having to store large amounts of provenance, by repeating some of the workflow steps instead of storing all intermediate results.
But we're doing that..?
Could perhaps do an initial prototype as a dispatch layer, would probably still need a 'search by value' in the provenance API, plus a better way to track service info (Scufl2?)
This was always the end game for the provenance in T2. eg. in case of service failure you could re-use old results, to increase speed you might be happy to use previous results for this service ....