July 7-10, 2009, UCSB, CA, USA
this was a fruitful event to be at.
A small number of non-Earth Science (ES) people were invited to this semi-annual meeting of the Earth Science (ES) community. Being the only non-ES representative of a non-US organization made me feel twice an alien, but this is a friendly and welcoming community, which seemed to appreciate our contribution to their workshops.
NASA and its contractors like JPL are amongst the largest players, at least in terms of project funding, but there were a fair number of academics. The full list of participants is available.
The full program is here: ESIP summer meeting
The topic of data preservation and curation, and of (provenance management* in particular, was on everybody's agenda.
This was a sizeable event: the myGrid and provenance talk was (the fourth of four long talks for one of 4 parallel tracks; and because it was on day one, that allowed me to break the ice and I have then found much interest in our technology.
A few of the concrete leads are described below. Other expressions of interest emerged.
I have additional notes on specific contrbutions (for the "data preservation" track which I was able to follow – for example, I missed Peter Fox's talks because of cocnurrent timing), but as I don't have not much time to write them up, please ask for these if interested.
Meeting with Peter Fox
oddily, we talked mostly about data quality.
It turns out they have been reading the Qurator papers and they used them to support a recently submitted NASA internal R&D funding proposal!
We had a very good chat and their concrete problems in quality assessment for instrument data (in earth science observations) very much resonates with our early Qurator idea. Peter concluded that Qurator was "ahead of its time" which I frankly find amazing.
- Specifically interested in the quality ontology idea. use of provenance to establish quality (express quality judgments).
- Reconciliation of multiple independent quality assessments.
- Use of the quality ontology to organize the quality knowledge, share it, and provide explanation feedback to users.
- Help users set appropriate quality thresholds on data to balance the tolerance to errors with the "data yield", i.e., how much data you have left after filterin.
- Using provenance to explain these trade-offs (I forwarwed our IPAW 08 paper on using provenance to do just that!)
SciFlo and provenance
Brian Wilson of JPL labs (NASA contractors, formally) described a data-driven workflow system they use internally, and described a provenance capture and representation model that is remarkably, and unsurprisingly, similar to ours – Brian was at my long talk on myGrid and provenance and we are on the same wavelength in terms of provenance architecture.
They use provenance for instrument calibration and drift correction, data reproducibility
Taverna used at JPL
Had a long chat with a chap from JPL (Hook Hua), a MTS who had been hacking away at T1.7 as part of his daily job in instrument design – these are measurement instruments that will eventually fly space missions, and their design involves exploring a large space of multiple parameters configurations: a typical parameter sweep use case from our perspective, only it is in the area of space instrument design which I found quite new.
We got into the details of how to model these types of workflows, told me about the large number of shims and wrappers they write to deal with legacy systems and data formats, praised beanshells as a godesnd , advocated a Jython plugin for Taverna, and had him listen to me explaining the wonders of T2, and promise he would transition to the beta.
I think his workflows should go on myExperiment and would also make good use cases for "granular" provenance (exploration of the parameters space is done by proper implicit iteration...)
Taverna used for teaching at U. of Arizona
Nirav Merchant is the director if the BioComputing facility at the Arizona Research Labs, part of U. Arizona.
Amazingly, he has been using Taverna as a teaching tool on his course in methods for bioinformatics and "fostering computational thinking in biology"!
He has invited us to do a remote webex or some other form of interactive hands-on web meeting on Taverna-- Katy, are you there?
Having listened to my talk on collaborative science, he is also interested in talking to us in the context of a NSF-funded project called iPlant collaborative, which deals with two of the current grand challenges in life sciences, on:
- genotype/phenotype association (Gen2Phen)
- phylogenetic relationships among species (iPToL)
we should definitely be in touch