Here is a, probably partial, list of what may be meant by security with respect to Taverna 2.1
- Gaining access to the description of services
- Calling a secured service
- Accessing secure data
- Accessing data from secure databases
- Accessing secure R servers
- Keeping the data secure
- Keeping provenance secure
- Keeping workflows secure
- Keeping runs (running workflows) secure
- Access to security credentials
- Security in what a service does
- Possible other issues
- Overall scenario
- NeISS scenario
Gaining access to the description of services.
For example, a WSDL description may be at a HTTPS URL. Typically the WSDL should not be behind HTTPS even if the service is secure but that is most often not the case.
Taverna (as of 2.1.2) can cope with this - if the service's certificate is signed by any of the Certificate Authorities Taverna already recognises (copied from Java the first time it starts up or explicitly added by the user using the Credential Manager) then Taverna will access such a WSDL in a normal way. Otherwise, Taverna will pop up a dialog warning the user that they are establishing a connection that is not trusted at the moment and asking the user if they want to trust such a service. This is what normally happens in Web browsers in similar situations. Taverna can cope with HTTPS Basic and Digest AuthN.
Calling a secured service.
The invocation of the service is secured.
The way in which the service is secured can vary a lot e.g. WS-Security, https, signed http. Taverna can currently do a invocation via HTTPS and implements a portion of WS-Security that relates to username and password authentication. WS-Security signing or encryption has not been implemented yet (we have not seen services that use this yet, apart from some OMII services). Taverna can also do HTTP Basic AuthN (with username and passwords); it cannot do Digest AuthN but perhaps would not be too difficult to add.
Donal Fellows says:
There are use-cases for specifying credentials on a per-server basis; some data sources require a non-standard identity to be used when accessing them. And there's also the question of whether there should be identities attached to the workflow ("you can access my service, but only through this workflow") and use of the identity of the user calling/invoking the workflow. J2EE covers some of these things already IIRC, so maybe it's right to raid there for ideas.
The other question is how to stop different identities on the same server from getting conflated. One workflow run owned by one user should not (must not!) be able to steal the credentials of another in the same server, even though there are fairly complex components in the workflows (i.e., Beanshell). It would be very beneficial to the server to not have to spin up a whole new instance for every user who wants to use one!
Stian Soiland-Reyes says:
Without proper sandboxing we would need to do some complicated setup with automatically rollbacked virtual machines and/or unix accounts with different VMs.. ie typical grid job sandboxing - but as Donal points out that could be resource hungry. In the case of more machine resources than workflows to run, you could of course spin up these in advance so that they would be quickly able to start running a workflow when it is queued for execution.
Alan Williams says:
A quick look at the code suggests that you could change the CredentialManager to use different KeyStores depending upon the owner of a workflow. However, we do know that there are nasty ways to hack a beanshell so that it gets at information which it shouldn't and the same probably applies to ApiConsumer.
Alex Nenadic says:
One workflow may access several secure services each of which may require different type "security" and a different credential from the same user. Running such a workflow on the Taverna Server requires user to "surrender" all their credentials to the Server and tell the Server when to use which and how.
As Alan said - you need to identify the user (who wishes to upload and run their workflows) to the Server and manage their privileges (on the
Server) and then you need to identify the user to the third party
services invoked as part of their workflows. These two are totally
separate from one another.
Accessing secure data.
At the moment Taverna can read files either local or URI specified ones. If the URI is secured then Taverna will popup a dialog in a similar manner to calling a secured web service. Taverna can cope with HTTPS Basic and Digest AuthN.
Accessing data from secure databases.
Currently Taverna allows SQL username and password specification but that is kept "plain text" in the workflow, normally as string constants.
Alex Nenadic says:
The username and password could be done with the Credential Manager so it would not be saved in the workflow file. If the database is remote - it depends how it is being accessed so if it requires username and password to be sent in plaintext there is nothing Taverna can do about it. This would require changes to the SQL local services.
Accessing secure R servers.
Again, Taverna allows username and password specification, but it is kept "plain text" as part of the configuration of the R service.
Alex Nanadic says:
We could try to save username and password in Credential Manager. Username and password pairs in the Credential Manager are identified by a service URL, so as long as we can identify an R server (or a database server) by some URI-like string this can be done.
Comment - the two above may be specific cases of a more general need to allow the secure specification of parts of a workflow so that they are not shared with the rest of the workflow and/or are user-specific.
Keeping the data secure.
Taverna can keep its data in a database, but that database is not secured.
Keeping provenance secure.
The provenance is kept in the same database as the data.
Stuart Owen says:
Regarding the security of the data & provenance, this is largely external to the Taverna application. The database can be secured using traditional mechanisms and is down to the database. For example you could be very strict about grant permissions and communicate via ssl. Stian already mentioned putting the database on a secure filesystem.
The DataSource as accessed via a JDNI lookup, and could be configured externally to the Taverna application as long as Taverna can access the JDNI instance. Particularly for the Taverna Server, the DataSource could be configured as part of the Application Server context.
Within the Taverna workbench, a simply JNDI instance is setup up and configured upon startup - but every where else within Taverna the datasource is looked up via its JDNI name.
Keeping workflows secure.
The workflows are kept (when not saved) internal to Taverna core. The set of open workflows is kept in the workbench. Security is currently implicit by the core being one-to-one with the workbench started by a user. Users of the workbench have upload/download from myExperiment based upon username and password. Previous runs along with the workflow that was run are kept in an unsecured provenance database.
Keeping runs (running workflows) secure.
The runs are resident in the core. The set of current runs is kept in the workbench. Previous runs are kept in the database. The current runs are secured by being one-to-one with the workbench. The previous runs are not secured.
Access to security credentials.
Taverna currently has a master password mechanism that users must enter via the workbench.
Security in what a service does.
At the moment a beanshell script could do anything. There is no protection from malicious/stupid services.
Possible other issues
Keeping list of recent workflows and known services in plain text as configurations.
Alex Nenadic says:
Just a comment on multi-user credential management on the Taverna Server - which is one of the security aspects that needs to be sorted.
Not sure if this is what Donal and Stian meant, but if we had one VM per user/account than, as far as Credential Manager is concerned, the user would only need to upload the keystores behind their current Workbench Credential Manager and the Server would know which keystores to use for that user (i.e. less changes to the Credential Manager as it would be single-user as it is now).
Otherwise, Credential Manager would have to be changed to know which user the Server is executing a workflow for and then switch to the keystores belonging to that user (such keystores would have to be uploaded to the Server by the user beforehand, of course). So Credential Manager itself would become a multi-user utility. Perhaps this is a better solution as we would not depend on the underlying user-VM setup and we could have, say, 10 Server instances serving 100s of users.
Richard Holland says:
Have you looked at the mechanisms behind X509 certificate authentication and delegation?
I'm thinking along these lines:
- Have a certificate manager that users can securely store their details in, a bit like the Keyring on KDE/Gnome. This certificate manager can either run inside Taverna Workbench, or it can be on a Taverna Server that the user trusts. I'm going to call this a SecurityStore.
- When they first use the SecurityStore it gives them an X509 certificate which is unique to the user. That certificate gets saved inside their .t2workbench folder in their home directory.
- The user stores their user/pwd for various services inside the SecurityStore. In services that require user/pwd (or other auth forms, e.g. certificates) the workflow will include a reference to an entry in the SecurityStore and the IP address of the machine that the SecurityStore lives on.
- When the workflow executes, the user will send a delegate their X509 certificate to the machine that is executing the workflow - could be their local Workbench, or a remote Server. They don't even have to send the delegate certificate itself - instead they could send it to MyProxy or something similar and ask the executor to retrieve it from there.
- The delegated certificate is not their original certificate, so they are not sending out their unrestricted credentials. Importantly, the delegated certificate is specific to the requested execution (some kind of hash of the workflow XML) and has an expiry time. Workbench/Server will refuse to accept it if it is used on any other workflow other than the one it was delegated for, or if it has expired.
- The Workbench/Server contacts the IP address of the SecurityStore specified in the workflow and uses the delegated certificate to authenticate itself and retrieve the usernames/passwords/certificates it needs based on the lookup values specified in the workflow.
Now, all the above can be made to happen auto-magically and invisibly by default if it is coded in a user-friendly way. It can also be made to happen in a way that puts users right off by demanding password entry all over the place, so implementation has to be carefully thought out. But I think it could work?
Also, there is no getting away from the fact that for any remote execution of code that requires credentials to authenticate itself to something on your behalf then at some point you're going to have to transmit something to it to help it identify you, and it's always possible that that transmission could be intercepted. SSL is about the best you can hope for to encrypt it, and so the use of delegated certificates which are task-specific and have expiry times minimises the risk of someone being able to do anything bad with it should they manage to intercept it.
Richard Holland says:
The SecurityStore I described would be capable of storing ALL kinds of security authentication mechanisms, whether username/password, or certificates, or passphrases, or proxy logins... whatever. The delegation method doesn't remove the need to have this authentication - it just secures the process of getting the per-service authentication from where it is defined (Workbench) to the place where it is needed (Server) and allows it to happen in a properly multi-user multi-workflow way which should be very hard to 'hack' from BeanShells.
When adding activities to the workflow they would notice if they hit a security barrier or be configured to expect one, then as part of the add-to-workflow process they would prompt for the relevant credentials and store them in the SecurityStore, making a note of the lookup key for the stored SecurityStore value in the activity entry in the workflow. This would happen at time of adding the activity to the workflow either within the Workbench, or programmatically via APIs. From the user's point of view it's almost exactly what happens already with the existing CredentialStore mechanism, except it would be able to store many more kinds of authentication than just usernames/passwords as it presently does, and it would be able to securely use these from anywhere that the Workflow is submitted to run.
When sharing workflows with others the workflow would forget any lookup keys to the SecurityStore (because the receiving user's own store would have a different signature from the one that the lookup keys related to) and Taverna would re-prompt the receiving user for relevant details when they first opened the workflow, then store them in the receiving user's own SecurityStore and update the lookup keys in the workflow accordingly. Again, from a user point of view, much the same as the existing CredentialStore.
By using delegation like this and not simply sending over copies of entire keystores from Workbench to Server, Server can run multiple users at once using different authentications against the same services/proxies. Also by abstracting the concept of SecurityStore out from both sides, the interaction with the SecurityStore service itself when executing workflows is identical at both Workbench and Server level - this fits neatly with your ideas about making the execution engine a separate entity.
Shibboleth then just becomes another authentication mechanism that SecurityStore understands.
Might be easier to explain as a diagram...?
Ravi Madduri says:
This is something we (at the caBIG project and Globus) are interested in and did put in some work in this direction. We did integrate the credential manager with caGrid security infrastructure and we plan to extend it Taverna 2.1 final release. More details on caGRid security here : http://www.cagrid.org/display/gaards/Home. Alex and Stian helped us a great deal to customize taverna security manager with PKI/Delegation/X509 based caGrid security.