This is a list of examples of the different kinds of refactoring that have been done while developing Taverna, and in particular while developing Taverna 2.
Replacing whole modules (t2core/data -> t2reference)
- Data module supports the data storage/retrieval within the workflows
- Initial module
t2datadeveloped with rough edged, it worked and did the job, but further development in other areas highlighted various issues. For solving these it was determined it was just as easy to build a new architecture from scratch (but strongly inspired by initial module) in stead of refactoring.
- Replacement module
t2referencewas built from scratch with separate unit tests - disconnected from the rest of the code (standalone) - here the choice of not refactoring made it easy to develop the new module in isolation without being tied into the workings of the initial module
- New module built on existing open source frameworks Spring and Hibernate
- Then, replace the use of the t2data module in the core code with using the replacement module t2reference
- This for a couple of weeks made ripples all over the code base, as we had to update everything to work with the new module and the new frameworks - so here was the down side of doing the refactoring in isolation - although we thought we had all the code ready there was still a big job left of doing the full integration.
Simple internal refactoring for readability and changeability
When looking at bugs or behaviour in existing code - in particular if written by another developer or from some years ago, it can be difficult to understand the code, or to feel confident about changing it. One way to deal with this by me personally is to do simple refactoring supported by the IDE (Eclipse), such as:
- Renaming variables/parameter names
- Converting anonymous inner classes to named inner classes
- Sorting members and reformatting code
After doing this it's easier to write unit tests, to reuse components, or to fix some old bugs. This exercise also gives a better understanding of what the code is meant to do instead of just reading it.
Splitting of API and implementation
Although old-school software system design says you should always build the APIs first using thousands of pages of word documents, it's often more beneficial to develop the API by building the implementation and unit tests. In some cases we initially only had an implementation class, and then extracted out the Java interface later when we knew what shape the API would be in. This kind of refactoring is supported by IDEs like Eclipse, and by doing the split we can reduce coupling, because we put the interface and implementation into different JARs - which also gives us the freedom to upgrade the implementation without affecting 3rd party plugin developers.
Modularising the build structure
Although often not affecting the code structure, it has been very beneficial for us to do modularisation of the code from the build point of view. For instance, Taverna has support for various types of services depending on how you interface with them, for instance using WSDL (SOAP), HTTP/REST, SQL conncetions, or even proprietary protocols. The different implementations of these services have for a long time been separated into different Java packages, but they still existed as one big Java module (a single JAR). We ran into problems because while service A needed a special version of a 3rd party dependency such as Axis, service B needed the off-the-shelf version. Having both on the same classpath meant we had to choose who "won" - which lead us to develop our plugin classloader system Raven.
Even before Raven, we had decided to build our system using Maven, and Maven encourages building different modules (JARs) separately. Each module can have their own list of dependencies, their own set of tests, and hence can be developed in isolation from it's sibling modules. (These modules would all also depend on our common "core" modules).
To modularise the code we basically had to create different Java projects for each module, and figure out their dependencies. By doing this we also formalised which 3rd party software was used, and made it possible to use Taverna without pulling in the various service implementatations if they were not needed - saving download time, memory footprint and complexity if a bug arrises.
When doing this modularisation we were also forced to define the internal coupling between the different modules, and this gave hints to where it was needed to extract out new APIs. In some cases we ran into circular depencies, which typically meant that one or both of the modules would be too "big" - and we had to reduce the coupling by splitting it into several modules. This often also meant we had to introduce API interfaces as a separate module.
For Taverna 2 we used this experience to build all core modules as split into an -api module and an -impl module. In theory all third party modules and service implementations etc. need only depend on the -api modules, while the -impl modules are provided by the specific version that has been installed. (And can easily be upgraded in a Firefox-style way with "New updates available" UI)