Refactoring scientific workflows for better (re)use

Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their simple programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. For several reasons, the complexity of such graph structures is increasing over time, which may have an impact on scientific workflows reuse.

 

We focus on effective methods for workflow design, with specific reference to the Taverna workflow model. We argue that one of the contributing factors for the difficulties in reuse, is the presence of certain design "anti-patterns", a term broadly used in business process modelling and program design, to indicate the use of idiomatic forms that lead to over-complicated design, and which should therefore be avoided. Our analysis of a sizeable public collection of Taverna workflows from the myExperiments repository shows that several of them exhibit local redundancies, whereby several links and nodes could be removed without altering the semantics of the workflow (e.g., processors duplicated several times with the same constant value as unique input).

 

There is thus a crucial need to provide systems able to detect the anti-patterns and automatically remove them in order to obtain scientific workflows with simpler structures: This is the goal of the DistillFlow approach that we demonstrate here.

 

 

Here, we introduce a java application tool named DistillFlow which aims at rewriting a complex workflow into an simple workflow by removing redundancy (anti-patterns) [1]. Please click here to see the video demo.


The DistillFlow System is used to detect redundancy (anti-patterns) and rewrite scientific workflow by removing anti-patterns. If you would like to check whether or not the workflow obtained is an SP workflow, you can use another tool SPChecker. Or you can use SPFlow to rewrite your non SP scientific workflows into SP workflows while preserving provenance [3].

Functionalities available (since Aug. 2013):

Check whether or not a Taverna 2 workflow is an SP workflow.
Detect anti-patterns which can be merged into a single processor in Taverna 2 workflows.
Rewrite scientific workflows into new structures with less redundancy (remove as many unnecessary tasks as possible).
Export distilled Taverna 2 files which can be loaded and executed by Taverna workbench (current version support Taverna 2).

   Get started!

related Publications

[1] Distilling structure in taverna scientific workflows: A refactoring approach. (Sarah Cohen-Boulakia, Jiuqiang Chen, Carole Goble, Paolo Missier, Alan Williams ,Christine Froidevaux) In BMC Bioinformatics, 2013.

 

[2] Distilling scientific workflow structure (Sarah Cohen-Boulakia, Christine Froidevaux, Carole Goble, Alan Williams, Jiuqiang Chen) In EMBnet Journal Proc. of the 12th International Workshop on Network Tools and Applications in Biology Nettab 2012 (poster), volume 18, 2012.

 

[3] Scientific Workflow Rewriting while Preserving Provenance (Sarah Cohen-Boulakia, Christine Froidevaux, Jiuqiang Chen) In Proc. of the 8th IEEE International Conference in eScience, 2012.