Taverna workbench

The educational technology and digital learning wiki
Jump to navigation Jump to search

Draft

Introduction

Taverna is a workflow toolkit that is mainly used in e-science projects, e.g. myExperiment. As of 2010, Taverna has mostly focused on supporting the Life Sciences community (biology, chemistry and medical imaging).

“Taverna allows for the automation of experimental methods through the use of a number of different services (such as Web services) from a very diverse set of domains – from biology, chemistry and medicine to music, meteorology and social sciences. Effectively Taverna allows a scientist with limited computing background and limited technical resources and support to construct highly complex analyses over public and private data and computational resources, all from a standard PC, UNIX box or Apple computer.” (What is Taverna?, retrieved 13:09, 17 June 2010 (UTC)).

“Taverna is an open source family of tools for designing and executing workflows, created by the myGrid project and funded by OMII UK, the EPSRC, BBRC, ESRC, JISC and Microsoft. The workbench includes contributions by the Moby Consortiumm and the University of Twente. The family consists of the Taverna Engine (the workhorse), and the Taverna Workbench (desktop client) that sits on top of the Engine.” (Workflow Type: Taverna 2, myExperiment, retrieved 17:11, 14 June 2010 (UTC))

See also myExperiment (a repository for scientific workflows and other objects. It is integrated with the taverna workbench).

Architecture and tools

Taverna explains its workflow architecture by first quoting the Free on-line dictionary of computing: “2. <job> The set of relationships between all the activities in a project, from start to finish. Activities are related by different types of trigger relation. Activities may be triggered by external events or by other activities.” and then translating this principle into an operational, computational definition: “the definition we adhere to is: the co-ordination of one or more services into a data analysis pipeline. This is treated as one entity, or a workflow.”

Each activity (also called operation) is implemented as a service that can take inputs and produce outputs. An output then can be linked to another service as input and so forth. This principle is illustrated with the following example:

  1. Fetching a DNA sequence with an identifier input and producing an output as a string representing the nucleotide sequence
  2. Transcribing this ouput into an RNA sequence

The first input (e.g. the DNA squence identifier in our case) is called workflow input and the final output (e.g. the RNA squence) is called the workflow output.

Most operations are primarily located on other machines, typically a so-called web service. “The role of Taverna is to remove the tedious parts generally associated with general data analyses, e.g. removing the need to cut and paste data, press buttons on forms etc. Although Taverna does not access Web pages itself, you can behave as if it is using this approach. Each service (as a replacement for the Web page) is provided by an institute that has some code you can run, without actually having to own or possess the code. All you have to do is point Taverna to the location of the interface to the code. This is typically a URL that contains a WSDL (Web Service Description Language) file. A WSDL file describes how a client program (Taverna) can access a piece of code (located on remote machine).” (What is a workflow, retrieved 13:09, 17 June 2010 (UTC)).

Taverna defines its own workflow definition language. It “is designed to present a workflow model where entities within the workflow map as far as is possible onto entities within a scientist’s description of the eScience process the workflow defines. This leads to a dataflow view of the world – the workflow is constructed from data processing and data transport (processors and data links)”. In contrast, BPEL for example, “is a process-centric model where the nodes in the workflows are activities and the data passed between them form a control system rather than being a genuine flow of messages.” (Why Taverna does not use BPEL as the workflow definition language?, retrieved 17:11, 14 June 2010 (UTC))

Taverna Workbench provides a desktop authoring environment and execution engine for scientific workflows. Workflows also can be executed via command line, by a remove server, as service, etc.

Taverna 2.1.2 workbench screenshot. source: [http://en.wikipedia.org/wiki/File:Dragon-workflow.png Wikipedia

Taverna basically includes two types of services:

  • Various kinds of domain (scientific) services that are mostly provided by third-party web services of various kinds, e.g. in the popular WSDL format or non-standardized like BioMoby or BioMart services. These services are typically provided by large government-funded institutions like EMBL-EBI in Europe. However, there exist many small services implemented by individuals or smaller labs. To help these smaller actors provide services, they can make use of the SoapLab, a set of Web Services for finding an analysis tool, discovering what data it requires and what data it produces, starting it and to obtaining results. Soaplab is especially well suited for sets of similar tools, such as the European Molecular Biology Open Software Suite (EMBOSS).
  • So-called Shim services are used to connect inputs and outputs of domain services. "Shim" referes to “a small piece of software that is added to an existing system program or protocol in order to provide some enhancement” (PCMag Encyclopedia). Within Taverna, a shim service typically it involves translating formats. Taverna includes many built-in shim services, but a user may have to define its own.

Examples

Using Biomart and EMBOSS soaplab services, This workflow retrieves a number of sequences from 3 species: mouse, human, rat; align them, and returns a plot of the alignment result. Corresponding sequence ids are also returned. Source: http://www.myexperiment.org/workflows/821
Use the local java plugins and some filtering operations to fetch the comic strip image from http://xkcd.com/. Source: http://www.myexperiment.org/workflows/824

Links

Main web site
Overviews
Tutorials

References

  • Missier, P., Turi, D., Goble, C., Oinn, T. and De Roure, D. (2007) Taverna Workflows: Syntax and Semantics. In: IEEE International Conference on e-Science and Grid Computing, December 2007, Bangalore, India. pp. 441-448.
  • Senger M., Rice P., Bleasby A., Uludag M., Soaplab: Open Source Web Services Framework for Bioinformatics Programs, The 10th Annual Bioinformatics Open Source Conference, 2009. PDF
  • Turi. D; P. Missier, C. Goble, D. De Roure, and T. Oinn, "Taverna Workflows: Syntax and Semantics," in IEEE International Conference on e-Science and Grid Computing, 2007, pp. 441-448. PDF
  • Oinn, Tom et al. (2000). Taverna: Lessons in creating a workflow environment for the life sciences, Concurrency Computat.: Pract. Exper. 1-7. PDF Reprint.
  • Wei Tan, Paolo Missier, Ravi Madduri and Ian Foster. Building Scientific Workflow with Taverna and BPEL: a Comparative Study in caGrid. Proc. 3rd e-Science Conference. 2007. PDF Preprint.