Taverna workbench

From EduTech Wiki
Jump to: navigation, search

This article or section is a stub. A stub is an entry that did not yet receive substantial attention from editors, and as such does not yet contain enough information to be considered a real article. In other words, it is a short or insufficient piece of information and requires additions.

Draft

1 Introduction

Taverna is a workflow toolkit that is mainly used in e-science projects, e.g. myExperiment. As of 2010, Taverna has mostly focused on supporting the Life Sciences community (biology, chemistry and medical imaging).

“Taverna allows for the automation of experimental methods through the use of a number of different services (such as Web services) from a very diverse set of domains – from biology, chemistry and medicine to music, meteorology and social sciences. Effectively Taverna allows a scientist with limited computing background and limited technical resources and support to construct highly complex analyses over public and private data and computational resources, all from a standard PC, UNIX box or Apple computer.” (What is Taverna?, retrieved 13:14, 17 June 2010 (UTC)).

“Taverna is an open source family of tools for designing and executing workflows, created by the myGrid project and funded by OMII UK, the EPSRC, BBRC, ESRC, JISC and Microsoft. The workbench includes contributions by the Moby Consortiumm and the University of Twente. The family consists of the Taverna Engine (the workhorse), and the Taverna Workbench (desktop client) that sits on top of the Engine.” (Workflow Type: Taverna 2, myExperiment, retrieved 17:11, 14 June 2010 (UTC))

See also myExperiment, a repository for scientific workflows and other objects, which is integrated with the Taverna workbench and the Kepler workflow system, a different system.

2 Architecture and tools

2.1 Workflows

Taverna explains its workflow architecture by first quoting the Free on-line dictionary of computing: “2. <job> The set of relationships between all the activities in a project, from start to finish. Activities are related by different types of trigger relation. Activities may be triggered by external events or by other activities.” and then translating this principle into an operational, computational definition: “the definition we adhere to is: the co-ordination of one or more services into a data analysis pipeline. This is treated as one entity, or a workflow.”

Each activity (also called operation) is implemented as a service that can take inputs and produce outputs. An output then can be linked to another service as input and so forth. This principle is illustrated with the following example:

  1. Fetching a DNA sequence with an identifier input and producing an output as a string representing the nucleotide sequence
  2. Transcribing this ouput into an RNA sequence

The first input (e.g. the DNA squence identifier in our case) is called workflow input and the final output (e.g. the RNA squence) is called the workflow output.

Most operations are primarily located on other machines, typically a so-called web service. “The role of Taverna is to remove the tedious parts generally associated with general data analyses, e.g. removing the need to cut and paste data, press buttons on forms etc. Although Taverna does not access Web pages itself, you can behave as if it is using this approach. Each service (as a replacement for the Web page) is provided by an institute that has some code you can run, without actually having to own or possess the code. All you have to do is point Taverna to the location of the interface to the code. This is typically a URL that contains a WSDL (Web Service Description Language) file. A WSDL file describes how a client program (Taverna) can access a piece of code (located on remote machine).” (What is a workflow, retrieved 13:14, 17 June 2010 (UTC)).

Taverna defines its own workflow definition language. It “is designed to present a workflow model where entities within the workflow map as far as is possible onto entities within a scientist’s description of the eScience process the workflow defines. This leads to a dataflow view of the world – the workflow is constructed from data processing and data transport (processors and data links)”. In contrast, BPEL for example, “is a process-centric model where the nodes in the workflows are activities and the data passed between them form a control system rather than being a genuine flow of messages.” (Why Taverna does not use BPEL as the workflow definition language?, retrieved 17:11, 14 June 2010 (UTC))

2.2 Taverna tools overview

Taverna is comprised of three tools:

(1) Taverna Workbench provides a desktop authoring environment for scientific workflows and access to the execution engine. The workbench can be called a superclient. In the same way that a web browser can access various contents from the web and a local PC, Taverna allows to plumb together services available from the web and locally.

(2) Taverna Enactor, the excution engine that takes a Taverna workflow and executes it using the data provided by the user, over the services described within the workflow. Workflows can be executed through the workbench, via command line, by a remove server, as service, etc.

(3) Simple Conceptual Unified Language (SCUFL), the taverna dataflow language. It is encoded in XML and graphically available in the Taverna workbench.

(4) In addition, Taverna is strongly tied to myExperiment, a web based application to share Taverna and other scientific objects built on the Ruby on Rails platform. myExperiment may be accessed directly through the workbench.

2.3 Taverna workbench

Taverna 2.1.2 workbench screenshot. source: Wikipedia

2.4 The SCUFL language

According to Oinn et al. (2004), the principles behind the Scufl language are captured in the origin of the acronym: simple, conceptual, unified, flow language:

Simple. The language aims to be as simple as possible. The target user communities have domain expertise and domain problems to investigate; their interest in programming is secondary. Conceptual. A Scufl workflow should match the users' conceptual model of their problem. Implementation detail should be kept to a minimum, by exploiting techniques such as introspection over service descriptions. Unified. Coordinating web services was the initial motivation. However users want to treat these and other services, which are essentially the same from their perspective (e.g. grid services, local applications), in a unified manner. Flow Language. The basic conceptual model is of a network of processing activities linked by data and control flows.

(Oinn et al., 2004: 438).

The end-user (researcher) doesn't need to know SCUFL, since there is a graphical workflow design editor. Below is sample SCUFL v0.2 file found in the Taverna Design Guide at cagrid.org (retrieved 14:22, 29 June 2010 (UTC))

<?xml version="1.0" encoding="UTF-8"?>
 <s:scufl xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha" version="0.2" log="0">
  <s:workflowdescription 
     lsid="urn:lsid:www.mygrid.org.uk:operation:W0TIAJ5S3K0"
     author="Tom Oinn" 
     title="Example of a conditional execution workflow">
     If the input is true then the string 'foo' is emited, if false then 'bar'. 
     Just a simple example to show how the monster works, so to speak.</s:workflowdescription>
  <s:processor name="Fail_if_false">
    <s:local>org.embl.ebi.escience.scuflworkers.java.FailIfFalse</s:local>
  </s:processor>
  <s:processor name="Fail_if_true">
    <s:local>org.embl.ebi.escience.scuflworkers.java.FailIfTrue</s:local>
  </s:processor>
  <s:processor name="foo">
    <s:stringconstant>Foo</s:stringconstant>
  </s:processor>
  <s:processor name="bar">
    <s:stringconstant>Bar</s:stringconstant>
  </s:processor>
  <s:processor name="Echo_list">
    <s:local>org.embl.ebi.escience.scuflworkers.java.EchoList</s:local>
  </s:processor>
  <s:link source="condition" sink="Fail_if_true:test" />
  <s:link source="condition" sink="Fail_if_false:test" />
  <s:link source="foo:value" sink="Echo_list:inputlist" />
  <s:link source="bar:value" sink="Echo_list:inputlist" />
  <s:link source="Echo_list:outputlist" sink="result" />
  <s:source name="condition">
    <s:metadata>
      <s:description>Enter the string 'true' or 'false' here to show the conditional branching</s:description>
    </s:metadata>
  </s:source>
  <s:sink name="result" />
  <s:coordination name="bar_BLOCKON_Fail_if_true">
    <s:condition>
      <s:state>Completed</s:state>
      <s:target>Fail_if_true</s:target>
    </s:condition>
    <s:action>
      <s:target>bar</s:target>
      <s:statechange>
        <s:from>Scheduled</s:from>
        <s:to>Running</s:to>
      </s:statechange>
    </s:action>
  </s:coordination>
  <s:coordination name="foo_BLOCKON_Fail_if_false">
    <s:condition>
      <s:state>Completed</s:state>
      <s:target>Fail_if_false</s:target>
    </s:condition>
    <s:action>
      <s:target>foo</s:target>
      <s:statechange>
        <s:from>Scheduled</s:from>
        <s:to>Running</s:to>
      </s:statechange>
    </s:action>
  </s:coordination>
</s:scufl>

The Scufl based workflow description file above is represented in the figure below:

Example Workflow as it would appear in Taverna Workbench, version ???

2.5 Web services

Taverna basically includes two types of services:

  • Various kinds of domain (scientific) services that are mostly provided by third-party web services of various kinds, e.g. in the popular WSDL W3C format or non-standardized like BioMoby or BioMart services. These services are typically provided by large government-funded institutions like EMBL-EBI in Europe. However, there exist many small services implemented by individuals or smaller labs. To help these smaller actors provide services, they can make use of the SoapLab, a set of Web Services for finding an analysis tool, discovering what data it requires and what data it produces, starting it and to obtaining results. Soaplab is especially well suited for sets of similar tools, such as the European Molecular Biology Open Software Suite (EMBOSS).
  • So-called Shim services are used to connect inputs and outputs of domain services. "Shim" referes to “a small piece of software that is added to an existing system program or protocol in order to provide some enhancement” (PCMag Encyclopedia). Within Taverna, a shim service typically it involves translating formats. Taverna includes many built-in shim services, but a user may have to define its own.

3 Examples

Using Biomart and EMBOSS soaplab services, This workflow retrieves a number of sequences from 3 species: mouse, human, rat; align them, and returns a plot of the alignment result. Corresponding sequence ids are also returned. Source: http://www.myexperiment.org/workflows/821
Use the local java plugins and some filtering operations to fetch the comic strip image from http://xkcd.com/. Source: http://www.myexperiment.org/workflows/824

4 Links

Main web site
Overviews
Tutorials

5 References

  • De Roure, D. and Goble, C. (2009) Software Design for Empowering Scientists. IEEE Software, 26 (1). pp. 88-95. ISSN 0740-7459, PDF Preprint
  • Hull D.; K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn, "Taverna: a tool for building and running workflows of services.," Nucleic Acids Research, vol. 34, iss. Web Server issue, pp. 729-732, 2006. PubMed link
  • Kuhn T, Willighagen EL, Zielesny A, Steinbeck C. (2010). CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinformatics. 11:159. DOI:10.1186/1471-2105-11-159 (open access).
  • Missier, P., Turi, D., Goble, C., Oinn, T. and De Roure, D. (2007) Taverna Workflows: Syntax and Semantics. In: IEEE International Conference on e-Science and Grid Computing, December 2007, Bangalore, India. pp. 441-448.
  • Senger M., Rice P., Bleasby A., Uludag M., Soaplab: Open Source Web Services Framework for Bioinformatics Programs, The 10th Annual Bioinformatics Open Source Conference, 2009. PDF
  • Taylor, I.J; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.) 2007. Workflows for e-Science: Scientific Workflows for Grids
  • Turi. D; P. Missier, C. Goble, D. De Roure, and T. Oinn, "Taverna Workflows: Syntax and Semantics," in IEEE International Conference on e-Science and Grid Computing, 2007, pp. 441-448. PDF
  • Oinn, Tom et al. Delivering Web Service Coordination Capability to Users, WWW 2004, May 17-22, 2004, New York, New York, USA. ACM 1-58113-912-8/04/0005, Abstract/PDF
  • Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P (2004). Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045-3054 PubMed Abstract - Publisher Full Text
  • Oinn, Tom, Mark Greenwood, Matthew Addis, Justin Ferris, Kevin Glover, Carole Goble, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Anil Wipat and Chris Wroe, Taverna – an introduction, mygrid.org.uk
  • Oinn, Tom et al. (2006). Taverna: Lessons in creating a workflow environment for the life sciences, Concurrency and Computation.: Practice and Experience 18 (10), p 1067-1100. DOI:10.1002/cpe.993 - PDF Reprint.
  • Wei Tan, Paolo Missier, Ravi Madduri and Ian Foster. Building Scientific Workflow with Taverna and BPEL: a Comparative Study in caGrid. Proc. 3rd e-Science Conference. 2007. PDF Preprint.