E-science

From EduTech Wiki
Jump to: navigation, search

Draft

1 Introduction

e-Science or e-Research usually denotes data-intensive, IT-intensive and collaborative research, but it also can just refer to research that uses explicitly defined IT-support research workflows.

Emerging e-science practices can be seen as result of (a) the need for processing huge amounts of diverse data (which led to grid computing), (b) increased collaboration between labs (and therefore various emerging forms of social computing and workflow modeling for researchers), and (c) what could be called the application of engineering principles to work organization.

One way of looking at e-science is the idea to make scientific workflows easier to create and to share. “Scientific research is increasingly digital. Some activities, like data analysis, search and simulation, can be accelerated by enabling scientists to write workflows and scripts that automate routine activities. These capture pieces of scientific method that can be shared with others.” (De Roure and Goble, 2009). As Katy Börner puts it: “Let us build Scholarly Marketplaces that are as easy and fun to use as Flickr and YouTube. Instead of sharing photos and videos we will use them to share scientific datasets and algorithms.” [and workflows]. Finally, there is confluence with science of science research.

Typically, eScience projects make use of grid computing technologies to implement workflows using web services. “The changing scale and scope of experimental science - with its need for accommodating the growing numbers of research coalitions with continuously changing partners and access to information - require a new research paradigm: (digitally) enhanced science or e-Science.” (Virtual Laboratory for e-Science, retrieved 12:15, 16 June 2010 (UTC)).

“is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. Examples of the kind of science include social simulations, particle physics, earth sciences and bio-informatics” (E-science, Wikipedia, retrieved 12:15, 16 June 2010 (UTC)).

“What is meant by e-Science? In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualization back to the individual user scientists.” (Defining e-Science, National e-Science Centre, retrieved 12:15, 16 June 2010 (UTC)).

e-Science projects usually refers to setups that include the following ingredients

  • an environment (e.g. desktop software or a web application) that allows researchers to define a workflow
  • an environment (e.g. desktop software or a web application) that allows researchers to execute a workflow (can be the same as above). Execution includes tracking data flow and steps.
  • a repository for sharing and reusing workflows made by others

See also:

2 Scientific workflows

In the growing e-science literature, we can find many converging definition of what "workflow" means in the context of the e-science, and in particular in the life sciences. A good overview is provided by Tiwari and Sekhar (2007). E-science workflow modeling languages and systems for other disciplines may need some adaptation.

According to Wikipedia, retrieved 12:15, 16 June 2010 (UTC), “A Scientific Workflow Systems is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a scientific application. [...] The rising interest in scientific workflow systems has coincided with the rising interest in e-Science technologies and applications and also the rise of interest in Grid computing. The vision of e-Science is that of distributed scientists being able to collaborate on conducting large scale scientific experiments and knowledge discovery applications using distributed computing resources, distributed data sets and distributed devices. Scientific workflow systems play an important role in enabling this vision.” Put more simply: “A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Scientific workflow systems often provide graphical user interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of the scientists.” (Kepler scientific workflow system, retrieved June 16 2010.)

“In e-Science environments, the support for scientific workflows emerges as a key service for managing experiment data and activities, for prototyping computing systems and for orchestrating the runtime system behaviour. Supporting domain specific applications via a common e-Science infrastructure enables knowledge sharing among different applications, and thus can broaden the range of the application and multiply the impact of scientific research.” (Zhao et al, 2005).

“Scientific workflows have become an increasingly popular paradigm for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. A scientific workflow is a formal specification of a scientific process, which represents, streamlines, and automates the analytical and computational steps that a scientist needs to go through from dataset selection and integration, computation and analysis, to final data product presentation and visualization. A scientific workflow management system (SWFMS) is a system that supports the specification, modification, execution, failure handling, and monitoring of a scientific workflow using the workflow logic to control the order of executing workflow tasks.” (IEEE 2010 Fourth International Workshop on Scientific Workflows (SWF 2010), Call for papers, retrieved 10:44, 16 June 2010 (UTC)).

“Scientific workflows are proving to be the preferred vehicle for computational knowledge extraction and for enabling science at a large scale. Workflows provide a scientist with a useful and flexible method to author complex data analysis pipelines composed of heterogeneous steps ranging from data capture from sensors or computer simulations to data cleaning, to transport and storage, and provide a foundation upon which results can be analyzed and validated.” (Scientific Workflow Workbench for Oceanography, Microsoft, retrieved 10:44, 16 June 2010 (UTC))

“A workflow system (Hollingsworth, 1995) is a holistic unit that defines, manages, and executes workflow processes aided by software. The order of execution is defined by a computer representation of the workflow process logic. Internally, a workflow system uses a Workflow Language or Meta-Languages for process specification (Michael and Jörg, 1999) to define the workflow process logic, to be executed by workflow execution engine or workflow controller. Visual representation of the workflow process logic is generally carried out using a Graphical User Interface where different types of nodes (data transformation point) or software components are available for connection through edges or pipes that define the workflow process. Graphical User Interfaces provide drag and drop utility for creating an abstract workflow, also known as “visual programming”. The anatomy of a workflow node or component [...] is basically defined by three parameters: (1) input metadata, (2) transformation rules, algorithms or user parameters, (3) output metadata. Nodes can be plugged together only if the output of one, previous (set of) node(s) represents the mandatory input requirements of the following node.” Tiwari and Sekhar (2007)

There exist many variants of workflow systems. E.g. Yu and Buya (2005) presented a taxonomy of grid workflow systems based on workflow design (workflow structure, workflow model/specification, workflow composition systems), workflow scheduling (architecture, decision making, planning scheme, scheduling strategies), fault tolerance and data movement.

3 Tool compenents and standards

(needs to be completed a lot, see also workflow and business process modeling)

According to Tiwari and Sekhar (2007), “Workflow systems can be data-intensive, computation-intensive, analysis intensive, visualization-intensive (e.g., visualization pipeline systems such as AVS, OpenDX, and SCIRun), process-intensive or a combination of one or more traits. There are many common desired traits (Ludäscher et al., 2005) of workflow systems like seamless access to resources and services, service composition and reuse, scalability, detached execution, reliability and fault-tolerance, semantic binding, process provenance and data provenance.”

Modeling
  • According to Yu and Buya (2005), within language-based modeling, users can specify a workflow by using a markup language. Most markup languages are XML-based (e.g. GridAnt, WSFL, XLANG, BPEL, 4WS, W3C XML-Pipeline language, and Gridbus Workflow). In most grid systems, so-called workflow languages are also graphical using mostly either Petri nets (e.g. Grid-Flow, FlowManage, and XRL/Flower) or UML models.
  • We also expect business process modeling languages such as BPMN to gain popularity in the future - Daniel K. Schneider 13:17, 29 June 2010 (UTC)
Grid middleware
  • Globus toolkit, UNICORE, Alchemi, ....
services
  • Open Services Gateway Initiative Framework (OSGi), a module system and service platform for the Java programming languag
  • Cyberinfrastructure Shell ([1]), an open source, community-driven platform for the integration and utilization of datasets, algorithms, tools, and computing resources.

4 In education

We don't know (yet) if and where tools like Taverna has been used as cognitive tool in educational settings, e.g. in high school biology classes. We image that some university students get exposed to such systems.

We also wonder whether Taverna could be adapted to implement educational workflows. Services architecture frameworks like the e-framework do mention this possibility “Middleware and tools implementing this functionality [workflow management] include: Taverna, Kepler, Triana, DAGMan, GridANT” (Hunter and Dovey, Factoring and Mapping the Research Domain 2006).

5 Links and references

5.1 Introductions

5.2 R&D teams and other actors

  • myGrid home page. The team produces and uses a suite of tools designed to help e-Scientists get on with science and get on with scientists. The tools support the creation of e-laboratories and have been used in various domains. Tools and infrastructure include taverna workbench and myExperiment.

5.3 e-science projects and workflow systems

Repositories and infrastructures
Toolkits
  • Taverna workbench is a workflow toolkit that is mainly used for e-science projects in the Life Sciences community (biology, chemistry and medical imaging).
  • Kepler workflow system, is an e-science tool designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.
  • Galaxy
  • Triana, An open source problem solving environment developed at Cardiff University that combines an intuitive visual interface with data analysis tools. (last updated on April 2007).
  • Kepler scientific workflow system (Wikipedia)
  • Discover on the Net (dead?) - Discovery Net (Wikipedia)
  • OMII-BPEL, OMII-BPEL brings an industrial standard, BPEL (Business Process Execution Language) to scientific workflow modeling and Grid services orchestration. It provides an integrated environment in which to model, execute and monitor scientific workflows that are expressed in BPEL. OMII-BPEL comprises two parts: 'BPEL Designer' is used for process modelling, and 'ActiveBPEL' is used for process execution and monitoring. See also Modelling, monitoring, executing scientific workflows with BPEL (OMII-BPEL)
Multi-purpose
Domain-specific

5.4 Events

5.5 Journals and collections

5.6 Articles

  • Bell, Gordon; Tony Hey, Alex Szalay, Beyond the Data Deluge (2009). Science, Vol. 323. no. 5919, pp. 1297 – 1298. Abstract/HTML full text
  • Börner, Katy (2010). Plug-and-Play Macroscopes, Communications of the ACM. preprint.
  • Börner, Katy, Sanyal, Soma & Vespignani, Alessandro. (2007). Network Science. In Cronin, Blaise (Eds.), Annual Review of Information Science & Technology (Vol. 41, pp. 537-607), chapter 12, Medford, NJ: Information Today, Inc./American Society for Information Science and Technology. PDF preprint
  • Critchlow, Terence (undated), Scientific Process Automation Improves Data Interaction,, Scientific Computing (white paper), retrieved June 2010.
  • Curcin, V., Ghanem, M. (2008). Scientific workflow systems - can one size fit all?, Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International, p 1 - 9 ((Access restricted) - preprint, describes six systems: Discovery Net, Taverna, Triana, Kepler, Yawl and BPEL.)
  • De Roure, D. and C. Goble (2009). "Software Design for Empowering Scientists," IEEE Software, vol. 26 (1), pp. 88-95, 2009. PDF Preprint
  • Hollingsworth, 1995 Hollingsworth, D., 1995. The workflow reference model. Technical Report (WFMC-TC00-1003) Workflow Management Coalition (http://www.wfmc.org/). See Business process modeling for a short description of this model.
  • Michael,M.Z. and B. Jörg (1999) Workflow process definition language-development and directions of a meta-language for workflow processes, Proceedings of the 1st KnowTech Forum.
  • Neerincx PB, Leunissen JA. (2005). Evolution of web services in bioinformatics. Brief Bioinform.6(2):178-88.
  • Taylor, I.J; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.) 2007. Workflows for e-Science: Scientific Workflows for Grids
  • Tiwari Abhishek and Arvind K.T (2007). Sekhar, Workflow based framework for life science informatics, Computational Biology and Chemistry, Volume 31, Issues 5-6, October 2007, Pages 305-319, ISSN 1476-9271, DOI:10.1016/j.compbiolchem.2007.08.009
  • Zhao, Zhiming; Adam Belloum, Peter Sloot and Bob Hertzberger (2005). Agent Technology and Generic Workflow Management in an e-Science Environment, in Zhuge, H. & Fox, G. (eds): Grid and Cooperative Computing, Springer, 480-485.DOI: 10.1007/11590354_61 (Access restricted) - PDF Preprint.
  • Yu, J. and Buyya, R. (2005a). A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34, 3 (Sep. 2005), 44-49. DOI:10.1145/1084805.1084814
  • Yu, Jia and Rajkumar Buyya (200b). A Taxonomy of Workflow Management Systems for Grid Computing, Technical Report, GRIDS-TR-2005-1, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Australia, March 10, 2005. PDF