E-science

The educational technology and digital learning wiki
Jump to navigation Jump to search

Draft

Introduction

e-Science or e-Research denotes data-intensive and IT-intensive collaborative research. This emerging academic practice can be seen as result of both the need to be able to process huge amounts of diverse data (which led to grid computing) and increased needs for collaboration (and therefore various emerging forms of social computing that support researchers). As Katy Börner puts it: “Let us build Scholarly Marketplaces that are as easy and fun to use as Flickr and YouTube. Instead of sharing photos and videos we will use them to share scientific datasets and algorithms.” [and workflows]. Finally, there is confluence with science of science.

Typically, eScience projects make use of grid computing technologies to implement workflows using web services. “The changing scale and scope of experimental science - with its need for accommodating the growing numbers of research coalitions with continuously changing partners and access to information - require a new research paradigm: (digitally) enhanced science or e-Science.” (Virtual Laboratory for e-Science, retrieved 12:15, 16 June 2010 (UTC)).

“is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. Examples of the kind of science include social simulations, particle physics, earth sciences and bio-informatics” (E-science, Wikipedia, retrieved 12:15, 16 June 2010 (UTC)).

“What is meant by e-Science? In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.” (Defining e-Science, National e-Science Centre, retrieved 12:15, 16 June 2010 (UTC)).

e-Science projects usually refers to setups that include the following ingredients

  • an environment (e.g. desktop software or a web application) that allows researchers to define a workflow
  • an environment (e.g. desktop software or a web application) that allows researchers to execute a workflow (can be the same as above). Execution includes tracking data flow and steps.
  • a repository for sharing and reusing workflows made by others

See also:

Scientific workflows

According to Wikipedia, retrieved 12:15, 16 June 2010 (UTC), “A Scientific Workflow Systems is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a scientific application. [...] The rising interest in scientific workflow systems has coincided with the rising interest in e-Science technologies and applications and also the rise of interest in Grid computing. The vision of e-Science is that of distributed scientists being able to collaborate on conducting large scale scientific experiments and knowledge discovery applications using distributed computing resources, distributed data sets and distributed devices. Scientific workflow systems play an important role in enabling this vision.” Put more simply: “A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Scientific workflow systems often provide graphical user interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of the scientists.” (Kepler scientific workflow system, retrieved June 16 2010.)

“In e-Science environments, the support for scientific workflows emerges as a key service for managing experiment data and activities, for prototyping computing systems and for orchestrating the runtime system behaviour. Supporting domain specific applications via a common e-Science infrastructure enables knowledge sharing among different applications, and thus can broaden the range of the application and multiply the impact of scientific research.” (Zhao et al, 2005).

“Scientific workflows have become an increasingly popular paradigm for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. A scientific workflow is a formal specification of a scientific process, which represents, streamlines, and automates the analytical and computational steps that a scientist needs to go through from dataset selection and integration, computation and analysis, to final data product presentation and visualization. A scientific workflow management system (SWFMS) is a system that supports the specification, modification, execution, failure handling, and monitoring of a scientific workflow using the workflow logic to control the order of executing workflow tasks.” (IEEE 2010 Fourth International Workshop on Scientific Workflows (SWF 2010), Call for papers, retrieved 10:44, 16 June 2010 (UTC)).

“Scientific workflows are proving to be the preferred vehicle for computational knowledge extraction and for enabling science at a large scale. Workflows provide a scientist with a useful and flexible method to author complex data analysis pipelines composed of heterogeneous steps ranging from data capture from sensors or computer simulations to data cleaning, to transport and storage, and provide a foundation upon which results can be analyzed and validated.” (Scientific Workflow Workbench for Oceanography, Microsoft, retrieved 10:44, 16 June 2010 (UTC))

There exist many variants of workflow systems. E.g. Yu and Buya (2005) presented a taxonomy of grid workflow systems based on workflow design (workflow structure, workflow model/specification, workflow composition systems), workflow scheduling (architecture, decision making, planning scheme, scheduling strategies), fault tolerance and data movement.

Tool compenents and standards

(needs to be completed a lot ...)

Modeling
  • According to Yu and Buya (2005), within language-based modeling, users can specify a workflow by using a markup language. Most markup languages are XML-based (e.g. GridAnt, WSFL, XLANG, BPEL4WS, W3C XML-Pipeline language, and Gridbus Workflow). In most grid systems, so-called workflow languages are also graphical using mostly either Petri nets (e.g. Grid-Flow, FlowManage, and XRL/Flower) or UML models.
Grid middleware
  • Globus toolkit, UNICORE, Alchemi, ....
services
  • Open Services Gateway Initiative Framework (OSGi), a module system and service platform for the Java programming languag
  • Cyberinfrastructure Shell ([1]), an open source, community-driven platform for the integration and utilization of datasets, algorithms, tools, and computing resources.
Specifications

In education

We don't know (yet) if and where tools like Taverna has been used as cognitive tool in educational settings, e.g. in high school biology classes. We image that some university students get exposed to such systems.

We also wonder whether Taverna could be adapted to implement educational workflows. Services architecture frameworks like the e-framework do mention this possibility “Middleware and tools implementing this functionality [workflow management] include: Taverna, Kepler, Triana, DAGMan, GridANT” (Hunter and Dovey, Factoring and Mapping the Research Domain 2006).

Links and references

General links

R&D teams and other actors

  • myGrid home page. The team produces and uses a suite of tools designed to help e-Scientists get on with science and get on with scientists. The tools support the creation of e-laboratories and have been used in various domains. Tools and infrastructure include taverna workbench and myExperiment.

e-science projects and workflow systems

Repositories and infrastructures
Toolkits
Multi-purpose
Domain-specific

Events

Collections

Articles

  • Bell, Gordon; Tony Hey, Alex Szalay, Beyond the Data Deluge (2009). Science, Vol. 323. no. 5919, pp. 1297 – 1298. Abstract/HTML full text
  • Börner, Katy (2010). Plug-and-Play Macroscopes, Communications of the ACM. preprint.
  • Börner, Katy, Sanyal, Soma & Vespignani, Alessandro. (2007). Network Science. In Cronin, Blaise (Eds.), Annual Review of Information Science & Technology (Vol. 41, pp. 537-607), chapter 12, Medford, NJ: Information Today, Inc./American Society for Information Science and Technology. PDF preprint
  • Critchlow, Terence (undated), Scientific Process Automation Improves Data Interaction,, Scientific Computing (white paper), retrieved June 2010.
  • Curcin, V., Ghanem, M. (2008). Scientific workflow systems - can one size fit all?, Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International, p 1 - 9 ((Access restricted) - preprint, describes six systems: Discovery Net, Taverna, Triana, Kepler, Yawl and BPEL.)
  • De Roure, D. and C. Goble, "Software Design for Empowering Scientists," IEEE Software, vol. 26, iss. 1, pp. 88-95, 2009.
  • Yu, J. and Buyya, R. (2005a). A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34, 3 (Sep. 2005), 44-49. DOI:10.1145/1084805.1084814
  • Yu, Jia and Rajkumar Buyya (200b). A Taxonomy of Workflow Management Systems for Grid Computing, Technical Report, GRIDS-TR-2005-1, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Australia, March 10, 2005. PDF