Kepler workflow system

The educational technology and digital learning wiki
Jump to navigation Jump to search

Draft

Introduction

Kepler is an e-science tool designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. Users can build workflows via Kepler's intuitive graphical interface. Components are dragged and dropped onto a Workflow canvas, where they can be connected, customized, and then executed.

“Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging "R" scripts with compiled "C" code, or facilitating remote, distributed execution of models. Using Kepler's graphical user interface, users simply select and then connect pertinent analytical components and data sources to create a "scientific workflow"—an executable representation of the steps required to generate results. The Kepler software helps users share and reuse data, workflows, and compo­nents developed by the scientific community to address common needs.” (The Kepler Project, retrieved 15:25, 16 June 2010 (UTC).

See also:

The Kepler application

“Kepler is a software application for the analysis and modeling of scientific data. Kepler simplifies the effort required to create executable models by using a visual representation of these processes. These representations, or “scientific workflows,” display the flow of data among discrete analysis and modeling components [...] Kepler allows scientists to create their own executable scientific workflows by simply dragging and dropping components onto a workflow creation area and connecting the components to construct a specific data flow, creating a visual model of the analytical portion of their research. Kepler represents the overall workflow visually so that it is easy to understand how data flow from one component to another. The resulting workflow can be saved in a text format, emailed to colleagues, and/or published for sharing with colleagues worldwide”(Getting Started with Kepler, retrieved 15:22, 29 June 2010 (UTC).)

The "Getting started guide" definessScientific workflows as flexible tools for accessing scientific data (streaming sensor data, medical and satellite images, simulation output, observational data, etc.) and executing complex analysis on the retrieved data. In Kepler, {{Each workflow consists of analytical steps that may involve database access and querying, data analysis and mining, and intensive computations performed on high performance cluster computers. Each workflow step is represented by an "actor", a processing component that can be dragged and dropped into a workflow via Kepler's visual interface. Connected actors [and some other components] form a workflow, allowing scientists to inspect and display data on the fly as it is computed, make parameter changes as necessary, and re-run and reproduce experimental results.

Workflows may represent theoretical models or observational analyses; they can be simple and linear, or complex and non-linear. One of the benefits of scientific workflows is that they can be nested, meaning that a workflow can contain âsub-workflowsâ that perform embedded tasks. A nested workflow (also known as a composite actor) is a reusable component that performs a potentially complex task.}}. (Getting Started with Kepler, retrieved 15:22, 29 June 2010 (UTC).)

Major components:

  • Actors define some processing
  • A director controls the execution of a workflow, i.e. will define when and actor will execute. There exist several types of directors, e.g. the SDF director allows to specify sequential workflows, whereas the PN Director allows to define parallel workflows.
  • Composite actors are collections or sets of bundled actors, i.e. they represent a sub-workflow.
  • Ports define input/output ports for actors, i.e. data that actors will receive and produce. Actors are connected in a workflow via their ports. There are three types: input ports, output ports and combined input/output ports.
  • The link that represents data flow between one actor port and another actor port is called a channel.
  • Relations allow branching, i.e. data can be sent to multiple actors in a workflow.
  • Parameters are configurable values that can be attached to a workflow or to individual directors or actors

Ptolemy and the MoML language

“Kepler builds upon the mature Ptolemy II framework, developed at the University of California, Berkeley. Ptolemy II is a software framework developed as part of the Ptolemy project, which studies modeling, simulation, and design of concurrent, real-time, embedded systems.” (Kepler User Manual Kepler workflows can be exchanged in XML using Ptolemy's Modeling Markup Language (MoML). Here is the DTD for MoML 1 which can be found at ptolemy.

<!-- MoML DTD Version 1.4, UC Berkeley -->
<!-- $Id: MoML_1.dtd,v 1.18 2010/04/02 00:59:29 cxh Exp $ -->
<!-- If you update this file, please also update:  -->
<!--    ptweb/xml/dtd/MoML_1.dtd, -->
<!--    ptII/doc/design/src/MoML.fm, -->
<!--    the static field MoML_DTD_1 in MoMLParser.java. -->
<!-- The top-level element can be either model or class. -->
<!-- NOTE: To ensure backward compatibility with other version 1.x DTDs, -->
<!-- there are quite a few deprecated entries here. The documentation    -->
<!-- does not mention these deprecated entries. -->

<!-- The model element is deprecated.  Use entity instead. -->
<!ELEMENT model (class | configure | deleteEntity | deletePort |
		deleteRelation | director | display | doc | entity | group | import |
                input | link | property | relation | rename | rendition |
		unlink)*>
<!ATTLIST model name CDATA #REQUIRED
                class CDATA #IMPLIED>

<!ELEMENT class (class | configure | deleteEntity | deletePort |
		 deleteRelation | director | display | doc |
                 entity | group | import | input | link | port |
                 property | relation | rename | rendition | unlink)*>
<!ATTLIST class name CDATA #REQUIRED
                extends CDATA #IMPLIED
		source CDATA #IMPLIED>

<!ELEMENT configure (#PCDATA)>
<!ATTLIST configure source CDATA #IMPLIED>

<!ELEMENT deleteEntity EMPTY>
<!ATTLIST deleteEntity name CDATA #REQUIRED>

<!ELEMENT deletePort EMPTY>
<!ATTLIST deletePort name CDATA #REQUIRED>

<!ELEMENT deleteProperty EMPTY>
<!ATTLIST deleteProperty name CDATA #REQUIRED>

<!ELEMENT deleteRelation EMPTY>
<!ATTLIST deleteRelation name CDATA #REQUIRED>

<!-- NOTE: deprecated.  Use property instead. -->
<!ELEMENT director (configure | doc | property)*>
<!ATTLIST director name CDATA "director"
                   class CDATA #REQUIRED>

<!ELEMENT display EMPTY>
<!ATTLIST display name CDATA #REQUIRED>

<!ELEMENT doc (#PCDATA)>
<!ATTLIST doc name CDATA #IMPLIED>

<!ELEMENT entity (class | configure | deleteEntity | deletePort |
		  deleteRelation | director | display | doc |
                  entity | group | import | input | link | port |
                  property | relation | rename | rendition | unlink)*>
<!ATTLIST entity name CDATA #REQUIRED
                 class CDATA #IMPLIED
		 source CDATA #IMPLIED>

<!ELEMENT group ANY>
<!ATTLIST group name CDATA #IMPLIED>

<!-- The import element is deprecated.  Use the source attribute instead. -->
<!ELEMENT import EMPTY>
<!ATTLIST import source CDATA #REQUIRED
                 base CDATA #IMPLIED>

<!-- The base attribute is deprecated. -->
<!ELEMENT input EMPTY>
<!ATTLIST input source CDATA #REQUIRED
                base CDATA #IMPLIED>

<!ELEMENT link EMPTY>
<!ATTLIST link insertAt CDATA #IMPLIED
               insertInsideAt CDATA #IMPLIED
               port CDATA #IMPLIED
               relation CDATA #IMPLIED
               relation1 CDATA #IMPLIED
               relation2 CDATA #IMPLIED
               vertex CDATA #IMPLIED>

<!-- Deprecated.  Use a property instead. -->
<!ELEMENT location EMPTY>
<!ATTLIST location value CDATA #REQUIRED>

<!ELEMENT port (configure | display | doc | property | rename)*>
<!ATTLIST port class CDATA #IMPLIED
               name CDATA #REQUIRED>

<!ELEMENT property (configure | display | doc | property | rename)*>
<!ATTLIST property class CDATA #IMPLIED
                    name CDATA #REQUIRED
                    value CDATA #IMPLIED>

<!ELEMENT relation (configure | display | doc | property | rename | vertex)*>
<!ATTLIST relation name CDATA #REQUIRED
                   class CDATA #IMPLIED>

<!ELEMENT rename EMPTY>
<!ATTLIST rename name CDATA #REQUIRED>

<!-- Deprecated.  Use a property instead. -->
<!ELEMENT rendition (configure | location | property)*>
<!ATTLIST rendition class CDATA #REQUIRED>

<!ELEMENT unlink EMPTY>
<!ATTLIST unlink index CDATA #IMPLIED
                 insideIndex CDATA #IMPLIED
                 port CDATA #REQUIRED
                 relation CDATA #IMPLIED>

<!ELEMENT vertex (configure | display | doc | location | property | rename)*>
<!ATTLIST vertex name CDATA #REQUIRED
                 pathTo CDATA #IMPLIED
                 value CDATA #IMPLIED>

Examples

View of the workbench
Kepler 1.0 screen capture - Lotka-Voltera perdator prey model
Annotated view of workbench panel
Kepler 1.0 screen capture - Lotka-Voltera perdator prey model
Results (plotted with R)
Kepler 1.0 screen capture - Lotka-Voltera perdator prey model
XML model file
(just the start, the file is much bigger)
<?xml version="1.0" standalone="no"?>
<!DOCTYPE entity PUBLIC "-//UC Berkeley//DTD MoML 1//EN"
"http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd">
<entity name="02-LotkaVolterraPredatorPrey" class="ptolemy.actor.TypedCompositeActor">
  <property name="_createdBy" class="ptolemy.kernel.attributes.VersionAttribute" value="7.0.beta">
  </property>
  <property name="r" class="ptolemy.data.expr.Parameter" value="2">
  <property name="_hideName" class="ptolemy.kernel.util.SingletonAttribute">
  </property>
  <property name="_icon" class="ptolemy.vergil.icon.ValueIcon">
  </property>
  <property name="_smallIconDescription" class="ptolemy.kernel.util.SingletonConfigurableAttribute">
     <configure>
       <svg>
         <text x="20" style="font-size:14; font-family:SansSerif; fill:blue" y="20">-P-</text>
       </svg>
     </configure>
  </property>
  <property name="_editorFactory" class="ptolemy.vergil.toolbox.VisibleParameterEditorFactory">
  </property>
  <property name="_location" class="ptolemy.kernel.util.Location" value="410.0, 50.0">
  </property>
  ........

Links and bibliography

Bibliography

  • Altintas, I, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, Kepler: An Extensible System for Design and Execution of Scientific Workflows, system demonstration, 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
  • Ludäscher B., Altintas I., Berkley C., Higgins D., Jaeger-Frank E., Jones M., Lee E., Tao J., Zhao Y. 2006. Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065.
  • Kepler User Manual (version 2). PDF
  • Getting Started with Kepler (version 2) PDF