Skip to end of metadata
Go to start of metadata

The Workflow Runner API is a way to expose a service that can run workflows of a particular type. In short, a workflow run is exposed as an RO following a subset of the RODL API, but with a formalized structure to organize things such as workflow inputs, outputs and run status.

Status

Icon

The first implementation of this API is made to interface the Taverna Server, and is under development. A deployment of the latest snapshot is available at http://sandbox.wf4ever-project.org/runner/default/ - which accesses http://sandbox.wf4ever-project.org/taverna-server/

API function overview

Research Objects, for the purpose of Wf4Ever, will generally contain workflows. In order to assess if a workflow is functional, it is generally useful to be able to (re)-execute a workflow.

Different workflow systems have different ways of running a workflow. For instance, Taverna has the Taverna Server, while Wings has a portal and a Pegasus/Condor engine in the backend. This API intends to provide a common lightweight interface within Wf4Ever for features such as "Run this workflow please" and "Show me the data from that workflow run".

At its heart, this API mirrors the RODL API, but the ROs exposed by this service each represent a particular workflow run, structured to show inputs, outputs, console logs, provenance and annotations containing wfprov and wfdesc mappings. Thus it intends to be possible to use existing RODL compatible tools with this service, for instance adding from the RO command line tool, browsing with the Portal or transforming to wfdesc using the Workflow Transformer service.

API usage

Accessing the root of the service, in this specification exemplified as http://example.com/runner, SHOULD redirect to a default server runs resource. From here the client may either:

  • POST a new workflow run, providing as a minimum the workflow definition
  • GET a list of existing workflow runs
  • DELETE existing workflow runs

Navigating the workflow runs would allow inspection of workflow status, outputs and other resources exposed by the underlying workflow server.

A client may also create a new run by uploading a workflow definition, provide inputs and initiate running the workflow.

See the #Resources and formats below for details.

Link relations

Resources are located using specific properties in the RO manifest for the workflow run.

Property Description
runner:workflow Used in the workflow run description to link the workflow run with the main 
workflow to run, such as uploaded on RO creation. It is a subproperty of 
ore:aggregates.
runner:inputs Used in the workflow run description to link the workflow run with the list 
of required workflow inputs, if any. It is a subproperty of ore:aggregates.
runner:outputs Used in the workflow run description to link the workflow run with the list 
of (expected or actual) workflow outputs, if any. It is a subproperty of 
ore:aggregates.
runner:logs Used in the workflow run description to link the workflow run with the list 
of logs, such as stdout, if any. It is a subproperty of ore:aggregates.
runner:provenance Used in the workflow run description to link the workflow run with the list 
of provenance related resources, if any. It is a subproperty of 
ore:aggregates.
runner:workingDirectory Used in the workflow run description to link the workflow run with the list 
of working directory and its files, if any. It is a subproperty of 
ore:aggregates.

Property

Description

runner:workflow

Used in the workflow run description to link the workflow run with the main workflow to run, such as uploaded on RO creation. It is a subproperty of ore:aggregates.

runner:inputs

Used in the workflow run description to link the workflow run with the list of required workflow inputs, if any. It is a subproperty of ore:aggregates.

runner:outputs

Used in the workflow run description to link the workflow run with the list of (expected or actual) workflow outputs, if any. It is a subproperty of ore:aggregates.

runner:logs

Used in the workflow run description to link the workflow run with the list of logs, such as stdout, if any. It is a subproperty of ore:aggregates.

runner:provenance

Used in the workflow run description to link the workflow run with the list of provenance related resources, if any. It is a subproperty of ore:aggregates.

runner:workingDirectory

Used in the workflow run description to link the workflow run with the list of working directory and its files, if any. It is a subproperty of ore:aggregates.

Resources and formats

All formats are based on RDF in text/turtle and application/rdf+xml (by content negotiation) unless noted otherwise.

The resource types are listed below. Specifically, a compliant implementation of the Workflow runner API SHOULD support:

Resource type

Description

Workspace

Represents a list of workflow runs, similarly to how an RO service specified a list of research objects. The only format available is text/uri-list, which returns a list of URIs that SHOULD point to research objects representing workflow runs.

Workflow run

A workflow run is represented as a research object and as such it shares the format of the research object as defined in the RO API. The preferred format is RDF; the support for ZIP and HTML formats is optional. The RDF format may be subject to content negotiation.

Workflow

The workflow as posted by the creator. It may be a workflow description as an RDF file (format subject to content negotiation) or the actual workflow file, such as application/vnd.taverna.t2flow+xml in case of a Taverna 2 workflow.

Workflow status

A one-element list of URIs, in which the URI is one of predefined values indicating the status of the workflow run. The format is text/uri-list.

Inputs

Any resource that has been submitted as an input to the workflow run. When submitting an input, it is possible to specify an external reference by using a "text/uri-list" format.

Outputs

Any outputs generated by the workflow run. Special formats can be used to indicate an error in generating the specific output, such as application/vnd.wf4ever.runner.error.

Provenance

An ro:Folder aggregating provenance resources.

Working Directory

An ro:Folder, which content will be/was the current directory (./) when running the workflow

Logs

An ro:Folder aggregating the log files.

Finding default workspace

HEAD or GET on this entry point SHOULD redirect to a workspace of workflow runs on the default server:

The returned location MUST point to a workspace (see [#Retrieve runs in workspace] below).

The service MAY return 405 Method Not Allowed if it has no default server, in which case it MUST support browsing of explicit servers (see below).

Browsing other workflow servers

The service MAY support browsing other workflow servers than the default, by ways of POSTing a text/uri-list specifying the service.

The returned location MUST point to a workspace of workflow runs.

The service SHOULD return 400 Bad Request if more than one URI was included, or the URI was malformed.

Icon

This specification does not require any particular URI templates for the redirection. It is an implementation detail how the Workflow Runner service relates the request to the actual, underlying workflow execution service.

Icon

Clients MUST ensure that the submitted URI is encoded according to RFC 3986, for instance http://example.net/fred%20and%20me/ rather than http://example.net/fred and me/.

Servers MAY use the submitted URI as a basis for constructing the returned URL, but MUST then ensure that it is likewise properly escaped.

Retrieve runs in workspace

 The list of server runs is represented as a RODL workspace, where each RO represents a run.

Each URI returned, if any, SHOULD point to a research object representing a workflow run.

Submit new run to workspace

Creating a new run is similar to creating a new research object, but requires the content-type text/uri-list to include the URL for the workflow definition to run.

The returned location refers to a research object representing the run.

The client MAY provide the Slug: header to suggest a name to include in the created run, which the service MAY support. The service SHOULD ensure the returned run URI is unique, even if multiple POSTs submit the same workflow URL.

The service SHOULD attempt to retrieve the provided workflow definition before responding to the request.

The service SHOULD NOT start running the workflow immediately, but wait for the client to modify its status. (See below).

The service SHOULD fail with 502 Bad Gateway if it is unable to retrieve the submitted workflow definition due to network issues or HTTP errors (including 404), or 504 Gateway Timeout if the request for the definition timed out. The service SHOULD include an error message in the response body to indicate the nature of this failure.

The service SHOULD fail with 501 Not Implemented if the service did successfully retrieve the workflow definition, but the underlying workflow server does not support its format. The server MAY include an error message in the response body to indicate supported workflow definition formats and/or media types.

Retrieve run

A workflow run is represented as a research object, thus retrieving it will redirect to a manifest listing its constituent resources.

The manifest MUST include Workflow Runner specific extensions to indicate the corresponding Workflow Runner API specific resources that are supported by the service. These are declared in the namespace http://purl.org/wf4ever/runner# (prefix runner: below) and associated with the research object, which MUST be of the type runner:WorkflowRun.

Supported properties and types:

Property

Type

Superclass

Description

 

runner:WorkflowRun

wf4ever:WorkflowResearchObject

A research object that represents a particular workflow run

runner:workflow

runner:Workflow

wfdesc:Workflow

(Required) The main workflow to run, such as uploaded on RO creation

runner:status

runner:Status

ro:Resource

(Required) The status of the workflow, such as 'Running' or 'Finished'

runner:inputs

runner:Inputs

ro:Folder

List of required workflow inputs, if any

runner:outputs

runner:Outputs

ro:Folder

List of (expected or actual) workflow outputs, if any

runner:logs

runner:Logs

ro:Folder

List of logs, such as stdout, if any

runner:provenance

runner:Provenance

ro:Folder

List of provenance related resources, if any

runner:workingDirectory

runner:WorkingDirectory

ro:Folder

List of working directory and its files, if any

Each of these properties are subproperties of ore:aggregates and have domain runner:WorkflowRun

See resources below for details of each type.

Retrieving the workflow

Retrieving the resource indicated with runner:workflow in the manifest SHOULD return the workflow definition originally posted.

The service MAY return the workflow definition directly (as in the example above), or MAY redirect with 303 See Other to the URI originally submitted when creating the RO.

The service MAY support replacing the workflow definition with PUT, but this is not covered by this specification, as it has ramifications for the other resources of the research object.

Retrieving the workflow description

The manifest SHOULD include an annotation on the native runner:Workflow to provide a wfdesc:Workflow description of the workflow structure:

Icon

The Research Object model intends to move from using AO to the unified Open Annotation Model, where the above annotation is better rendered as:

Retrieving the wfdesc:

As the service might be needing to use the Wf-RO transformation service to create the wfdesc, this resource might not be available within a reasonable amount of time. The service SHOULD in this case respond with 504 Gateway Timeout. The client MAY then try retrieving the resource again after a small delay.

Retrieving the workflow status

Retrieving the resource indicated with runner:status in the manifest MUST return the current status of the workflow run.

The returned URI list MUST include one and only one of these URIs:

URI

Label

Description

http://purl.org/wf4ever/runner#Initialized

Initialized

The research object has been created (the RO is considered an roevo:LiveRO)

http://purl.org/wf4ever/runner#Ready

Ready

All required inputs and resources are provided, the workflow is ready to run (ie. the RO is an wfdesc:WorkflowInstance)

http://purl.org/wf4ever/runner#Queued

Queued

The workflow is in the queue, waiting to be run by the underlying workflow server

http://purl.org/wf4ever/runner#Running

Running

The workflow is actively running on the workflow server

http://purl.org/wf4ever/runner#Failed

Failed

The workflow could not run, or failed while running

http://purl.org/wf4ever/runner#Finished

Finished

The workflow completed running

http://purl.org/wf4ever/runner#Cancelled

Cancelled

The workflow run was cancelled, for instance by the client or by a server time out

http://purl.org/wf4ever/runner#Archived

Archived

The workflow runner service has finished post-run processing (the RO is now considered an roevo:ArchivedRO)

The service might also include other, third-party specific URIs, like http://api.example.com/status/StartingCloudInstance.

The service might not support all of the above status types, but MUST support InitializedRunning and Archived.

The service MAY do its own state transitions, like Initialized to Ready or Finished to Archived, but SHOULD NOT start the workflow as Running unless the client has requested Queued or Running. (See below).

The state Archived means that the Research Object has a complete view of the workflow run. Until the workflow run is in this state, requests for resources such as outputs, the manifest and provenance MAY give incomplete results or 404 Not Found. The workflow service SHOULD automatically transition from Finished to Archived, but SHOULD NOT do this transition from failure states such Failed or Cancelled.

Icon

Should Archived be an additional state, to keep the final state of Finished, Cancelled or Failed?

Changing the workflow status

The client can request a desired state transitions by PUT-ing to the status resource:

The client MUST include one and only one of the above listed Workflow Runner statuses, but MAY also include third-party statuses.

The service SHOULD ignore third-party statuses it does not support them. The service MAY throw errors if it understands the third party, but refuses to fulfill the request.

The service MUST return the current status, which MAY be different from the requested status (as in the above example).

The service MAY respond with 202 Accepted if transitioning to the new state is not be immediate, for instance if it takes a while to cancel a workflow run. (Note however that the state Queued is intended for the state transition to Running).

If the state transition is not valid according to the current state, like from Failed to Finished, the service MUST fail with 409 Conflict.

If the client requests change to a state that is not supported by the service, like Cancelled, the service SHOULD fail with 501 Not Implemented. Alternatively, the service MAY change the status to a similar status (like Finished instead of Cancelled) and return 200 OK.

The service MUST support the following state transitions:

Current status

Client requests

Possible returns

Initialized

Ready

Initialized (it was not ready), Ready, 501 Not Implemented (service can't check readiness without running)

Initialized

Running

Initialized (it was not ready), Queued, Running, Failed, Finished, Cancelled, Archived

Ready

Running

Queued, Running, Failed, Finished, Cancelled, Archived

Finished

Archived

Archived, 202 Accepted

Retrieving the inputs

Retrieving the folder identified using runner:inputs in the manifest:

Expected inputs

Icon

Note that the expected inputs listed might not yet exist, so a GET on <in1> above would then give a 404 until it has been uploaded with PUT.

The service MAY provide a list of expected inputs (such as in the example above). An attempt by the client to retrieve these before PUT}}ing them SHOULD give a {{404 error unless they contain a default from the workflow definition.

The service MAY expect nested inputs (ie. a list of values, or list of lists of values, etc). Such inputs are indicated by being a ro:Folder rather than ro:Resource.

Providing inputs

The client SHOULD provide inputs for all input resources:

The service MAY respond with 415 Unsupported Media Type if the content type is not supported, for instance because it requires an input to be a URI or a file, as shown below.

The client MAY attempt to change the state to Ready to see if the inputs provided are sufficient to run the workflow.

Input from an URI

The client MAY provide input to be retrieved from an URI:

The service SHOULD respond with a 415 Unsupported Media Type if it does not support input from an URI (that is the URL would be interpreted as a literal by the workflow system).

The service SHOULD respond with a 400 Bad Request if the URI given is not valid or not supported, for instance ftp://example.com/file.txt.

Input from a file

The client MAY provide input to be retrieved from a file uploaded to the working directory (See below):

The service MAY in this case recognize the prefix for the working directory as given by runner:workingDirectory in the manifest, and replace the URL with the the relative file path uploaded.txt when running the workflow.

The client is not required to have already uploaded the file, for instance this file could be written by the workflow itself or by the client at a later stage. However the service MAY in this case refuse to run the workflow if it does not support this feature.

Retrieving the outputs

Outputs are shown as a folder structure, similar to inputs, by following the runner:outputs link in the manifest.

The service MAY show expected output resources before the workflow has been in state Running, but attempting to resolve any of the resources at that stage SHOULD give a 404 Not Found.

If the service does not support expected outputs, it SHOULD give a 404 Not Found on attempt to resolve the runner:Outputs folder, as indicating an empty folder would wrongly suggest that the workflow is predicted to have no outputs.

As for inputs, outputs MAY be nested. The extent of the nesting might not be known at the Initialized state, so an output previously indicated as a ro:Resource might be a ro:Folder at the time the workflow is Finished or Archived.

The service MAY expose outputs before the workflow has reached the Finished state, for instance if the workflow engine provides partial outputs before completion, or some outputs were produced even though the workflow was Cancelled.

The client can retrieve outputs by following the links:

If the service do not know the correct content type of the output, it SHOULD fail over to text/plain; charset="utf-8" or application/octet-stream accordingly.

Error outputs

Some workflow systems can indicate a (partial) error on a particular output. For instance, out1 might be produced fine, while out2 contains an error rather than a value. It is currently out of scope of this specification how to indicate such errors to clients of the Workflow Runner, but it is recommended to use a custom media type in the response, like application/vnd.wf4ever.runner.error, rather than a HTTP error.

Nested outputs

Retrieving a nested output yields another folder:

Name of nested outputs

Icon

This specification does not put any requirements on the file names of nested output entry names (beyond them being unique within the folder). Server implementations might however have particular naming schemes such as increasing integers with gaps, including gaps for missing values.

Retrieving the provenance

%% To be done (also a folder - but with annotation to wfprov)

Retrieving the working directory

%% To be done, folder

Retrieving the logs

%% To be done, folder - some standards for stdout/stderr

Cache considerations

The service SHOULD include appropriate cache control/expiry headers when such are available. For instance, if a workflow is Running and it is not possible to change the inputs after this state, then the Inputs resources can be given a long cache life time.

Some resources are transient in their nature, such as the Status. The service SHOULD provide a cache headers for the status where appropriate, for instance if it only checks the underlying server status every 5s, then the status resource should have a similar Expiry time set.

When the research object is in status Archived, then the cache headers SHOULD show a long expiration time for all resources.

The service MAY expire research objects from any state after a reasonable or configured period of time (like 48 hours). The service SHOULD respond 410 Gone for requests to an expired RO or any of its resources.

Security considerations

This specification does not specify the authentication mechanism for accessing the service or the underlying workflow system. It is envisioned that a system using OAuth 2.0 with common users on both the Workflow Runner API and the underlying workflow system would provide reasonable authentication measures.

The service SHOULD NOT expose workflow runs or its data that the authenticated user should not have access to.

This service, by its nature, allows execution of arbitrary workflows of the supported workflow system. Depending on the workflow system, this might give the client execution rights on the underlying workflow server, which might be used to expose the data of other users of the service, in addition to be a platform for further exploits. This service might allows uploading of arbitrary data as workflows, workflow inputs and files in a working directory, which could be used by attackers for hosting unwanted content such as spam links and pornographic content. Even pre-approved workflows might in some cases be the subject of abuse if the service allows execution with arbitrary content, for instance to cause out of memory exceptions or SQL injections.

Implementations should ensure that the underlying workflow server is subject to additional security constraints, such as firewalls, user isolation (sudo) and use of virtual machines with snapshot rollback. Implementations should prevent workflow executions access to any security tokens needed by the Workflow Runner Service, for instance to prevent an malign workflow from submitting additional workflow runs.

This service should preferably only allow execution by a pre-approved list of accountable users, ie. users who could otherwise be given direct execution rights on the underlying execution platform, although it may allow unauthenticated execution on third-party workflow systems which authentication details are provided by the client. It is outside the scope of this specification how to provide these details to the service.

References

The first implementation of this service will interface the Taverna Server using its REST API and will be made available on the Wf4Ever sandbox.

  • No labels