Skip to end of metadata
Go to start of metadata

Proposed Final list

1. Make a sketch workflow

How?

  • In Taverna using empty beanshells
  • In PowerPoint
  • In a sketch book

Why?

  • Provides a reference point of the main task(s) of the workflow through the implementation process
  • Promotes sharing between computer and workflow systems due to its non-explicit nature
  • Helps design experiment
  • Helps communication (supervisors, colleagues)

Examples

  • Concept mining (beanshells), Eleni (ppt), sketch on paper (todo)

2. Use modules

How?

  • Describe and implement each of the executable processes in a workflow individually and independently
  • In Taverna this can be done through nested workflows

Why?

  • Facilitates independent testing and validation of the execution of each of the individual modules
  • Encourages re-use

Note: Make sure that you publish the separate modules as well as the final nested workflow (facilitates re-use)
   Note: myExperiment does not very well support this.

Examples

  • Protein discovery workflow (workflow 74 on myExperiment)

3. Think about the output

How?

  • Consider if you want to populate data models/databases or create outputs of disconnected collections of files
  • Consider who the results are for (overview for users, or the next workflow component)
    • General advice: at least have a report as an output (provenance will have the separate parts anyway)
  • Use Taverna for provenance collection

Why?

  • Easier to think about this at the design stage than trying to adjust a ready workflow
  • Structure potentially large output data,

Examples

4. Provide example inputs and outputs

How?

  • Example inputs and outputs can be provided in Taverna
  • Alternatively: add input or output files to a pack containing the workflow

Why?

  • To help understand the workflow
  • For validation
  • For maintenance

Notes:

  1. Make sure that the input and the output examples are coupled. Keep in mind that the output has a timestamp. It may change due to changes in underlying databases.
  2. Make sure that input and output examples are real inputs/outputs.

Examples

  • Protein Discovery (workflow 74)

5. Annotate

How?

  • Choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow. Focus on 'how' a component is used in this workflow and 'why' it is in there.
  • If it exists, reference to information about what the component does in general (e.g. by referencing a service on BioCatalogue). Assume that a referenced resource may disappear or change at some time in the future.
  • Use Taverna description fields and example fields. Taverna keeps it with the workflow and MyExperiment uses this information.
  • Keep any notes that are related to the workflow, but not part of it, linked to it
  • Example of useful "extra" information: execution time, keywords, contact information, attribution
    • myExperiment offers some of this, but best to put it in the workflow descriptions

Why?

  • Doing good science
  • Record what is needed for a publication later on
  • Increase re-usability

Examples

  • No perfect example
    • Alan's BioVEL example?
    • SCAPE examples?
    • Workflow 74 does its best, but is not perfect

6. Make it executable from outside the local environment

How?

  • Use Web Services, any Taverna widget except 'external tool', and 'external tool' only when it runs over ssh on publicly accessible server
  • Use Taverna with local tools, but installed on a publicly accessible server with the Taverna server
  • Use local tools from an easy to set up environment such as biolinux (only for a certain niche of users)
  • TRY IT!!

Why?

  • Others will be able to run the workflow
  • Proof of reproducibility

Examples

  • postitive examples: plenty
  • negative examples
    • LibSBML is a negative example (requires a plugin)
    • possibly create one if we have a local tool and a web service for it (Magnus/Yassene)
    • possibly look at Soaplab examples (e.g. EMBOSS)

7. Choose services carefully

How?

Choose the service that is reliable based on:

  • BioCatalogue reliability statistics
  • How often it is used in other workflows
  • Contact with service providers. Communicate!
  • The reputation of the institution providing the service
    • check trustworthiness of service provider (can also be a person, of whom you can check if they will remain at an institution to maintain the service)
  • In practice: check on biocatalogue if it has a green light (momentarily not much more you can do)

Why?

  • Prevent workflow decay, prolong the life of the workflow

Note to service developers: Many work around and ugly workflow practices come from having to deal with badly behaved services!

  • example: workflow 1767
    • asynchronous
    • lots of XML splitters

Examples

  • [ask biocatalogue-friends]

8. Reuse existing workflows

How?

  • Make your own workflows modular since this promotes reuse
  • Search myExperiment and filter on most downloaded or most viewed
  • Check if it has been used in a publication (e.g. in description)
  • Use your contacts: maybe someone has tried to solve something similar before using a workflow?
  • Try (hard) and talk to the author if it doesn't work (they typically like you for doing that)

Why?

  • Another user that is familiar with one of your workflows, is more likely to understand another workflow that you designed
  • Beneficial when repairing workflows: By repairing a given workflow may entails repairing the workflows in which it is used as a subworkflow
  • Fights redundancy
  • Better workflows, because people repair it upon feedback
  • Get ideas on methods and workflow patterns

Note: don't forget the attibution if you use someone elses workflow

Examples

9. Advertise

How?

  • Share your workflow on myExperiment, other social media, by e-mailing it around to colleagues. Be sure to provide contact information!
  • Cite your workflow when publishing, using a stable identifier like myExperiment
  • Make use of the pack functionality in myExperiment to bundle your workflow with other important documents such as a publication

Why?

  • Good science - share your results
  • Get cited – fame!
  • Progress, let others build on your work without reinventing it.

Examples

  • Pique's example: myExperiment pack
  • Paul's example: pack made for NAR: pack 55

10. Maintain

How?

  • Act on information about services that are deprecated, either by changing services or providing a note that that specific process in the workflow in not executable anymore
  • Put your services on BioCatalogue (don't have to be the owner) and your workflows on myExperiment (notification iits planned)
  • Regularly test the workfow (like 'unit tests')

Why?

  • Good practice – this is already demanded for some types of publications, like an application note in Bioinformatics
  • Fight workflow decay, prolong the life of the workflow

Examples

  • Workflow 74, notifications by BioCatalogue

Khalids original list, extended by Kristina, Marco, Katy and Carole

Following today's telecon, in which Kristina expressed that the fact that they did not get answers from the users is blocking for progressing on the Best Practice document for workflow design, I thought that it would be a good start to distill the elements that we already know about, either because we interacted in the past with users, or simply because we have been users ourselves. So here are the practices that I believe would make workflow reuse and preservation a much easier task. The order of the elements below is significant. It starts by the practices that are easy to implement first.

Comment Marco: designing a workflow is a combination of best practices experiment design + best practices software design.

Comments and additions from Kristina are provided in purple below.

  1. Make an abstract workflow. An abstract workflow is analogue to an abstract class in object oriented programming or pseudo-code in procedural programming. It is a workflow where the component services are NOT explicitly declared. This abstract workflow will serve many purposes, including providing an reference point of the main task(s) of the workflow through the implementation process, and promoting sharing between computer and workflow systems due to its non-explicit nature. Katy says: maybe a bit too technical. Better to say "sketch out your workflow". Might also be a simple design on a piece of paper.
  2. Make use of modules (or nodes). Describe and implement each of the executable processes in a workflow individually and independently as a module. This facilitates the independent testing and validation of the execution of each of the individual modules using appropriate data, and encourages re-use. When the final workflow is put together, it will be nested. Make sure that you also publish the separate modules as well as the final nested workflow. Katy says: Assembly part and glue part. Marco says: provide example, there are many names for the same thing.
  3. Provide example inputs and outputs that can be used by others to enact your workflow or by yourself for maintenance purposes. Users tend to understand the workflow functionality by executing it. In many cases, however, it is difficult to determine example inputs that can be used to run the workflow, e.g., because the inputs of the workflow have meaningless names, such as in0, in1, or because it is no obvious which format the input data should be structured according to. Make sure that the input and the output examples are coupled. Marco says: also for validation. Katy says: Be careful that the output has a timestamp. It may be changed due to changes in underlying databases.
  4. Annotate the workflow, its modules, and related data. Choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow. It is always difficult to understand what the workflow does, when the processes are named in an arbitrary manner. Of course, the user may gain more information by looking at the services that implement the processes in question. However, in practice, the names chosen for the services are not helpful either, and the majority of them are not annotated. Provide contact information, keywords, execution time. Marco says: related to wet lab best practices. Doing good science and record what is needed for a publication later on.
  5. Comment Marco and Katy: Rename this point to "Make sure that your workflow is reusable". Use services or scripts instead of local processes when possible. This is to provide others with the ability to enact your workflow. Marco says: if you use local processes put it all on a server because then it will be usable by others. Do this in an early stage of development. In summary: Reusable = (1) Web Services, any Taverna widget except 'external tool', 'external tool' only when it runs over ssh on publicly accessible server; (2) Taverna with local tools, but installed on a publicly accessible server with the Taverna server; (3) local tools from an easy to set up environment such as biolinux (only for a certain niche of users). Kristina says: maybe what we mean is "make it executable from outside local environment"?
  6. When you publish you workflow make sure that you use a stable identifier. Use of stable identifiers for workflows and related document URLs to prolong workflow longevity long after publication. Comment Marco and Katy: for the talk, only talk about a stable identifier for workflows. Refer to myExp and Galaxy. This might be merged with point 5.
  7. Be careful when choosing the services that implement the workflow processes. If there is more than one service can implement a given process, then choose the service that is reliable, e.g., based on the reputation the institution providing the service. A service that is provided by a reputable institute is more likely to be available in the long run. Comment Katy: could also look on BioCatalogue for reliability statistics. Look how often it is used in other workflows. Comment Marco: could also contact providers and ask about the service. Communicate!
  8. Reuse existing workflows whenever possible. Try to reuse the workflows that you or others have designed in the past, provided that they are still up and running. This way, a user that is familiar with one of your workflows, is more likely to understand another workflow that you designed. Also, it may be beneficial when repairing workflows: By repairing a given workflow wf may entails repairing the workflows in which sf is used as a subworkflow. It also fights redundancy. Important: do not forget attribution. Comment Marco: Communicate! Comment Katy: mention reputation, that you might pick the one that has been most downloaded or used in publications.
  9. Make use of the pack functionality (myExp-specific). Bundle your workflow with other important documents such as a publication. Comment Marco: maybe put under make your workflow reusable. Comment Katy: or under advertise.
  10. Advertise. Share your workflow on myExperiment, through other social media, or just by e-mailing it around to colleagues. Cite your workflow when publishing. Comment Katy: using a stable identifier like myExperiment.
  11. Maintain the workflow. Act on information about services that are deprecated, either by changing services or providing a note that that specific process in the workflow in not executable anymore. Comment Marco: maybe we could merge some points with this one into a point about making it a "first class publishable object". Katy says: this is already demanded for some types of publications, like an application note in Bioinformatics.
  12. Think about what to do with your output data (manage, visualization). Comment Carole: Designing workflows to populate data models/databases vs outputs of disconnected collections of file (like most of Pauls) + Workflow design for provenance collection (using the provenance of Taverna). Comment Katy: think about how to understand the results.

Some useful publications

Not much - could be a paper opportunity!

Possibly of interest:

Business workflow Guidelines

Best Practices in Designing BPM Workflows http://docs.oracle.com/cd/E13214_01/wli/docs70/bestprgd/bestprga.htm

Designing Workflow Components http://msdn.microsoft.com/en-us/library/ee658122.aspx#Step1

Guidelines for Workflow Systems

Timothy McPhillips, Shawn Bowers, Daniel Zinn, Bertram Ludäscher, Scientific workflow design for mere mortals, Future Generation Computer Systems, Volume 25, Issue 5, May 2009, Pages 541-551, ISSN 0167-739X, 10.1016/j.future.2008.06.013.

http://www.sciencedirect.com/science/article/pii/S0167739X08000873

  • No labels