Preservation of workflows in Biology
One of the main challenges for biomedical research lies in the integrative study of large and increasingly complex combinations of data. Diseases that are caused by one gene and a singular cascade of molecular events are rare. Complex integrative analyses are necessary to help understand the mechanisms that explain the onset and progression of the disease. At the same time, we need to uphold scientific principles of reproducible science that enable researchers to understand how predecessors reached their conclusions. Traditionally, biologists achieve this by publishing methods with experimental results, conform best practices supported by most journals. Computational biology currently lacks its own widely accepted best practice. Our aim is to provide new tooling for preserving computational methods. Because we can preserve certain steps automatically, we have an opportunity to outperform traditional practices in terms of reproducing and reusing previous work.
We use the workflow paradigm to define our experiments. Another critical concept is the 'Research Object'. This allows us to make a digital structured aggregate of the 'materials and methods' of a computational experiment. Semantic Web technology is used to obtain meaningful, citable, and machine-readable references for these aggregates and their contents. The aggregates help communication with supervisors, collaborators, and ultimately the scientific community (analogous to the publication of traditional papers). We aim to make the creation of Research Objects coincide with a researcher's routine, extending his/her possibilities where possible.
Case studies at the Human Genetics Department of the Leiden University Medical Centre help us fine-tune the standards and tools. We focus on a genotype-phenotype study to unravel the genetic cause of Metabolic Syndrome, and a study to reveal the role of epigenetic factors in Huntington's Disease. While we are working towards publishable results we explore the emerging wf4ever technologies. For instance, we made our first Research Object aggregates. We do not advertise them publicly yet, because they contain unpublished results of collaborating scientists.*
* A predecessor of the Research Object is the myExperiment pack. Research Objects are more structured and enriched by annotations, but you could have a look at some of these packs as an example, e.g. http://www.myexperiment.org/packs/58. It shows how you can use such an aggregate of digital objects as a reference for your publications. It also shows how important wf4ever is for increasing the quality of these digital objects: not all parts of the pack are fully functional anymore and you may still have to guess how they were used in the original experiment. Gradually, you will see an enrichment of myExperiment by wf4ever tooling, and thereby an increase in quality of part of its content.