Skip to end of metadata
Go to start of metadata

This document presents an initial model that can be used for specifying research objects.

http://purl.org/wf4ever/ro

http://purl.org/wf4ever/wfdesc

http://purl.org/wf4ever/wfprov

Research Object = Aggregation + Annotation + RO Vocabularies

A research object can be viewed as an artifact that aggregates a number of resources that are used and/or produced in a given scientific investigation. The figure below illustrates a high level description of the elements that are needed to specify a research object.

A research object aggregates a number of resources. A resource can be a workflow, web service, document, data item, data set, workflow run, software or a research object. Aggregated resources may be related to each other and other resources, these relationships are also parts of the research object. (Are relationships also identified and aggregated? Are annotations aggregated? -Stian) In what follows, we present examples of relationships that were mined from user requirements.

  • A web service is used in a workflow
  • A given workflow wf1 is a subworkflow of another workflow wf2
  • A given workflow run is an instantce of a given workflow
  • A given dataset (data item) was used as input to a given workflow run
  • A given dataset (data item) was produced by a given workflow run

Resources and their relationships can be the subject of different kinds of annotations.

From the above description, it follows that there are two core concepts that are needed for specifying research objects, that is aggregation and annotation. Instead of building a new model that caters for these concepts, we decided to adapt existing models. Specifically, we are investigating the use of Object Reuse and Exchange (OAI-ORE) for specifying aggregation of resources, and Annotation Ontology (AO) for their annotations.

On top of this construct annotations using Research Object and domain-specific vocabularies allow the descriptions of the roles of the individual aggregated resources.

ORE for Specifying Aggregation of Resources in a Research Object

ORE defines standards for the description and exchange of aggregations of Web resources. The figure below depicts an RDF graph illustrating the elements that compose the ORE model. To be able to refer to an aggregation of web resources, a new resource named Aggregation is introduced to specify the resources that constitute the aggregation. Additionally, a new resource, called Resource Map, is introduced to describe the aggregation, which is designated using the property ore:describes. A resource map allows describing the aggregation as well as the resources that constitute that aggregation. Multiple resource maps might describe the same aggregation, for instance in other formats like Atom.


Figure extracted from http://www.openarchives.org/ore/1.0/primer.html

The concepts of aggregation and resource map just described seem to fit our purposes. A Research Object can be defined as an ore:Aggregation, and an ore:ResourceMap can be used to describe the research object and its constituent resources. Specifically, our Resource Object class can be defined as follows:

An aggregated resource is anything that can be referenced with an URI - it might be stored as part of the research object within a RO service or offline archive, or be a freestanding web resource. (Note: For external resources it might also be required to capture meta-data about how/when the resource was accessed. Describing this is supported by AO and PAV but might have additional _versioning requirements.)_

Using ORE vocabulary, the manifest describing a research object corresponds to a resource map, and the research object itself can be specified as an aggregation. To illustrate this, below is a simple example of a manifest and a research object:

The research object :ro is an aggregation of the resources identified by: <http://example.com/workflow.scufl2>, <input.txt>, <output.txt>. It was created by :stian at 2011-07-14T15:01:13.The research object :ro is described by the resource map, <manifest>, which was generated by the agent :roservice, at the date 2011-07-14T15:01:14.

Authors and agents are described using FOAF (not shown in the above excerpt), but the challenge of identifying and unifying these authors across multiple research objects and stores is out of scope for this document.

As well as resource map and aggregation, ORE introduces the concept of a proxy resource. A proxy resource provides the means for denoting a resource in the context of a specific aggregation. The figure below describes this concept.


Figure extracted from http://www.openarchives.org/ore/1.0/datamodel.

Proxy resources may prove useful when specifying research objects. Specifically, it can be used to describe resources that play different roles in different research objects, and that may, consequently, have different annotations depending on the research object they are used within. For instance, a data file might be annotated as a workflow output in one research object, but a workflow input in another.

The following example illustrates how resources can be described as proxies within a research object. <http://example.com/workflow.scufl2>, <input.txt> and <output.txt> have been assigned proxies, which as we will show later on, can associated with annotations that apply in the context of the research object to which they belong.

Notice that so far we did not specify how to describe relationships between the resources or their proxies. We will show below how these can be specified using annotations.


Example RDF graph of Research Object in ORE. Also available as PDF, OmniGraffle

Annotating Research objects

Instead of designing a new model for annotating research objects and their constituent resources, we investigated two RDF-based models, namely the Open Annotation and Collaboration (OAC) and the Annotation Ontology (AO) [http://www.jbiomedsem.com/content/2/S2/S4]. Following several considerations, we finally decided the use of the Annotation Ontology. Technically the two models compares fairly equal, so this choice was done mainly for political-social reasons, for instance myGrid has a product called Utopia, which can use AO. The way we intend to use the annotation model should be fairly interchangeable with OAC should we change our mind. In addition both teams are actively working on consolidating the two models.

So in Wf4Ever Research Objects, annotations are specified using the Annotation Ontology. The Annotation Ontology provides a common model for document metadata derived from text mining and manual annotation of scientific papers. Specifically, it provides the means for annotating electronic documents or parts of electronic documents. Different kinds of annotations can be classified, e.g., comment, notes, examples, erratum, etc.

Annotation Ontology overview

This section gives a overview of how AO is generally used elsewhere and is not normative for Wf4Ever research objects. 

Most examples of AO are typically shown using annotation of the aot:Qualified subclass, where the annotation is the creation of a link between an annotated document, and an annotation topic, expressing that the document (or a select of the document) is talking about or describing the given topic:

From http://code.google.com/p/annotation-ontology/wiki/Annotation, an annotated document has been annotated to have the topic of the enzyme beta-secretase 1. Additional metadata about when the document was retrieved is provided using ao:onSourceDocument and expressed with the PAV ontology. In addition pav:createdOn and pav:createdBy shows who created this annotation.

Note that the statements within the "Annotation Topic" box (ie. name) is not part of this particular annotation, that is just additional information about a term from a controlled vocabulary. The annotation here is the dotted circle in the middle which is of the type of ao:Annotation, and is simply linking the document with a topic. So the annotation ontology tells us what this connection is (http://tinyul.com/... talks about PRO:00004615), the nature of the annotation, (aot:Qualifier), in addition to when and by who this annotation was created.

It is possible to specify that an annotation has a context, thus instead of claiming that the annotation applies to the whole document, it is related to a selector, which highlights a particular bit of the document or resource. Various standard selectors are provided, such as xpointer, text prefix-and-postfix selection, image selection by rectangle and video selection. 


From http://code.google.com/p/annotation-ontology/wiki/Selectors#Examples - the exact word "BACE" is selected when it has a certain text before and after.

AO encourages specialisation through subclassing ao:Annotation, for instance an aot:Note relates an ann:body as a free-text HTML note describing (a sub-selection of) the annotated document:

From http://code.google.com/p/annotation-ontology/wiki/AnnotationTypes#Annotation_Type:_Note - an aot:Note significies that the ann:body is a free-text HTML note. This example also shows how the context is an aos:ImageSelector, relating the note to that particular section of the image.

AO allows the use of named graphs or graph literals, at least for the ao:hasTopic property. In Wf4Ever we do however feel that we don't primarily need to do such aot:Qualifiers style tagging using ao:hasTopic, but rather a named-graph variant of aot:Note using ann:body. After discussions with the Paolo Ciccarese it was agreed that although hasTopic could be used for our purposes, it could be slightly misleading due to its name. We are therefore opting for the solution described below:

Annotation Ontology used in Research Objects

Although it may be useful in Wf4Ever to be able to use AO features such as qualifiers and selectors, the main motivation for allowing RO annotation is to allow users to describe aggregated resources and the research object itself using standard and domain-specific vocabularies. We therefore do not prescribe anything for or against the other users of AO, but for this specific need we introduce a subclass of ao:Annotation called ro:GraphAnnotation.

Using the ro:GraphAnnotation subclass signifies that the attached ao:body is a machine-readable RDF graph with a structured aot:Note-like annotation. This means that whoever created this annotation stated what is expressed by the graph, and the graph somehow is about or describes the annotated resource. The annotated resource would typically be aggregated in the same research object that contains the annotation (See ORE above).

In this style, ao:annotatesResource }}is used to indicate the annotated resource, and {{ao:body is used to indicate the content of the annotation, as a named graph or separate resource identified by an URI. The use of a separate resource for the ao:body means that the asserter is allowed to use existing vocabularies to directly describing the aggregated resources, and multiple annotation bodies are observable as separate graphs which are not required to be non-contradictory.

AO example assuming ro:GraphAnnotation using named graphs in TriG format:


Example RDF Graph of Research Object using AO annotations as nested graphs. Also available as PDF, OmniGraffle

_

Todo

Icon

This illustration needs to be updated to use ro:GraphAnnotation rather than ro:DataAnnotation

_

As named graph representations such as TriG are still not generally supported by most RDF toolkits, and not yet standardised by W3C, we recommend that annotation bodies are addressed as and accessible as separate (HTTP) resources. Clients are then free to choose if they want to follow the links or not, depending on which resource is annotated or who did the annotation, and then locally merge multiple annotations into a flat graph (loosing knowledge of who said what) or keep them as named graphs in a local quad-store.

For efficiency reasons the service providing these annotation could also provide an additional resources for a given RO, which using content-negotiation would either return a named graph in TriG format, a flattened RDF/XML graph, or even an N3 graph with graph literals. However, for posting annotations it should be recorded who is providing which new statements, and as such only the individual new RDF statement should be posted. The individual resources also allows PUT and DELETE to modify or delete old annotation.

Versioning of annotations is assumed to be done similarly to versioning of ROs, and so the annotation ontology does not by itself cover disputes or multiple edits, only the "current" annotations are shown.

Core RO Vocabulary

The following suggests a vocabulary that can be used for describing research objects, their resources, and their relationships. This vocabulary is not yet complete, it needs to be extended and is not yet inline with the terms used in the examples.

In particular the core vocabulary allows the statements of which resources of the Research Object is a workflow, input data, hypothesis, etc, and allows description of the relationships, such as <outputB> rel:outputFrom <workflow1>.

Examples

The following is example of a single annotation body above, describing how three aggregated resources (through their ORE proxies) are related by a workflow run.

Taken from https://github.com/wf4ever/ro/blob/master/examples/annotations-ao.trig

Icon

The above and/or https://github.com/wf4ever/ro/blob/master/ro-vocab.ttl - needs to be consolidated.

Complete examples

These examples were created by Stian. They show how ORE together with AO can be used to specify research objects. The examples also include Sparql queries that can be issued against the specified research object.

Sparql queries:

Using Existing Vocabularies for Describing Research Object

In addition to the core vocabulary, users can make use of vocabularies defined by existing (domain) ontologies. For example, they can use the Gene Ontology to describe the products of their in silico experiments. Users can also use well known ontologies such as the FOAF ontology to provide information about the authors (contributors) of research objects, and the Dublin Core ontology and the Provenance Authoring and Versioning ontology to provide information about the provenance of resources in the research object.