This document presents an initial model that can be used for specifying research objects.
Research Object = Aggregation + Annotation + RO Vocabularies
A research object can be viewed as an artifact that aggregates a number of resources that are used and/or produced in a given scientific investigation. The figure below illustrates a high level description of the elements that are needed to specify a research object.
A research object aggregates a number of resources. A resource can be a workflow, web service, document, data item, data set, workflow run, software or a research object. Aggregated resources may be related to each other and other resources, these relationships are also parts of the research object. (Are relationships also identified and aggregated? Are annotations aggregated? -Stian) In what follows, we present examples of relationships that were mined from user requirements.
- A web service is used in a workflow
- A given workflow wf1 is a subworkflow of another workflow wf2
- A given workflow run is an instantce of a given workflow
- A given dataset (data item) was used as input to a given workflow run
- A given dataset (data item) was produced by a given workflow run
Resources and their relationships can be the subject of different kinds of annotations.
From the above description, it follows that there are two core concepts that are needed for specifying research objects, that is aggregation and annotation. Instead of building a new model that caters for these concepts, we decided to adapt existing models. Specifically, we are investigating the use of Object Reuse and Exchange (OAI-ORE) for specifying aggregation of resources, and Open Annotation Collaboration (OAC) and Annotation Ontology (AO) for their annotations.
On top of this construct annotations using Research Object and domain-specific vocabularies allow the descriptions of the roles of the individual aggregated resources.
ORE for Specifying Aggregation of Resources in a Research Object
ORE defines standards for the description and exchange of aggregations of Web resources. The figure below depicts an RDF graph illustrating the elements that compose the ORE model. To be able to refer to an aggregation of web resources, a new resource named Aggregation is introduced to specify the resources that constitute the aggregation. Additionally, a new resource, called Resource Map, is introduced to describe the aggregation, which is designated using the property
ore:describes. A resource map allows describing the aggregation as well as the resources that constitute that aggregation. Multiple resource maps might describe the same aggregation, for instance in other formats like Atom.
Figure extracted from http://www.openarchives.org/ore/1.0/primer.html
The concepts of aggregation and resource map just described seem to fit our purposes. A Research Object can be defined as an
ore:Aggregation, and an
ore:ResourceMap can be used to describe the research object and its constituent resources. Specifically, our Resource Object class can be defined as follows:
An aggregated resource is anything that can be referenced with an URI - it might be stored as part of the research object within a RO service or offline archive, or be a freestanding web resource. (Note: For external resources it might also be required to capture meta-data about how/when the resource was accessed. Describing this is supported by OAC, AO and PAV but might have additional _versioning requirements.)_
Using ORE vocabulary, the manifest describing a research object corresponds to a resource map, and the research object itself can be specified as an aggregation. To illustrate this, below is a simple example of a manifest and a research object:
The research object
:ro is an aggregation of the resources identified by:
<output.txt>. It was created by
:stian at 2011-07-14T15:01:13.The research object
:ro is described by the resource map,
<manifest>, which was generated by the agent
:roservice, at the date 2011-07-14T15:01:14.
Authors and agents are described using FOAF (not shown in the above excerpt), but the challenge of identifying and unifying these authors across multiple research objects and stores is out of scope for this document.
As well as resource map and aggregation, ORE introduces the concept of a proxy resource. A proxy resource provides the means for denoting a resource in the context of a specific aggregation. The figure below describes this concept.
Figure extracted from http://www.openarchives.org/ore/1.0/datamodel.
Proxy resources may prove useful when specifying research objects. Specifically, it can be used to describe resources that play different roles in different research objects, and that may, consequently, have different annotations depending on the research object they are used within. For instance, a data file might be annotated as a workflow output in one research object, but a workflow input in another.
The following example illustrates how resources can be described as proxies within a research object.
<output.txt> have been assigned proxies, which as we will show later on, can associated with annotations that apply in the context of the research object to which they belong.
Notice that so far we did not specify how to describe relationships between the resources or their proxies. We will show below how these can be specified using annotations.
Annotating Research objects
Instead of designing a new model for annotating research objects and their constituent resources, we are investigating two RDF-based models, namely the Open Annotation and Collaboration (OAC) and the Annotation Ontology (AO).
The description below mainly focuses on OAC as this is the model we investigated first. A comparison with a similar approach using AO follows.
OAC specifies an approach for associating resources with annotations. The annotation model adopted by OAC is illustrated below. An annotation is defined as a document, identified by an URI, which describes the association created between two resources, a body and a target. The annotation body is a resource that "is somehow about" the resource designated by the annotation target.
Figure extracted from http://www.openannotation.org/spec/beta/. A1 is an annotation linking the annotation body B1 with the annotation target T1, meaning that B1 is describing T1.
The OAC specification does not put any requirements to the body, it could be any kind of resource (like a text document, video, etc) which in some way describes or talks about the target resource.
Using OAC, one can annotate the research object and its content. In particular such annotations can be used to describe the aggregated resources as well as specifying relationships between those resources.
In the beta OAC specification, the subclass
oac:DataAnnotation is introduced to indicate that the annotation body is a structured data annotation meant for computer consumption. As research objects require structured annotations with relationships and use of controlled vocabularies, we utilize OAC data annotations for denoting annotation bodies containing RDF graphs.
As an example of a data annotation, the following illustrate how the research object itself can be associated with an annotation body specifying a title. This resource also includes metadata about who created the annotation body, ie. who gave the title.
Note that the body of the annotation is here a named graph containing two statements used to associate the research object with a title and a description. Using a named graph representation like TriG allows us to include the annotation body directly, while other representations like RDF/XML would require a separate retrieval of the annotation body resource, RDF reification or Content in RDF.
This separation of annotation metadata and annotation bodies allows multiple structured (and potentially inconsistent) annotations to be asserted about resources in the research object. The annotation bodies are not limited to describe only aggregated resources, but may also relate them to other resources and controlled vocabularies.
OAC allows multiple targets of an annotation, this enables us to specify relationships between resources that compose the research object. As an example, the following specifies the relationship between the proxies
manifest:workflowProxy. It specifies that there exists a workflow run that is an instance of
manifest:workflowProxy, that consumed
manifest:inputProxy and produced
manifest:inputProxy. In this example we don't know much more about the workflow run itself (it might be described in depth using provenance ontologies), so it's here only given as an anonymous node within the annotation body. ("there exists a run such that..)
Annotation Ontology (AO)
As an alternative to OAC, annotations could be specified using the Annotation Ontology. The Annotation Ontology provides a common model for document metadata derived from text mining and manual annotation of scientific papers. Specifically, it provides the means for annotating electronic documents or parts of electronic documents. Different kinds of annotations can be classified, e.g., comment, notes, examples, erratum, etc.
AO is normally applied such that an annotation is the creation of a link between an annotated document, and an annotation topic.
From http://code.google.com/p/annotation-ontology/wiki/Annotation, an annotated document has been annotated to have the topic of the enzyme beta-secretase 1. Additional metadata about when the document was retrieved is provided using PAV.
Note that the statements within the "annotation topic" box (ie.
name) is not part of the annotation, that is just additional information about a term from a controller vocabulary. The annotation is in AO viewed as a "yellow marker"-type entity that links a (previously known) topic with a document.
From this example above (using
aot:Qualifier) one could strictly argue that for our annotation bodies, AO should be applied 'opposite' to how we used OAC, as the annotation bodies have the aggregated resources as their topics. We feel that this is somewhat counter-intuitive, as our motivation was to find a mechanism for attaching rich descriptions to aggregated resources. However, AO encourages specialisation through subclassing
ao:Annotation, for instance an
aot:Note relates an
ann:body as a free-text note describing (a sub-selection of) the annotated document.
AO can therefore be used in a manner similar to OAC data annotations, by introducing a new subclass
ro:DataAnnotation. In this style,
aof:annotatesResource (rather than
aof:annotatesDocument as we're not sure if our resource can be considered a
foaf:Document) is used to indicate the annotated resource, and
ao:body is used to indicate the content of the annotation, here as well referenced as a named graph or separate resource.
AO example assuming ro:DataAnnotation using named graphs in TriG format:
The following suggests a vocabulary that can be used for describing research objects, their resources, and their relationships. This vocabulary is not yet complete, it needs to be extended and is not yet inline with the terms used in the examples.
These examples were created by Stian. They show how ORE together with OAC or AO can be used to specify research objects. The examples also include Sparql queries that can be issued against the specified research object.
- The ORE aggregation: https://github.com/wf4ever/ro/blob/master/examples/ore.n3
- Annotations using OAC: https://github.com/wf4ever/ro/blob/master/examples/annotations-oac.trig
- Annotations using AO: https://github.com/wf4ever/ro/blob/master/examples/annotations-ao.trig
- Return all aggregated resources and their proxy: https://github.com/wf4ever/ro/blob/master/examples/ro-aggregations.sparql
- Return dc:title on a resource, and the author of the title annotation: https://github.com/wf4ever/ro/blob/master/examples/titles.sparql
- Return any statements with the resource as a subject, and the author of those statements: https://github.com/wf4ever/ro/blob/master/examples/direct-annotations.sparql
Using Existing Vocabularies for Describing Research Object
In addition to the core vocabulary, users can make use of vocabularies defined by existing (domain) ontologies. For example, they can use the Gene Ontology to describe the products of their in silico experiments. Users can also use well known ontologies such as the FOAF ontology to provide information about the authors (contributors) of research objects, and the Dublin Core ontology and the Provenance Authoring and Versioning ontology to provide information about the provenance of resources in the research object.