Annotating Research objects
Instead of designing a new model for annotating research objects and their constituent resources, we are investigating two RDF-based models, namely the Open Annotation and Collaboration (OAC) and the Annotation Ontology (AO).
Open Annotation and Collaboration (OAC)
The description below mainly focuses on OAC as this is the model we investigated first. A comparison with a similar approach using AO follows.
OAC specifies an approach for associating resources with annotations. The annotation model adopted by OAC is illustrated below. An annotation is defined as a document, identified by an URI, which describes the association created between two resources, a body and a target. The annotation body is a resource that "is somehow about" the resource designated by the annotation target.
Figure extracted from http://www.openannotation.org/spec/beta/. A1 is an annotation linking the annotation body B1 with the annotation target T1, meaning that B1 is describing T1.
The OAC specification does not put any requirements to the body, it could be any kind of resource (like a text document, video, etc) which in some way describes or talks about the target resource.
Using OAC, one can annotate the research object and its content. In particular such annotations can be used to describe the aggregated resources as well as specifying relationships between those resources.
In the beta OAC specification, the subclass
oac:DataAnnotation is introduced to indicate that the annotation body is a structured data annotation meant for computer consumption. As research objects require structured annotations with relationships and use of controlled vocabularies, we utilize OAC data annotations for denoting annotation bodies containing RDF graphs.
As an example of a data annotation, the following illustrate how the research object itself can be associated with an annotation body specifying a title. This resource also includes metadata about who created the annotation body, ie. who gave the title.
Note that the body of the annotation is here a named graph containing two statements used to associate the research object with a title and a description. Using a named graph representation like TriG allows us to include the annotation body directly, while other representations like RDF/XML would require a separate retrieval of the annotation body resource, RDF reification or Content in RDF.
This separation of annotation metadata and annotation bodies allows multiple structured (and potentially inconsistent) annotations to be asserted about resources in the research object. The annotation bodies are not limited to describe only aggregated resources, but may also relate them to other resources and controlled vocabularies.
OAC allows multiple targets of an annotation, this enables us to specify relationships between resources that compose the research object. As an example, the following specifies the relationship between the proxies
manifest:workflowProxy. It specifies that there exists a workflow run that is an instance of
manifest:workflowProxy, that consumed
manifest:inputProxy and produced
manifest:inputProxy. In this example we don't know much more about the workflow run itself (it might be described in depth using provenance ontologies), so it's here only given as an anonymous node within the annotation body. ("there exists a run such that..)
Annotation Ontology (AO)
As an alternative to OAC, annotations could be specified using the Annotation Ontology. The Annotation Ontology provides a common model for document metadata derived from text mining and manual annotation of scientific papers. Specifically, it provides the means for annotating electronic documents or parts of electronic documents. Different kinds of annotations can be classified, e.g., comment, notes, examples, erratum, etc.
AO is normally applied such that an annotation is the creation of a link between an annotated document, and an annotation topic.
From http://code.google.com/p/annotation-ontology/wiki/Annotation, an annotated document has been annotated to have the topic of the enzyme beta-secretase 1. Additional metadata about when the document was retrieved is provided using PAV.
Note that the statements within the "annotation topic" box (ie.
name) is not part of the annotation, that is just additional information about a term from a controller vocabulary. The annotation is in AO viewed as a "yellow marker"-type entity that links a (previously known) topic with a document.
From this example above (using
aot:Qualifier) one could strictly argue that for our annotation bodies, AO should be applied 'opposite' to how we used OAC, as the annotation bodies have the aggregated resources as their topics. We feel that this is somewhat counter-intuitive, as our motivation was to find a mechanism for attaching rich descriptions to aggregated resources. However, AO encourages specialisation through subclassing
ao:Annotation, for instance an
aot:Note relates an
ann:body as a free-text note describing (a sub-selection of) the annotated document.
AO can therefore be used in a manner similar to OAC data annotations, by introducing a new subclass
ro:DataAnnotation. In this style,
ao:annotatesResource (rather than
aof:annotatesDocument as we're not sure if our resource can be considered a
foaf:Document) is used to indicate the annotated resource, and
ao:body is used to indicate the content of the annotation, here as well referenced as a named graph or separate resource.
AO example assuming ro:DataAnnotation using named graphs in TriG format:
2012-03-15 Open Annotation
An effort to merge the AO and OAC models into the "Open Annotation" model has started as part of the W3C Open Annotation Community Group.
On 2012-03-15 the AO and OAC communities met in Boston for a Open Annotation Technical Meeting. Many agreements were reached on merging the two models.
Here is a picture of Robert Sanderson presenting the outcome of the meeting, the "OA" model:
- A1 is the oa:Annotation. It has has a minimum one oa:hasTarget, the resource(s) this annotation is about.
- oa:hasBody points to an (optional) resource which is somewhat about the target(s). Thus this annotation says that S1 is about ST. S1 might be a retrievable web resource, or have an UUID with the (typically lightweight) body embedded using Content in RDF.
- The annotation can indicate a series of oa:hasSemanticTag. This is a scruffy version to say that an ontological term is related to the target.
- The annotation can be an instance of the subclass oa:DataAnnotation, this indicates that the (now required) body is intended for computational processing
- The subclass oa:GraphAnnotation is a subclass of oa:DataAnnotation, indicating that the body can be seen as an RDF graph. The graph can either be retrieved from the URI of the body, be embedded as a known RDF serialisation using Content in RDF, or be a named graph (if the annotation is within a quad-serialization/store).
- The target can be represented as any resource directly (oa:hasTarget <http://example.com/>) or by an intermediate oa:SpecificTarget node, which indicates the resource using* the required *oa:hasSource. The annotation (and its body) can then talk about the specific target rather than the whole resource, for instance a part of an image instead of the whole image.
- oa:hasSetup *can specify 0 or more alternate *oa:Setups for specifying how the resource is to be retrieved and prepared before applying the selectors. For instance a subclass of ao:Setup could specify HTTP accept and language headers for retrieval, while a more specific subclass could specify how to rotate a 3d molecule. Multiple setups indicate alternative setups - applications SHOULD try to apply one of the setup, but choose their own preferences between them.
- oa:hasSelector indicates 0 or more alternate oa:Selectors for specifying a subset of the resource - for instance a paragraph in an HTML page or a rectangle cutout from an image. This is an extension point, the OA core does not specify any seectors, except possibly the oa:AllSelector (?). Multiple selectors indicate alternative ways to select the same intended selection - although they are not required to be one-to-one matching. (for instance: rectangle selector vs. polygon selectors). As an extension point, different domains will make different selectors, and applications can pick the selectors they understand (if any), and will have their own preferences in case multiple can be applied.
- oa:hasStyle indicates 0 or more alternate oa:Styles for specifying how the annotator would prefer the target to be rendered/shown. For instance in a CSS selector, specifying border: red can make a red border appear around the selection.
- The application is free to find which combination of setup, selector and style it can apply, in particular for rendering an annotation.
- Annotation metadata should be standardized to specify the generator, creator, created - but no decision was made as to if these should be taken from existing vocabularies like Dublin Core Terms or PAV. The distinction between creator and generator is that the generator is what made the RDF (typically software), creator is who made the annotation (typically a person). These could also be specified on the body if they are different from the OA creator/generator/created.
- The oa:modelVersion (called 'version' in picture) specifies the version of OA used.
Overall the new proposed model should be a good fit for our case, in particular because oa:GraphAnnotation matches our ro:GraphAnnotation.
It would only be minimal changes to our current use of AO to use the new OA instead.
For example, our current RO 0.2 approach using AO 2:
But the devil is in the detail, and the DC/PAV style metadata (creator, date) has not yet been agreed on.