Skip to end of metadata
Go to start of metadata

Related Vocabularies

A list of vocabularies and models that may be of relevance/use in describing Research Objects. Largely unstructured brain dump at present.

Models

OAIS

Open Archive Information System. OAIS Reference Model provides a description of the functionality and components expected in a system intended to support preservation of information for a community.

Overview by Brian Lavoie: http://www.dpconline.org/docs/lavoie_OAIS.pdf

OAIS describes "open archival information systems" which are concerned with preserving information for the benefit of a community. The prime focus for OAIS seems to be preservation (securing long-term persistence), without directly considering the uses to which that information will be put. Although there is consideration of providing sufficient contextual information about the resources preserved to ensure that information is understandable and useable.

OAIS considers three external entities or actors that interact with the system:

  • Producers . Those that transfer information to the system for preservation
  • Management . Those that formulate and enforce high level policies (planning, defining scope, providing "guarantees"). 
  • Consumers . Those that are expected to use the information

It also explicitly defines the notion of a Designated Community, a subset of consumers that are expected to understand the archived information. There is a notion of content/information being independently understandable and the Designated Community are the users for which this is targeted. This is linked to the requirements that Paolo has expressed, and that we discussed in the eScience paper as being the ability to re-use content without resorting to "back-channels". So I don't need to pick up the phone in order to make use of or understand a data set/RO/whatever. The Designated Community impact on the content and forms of information stored in the OAIS. This could be a useful concept/term to use in our description of users and scenarios.

The OAIS Functional Model describes a core set of mechanisms which include Ingest, Storage and Access along with Planning, Data Management and Adminstration.

There is also the separation of

  • Submission Information Package the mechanism by which content is submitted for ingest by a Producer;
  • Archival Information Package the version stored by the system. The Archival information package includes information relating to the preservation including provenance or fixity (checksums, watermarks etc), which again resonate with some of our earlier discussions.
  • Dissemination Information Package, the version delivered to a Consumer.

Again, this is a useful explicit separation (and naming) of the various stages of the RO, and the fact that the mechanism that I use to construct/ingest (e.g. the ROBox prototype) may differ from the mechanism used to deliver that to a consumer (bundle including OAI-ORE and links to resources).

Some questions that this brings up and I'd like us to consider:

  • How do the proposed stereotypes <http://www.wf4ever-project.org/wiki/display/docs/RO+Definition%2C+Properties+and+Stereotypes> relate to this architecture? Are there differences in Dissemination Packages for, e.g. Archived/Exposing/Live etc. Objects?
  • What's the interplay in Wf4Ever between preservation, curation and reuse, and is considering these aspects seperately useful for our requirements? Our key aim is to support the long term preservation of workflows. But that's clearly with the intention of supporting reuse of those objects in the future.

Vocabularies

Structural

Vocabularies that are concerned with describing structure.

OAI-ORE http://www.openarchives.org/ore/

Vocabulary describing aggregation structures. Domain agnostic. Facilities to identify occurrences of resources within an aggregation.

Dublin Core http://dublincore.org/documents/dcmi-terms/

Dublin Core metadata terms provide a collection of terms for describing rights, policies, authorship, ownership, formats etc.

Memento http://www.mementoweb.org/

Memento wants to make it as straightforward to access the Web of the past as it is to access the current Web.

If you know the URI of a Web resource, the technical framework proposed by Memento allows you to see a version of that resource as it existed at some date in the past, by entering that URI in your browser like you always do and by specifying the desired date in a browser plug-in. Or you can actually browse the Web of the past by selecting a date and clicking away. Whatever you land upon will be versions of Web resources as they were around the selected date. Obviously, this will only work if previous versions are available somewhere on the Web. But if they are, and if they are on servers that support the Memento framework, you will get to them.

Liquid Pubs http://liquidpub.org/

The world of scientific publications has been largely oblivious to the advent of the Web and to advances in ICT. Even more surprisingly, this is the case even for research in the ICT area: ICT researchers have been able to exploit the Web to improve the (production) process in almost all areas, but not their own. We are producing scientific knowledge (and publications in particular) essentially following the very same approach we followed before the Web. Scientific knowledge dissemination is still based on the traditional notion of "paper" publication and on peer review as quality assessment method. The current approach encourages authors to write many (possibly incremental) papers to get more "tokens of credit", generating often unnecessary dissemination overhead for themselves and for the community of reviewers. Furthermore, it does not encourage or support reuse and evolution of publications: whenever a (possibly small) progress is made on a certain subject, a new paper is written, reviewed, and published, often after several months. The situation is analogous if not worse for textbooks.

The LiquidPub project proposes a paradigm shift in the way scientific knowledge is created, disseminated, evaluated and maintained. This shift is enabled by the notion of Liquid Publications, which are evolutionary, collaborative, and composable scientific contributions. Many Liquid Publication concepts are based on a parallel between scientific knowledge artifacts and software artifacts, and hence on lessons learned in (agile, collaborative, open source) software development, as well as on lessons learned from Web 2.0 in terms of collaborative evaluation of knowledge artifacts.

SPAR http://purl.org/spar/

The Semantic Publishing and Referencing Ontologies (SPAR) form a suite of orthogonal and complementary ontology modules for creating comprehensive machine-readable RDF metadata for all aspects of semantic publishing and referencing. The component ontologies within SPAR are named in the flower diagram below (Figure 1). The ontologies can be used either individually or in conjunction, as need dictates. Each is encoded in the Web ontology language OWL 2.0 DL. Together, they provide the ability to describe far more than simply bibliographic entities such as books and journal articles, by enabling RDF metadata to be created to relate these entities to reference citations, to bibliographic records, to the component parts of documents, and to various aspects of the scholarly publication process.

All eight SPAR ontologies – FaBiO, CiTO, BiRO, C4O, DoCO, PRO, PSO and PWO – are available for inspection, comment and use. They are useful for describing bibliographic objects, bibliographic records and references, citations, citation counts, citation contexts and their relationships to relevant sections of cited papers, the organization of bibliographic records and references into bibliographies, ordered reference lists and library catalogues, document components, publishing roles, publishing status and publishing workflows.

FRBR http://www.ifla.org/VII/s13/frbr/frbr.pdf

Functional Requirements for Bibliographic Records. FRBR is a conceptual model of the bibliographic universe outlined in a 1998 report from the International Federation of Library Associations and Institutions (IFLA). The report uses entity-relationship analysis to "provide a clearly defined, structured framework for relating the data that are recorded in bibliographic records to the needs of the users of those records." (FRBR Report, p. 7) The most influential parts of the FRBR report are the definitions of user tasks and bibliographic entities.

See also http://techessence.info/frbr []

Tasks:

to find entities that correspond to the user's stated search criteria (i.e., to locate either a single entity or a set of entities in a file or database as the result of a search using an attribute or relationship of the entity);
to identify an entity (i.e., to confirm that the entity described corresponds to the entity sought, or to distinguish between two or more entities with similar characteristics);
to select an entity that is appropriate to the user's needs (i.e., to choose an entity that meets the user's requirements with respect to content, physical format, etc., or to reject an entity as being inappropriate to the user's needs);
to acquire or obtain access to the entity described (i.e., to acquire an entity through purchase, loan, etc., or to access an entity electronically through an online connection to a remote computer).

Conceptual entities:

Work: a distinct intellectual or artistic creation
Expression: the intellectual or artistic realization of a Work
Manifestation: the physical embodiment of an Expression of a Work
Item: a single exemplar of a Manifestation

SWAN http://swan.mindinformatics.org/ontology.html

The SWAN project makes use of the "SWAN ontology" 1. This ontology, which was originally organized in a single block, has been modularized to foster reusability and integration with other existing ontologies in a software ecosystem. Thus, when we refer to the SWAN ontology, we refer to the collection of ontologies used by our software ecosystem to create, manage and share SWAN knowledge bases. In general, an ontology, together with a set of instances of its classes, constitutes a knowledge base.

Includes a number of ontologies representing provenance, collections, discourse relationships etc.

Nanopublications http://www.w3.org/wiki/HCLSIG/SWANSIOC/Nanopublications-Subtask

What is a nanopublication?
Following 1, a nanopublication is constructed using the following elements.
Concepts (unitary elements of knowledge)
Triples (tuples of three concepts)
Named Graph (a set of interconnected Triples)
Statements (originally defined as a triple that is uniquely identifiable, but now extended to be a named graph that is uniquely identifiable)
Annotation (a triple such that the subject of the triple is a statement)
Nanopublication (a set of annotations that refer to the same statement and contains a minimum set of annotations)

All concepts, statements, and nanopublications must be uniquely identifiable.

Annotation Ontology http://code.google.com/p/annotation-ontology/

The Annotation Ontology is a vocabulary for performing several types of annotation - comment, entities annotation (or semantic tags), textual annotation (classic tags), notes, examples, erratum... - on any kind of electronic document (text, images, audio, tables...) and document parts. AO is not providing any domain ontology but it is fostering the reuse of the existing ones for not breaking the principle of scalability of the Semantic Web.

Nepomuk Annotation Ontology http://www.semanticdesktop.org/ontologies/nao/

The annotation ontology provides vocabulary that enables users to attach custom descriptions, identifiers, tags and ratings to resources on their desktop. Via other properties, the user is also able to make generic relationships between related resources explicit. Some relationships between resources are too general to be included at the domain ontology level. Instead, these properties are also defined in the annotation ontology. Given the high-level status of this ontology, these propreties can be used to link any related resources on the user's desktop, as well as provide custom human-readable textual annotations.

Experimental Metadata

EXPO http://expo.sourceforge.net/

EXPO defines over 200 concepts for creating semantic markup about scientific experiments, using the Web Ontology Language OWL.

We propose the ontology EXPO to formalise generic knowledge about scientific experimental design, methodology, and results representation. Such a common ontology is both feasible and desirable because all the sciences follow the same experimental principles. The formal description of experiments for efficient analysis, annotation, and sharing of results is a fundamental objective of science.

OBI http://obi-ontology.org/page/Main_Page

The Ontology for Biomedical Investigations (OBI) project is developing an integrated ontology for the description of life-science and clinical investigations.

The Ontology for Biomedical Investigations (OBI) project is developing an integrated ontology for the description of biological and clinical investigations. This includes a set of 'universal' terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. This ontology will support the consistent annotation of biomedical investigations, regardless of the particular field of study. The ontology will represent the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it. Currently OBI is being built under the Basic Formal Ontology (BFO).

MGED http://mged.sourceforge.net/ontologies/index.php

The primary purpose of the MGED Ontology is to provide standard terms for the annotation of microarray experiments. These terms will enable structure queriesw of elements of the experiments. Furthermore, the terms will also enable unambiguous descriptions of how the experiment was performed. The terms will be provided in the form of an ontology which means that the terms will be organized into classes with properties and will be defined. A standard ontology format will be used. For descriptions of biological material (biomaterial) and certain treatments used in the experiment, terms may come from external resources that are specified in the Ontology. Software programs utilizing the Ontology are expected to generate forms for annotation, populate databases directly, or generate files in the established MAGE-ML format. Thus, the Ontology will be used directly by investigators annotating their microarray experiments as well as by software and database developers and therefore will be developed with these very practical applications in mind.

ISA http://isatab.sourceforge.net/

Investigation/Study/Assay (ISA) infrastructure is the first general-purpose format and freely available desktop software suite targeted to experimentalists, curators and developers and that:

assists in the reporting and local management of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) from studies employing one or a combination of technologies;
empowers users to uptake community-defined minimum information checklists and ontologies, where required;
formats studies for submission to a growing number of international public repositories endorsing the tools, currently ENA (genomics), PRIDE (proteomics) and ArrayExpress (transcriptomics).

JERM http://www.sysmo-db.org/jerm

The JERM (Just Enough Results Model) allows us to exchange, interpret and compare between different types of data and results files across SysMO.

The JERM is being developed to model experiments, data and results and the relationships between these and other assets, such as SOPs, protocols and models

The JERM addresses the questions:

What is the minimum information required to find the data?
What is the minimum amount of information required to interpret the data?
Different types of data will require different JERMs. The minimum information required to describe a microarray experiment, for example, is not the same as the minimum information required to describe a proteomics experiment using NMR. For some experiment types (predominantly in omics), the research community have been working to define "Just Enough" models for publishing data. SysMO-DB leverages these minimum models wherever possible, enabling easy export and publishing of SysMO data to public repositories.

Actors

SIOC http://sioc-project.org/ontology

The SIOC (Semantically-Interlinked Online Communities) Core Ontology provides the main concepts and properties required to describe information from online communities (e.g., message boards, wikis, weblogs, etc.) on the Semantic Web. This document contains a detailed description of the SIOC Core Ontology.

foaf http://xmlns.com/foaf/spec/

FOAF is about your place in the Web, and the Web's place in our world. FOAF is a simple technology that makes it easier to share and use information about people and their activities (eg. photos, calendars, weblogs), to transfer information between Web sites, and to automatically extend, merge and re-use it online.
The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people, the links between them and the things they create and do.

FOAF is a project devoted to linking people and information using the Web. Regardless of whether information is in people's heads, in physical or digital documents, or in the form of factual data, it can be linked. FOAF integrates three kinds of network: social networks of human collaboration, friendship and association; representational networks that describe a simplified view of a cartoon universe in factual terms, and information networks that use Web-based linking to share independently published descriptions of this inter-connected world. FOAF does not compete with socially-oriented Web sites; rather it provides an approach in which different sites can tell different parts of the larger story, and by which users can retain some control over their information in a non-proprietary format.

SWAN/SIOC http://www.w3.org/TR/hcls-swansioc/

This notes describes the alignment between the SWAN - Semantic Web Applications in Neuromedicine - and SIOC - Semantically-Interlinked Online Communities - ontologies, providing a complete model to represent Scientific Discourse in online communities at different levels of granularity (discourse elements and content items). The goal of this alignment is to make the discourse structure and component relationships much more accessible to computation, so that information can be navigated, compared and understood in context far better that at present, across and within domains.

myExperiment Ontology http://rdf.myexperiment.org/ontologies/

myExperiment is a collaborative environment where scientists can safely publish their workflows and experiment plans, share them with groups and find those of others. (Please see the myExperiment Wiki for more detailed information). This results in the myExperiment data model having three main underlying features:

Content Management
Social Networking
Object Annotation

VIVOWeb http://vivoweb.org/ontology/core

The National Network enables the discovery of researchers across institutions. Participants in the network include institutions with local installations of VIVO or those with research discovery and profiling applications that can provide semantic web!-compliant data. The information accessible through VIVO's search and browse capability will reside and be controlled locally, within institutional VIVOs or other semantic web-compliant applications.

VIVO is an open source semantic web application originally developed and implemented at Cornell. When installed and populated with researcher interests, activities, and accomplishments, it enables the discovery of research and scholarship across disciplines at that institution and beyond. VIVO supports browsing and a search function which returns faceted results for rapid retrieval of desired information. Content in any local VIVO installation may be maintained manually, brought into VIVO in automated ways from local systems of record, such as HR, grants, course, and faculty activity databases, or from database providers such as publication aggregators and funding agencies.

Provenance

OPM http://openprovenance.org/

The Open Provenance Model OPM is the result of the Provenance Challenge series that was initiated in May 2006, at the first IPAW workshop. OPM was originally crafted in a meeting held in Salt Lake City in August 2007. OPM v1.00 was released to the community in December 2007. The first OPM workshop in June 2008 involved some twenty participants discussing issues related to this specification, and led to a revised specification, referred to as OPM v1.01. From the outset, the original authors' intent has been to define a data model that is open from an inter-operability viewpoint but also with respect to the community of its contributors, reviewers and users. To ensure that these principles are adhered to, an "open source like" governance model for OPM was adopted in June 2009, which led to the development of OPM v1.1, the most recent version of the model, which went under a public revision process.

Janus http://dx.doi.org/10.1109/MIC.2011.7A

Enhanced Semantic Provenance

  • No labels