This concept has been defined in the context of the Driver-II project, which aims to investigate the ways in which the availability of research data can be used to enhance the traditional academic publication. Even though the core of this type of compound object is an electronic publication, the general concept shares similarities with the Research Objects we are dealing with. In this page, we summarize the enhanced publication characteristics, including the requirements identified in D4.2 Report on Object Models and Functionalities that those objects should comply with. Some of those requirements may be useful for the characterization of ROs. Some of them may be also useful for different WPs. For instance, the discussion regarding versioning of these compound objects may be interesting in the context of WP3.
Enhanced publications can be defined as compound digital objects which combine ePrints with one or more metadata records, one or more research data objects, or any combination of these. The term ePrint is used to refer to an electronic version of an academic research paper , e.g., dissertations, journal articles, working papers, book chapters or reports. Similarly, the term research data may refer to any of the following types of objects:
- Data collections containing, e.g., the results of experiments, measurements performed by technical instruments or the results of surveys
- Data visualisations, e.g., graphs, diagrams, tables, or 3D models
- Machine readable chemical structures
- Multimedia files, e.g., images, video files or audio recordings
- Mathematical formulae, e.g., expressed in MathXML, or algorithms
- Text documents that form part of a corpus created for research purposes
- Software, which may be provided as source code, or implemented as web services
- Commentaries and annotations made by agents who have consulted digital objects
- Specifications of instruments or other hardware
- Digital certificates for research instruments
Requirements and recommendations for enhanced publications (EP)
Although, the core of an enhanced publication is an ePrint, whereas for a Research Objects is a scientific workflow, many of the following requirements for enhanced publications can be applied or at least be useful for the identification of requirements for ROs.
- Wrt Specification of the structure of enhanced publications,
o It must be possible at any moment to specify the component parts of an enhanced publication: Furthermore, it should be possible references not only to resources in their entirety, but, under certain conditions, also to specific locations within these resources, e.g., a specific table or even to a specific record or group of records within a database.
o Both the enhanced publication and its components must be available as web resources that can be referenced via URIs: It is advised to separate the URI of a resource from its location, especially as components of an enhanced publication do not necessarily have to be stored in a single repository. They may be distributed over different network locations.
- Wrt Compound Object properties
o It must be possible to add compound digital objects to the publication: Enhanced publications can be highly complex and multi-tiered objects, e.g., one enhanced publication may also wholly aggregate a second enhanced publication.
- Wrt versioning
o It must be possible to keep track of the different versions of both the enhanced publication as a whole, and of its constituent parts: Enhanced publications are potentially very dynamic resources, which may result in invalidating applications that were based on them. Hence, it is important to ensure that agents who make use of an enhanced publication can refer to specific versions of the compound object. The versioning issue is also important on the level of individual components, i.e., ePrints and research data can be very dynamic. A version is defined as "a digital object (in whatever format) that exists in time and place and has a context within a larger body of work". Versions can be identified by recording the date of the last modification, a version identification, or a textual description of the version.
- Wrt to the basic properties
o It must be possible to record basic properties of the publication, and of the resources that are added to it: these properties should be described using a standardised and controlled vocabulary as much as possible. For instance, each component should be typed semantically to make it clear what kind of resource is being referred to; ePrints can have a title; for atomic or compound datasets, a brief description may be given; for enhanced publications as a whole, it will be useful to record the date of the last modification; the technical format of the resource, e.g., using the IANA registered list of Internet Media Types; the MIME type can also be specified for metadata
o It must be possible to record the authorship of the enhanced publication and that of its component parts: especially as e-science projects are increasingly collaborative and interdisciplinary processes. Capturing the provenance may also help clients to establish the trustworthiness of the resource
- Wrt to long term preservation
o It must be possible to secure the long-term preservation of enhanced publications: it must be possible to harvest the document that serialises the enhanced publication from local repositories and to ingest it into digital archiving systems. Also it must be possible to harvest representations of the web resources that are referred to in the scientific package. Here it is especially important to capture versioning information that will allow to decide to preserve one specific version of the enhanced publication, instead of having to wait until the entire enhanced publication is complete
- Wrt relations
o It must be possible to record the relations between the web resources that are part of an enhanced publication: Relations between the various component parts need to be described and classified using a standard and generic vocabulary, as much as possible. They clarify the reasons why these resources were added to the collection. Typical relations are: Containment relations, sequential relations, versioning information, lineage relations, manifestations, bibliographic citations. Relations can be unidirectional and bidirectional.
- Wrt discovery
o Institutions that offer access to enhanced publications must make sure that they can be discovered, e.g., by web crawlers, citation analysis tools, harvesters and data mining applications. Processes of locating, retrieving and promulgating enhanced publications can be based on a wide range of techniques, including site maps, syndication or OAI-PMH.
- Wrt OAI-ORE
o Institutions that provide access to enhanced publications must ensure that these are available as documents based on the OAI-ORE model. This does not imply any subsequent prescriptions for the internal storage of the resource. Data may be stored, e.g., in relational databases, or in other XML formats such as MPEG-DIDL or METS
They also define an abstract data model that can represent these requirements. This model considers five main entities (ePrints, data objects, metadata, compound datasets and enhanced publications) and their key properties. It also considers the possibility to keep track of the different versions of both the enhanced publication as a whole, and of its constituent parts, to capture the provenance of the enhanced publication and of the various resources that it combines and to describe relations between resources.
They also propose a number of vocabularies that can be used to make enhanced publications semantically interoperable,e i.e., using standardised and controlled vocabularies as much as possible. For instance, the DCMI Type Vocabulary provides a number of terms that may be used to describe the semantic type; Containments relations, relations between different versions, digital manifestations, bibliographic references and usage rights can be stated explicitly by making use the Dublin Core Metadata Initiative. To describe lineage relations, the ABC may be used.
Finally, they explain how the OAI-ORE data model can be applied to exchange information about enhanced publications. The OAI-ORE vocabulary can be used in RDF statements to specify that a collection of URI-identified resources together form a compound object. Also, the OAI-ORE documentation contains guidelines for serialisations of the model in RDF/XML and in ATOM.
1. D4.2 Report on Object Models and Functionalities. DRIVER II deliverable. Available online at http://wiki.surffoundation.nl/display/standards/Objectmodel+Enhanced+Publications