This concept has been defined in the context of the Driver-II project, which aims to investigate the ways in which the availability of research data can be used to enhance the traditional academic publication. Even though the core of this type of compound object is an electronic publication, the general concept shares similarities with the Research Objects we are dealing with. In this page, we summarize the enhanced publication characteristics, including the requirements identified in D4.2 Report on Object Models and Functionalities that those objects should comply with. Some of those requirements may be useful for the characterization of ROs. Some of them may be also useful for different WPs. For instance, the discussion regarding versioning of these compound objects may be interesting in the context of WP3.
Enhanced publications can be defined as compound digital objects which combine ePrints with one or more metadata records, one or more research data objects, or any combination of these. The term ePrint is used to refer to an electronic version of an academic research paper (as proposed by Foulonneau and André, e.g., dissertations, journal articles, working papers, book chapters or reports. Similarly, the term research data may refer to any of the following types of objects:
- Data collections containing, e.g., the results of experiments, measurements performed by technical instruments or the results of surveys
- Data visualisations, e.g., graphs, diagrams, tables, or 3D models
- Machine readable chemical structures
- Multimedia files, e.g., images, video files or audio recordings
- Mathematical formulae, e.g., expressed in MathXML, or algorithms
- Text documents that form part of a corpus created for research purposes
- Software, which may be provided as source code, or implemented as web services
- Commentaries and annotations made by agents who have consulted digital objects
- Specifications of instruments or other hardware
- Digital certificates for research instruments
Requirements and recommendations for enhanced publications (EP)
Although, the core of an enhanced publication is an ePrint, whereas for a Research Objects is a scientific workflow, many of the following requirements for enhanced publications can be applied or at least be useful for the identification of requirements for ROs.
- Wrt Specification of the structure of enhanced publications,
o It must be possible at any moment to specify the component parts of an enhanced publication: Furthermore, it should be possible references not only to resources in their entirety, but, under certain conditions, also to specific locations within these resources, e.g., a specific table or even to a specific record or group of records within a database.
o Both the enhanced publication and its components must be available as web resources that can be referenced via URIs: It is advised to separate the URI of a resource from its location, especially as components of an enhanced publication do not necessarily have to be stored in a single repository. They may be distributed over different network locations.
- Wrt Compound Object properties
o It must be possible to add compound digital objects to the publication: Enhanced publications can be highly complex and multi-tiered objects, e.g., one enhanced publication may also wholly aggregate a second enhanced publication.
- Wrt versioning
o It must be possible to keep track of the different versions of both the enhanced publication as a whole, and of its constituent parts: Enhanced publications are potentially very dynamic resources, which may result in invalidating applications that were based on them. Hence, it is important to ensure that agents who make use of an enhanced publication can refer to specific versions of the compound object. The versioning issue is also important on the level of individual components, i.e., ePrints and research data can be very dynamic. A version is defined as "a digital object (in whatever format) that exists in time and place and has a context within a larger body of work". Versions can be identified by recording the date of the last modification, a version identification, or a textual description of the version.
- Wrt to the basic properties
o It must be possible to record basic properties of the publication, and of the resources that are added to it: these properties should be described using a standardised and controlled vocabulary as much as possible. For instance, each component should be typed semantically to make it clear what kind of resource is being referred to; ePrints can have a title; for atomic or compound datasets, a brief description may be given; for enhanced publications as a whole, it will be useful to record the date of the last modification; the technical format of the resource, e.g., using the IANA registered list of Internet Media Types; the MIME type can also be specified for metadata
o It must be possible to record the authorship of the enhanced publication and that of its component parts: especially as e-science projects are increasingly collaborative and interdisciplinary processes. Capturing the provenance may also help clients to establish the trustworthiness of the resource
- Wrt to long term preservation
o It must be possible to secure the long-term preservation of enhanced publications: it must be possible to harvest the document that serialises the enhanced publication from local repositories and to ingest it into digital archiving systems. Also it must be possible to harvest representations of the web resources that are referred to in the scientific package. Here it is especially important to capture versioning information that will allow to decide to preserve one specific version of the enhanced publication, instead of having to wait until the entire enhanced publication is complete
- Wrt relations
o It must be possible to record the relations between the web resources that are part of an enhanced publication: Relations between the various component parts need to be described and classified using a standard and generic vocabulary, as much as possible. They clarify the reasons why these resources were added to the collection. Typical relations are: Containment relations, sequential relations, versioning information, lineage relations, manifestations, bibliographic citations. Relations can be unidirectional and bidirectional.
- Wrt discovery
o Institutions that offer access to enhanced publications must make sure that they can be discovered, e.g., by web crawlers, citation analysis tools, harvesters and data mining applications. Processes of locating, retrieving and promulgating enhanced publications can be based on a wide range of techniques, including site maps, syndication or OAI-PMH.
- Wrt OAI-ORE
o Institutions that provide access to enhanced publications must ensure that these are available as documents based on the OAI-ORE model. This does not imply any subsequent prescriptions for the internal storage of the resource. Data may be stored, e.g., in relational databases, or in other XML formats such as MPEG-DIDL or METS
They also define an abstract data model that can represent these requirements. This model considers five main entities (ePrints, data objects, metadata, compound datasets and enhanced publications) and their key properties. It also considers the possibility to keep track of the different versions of both the enhanced publication as a whole, and of its constituent parts, to capture the provenance of the enhanced publication and of the various resources that it combines and to describe relations between resources.
They also propose a number of vocabularies that can be used to make enhanced publications semantically interoperable,e i.e., using standardised and controlled vocabularies as much as possible. For instance, the DCMI Type Vocabulary provides a number of terms that may be used to describe the semantic type; Containments relations, relations between different versions, digital manifestations, bibliographic references and usage rights can be stated explicitly by making use the Dublin Core Metadata Initiative. To describe lineage relations, the ABC may be used.
Finally, they explain how the OAI-ORE data model can be applied to exchange information about enhanced publications. The OAI-ORE vocabulary can be used in RDF statements to specify that a collection of URI-identified resources together form a compound object. Also, the OAI-ORE documentation contains guidelines for serialisations of the model in RDF/XML and in ATOM.
Scientific Publication Packages
This concept has been defined by Hunter, to provide method: for encapsulating expert knowledge; for publishing and sharing scientific process and results; for teaching complex scientific concepts; and for the selective archival, curation and preservation of scientific data and output
Scientific publication packages (SPP) are defined as compound digital objects that encapsulate and relate the raw data to its derived products, publications and the associated contextual, provenance and administrative metadata. A variety of heterogeneous components can be contained or referenced in the SPP, such as:
- Pre-existing data, models, hypotheses or publications
- Large datasets generated from experiments, observations and instruments
- Experimental and instrumental conditions, settings and parametric ranges or constraints
- Assumptions made and criteria applied
- Formulas, rules, hypotheses, numerical models, mathematical functions
- Conceptual models
- Software tools and services
- Hardware specifications
Conceptual model and representation
The author proposes to use ABC ontology to describe scientific models. It is extended in order to capture the provenance or lineage of scientific output. It also uses ABC as a top-level ontology for defining the classes and properties associated with scientific outputs and their components.
The descriptive metadata for the SPP is envisaged to be based on the extensible CCLRC Scientific metadata model, which includes Identifier, Title, Research focus/Topic, Study, Model type (drawn from a hierarchical thesaurus), Creator/Investigator – name and contact details, organization etc., Date Created, Date Published
RDF is used for representing scientific model packages and for recording the relationships between the components. RDF instance data provides XML-based descriptions of both the complete set of components (uniquely identified via URIs) within a scientific model package as well as the lineage (e.g., derivation, temporal, spatial, containment) and semantic relationships between these components.
For the preservation of SPP, the system PANIC is used. PANIC comprises three main software components:
- Preservation Metadata Capture. It enables the generation of preservation metadata for either atomic or composite mixed-media digital objects
- Obsolescence Detection and Notification, which periodically compares each object's/sub-object's preservation metadata with software and format registries (e.g., PRONOM) which store information about the latest available authoring, rendering or viewing software and recommended formats
- Preservation Service Discovery and Invocation: Discovers the most appropriate preservation service by matching the specified attributes against descriptions of available preservation services. Then select (and possibly compose) and invoke the most appropriate preservation services for a sub-object and update the provenance metadata.
The system considers the accessibility and preservation of each of the atomic sub-objects, prior to monitoring and processing the composite object
1. D4.2 Report on Object Models and Functionalities. DRIVER II deliverable. Available online at http://wiki.surffoundation.nl/display/standards/Objectmodel+Enhanced+Publications
2. Hunter, J., "Scientific Publication Packages – A Selective Approach to the Communication and Archival of Scientific Output", International Journal of Digital Curation 1(1), 2006. http://www.ijdc.net/index.php/ijdc/article/viewFile/8/4
3. Foulonneau, Muriel and Francis André (2008). Investigative Study of Standards for Digital Repositories and Related Services, Amsterdam: Amsterdam University Press