Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 46 Next »

Warning

Icon

This document is outdated, it presents the state of the art in M6 of the project.

Sandbox

http://sandbox.wf4ever-project.org/rosrs5

http://sandbox.wf4ever-project.org/portal

Research Objects Digital Library

Introduction

This document contains a specification of the services offered by the Research Object Digital Library. The main one is Research Object Storage and Retrieval Service (ROSRS). The main functionality of the service is to allow its users to store and retrieve Research Objects. Other services include the User Management Service.

Implementation

The RODL is built on top of dLibra (see figure 3). dLibra provides file stor- age and retrieval functionalities, including file versioning and consistency checking. It has a built-in text search engine, fed by its own flexible metadata system, and it manages users and controls their access rights. Besides, dLibra allows organising stored objects into hierarchical structures and associating metadata at the level of object aggregations.
A dedicated Semantic Metadata Service has been included in RODL to give sup- port to the Research Object model. This service manages an RDF triplestore and allows
storage and retrieval of any type of RO metadata, in particular structured semantic an- notations, classes of resources and relations between them. The metadata can refer to any object identifiable by an URI, which apart from workflows and other resources, may include parts of workflows or external resources (e.g., web services, data sources).
The RODL is built on top of dLibra (see figure 3). dLibra provides file stor- age and retrieval functionalities, including file versioning and consistency checking. It has a built-in text search engine, fed by its own flexible metadata system, and it manages users and controls their access rights. Besides, dLibra allows organising stored objects into hierarchical structures and associating metadata at the level of object aggregations.

A dedicated Semantic Metadata Service has been included in RODL to give sup- port to the Research Object model. This service manages an RDF triplestore and allows storage and retrieval of any type of RO metadata, in particular structured semantic an- notations, classes of resources and relations between them. The metadata can refer to any object identifiable by an URI, which apart from workflows and other resources, may include parts of workflows or external resources (e.g., web services, data sources).

RODL offers preservation services for workflows. These services take into account the decay of workflows due to changes in the external resources on which they depend – data sources or web services can disappear, malfunction or change their interface. A number of services, such as research object completeness and stability evaluation services, have already been implemented.

RODL services expose their functionality by means of a REST API. The API is accessed by software clients, which include applications that facilitate registering users and managing their access rights or support browsing the contents of RODL and connect it with other services. In particular, RODL is being used to extend the workflow preser- vation capabilities of myExperiment, where users can export their content to RODL as Research Objects. An interface for RODL in myExperiment is being developed so that the users can navigate through their Research Objects preserved in RODL and take advantage of its functionalities.

REST interface specifications

Unable to render {children}. Page not found: docs:REST Interface specifications.

Additional technical information

Handling of Modifications in manifest.rdf File

Users are expected to modify only the tokens corresponding to the Research Object metadata. List of resources is automatically regenerated every time any other file in the Research Object directory is created, modified or removed. Therefore any changes to the resource list by users will be ignored.

Handling Remote Resources

One may want to manually modify the resource list in order to link to a remote resource, that cannot be put directly in the Research Object directory. As stated earlier, this is not a valid approach. The correct way to do this is to create an internet shortcut (file with .url extension in Windows, we need to check how it is handled in other systems) pointing to the remote resource. The shortcut should be placed directly in the Research Object directory. Such link will be automatically parsed by RO SRS and the URI will be placed in the resource list in the manifest.rdf file.

Another, possibly less user friendly option is to include the URIs of remote resources in descriptive metadata for example in the dcterms:hasPart element. But this value will not be transformed to structural metadata, it will stay in descriptive metadata.

Discussion about Conflicts

If two or more users edit contents of the same file at the same time, it leads to conflicts. In DropBox, this results in creation of conflicted copies - for each conflict a new file is added, with information about author and current date appended to the file name. As we have no means to resolve such conflicts, the conflicted copies will not be treated in any special way. They will be added to the RO SRS simply as separate files. As users resolve conflicts manually, it will result in removal of conflicted copies and subsequent removal of corresponding files in the prototype.

Conflicts on the manifest.rdf file require special consideration, as this file is used to keep track of changes in metadata and resources. Furthermore, the conflict may be triggered by Connector itself as it updates the file contents after every change in resources. We decided not to address this issue at the moment though, in order to keep the first prototype design simple. We assume that manifest.rdf is relatively small, so DropBox will manage to synchronize it quickly between users. Also, in normal use manifest.rdf is rarely modified by users (only when metadata changes). As a result, the risk of conflicts occurring for this file is minimal.

Mapping between RO structure and internal dLibra data model

General structure

Research Object

dLibra

Research Object

Group publication

Version of Research Object

Publication with single edition which content is modified every
time the data or metadata of the Research Object changes

Resource file

File
For each modification of the file a new File Version is created. The deletion of files will be handled by excluding them from the "edition" in dLibra.

dLibra mapping

manifest.rdf

RO Information

dLibra mapping

<dcterms:description>

a simple textual description of the RO

edition attribute: Description

<dcterms:title>

RO title

edition attribute: Title

<dcterms:creator>

RO creator

edition attribute: Creator

<dcterms:identifier>

RO name

group publication name

<dcterms:source
rdf:resource="http://some/url"/>

reference to sources such as other ROs

edition attribute: Source

<rdf:type>

RO schema element

none

<dcterms:created>

date the RO version was created in dLibra

edition attribute: Created

<dcterms:modified>

date the RO version was last modified in dLibra

edition attribute: Modified

<ore:aggregates
rdf:resource="http://some/url"/>

enumeration of the URIs of the resources
in RO

assignment of file versions to edition

<dcterms:hasVersion
rdf:resource="http://some/url"/>

other RO versions

all publications in given group publication

<oxds:currentVersion>

RO version

publication name

Any other descriptive metadata (tags from any vocabulary) can be stored in the dLibra as long as it does not contain nested tags.

Technical implementation details

The service is a servlet based application built using the Jersey framework.

Jena framework is used to handle RDF files, and additional XML transformations are performed using Xalan library.

Deployment instructions

The source code is available at https://github.com/wf4ever/prototype1-dlibra.

dLibra server location and directory used for storing workspaces should be configured in src\main\resources\connection.properties file. A dLibra instance used for demonstration purposes is available at host sandbox.wf4ever-project.org (port number 10051 and directory 3, as originally configured).

That's all, now the project can be built (mvn package) and deployed (rosrs5 servlet).

TBD:

  • authorization and access control
  • output format of list of links to research objects
  • MD5 checksum and the last modification time will be stored as additional attributes of the tag (details TBD).
  • Switching to older versions

/spantd class='confluenceTd'lip user="true" style="display:none"/gt;

  • No labels