Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 45 Next »

Warning

Icon

This document is outdated, it presents the state of the art in M6 of the project.

Sandbox

http://sandbox.wf4ever-project.org/portal

Research Objects Store and Retrieve Service (RO SRS) - Initial specification

Introduction

This document contains initial specification of the RO SRS. The main functionality of the service will be to allow its users to store and retrieve Research Objects.

The RO SRS will be accessible via REST interface. It will be implemented on top of a digital library system. The dLibra system was chosen for prototype implementation.

The RO SRS will be a part of the first prototype developed in the Wf4Ever project. The aim of this prototype is to handle automated publication and synchronization of a Research Objects between a DropBox folder and the dLibra system. It will be achieved with an intermediate component, called DropBox Connector, acting as a client of DropBox service and RO SRS. DropBox Connector observes changes in DropBox directory dedicated for shared Resource Objects and requests relevant operations in the RO SRS.

The structure of the Research Object that we are using in this prototype is based on the ADMIRAL data package.

The specifications consists of the following sections:

  • The description of the Research Object structure (based on ADMIRAL Data Package)
  • Use case scenario of interaction between end user (using DropBox service with a special intermediate connector) and RO SRS
  • REST interface specification
  • Description of interaction between prototype components in the example scenario
  • Additional technical information
  • Mapping between RO structure and internal dLibra data model

The description of the Research Object structure

From users' point of view, in this prototype Research Object is a directory in a file system. All files included in the directory or any of its subdirectories are considered to be the Resources of the Research Object, except for the file named manifest.rdf (placed directly in the RO directory). The manifest.rdf is an RDF/XML file containing metadata associated with the Research Object as well as a list of all its resources. This file contains:

  • descriptive metadata (it can be edited by the user):
    • in the dcterms namespace;
    • in the oxds namespace (ADMIRAL specific metadata);
      • oxds:currentVersion tag contains the identifier of the current version of the RO
  • structural metadata in the ore namespace, should be generated automatically by the RO SRS on the basis of the contents of the RO.

As an addition to the definitions of ADMIRAL Data Package, we define the following assumptions:

  • If several versions of a RO have been defined, information about all the versions will be automatically listed in manifest.rdf (in the dcterms:hasVersion tag).
  • For each local file listed in the ore:aggregates tag, MD5 checksum and the last modification time will be stored as additional attributes of the tag (details TBD).

Example

The following example shows the structure of the manifest.rdf file:

Issues to be discussed:

  • What is the meaning of <oxds:isEmbargoed> and <oxds:embargoedUntil>? Should they be included in manifest.rdf?
  • What should be the value of rdf:type?
  • Should the rdf:about contain version number?

Metadata

manifest.rdf

ADMIRAL Data Package
Information

RO Information

Source

Mandatory/Optional

<dcterms:description>

a simple textual description of the dataset

a simple textual description of the RO

ROBox

M

<dcterms:title>

a one-line title of the dataset

RO title

ROBox

M

<dcterms:creator>

username of creator

RO creator

ROBox

M

<dcterms:identifier>

a dataset local identifier

RO name

ROBox

M

<dcterms:source
rdf:resource="http://some/url"/>

reference to any resource from which the data
package has been derived

reference to sources such as other ROs

ROBox

O

<rdf:type>

?

RO schema element

ROBox

O

<dcterms:created>

date the package was created (submitted)

date the RO version was created in dLibra

dLibra

M

<dcterms:modified>

date the package was last modified

date the RO version was last modified in dLibra

dLibra

M

<ore:aggregates
rdf:resource="http://some/url"/>

enumeration of the URIs of the resources
in the data package

enumeration of the URIs of the resources
in RO

dLibra

O

<dcterms:hasVersion
rdf:resource="http://some/url"/>

 

other RO versions

dLibra

O

<oxds:currentVersion>

a version number for the data package

RO version

dLibra/ROBox

M

<oxds:isEmbargoed>,
<oxds:embargoedUntil>

embargo status and date

?

dLibra

O

Use case scenario

Actors

  • Users: Researchers collaborating on a Research Object. In this scenario they interact with the prototype using only DropBox interface and a shared directory in it.
    • Creator: A special user who has created particular Research Object
  • Prototype: The system under design. It interacts with the Users as another DropBox user, having access to their shared directory.

Main scenario

  1. Creator creates a new directory for the Research Object inside the shared DropBox directory.
  2. Prototype creates manifest.rdf file inside the directory, with a basic structure having empty description fields.
  3. While Users make changes in the Research Object - if one of the Users:
    1. Modifies descriptive metadata in the manifest.rdf file
      1. If Research Object Version is modified (the oxds:currentVersion tag):
        • If provided Research Object Version has not been used before, Prototype creates a new version of the RO - in the Prototype all files and metadata from the previous version are copied and become a new version of the RO. In the shared DropBox directory the only updated file is the manifest.rdf, because of the versioning information stored in it.
        • If provided Research Object Version has been used before, Prototype restores the directory's state to the given Research Object Version (TBD, input from Jits required).
      2. Prototype updates metadata of the Research Object Version stored in the RO SRS, according to changes in manifest.rdf
    2. Adds some Research Object related file to the directory
      1. Prototype stores the contents of a new file.
      2. Prototype updates the manifest.rdf file by adding a corresponding resource element.
    3. Modifies some Research Object related files in the directory  
      1. Prototype stores the update contents of the file.
      2. Prototype updates the manifest.rdf file by updating the attributes of corresponding resource element.
    4. Removes some Research Object related files in the directory  
      1. Prototype deletes the file from the internal storage.
      2. Prototype updates the manifest.rdf file by removing the corresponding resource element.

REST interface specification

Unable to render {children}. Page not found: docs:REST Interface specifications.

Additional technical information

Handling of Modifications in manifest.rdf File

Users are expected to modify only the tokens corresponding to the Research Object metadata. List of resources is automatically regenerated every time any other file in the Research Object directory is created, modified or removed. Therefore any changes to the resource list by users will be ignored.

Handling Remote Resources

One may want to manually modify the resource list in order to link to a remote resource, that cannot be put directly in the Research Object directory. As stated earlier, this is not a valid approach. The correct way to do this is to create an internet shortcut (file with .url extension in Windows, we need to check how it is handled in other systems) pointing to the remote resource. The shortcut should be placed directly in the Research Object directory. Such link will be automatically parsed by RO SRS and the URI will be placed in the resource list in the manifest.rdf file.

Another, possibly less user friendly option is to include the URIs of remote resources in descriptive metadata for example in the dcterms:hasPart element. But this value will not be transformed to structural metadata, it will stay in descriptive metadata.

Discussion about Conflicts

If two or more users edit contents of the same file at the same time, it leads to conflicts. In DropBox, this results in creation of conflicted copies - for each conflict a new file is added, with information about author and current date appended to the file name. As we have no means to resolve such conflicts, the conflicted copies will not be treated in any special way. They will be added to the RO SRS simply as separate files. As users resolve conflicts manually, it will result in removal of conflicted copies and subsequent removal of corresponding files in the prototype.

Conflicts on the manifest.rdf file require special consideration, as this file is used to keep track of changes in metadata and resources. Furthermore, the conflict may be triggered by Connector itself as it updates the file contents after every change in resources. We decided not to address this issue at the moment though, in order to keep the first prototype design simple. We assume that manifest.rdf is relatively small, so DropBox will manage to synchronize it quickly between users. Also, in normal use manifest.rdf is rarely modified by users (only when metadata changes). As a result, the risk of conflicts occurring for this file is minimal.

Mapping between RO structure and internal dLibra data model

General structure

Research Object

dLibra

Research Object

Group publication

Version of Research Object

Publication with single edition which content is modified every
time the data or metadata of the Research Object changes

Resource file

File
For each modification of the file a new File Version is created. The deletion of files will be handled by excluding them from the "edition" in dLibra.

dLibra mapping

manifest.rdf

RO Information

dLibra mapping

<dcterms:description>

a simple textual description of the RO

edition attribute: Description

<dcterms:title>

RO title

edition attribute: Title

<dcterms:creator>

RO creator

edition attribute: Creator

<dcterms:identifier>

RO name

group publication name

<dcterms:source
rdf:resource="http://some/url"/>

reference to sources such as other ROs

edition attribute: Source

<rdf:type>

RO schema element

none

<dcterms:created>

date the RO version was created in dLibra

edition attribute: Created

<dcterms:modified>

date the RO version was last modified in dLibra

edition attribute: Modified

<ore:aggregates
rdf:resource="http://some/url"/>

enumeration of the URIs of the resources
in RO

assignment of file versions to edition

<dcterms:hasVersion
rdf:resource="http://some/url"/>

other RO versions

all publications in given group publication

<oxds:currentVersion>

RO version

publication name

Any other descriptive metadata (tags from any vocabulary) can be stored in the dLibra as long as it does not contain nested tags.

Technical implementation details

The service is a servlet based application built using the Jersey framework.

Jena framework is used to handle RDF files, and additional XML transformations are performed using Xalan library.

Deployment instructions

The source code is available at https://github.com/wf4ever/prototype1-dlibra.

dLibra server location and directory used for storing workspaces should be configured in src\main\resources\connection.properties file. A dLibra instance used for demonstration purposes is available at host sandbox.wf4ever-project.org (port number 10051 and directory 3, as originally configured).

That's all, now the project can be built (mvn package) and deployed (rosrs2 servlet).

TBD:

  • authorization and access control
  • output format of list of links to research objects
  • MD5 checksum and the last modification time will be stored as additional attributes of the tag (details TBD).
  • Switching to older versions

/spantd class='confluenceTd'lip user="true" style="display:none"/gt;

  • No labels