Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

Research Objects Store and Retrieve Service (RO SRS) - Initial specification

Introduction

This document contains initial specification of the RO SRS. The main functionality of the service will be to allow its users to store and retrieve Research Objects.

The RO SRS will be accessible via REST interface. It will be implemented on top of a digital library system. The dLibra system was chosen for prototype implementation.

The RO SRS will be a part of the first prototype developed in the Wf4Ever project. The aim of this prototype is to handle automated publication and synchronization of a Research Objects between a DropBox folder and the dLibra system. It will be achieved with an intermediate component, called DropBox Connector, acting as a client of DropBox service and RO SRS. DropBox Connector observes changes in DropBox directory dedicated for shared Resource Objects and requests relevant operations in the RO SRS.

The structure of the Research Object that we are using in this prototype is based on the ADMIRAL data package.

The specifications consists of the following sections:

  • The description of the Research Object structure (based on ADMIRAL Data Package)
  • Use case scenario of interaction between end user (using DropBox service with a special intermediate connector) and RO SRS
  • REST interface specification
  • Description of interaction between prototype components in the example scenario
  • Additional technical information
  • Mapping between RO structure and internal dLibra data model

The description of the Research Object structure

From users' point of view, in this prototype Research Object is a directory in a file system. All files included in the directory or any of its subdirectories are considered to be the Resources of the Research Object, except for the file named manifest.rdf (placed directly in the RO directory). The manifest.rdf is an xml file containing metadata associated with the Research Object as well as a list of all resources. This file contains:

  • descriptive metadata (it can be edited by the user):
    • in the dcterms namespace;
    • in the oxds namespace (ADMIRAL specific metadata);
      • oxds:currentVersion tag contains the identifier of the current version of the RO
  • structural metadata in the ore namespace, should be generated automatically by the RO SRS on the basis of the contents of the RO.

As an addition to the definitions of ADMIRAL Data Package, we define the following assumptions:

  • If several versions of a RO have been defined, information about all the versions will be automatically listed in manifest.rdf (in the dcterms:hasVersion tag).
  • For each local file listed in the ore:aggregates tag, MD5 checksum and the last modification time will be stored as additional attributes of the tag (details TBD).

The following example shows the structure of the manifest.rdf file:

Use case scenario

Actors

  • Users: Researchers collaborating on a Research Object. In this scenario they interact with the prototype using only DropBox interface and a shared directory in it.
    • Creator: A special user who has created particular Research Object
  • Prototype: The system under design. It interacts with the Users as another DropBox user, having access to their shared directory.

Main scenario

  1. Creator creates a new directory for the Research Object inside the shared DropBox directory.
  2. Prototype creates manifest.rdf file inside the directory, with a basic structure having empty description fields.
  3. While Users make changes in the Research Object - if one of the Users:
    1. Modifies descriptive metadata in the manifest.rdf file
      1. If Research Object Version is modified (the oxds:currentVersion tag):
        • If provided Research Object Version has not been used before, Prototype creates a new version of the RO - in the Prototype all files and metadata from the previous version are copied and become a new version of the RO. In the shared DropBox directory the only updated file is the manifest.rdf, because of the versioning information stored in it.
        • If provided Research Object Version has been used before, Prototype restores the directory's state to the given Research Object Version (TBD, input from Jits required).
      2. Prototype updates metadata of the Research Object Version stored in the RO SRS, according to changes in manifest.rdf
    2. Adds some Research Object related file to the directory
      1. Prototype stores the contents of a new file.
      2. Prototype updates the manifest.rdf file by adding a corresponding resource element.
    3. Modify some Research Object related files in the directory
      1. Prototype stores the update contents of the file.
      2. Prototype updates the manifest.rdf file by updating the attributes of corresponding resource element.
    4. Remove some Research Object related files in the directory
      1. Prototype marks the file as deleted (but does not actually delete the content from internal storage).
      2. Prototype updates the manifest.rdf file by removing the corresponding resource element.

REST interface specification

Vocabulary:

  • BASE_URI - base URI of service, for example http://example.org/wf4ever/
  • RO_ID - RO identifier - assigned automatically by the service
  • RO_VERSION_ID - identifier of version of RO - defined by the user

Please note:

  • Error codes 401 (Not Authorized), 404 (Not Found) and 500 (Internal Server Error) can be returned as a response to any request described below. As their meaning is obvious, they are omitted in the interface description below.

Interface:

BASE_URI/research_objects

  • GET - returns list of links to research objects. Output format is TBD.
  • PUT - not allowed
  • POST - this method is used to request an URI of a new RO and should be called after user creates a new RO directory in his workspace.
    • input: optionally - name of the RO (for example name of the directory for the RO created in the shared DropBox folder) this will be the default value of the dcterms:identifier element of the descriptive metadata.
    • output: 200 (OK) response code with Content-Location header with an URI of a new RO (with assigned RO_ID)
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID

  • GET - returns list of versions of this research object.
    • output: 200 (OK) response code with a rdf file in response body containing OAI-ORE aggreagates tags.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - deletes the research object.

BASE_URI/research_objects/RO_ID/RO_VERSION_ID

  • GET - returns specified version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - creates new version
    • input: optionally - URI of the base version that should be used to create a new version
    • output: 201 (Created) if the version was created
    • possible errors: 409 (Conflict) if version with given RO_VERSION_ID already exists
  • DELETE - deletes this version of research object

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/manifest.rdf

  • GET - returns manifest.rdf of given version of RO.
  • PUT - not allowed
  • POST - used for updating manifest.rdf file.
    • input: manifest.rdf in request body
    • output: 200 (OK) response code if the descriptive metadata was succesfully created
    • possible errors: 400 (Bad Request) if manifest.rdf is not well-formed, 409 (Conflict) if manifest.rdf contains incorrect data (for example, one of required tags is missing).
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/any/other/file

  • GET - returns requested file. If requested URI leads to a directory, returns rdf file with list of files in this directory.
  • PUT - not allowed
  • POST - used for adding and updating files, also results in the manifest.rdf modification (the part with the structural metadata).
    • input: file in request body
    • output: 200 (OK) response code
    • possible errors: 400 (Bad Request) if request has empty body - this means that empty directories are not supported
  • DELETE - used for removing files from research object, also results in the manifest.rdf modification (the part with the structural metadata).
    • output: 200 (OK) response code

Description of interaction between prototype components

Coming soon: sequence diagrams.

Additional technical information

Handling of Modifications in manifest.rdf File

Users are expected to modify only the tokens corresponding to the Research Object metadata. List of resources is automatically regenerated every time any other file in the Research Object directory is created, modified or removed. Therefore any changes to the resource list by users will be ignored.

Handling Remote Resources

One may want to manually modify the resource list in order to link to a remote resource, that cannot be put directly in the Research Object directory. As stated earlier, this is not a valid approach. The correct way to do this is to create an internet shortcut (file with .url extension in Windows, we need to check how it is handled in other systems) pointing to the remote resource. The shortcut should be placed directly in the Research Object directory.

Discussion about Conflicts

If two or more users edit contents of the same file at the same time, it leads to conflicts. In DropBox, this results in creation of conflicted copies - for each conflict a new file is added, with information about author and current date appended to the file name. As we have no means to resolve such conflicts, the conflicted copies will not be treated in any special way. They will be added to the RO SRS simply as separate files. As users resolve conflicts manually, it will result in removal of conflicted copies and subsequent removal of corresponding files in the prototype.

Conflicts on the manifest.rdf file require special consideration, as this file is used to keep track of changes in metadata and resources. Furthermore, the conflict may be triggered by Connector itself as it updates the file contents after every change in resources. We decided not to address this issue at the moment though, in order to keep the first prototype design simple. We assume that manifest.rdf is relatively small, so DropBox will manage to synchronize it quickly between users. Also, in normal use manifest.rdf is rarely modified by users (only when metadata changes). As a result, the risk of conflicts occurring for this file is minimal.

Mapping between RO structure and internal dLibra data model

General structure

Research Object

dLibra

Research Object

Group publication

Version of Research Object

Publication with single edition which content is modified every
time the data or metadata of the Research Object changes

Resource file

File
For each modification of the file a new File Version is created. The deletion of files will be handled by excluding them from the "edition" in dLibra.

Metadata

ADMIRAL Data Package Information

manifest.rdf

dLibra

a dataset local identifier

<dcterms:identifier>

edition name

username of creator

<dcterms:creator>

edition attribute: Creator

a one-line title of the dataset

<dcterms:title>

edition attribute: Title

a simple textual description of the dataset

<dcterms:description>

edition attribute: Description

enumeration of the URIs of the resources
in the data package

<ore:aggregates
rdf:resource="http://some/url"/>

assignment of file versions to edition

a version number for the data package

<oxds:currentVersion>

publication name

embargo status and date

<oxds:isEmbargoed>,
<oxds:embargoedUntil>

TBD (what does this information mean?)

date the package was created (submitted)

<dcterms:created>

group publication attribute: Created

date the package was last modified

<dcterms:modified>

group publication attribute: Modified

reference to any resource from which the data
package has been derived

<dcterms:source>

group publication attribute: Source

Any other descriptive metadata (tags from any vocabulary) can be stored in the dLibra as long as it does not contain nested tags.

TBD:

  • authorization and access control
  • output format of list of links to research objects
  • MD5 checksum and the last modification time will be stored as additional attributes of the tag (details TBD).
  • Switching to older versions
  • No labels