Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Research Objects Store and Retrieve Service (RO SRS) - Initial specification

Introduction

This document contains initial specification of the RO SRS. The main functionality of the service will be to allow its users to store and retrieve Research Objects.

The RO SRS will be accessible via REST interface. It will be implemented on top of a digital library system. The dLibra system was chosen for prototype implementation.

The RO SRS will be a part of the first prototype developed in the Wf4Ever project. The aim of this prototype is to handle automated publication and synchronization of a Research Objects between a DropBox folder and the dLibra system. It will be achieved with an intermediate component, called DropBox Connector, acting as a client of DropBox service and RO SRS. DropBox Connector observes changes in DropBox directory dedicated for shared Resource Objects and requests relevant operations in the RO SRS.

The structure of the Research Object that we are using in this prototype is based on the Admiral data package.

The specifications consists of the following sections:

  • The description of the Research Object structure (based on ADMIRAL Data Package)
  • Use case scenario of interaction between end user (using DropBox service with a special intermediate connector) and RO SRS
  • REST interface specification
  • Description of interaction between prototype components in the example scenario
  • Additional technical information
  • Mapping between RO structure and internal dLibra data model

The description of the Research Object structure

From users' point of view, in this prototype Research Object is a directory in a file system. All files included in the directory or any of its subdirectories are considered to be the Resources of the Research Object, except for the file named manifest.rdf (put directly in the RO directory). The manifest.rdf is an xml file containing metadata associated with the Research Object as well as list of all resources.

As an addition to the definitions of ADMIRAL Data Package, we define the following assumptions:

  • If several versions of a Research Object have been defined, information about all the versions will be listed in manifest.rdf (specific tags TBA).
  • For each resource MD5 checksum or the last modification time will be stored in manifest.rdf (specifics TBA). This will be helpful for tracking changes in resources.

The following example shows the structure of the manifest.rdf file:

Use case scenario

Actors

  • Users: Researchers collaborating on a Research Object. In this scenario they interact with the prototype using only DropBox interface and a shared directory in it.
    • Creator: A special user who has created particular Research Object
  • Prototype: The system under design. It interacts with the Users as another DropBox user, having access to their shared directory.

Main scenario

  1. Creator creates a new directory for the Research Object inside the shared DropBox directory.
  2. Prototype creates manifest.rdf file inside the directory, with a basic structure having empty description fields.
  3. While Users make changes in the Research Object - if one of the Users:
    1. Modifies manifest.rdf file (in the Prototype only descriptions fields modifications are allowed)
      1. If Research Object Version is modified:
        • If provided Research Object Version has not been used before, Prototype creates a new version of the RO - in the Prototype all files and metadata from the previous version are copied and become a new version of the RO.
        • If provided Research Object Version has been used before, Prototype restores the directory's state to the given Research Object Version.
      2. Prototype updates metadata of the Research Object Version, according to changes in manifest.rdf
    2. Adds some Research Object related file to the directory
      1. Prototype stores the contents of a new file.
      2. Prototype updates the manifest.rdf file by adding a corresponding resource element.
    3. Modify some Research Object related files in the directory
      1. Prototype stores the update contents of the file.
      2. Prototype updates the manifest.rdf file by updating the attributes of corresponding resource element.
    4. Remove some Research Object related files in the directory
      1. Prototype marks the file as deleted (but does not actually delete the content from internal storage).
      2. Prototype updates the manifest.rdf file by removing the corresponding resource element.

REST interface specification

Vocabulary:

  • BASE_URI - base URI of service, for example http://example.org/wf4ever/
  • RO - research object
  • RO_ID - research object identifier
  • RO_VERSION_ID - identifier of version of RO
  • RO_EDITION_ID - identifier of edition of RO.
  • Dropbox client - software that observes changes in shared Resource Object directory and requests relevant operations in the dLibra system via REST interface. Developed by Manchester.
  • REST interface – dLibra module that performs operations requested by DropBox client. Developed by PSNC.
  • Connector - software that handles automatic publication and synchronization of a Research Object within a Digital Library. Consists of two components: Dropbox client and REST interface. These components are separated (and should be loosely coupled) so that adding new clients (for example, WebDAV client) was simple.

Please note:

  • Error codes 401 (Not Authorized), 404 (Not Found) and 500 (Internal Server Error) are omitted in interface description unless there are special cases when they are used.
  • After receiving manifest.rdf in response from REST interface, Client should immediately place it in DropBox directory.

Interface:

BASE_URI/research_objects

  • GET - returns list of links to research objects. Output format is TBD.
  • PUT - not allowed
  • POST - this method is used to request an URI of a new RO and should be called after user creates a new RO directory in his workspace.
    • input: empty body
    • output: 200 (OK) response code with manifest.rdf in response body and Content-Location header with an URI of a new RO
    • REST interface should create and return manifest.rdf file.This file should contain these tags, whose values should be changed by user.
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID

  • GET - returns list of versions of this research object.
    • output: 200 (OK) response code with a rdf file in response body containing OAI-ORE aggreagates tags.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - deletes the research object.

BASE_URI/research_objects/RO_ID/current

  • GET - returns most recent edition of the most recent version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID

  • GET - returns list of edtions of this version of RO
    • output: 200 (OK) response code with a rdf file in response body containing OAI-ORE aggreagates tags.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - deletes this version of research object

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current

  • GET - returns most recent edition of this version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/RO_EDITION_ID

  • GET - returns given edition of this version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/RO_EDITION_ID/any/file

  • GET - returns requested file from RO. If requested URI leads to a directory, returns rdf file with list of files in this directory.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current/manifest.rdf

  • GET - returns most recent edition of manifest.rdf from given version of RO.
  • PUT - not allowed
  • POST - used for updating manifest.rdf file.
    • input: manifest.rdf in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, manifest.rdf in response body.
    • changes in research object: new edition of RO is created
    • possible errors: 400 (Bad Request) if manifest.rdf is not well-formed, 409 (Conflict) if manifest.rdf contains incorrect data (for example, one of required tags is missing).
    • if RO_VERSION_ID is changed in manifest.rdf, system creates a new version of RO. Content-Location header in response points to a new version instead of edition. TBD: what if new value of RO_VERSION_ID is already in use?
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current/any/other/file

  • GET - returns most recent edition of requested file. If requested URI leads to a directory, returns rdf file with list of files in this directory.
  • PUT - not allowed
  • POST - used for adding and updating files.
    • input: file in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, updated manifest.rdf ine response body
    • changes in research object: new edition of RO is created, manifest.rdf is updated to reflect changes in RO structure.
  • DELETE - used for removing files from research object.
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, updated manifest.rdf ine response body.
    • changes in research object: new edition of RO is created, manifest.rdf is updated to reflect changes in RO structure.

BASE_URI/research_objects/RO_ID/current/manifest.rdf

  • GET - not allowed
  • PUT - not allowed
  • POST - used for updating manifest.rdf until first version of RO is created. This uri should not be used after first version of RO is created.
    • input: file in request body
    • output: 200 (OK) response code, manifest.rdf in response body
    • possible errors: 403 (Forbidden) if RO already has a version.
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/current/any/other/file

  • GET - not allowed
  • PUT - not allowed
  • POST - used for adding first file (other than manifest) to RO. Adding file results in creating first version of RO. RO_VERSION_ID is taken from manifest.rdf – default value will be used if user did not change it. This uri should not be used after first version of RO is created.
    • input: file in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new version of RO, updated manifest.rdf in response body
    • possible errors: 403 (Forbidden) if RO already has a version.
  • DELETE - not allowed

Description of interaction between prototype components

aaaa

  1. User creates a new directory for the Research Object inside the shared DropBox directory.
  2. Prototype creates manifest.rdf file inside the directory, containing basic file structure with empty description fields.
  3. Prototype creates a group publication for the Research Object in dLibra.
  4. While Users make changes in the Research Object, if users:
    1. Modify manifest.rdf file (only descriptions fields modifications are allowed)
      1. If Research Object Version is modified:
        • If provided Research Object Version has not been used before, Connector creates a new publication and its first edition in dLibra for the new Research Object Version.
        • If provided Research Object Version has been used before, Connector restores the directory's state to the last edition associated with the Research Object Version.
      2. Connector updates metadata of the associated group publication, according to changes in manifest.rdf
    2. Add some Research Object related file to the directory
      1. If it is the first file in the Research Object, Connector creates a publication in dLibra for the first Research Object Version.
      2. Connector creates a corresponding resource element inside manifest.rdf file.
      3. Connector uploads the file to dLibra.
      4. Connector creates a new edition of the corresponding publication, containing the added file.
    3. Modify some Research Object related files in the directory
      1. Connector updates the corresponding element inside manifest.rdf file.
      2. Connector creates a new version of the corresponding file in dLibra.
      3. Connector creates a new edition of the corresponding publication, associating it with the new version of file.
    4. Remove some Research Object related files in the directory
      1. Connector removes the corresponding resource element from manifest.rdf
      2. Connector creates a new edition of the corresponding publication in dLibra, without the removed file.

Additional technical information

Handling of Modifications in manifest.rdf File

Users are expected to modify only the tokens corresponding to the Research Object metadata. List of resources is automatically regenerated every time any other file in the Research Object directory is created, modified or removed. Therefore any changes to the resource list by users will be ignored.

Handling Remote Resources

One may want to manually modify the resource list in order to link to a remote resource, that cannot be put directly in the Research Object directory. As stated earlier, this is not a valid approach. The correct way to do this is to create an internet shortcut (file with .url extension in Windows, we need to check how it is handled in other systems) pointing to the remote resource. The shortcut should be placed directly in the Research Object directory.

Discussion about Conflicts

If two or more users edit contents of the same file at the same time, it leads to conflicts. In DropBox, this results in creation of conflicted copies - for each conflict a new file is added, with information about author and current date appended to the file name. As we have no means to resolve such conflicts, the conflicted copies will not be treated in any special way. They will be added to dLibra simply as separate files. As users resolve conflicts manually, it will result in removal of conflicted copies and subsequent removal of corresponding files in dLibra.

Conflicts on the manifest.rdf file require special consideration, as this file is used to keep track of changes in metadata and resources. Furthermore, the conflict may be triggered by Connector itself as it updates the file contents after every change in resources. We decided not to address this issue at the moment though, in order to keep the first prototype design simple. We assume that manifest.rdf is relatively small, so DropBox will manage to synchronize it quickly between users. Also, in normal use manifest.rdf is rarely modified by users (only when metadata changes). As a result, the risk of conflicts occurring for this file is minimal.

Mapping between RO structure and internal dLibra data model

General structure

Research Object

dLibra

Research Object

Group publication

Version of Research Object

Publication with single edition which content is modified every
time the data or metadata of the Research Object changes

Resource file

File
For each modification of the file a new File Version is created

Metadata

ADMIRAL Data Package Information

manifest.rdf

dLibra

a dataset local identifier

<dcterms:identifier>

group publication name

username of creator

<dcterms:creator>

group publication attribute: Creator

a one-line title of the dataset

<dcterms:title>

group publication attribute: Title

a simple textual description of the dataset

<dcterms:description>

group publication attribute: Description

enumeration of the URIs of the resources
in the data package

<ore:aggregates
rdf:resource="http://some/url"/>

assignment of file versions to edition

a version number for the data package

<oxds:currentVersion>

publication name

embargo status and date

<oxds:isEmbargoed>,
<oxds:embargoedUntil>

TBD (what does this information mean?)

date the package was created (submitted)

<dcterms:created>

group publication attribute: Created

date the package was last modified

<dcterms:modified>

group publication attribute: Modified

reference to any resource from which the data
package has been derived

<dcterms:source>

group publication attribute: Source


TBD:

  • authorization and access control
  • output format of list of links to research objects
  • default file format for RO (zip or rdf with file list)
  • in which xml tag should we store RO_VERSION_ID in manifest.rdf?
  • maybe RO_ID should be specified by user?
  • dLibra Edition/File Version Creation Frequency Issue
    It would not be reasonable to create a new file version and immediately a new publication edition every time any single file in Research Object is modified. It would consume a lot of resources and the advantages would be negligible. So a parametrized time interval should be defined to control how often new edition are to be created. For example if this interval is set to one hour, then new file versions and edition is created after one hour passes since the last change. This way users can conveniently work on the research object and the results are imported to dLibra after they finish.
  • how should we handle the situation when user creates a copy of directory with multiple files?
  • ADMIRAL speciffication: Any additional descriptive information may be added to the manifest. Any such value is identified by an arbitrary URI, and may have a value that is one of the XML schema built-in "anySimpleType" datatypes. How to map any descriptive information into dLibra?
  • No labels