Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Use Case: Synchronizing Research Object with dLibra Digital Library

Scope

This use case describes behavior of the Connector between a DropBox folder and a dLibra system that handles automatic publication and synchronization of a Research Object within a Digital Library.

Actors

  • User A, User B: Researchers collaborating on a Research Object. They interact with the rest of the system using only DropBox interface.
  • Connector: The system under design. It interacts with the Researchers as another DropBox user, having access to their shared directory. It interacts with the dLibra system as a redactor user, having rights to add and modify publications.

Main Scenario

  1. User A creates a new directory for the Research Object inside the shared DropBox directory.
  2. Connector creates manifest.rdf file inside the directory, containing basic file structure and a description stub with empty fields.
  3. Connector creates a group publication for the Research Object in dLibra.
  4. User A fills in description fields in manifest.rdf.
  5. Connector updates metadata of the associated group publication, according to changes in manifest.rdf
  6. User A adds some research related files to the directory.
  7. After the first file is added, Connector creates a publication in dLibra for the first Research Object Version.
  8. For each file added by User A, Connector:
    1. creates a corresponding resource element inside manifest.rdf file,
    2. uploads the file to dLibra,
    3. creates a new edition of the corresponding publication, containing the added file.
  9. User B modifies a file in the directory.
  10. Connector updates the corresponding element inside manifest.rdf file.
  11. Connector creates a new version of the corresponding file in dLibra.
  12. Connector creates a new edition of the corresponding publication, associating it with the new version of file.
  13. User B changes the description of Research Object in manifest.rdf, changing the Research Object Version.
  14. Connector creates a new publication and its first edition in dLibra for the new Research Object Version.
  15. User A removes a file from the directory.
  16. Connector removes the corresponding resource element from manifest.rdf
  17. Connector creates a new edition of the corresponding publication in dLibra, without the removed file.

Discussion about Conflicts

If two or more users edit contents of the same file at the same time, it leads to conflicts. In DropBox, this results in creation of conflicted copies - for each conflict a new file is added, with information about author and current date appended to the file name. As we have no means to resolve such conflicts, the conflicted copies will not be treated in any special way. They will be added to dLibra simply as separate files. As users resolve conflicts manually, it will result in removal of conflicted copies and subsequent removal of corresponding files in dLibra.

Conflicts on the manifest.rdf file require special consideration, as this file is used to keep track of changes in metadata and resources. Furthermore, the conflict may be triggered by Connector itself as it updates the file contents after every change in resources. We decided not to address this issue at the moment though, in order to keep the first prototype design simple. We assume that manifest.rdf is relatively small, so DropBox will manage to synchronize it quickly between users. Also, in normal use manifest.rdf is rarely modified by users (only when metadata changes). As a result, the risk of conflicts occurring for this file is minimal.

Handling of Modifications in manifest.rdf File

Users are expected to modify only the tokens corresponding to the Research Object metadata. List of resources is automatically regenerated every time any other file in the Research Object directory is created, modified or removed. Therefore any changes to the resource list by users will be ignored.

One may want to manually modify the resource list in order to link to a remote resource, that cannot be put directly in the Research Object directory. The correct way to do this is to create an internet shortcut (file with .url extension in Windows, we need to check how it is handled in other systems) pointing to the remote resource. The shortcut should be placed directly in the Research Object directory.

Connector architecture

The Connector consists of two components:

  • DropBox client that observes changes in shared Resource Object directory and requests relevant operations in the dLibra system. Developed by ???
  • dLibra module that performs operations requested by DropBox client. Developed by PSNC.

Communication between components: the dLibra module will provide a REST interface to the DropBox client.

Mapping from Research Object structure to dLibra objects

Research Object

dLibra

Research Object

  • Research Object Identifier

Group publication

  • Group publication name

Version of Research Object

  • Version identifier (TBD: where to define it? dcterms:identifier?)

Publication

  • Publication name

Edition of Research Object

Edition
For each modification of the Research Object metadata and/or resource files a new Edition is created

Resource file

File
For each modification of the file a new File Version is created

REST interface description

Vocabulary:

  • BASE_URI - base URI of service, for example http://example.org/wf4ever/
  • RO - research object
  • RO_ID - research object identifier
  • RO_VERSION_ID - identifier of version of RO
  • RO_EDITION_ID - identifier of edition of RO.
  • Dropbox client - software that observes changes in shared Resource Object directory and requests relevant operations in the dLibra system via REST interface. Developed by ???
  • REST interface – dLibra module that performs operations requested by DropBox client. Developed by PSNC.
  • Connector - software that handles automatic publication and synchronization of a Research Object within a Digital Library. Consists of two components: Dropbox client and REST interface. These components are separated (and should be loosely coupled) so that adding new clients (for example, WebDAV client) was simple.

Please note:

  • Error codes 401 (Not Authorized), 404 (Not Found) and 500 (Internal Server Error) are omitted in interface description unless there are special cases when they are used.
  • After receiving manifest.rdf in response from REST interface, Client should immediately place it in DropBox directory.

Interface:

BASE_URI/research_objects

  • GET - returns list of links to research objects. Output format is TBD.
  • PUT - not allowed
  • POST - this method is used to request an URI of a new RO and should be called after user creates a new RO directory in his workspace.
    • input: empty body
    • output: 200 (OK) response code with manifest.rdf in response body and Content-Location header with an URI of a new RO
    • REST interface should create and return manifest.rdf file.This file should contain these tags, whose values should be changed by user.
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID

  • GET - returns list of versions of this research object.
    • output: 200 (OK) response code with a rdf file in response body containing OAI-ORE aggreagates tags.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - deletes the research object.

BASE_URI/research_objects/RO_ID/current

  • GET - returns most recent edition of the most recent version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID

  • GET - returns list of edtions of this version of RO
    • output: 200 (OK) response code with a rdf file in response body containing OAI-ORE aggreagates tags.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - deletes this version of research object

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current

  • GET - returns most recent edition of this version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/RO_EDITION_ID

  • GET - returns given edition of this version of RO. Output format is chosen by content negotiation and is either: zip archive with all files from RO or rdf file with list of files in RO.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/RO_EDITION_ID/any/file

  • GET - returns requested file from RO. If requested URI leads to a directory, returns rdf file with list of files in this directory.
  • PUT - not allowed
  • POST - not allowed
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current/manifest.rdf

  • GET - returns most recent edition of manifest.rdf from given version of RO.
  • PUT - not allowed
  • POST - used for updating manifest.rdf file.
    • input: manifest.rdf in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, manifest.rdf in response body.
    • changes in research object: new edition of RO is created
    • possible errors: 400 (Bad Request) if manifest.rdf is not well-formed, 409 (Conflict) if manifest.rdf contains incorrect data (for example, one of required tags is missing).
    • if RO_VERSION_ID is changed in manifest.rdf, system creates a new version of RO. Content-Location header in response points to a new version instead of edition. TBD: what if new value of RO_VERSION_ID is already in use?
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/RO_VERSION_ID/current/any/other/file

  • GET - returns most recent edition of requested file. If requested URI leads to a directory, returns rdf file with list of files in this directory.
  • PUT - not allowed
  • POST - used for adding and updating files.
    • input: file in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, updated manifest.rdf ine response body
    • changes in research object: new edition of RO is created, manifest.rdf is updated to reflect changes in RO structure.
  • DELETE - used for removing files from research object.
    • output: 200 (OK) response code, Content-Location header with an URI of a new edition of RO, updated manifest.rdf ine response body.
    • changes in research object: new edition of RO is created, manifest.rdf is updated to reflect changes in RO structure.

BASE_URI/research_objects/RO_ID/current/manifest.rdf

  • GET - not allowed
  • PUT - not allowed
  • POST - used for updating manifest.rdf until first version of RO is created. This uri should not be used after first version of RO is created.
    • input: file in request body
    • output: 200 (OK) response code, manifest.rdf in response body
    • possible errors: 403 (Forbidden) if RO already has a version.
  • DELETE - not allowed

BASE_URI/research_objects/RO_ID/current/any/other/file

  • GET - not allowed
  • PUT - not allowed
  • POST - used for adding first file (other than manifest) to RO. Adding file results in creating first version of RO. RO_VERSION_ID is taken from manifest.rdf – default value will be used if user did not change it. This uri should not be used after first version of RO is created.
    • input: file in request body
    • output: 200 (OK) response code, Content-Location header with an URI of a new version of RO, updated manifest.rdf in response body
    • possible errors: 403 (Forbidden) if RO already has a version.
  • DELETE - not allowed

TBD:

  • authorization and access control
  • output format of list of links to research objects
  • default file format for RO (zip or rdf with file list)
  • in which xml tag should we store RO_VERSION_ID in manifest.rdf?
  • maybe RO_ID should be specified by user?
  • dLibra Edition/File Version Creation Frequency Issue
    It would not be reasonable to create a new file version and immediately a new publication edition every time any single file in Research Object is modified. It would consume a lot of resources and the advantages would be negligible. So a parametrized time interval should be defined to control how often new edition are to be created. For example if this interval is set to one hour, then new file versions and edition is created after one hour passes since the last change. This way users can conveniently work on the research object and the results are imported to dLibra after they finish.
  • how should we handle the situation when user creates a copy of directory with multiple files?
  • No labels