Skip to end of metadata
Go to start of metadata

This page describes the use case of connecting a Dropbox account to a Wf4Ever Research Object service, in order to populate and manipulate an RO through editing of files and folders synchronized by Dropbox. 

Background

When a user is to create a new Research Object, the first phase will generally be to start populating it with actual files and resources.

For a typical scientists these files would be stored across a few folders on a laptop or USB stick, more advanced users might use a file server, Dropbox or SVN.

These files can for instance be a selection of:

  • Experiment Protocol (as a Word document)
  • Results from previous experiments
  • New experiment data (from instruments, measurements, etc.)
  • Discarded experiment data (failed calibrations, etc)
  • Reference data sets (downloaded from official and unofficial sources)
  • Scripts/tools/workflows for processing data
  • Compiled libraries and binaries needed by scripts
  • Raw outputs and logs from running tools over data (unsuccessful ones hidden in a deeper folder)
  • Results compiled from output (Excel)
  • Interpretations and summary tables (Word)
  • Draft 12 page long paper for publicising results (LaTeX, Word)
  • Referenced papers (PDFs)
  • Published workshop paper (6 pages, Word and PDF)
  • Presentation done at workshop (Powerpoint)
  • Notes made after talking to people at workshop (OneNote, Notepad, Word)

As users are generally confident with working with files and folders, and are increasingly using services like [Dropbox|http://dropbox.com] to achieve replication, sharing and archiving/backup of their data, the idea was born that Wf4Ever can tap in to the Dropbox account and make this be an interface for populating research objects, while at the same time users keep working with their files pretty much as before, but with the added value of Wf4Ever tools allowing them to add further annotations, templates and guiding them towards building a fully fledged research object.  

Scenario

A simple scenario of this use case (as seen by the user):

User already have Dropbox installed and configured on their desktop computer.

  1. User registers their Dropbox account with the service website. They do not need to use the website after this.
  2. Service creates a specified folder My Research Objects in the Dropbox which the user will deposit research objects into.
  3. Folder appears on the desktop computer (courtesy of Dropbox)
  4. User creates a subfolder Cool Experiment within this folder
  5. Service recognises the new folder as a new research object called "Cool Experiment"
  6. User adds files and folders to Cool Experiment, for instance data/measurements_2043.csv, results.xls and analysis.doc
  7. Service discovers the files, and adds them to the Research Object manifest, which is added in the Dropbox as My Research Objects/Cool Experiment/manifest.rdf (that is, one top-level manifest per research object)
  8. The manifest.rdf file appears. 
    The user now has created a minimal and self-contained research object. They may place this folder structure wherever they want, say on a USB stick or zipped up on their web site. The manifest will contain identifiers referring back to the service, and provides the starting point for other Wf4Ever services to start annotating the captured resources and the research object itself. 

One can easily imagine a few interesting extensions to this scenario:

  • A UI (web) for browsing the research objects, their resources and metadata
  • A UI for adding additional metadata: title, description, intended usage
  • A UI for making links between resources, such as "X was produced by Y using Z". (Rudimentary provenance)
  • A template system, create an RO folder ?AstroExperiment X and the service can create additional folder structures datascriptsoutputs, and automatically annotate the resources according to their placement in this structure
  • "RO-aware" software (say Taverna with a Wf4Ever plugin) recognizes the manifest.rdf and updates it to provide additional information about files it is saving to the RO folder

Project focus

This use case is intended to focus on that initial phase of populating the research object. It will give users a taste for what making research objects mean, and help the Wf4Ever project figure out the properties and requirements of research objects and their preservation. Implementing the use case will require a rudimentary RO model, touch on issues like versioning and sharing, and form patches of the intended architecture. As the project's first prototype, it is important to stress that this use case is not meant to cover the full life-cycle of research objects, neither is the plan to keep the Dropbox integration as the main way to interact with Wf4Ever. It is however an interesting, yet fairly straight-forward scenario that will help further planning and development of the Wf4Ever reference implementation.

Implementation

This usecase was implemented by developing two new components, reusing our existing product dLibra, and using the external Dropbox API.

Signing up

The ROBox is the user's first meeting with the service, and is presented as a step-by-step guidance on signing up. The first step is to authenticate via Dropbox' web site, which returns an authentication token. (Hence ROBox avoids storing or even having access to the user's Dropbox credentials).

Secondly the user is asked to provide the name for the desired "container" folder in Dropbox. ROBox creates this folder using the Dropbox API and the received authentication tokens.

Next, ROBox communicates with the REST API of ROSRS - the Research Object Storage and Retrieve service. The ROSRS can be though of as independent of Dropbox, and so ROBox is just one of many potential users of the ROSRS API.

Using an admin credential, it creates a ROSRS workspace with a randomized ID and an internally stored password - needed later for accessing the workspace through the API. This workspace can be thought of as the user's home directory of research objects in ROSRS, and corresponds to the container folder in Dropbox. 

ROSRS communicates with a running dLibra instance using Java RMS, where a new dLibra user is created, which username/password corresponding to the workspace credentials.  

ROBox stores the Dropbox authentication token and workspace ID/password in its internal database.

Now the user has completed the sign-up process, and may proceed accessing his freshly created Dropbox folder on his desktop computer.

Adding files

The ROBox has now started a background job which repeatedly checks this user's Dropbox folder for changes.

As the user creates a new sub-folder within the selected container folder, the ROBox background job will pick up on this. It will access the ROSRS workspace and create a new research object which name corresponds to the folder name. 

The ROSRS will connect to dLibra and create a new Group publication corresponding to the RO.

The ROBox will create a version of the research object where files can be stored. ROSRS creates a corresponding dLibra publication, so the version is higher level than a SVN revision, it is more like an edition of a book. Therefore the ROBox currently only creates one version, using the Dropbox interface can be seen as a way to build and prepare the very first version/publication of the research object, even if this means multiple additions, modifications and removal of files contained by the RO.

As the user adds files to this research object folder in Dropbox, the background job will proceed by uploading each of the files to ROSRS, which stores them as Files of the publication in dLibra. 

Generating a manifest

After completing a round of syncronisation for a user, the ROBox will request the manifest of the version from the ROSRS.

The manifest is an RDF file modelled on the ADMIRAL data package information, realising the first minimal Research Object model. ROSRS builds this manifest by combining metadata stored in dLibra (such as the Creator of the Publication) with the listing of aggregated files in the research object.

ROBox uploads manifest.rdf to the Dropbox folder of that particular research object, and Dropbox ensures this file appears on the user's desktop computer.

Modifications to the research object files

The user continues adding, modifies or removes files in the research object's folder. Each of these files are considered contained or aggregated by the research object, and so the ROBox background job continually monitors the Dropbox for such changes. 

Internally the ROBox job has stored the synchronisation state in its database, and by comparing Dropbox revision numbers it can detect if a file or folder has changed. It proceeds by synchronizing these files to ROSRS as it did previously. New files are simply added.

Modifications to an existing file is handled by ROBox by uploading it to ROSRS, which creates a new dLibra File Version, and modifies the Publication edition to refer to the new version instead of the old.

If a previously synchronized file no longer appears in the Dropbox folder, ROBox will request its deletion from ROSRS, which will proceed to remove the link to the File Version from the publication. (The actual file content is still stored in dLibra, but no longer included in the RO).

Finally ROBox pulls back the modified manifest to replace the manifest.rdf in the Dropbox folder.

Modifications to the manifest

Although the ROSRS API supports user modifications to the manifest, adding custom annotations or modifying the Research Object metadata, the ROBox does not currently synchronize back any Dropbox changes to manifest.rdf. The reason for this is that this almost immediately can introduce conflicts, as part of the manifest is the aggregation of files. In an early test, a user edited the manifest file manually, but used invalid XML, thus immediately requiring complex error and conflict management.

It is planned that later iterations over this prototype will instead provide a web user interface where the user can be presented with a structured way to provide metadata about the RO and its resources. This metadata can then be stored directly in the manifest.

Architecture

Also available as PDF, or original OmniGraffle in Dropbox Wf4Ever/M6 Deliverables/d1.2-robox-architecture.graffle.

  • No labels