In this document, we have compiled the discussion (mainly Graham's comments, Kevin's and ours) after the first version of the RO SRS has been implemented. The output of this discussion will be used for the next iteration of the service. We will first create a new page for the second version and then continue with the development tasks.
In "description of RO structure":
- - manifest.rdf is an RDF/XML file (not just XML).
PSNC: Sure, we will update it.
- - For use with a web interface, I would make a bigger distinction between the manifest submitted with a RO and the manifest presented by the RO SRS. I've update the page on ADMIRAL data packages, and added examples there, to explicate this distinction: http://www.wf4ever-project.org/wiki/display/docs/Data+packages+in+ADMIRAL
PSNC: Check the tags in our manifest. Is there a particular reason such distinction is needed? We don't want to focus much on the manifest details as it is not a final implementation.
GK: not especially - I was mainly trying to underscore that the manifest was composed from information supplied by both the user and reposotory
In "Main scenario":
- - What happens if the user deletes an entire directory containing a manifest.rdf file? (I guess, for now, the answer is "don't do that".)
PSNC: Depends on the connector – we assume one manifest per RO and deleting a folder containing the manifest is equivalent to deleting the RO. RO can be deleted from the ROSRS.
GK: I think that's OK, but I'd like to review it in light of clarification of what the URI is presumed to identify. (See also comment towards the end about enumerating resources and URIs)
Before "REST Interface specification":
- - I think it would be useful to have an additional section that clarifies the linkage between the dropbox directory and the REST interface; i.e. to describe the prototype service. Maybe by reference to http://www.wf4ever-project.org/wiki/download/attachments/1179685/20110207072.jpg.
PSNC: We created some diagrams illustrating the interaction between the Dropbox connector and RO SRS. However, they were outdated after the last changes in the interface. We will work on this next week. Additionally, it will be required from JITS – specification of the DropBox connector – a clarification of interaction between dropbox directories and the connector.
- - check: does the term "Prototype" refer to the entire system under test, or just the dLibra service and REST interface? I've assumed the latter.
PSNC: We assumed the former, as it is explained at the beginning of the document. We will check the rest of the document to avoid confusions
- (This raises an issue for me: should I be implementing tests against the dropbox interface or against the REST interface? In the long run, the answer is probably "both", but for this first round I think the appropriate focus is on the interface that is closest to the user - i.e. the DropBox. But I think would be easiest to create tests against the REST interface. I'll return to this is another message.)
PSNC: Basic REST interface tests are already implemented by PSNC. We assumed the tests you refer will be for the entire prototype, mainly against the dropbox interface, but you can create additional tests against the REST interface if you want
GK: it turns out what you suggest is what I've done. (I also like to have tests at the interface between developers that serve to clarify and codify the necessary common understanding - I guess your tests can serve this purpose.)
- - BASE_URI/workspaces/WORKSPACE_ID
the semantics described for PUT is, I think, slightly at odds with HTTP, specifically in the use of 409 response. Normal HTTP expectations would be that a PUT operation replaces the current state for that resource if it already exists.
- More broadly, for a REST interface, I think we need some clarity about what resource is represented by BASE_URI/workspaces/WORKSPACE_ID: as described for PUT, it seems that the resource here is the password for the identified workspace, rather than the workspace as a whole, which would seem more intuitive, and more in line with the use of DELETE.
(KEVIN +1 - Agreed, with the proviso that we shouldn't get too hung up on what a workspace is until we're clearer what an RO is, and the social sharing aspects from WP3)
- I would suggest a possible alternative for consideration would be:
(a) use POST to BASE_URI/workspaces/ rather than PUT to BASE_URI/workspaces/WORKSPACE_ID, with input containing WORKSPACE_ID and user and password and maybe other metadata. The use here of a 409 response would be more normal.
PSNC: Good point (PUT will be changed to POST).
- There's a potential security problem here: the PUT/POST containing a password clearly must be on an HTTPS rather that HTTP connection. A common way to force HTTPS is to use HTTP redirect. But if the original request is sent via HTTP prior to the redirect, the password leakage has already happened. So I think that any request that contains confidential/credential material should be sent only to a URI provided by the service.
PSNC: We will move the entire interface to HTTPs.
(b) use GET to BASE_URI/workspaces/ to retrieve a list of workspace IDs.
PSNC: ??? – you should know the workspace id and its password to access it.
- - BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID similar comments to above
PSNC: Agree. Move to POST
- The use of GET to retrieve a list of versions seems particularly surprising to me. Further, I don't think that returning a list of ore:aggregates values is the same as returning a list of versions. I don't think it is the intent of ORE that an object is an aggregation of versions, but rather an aggregation of component elements; Cf. http://www.openarchives.org/ore/1.0/datamodel.html#Introduction:
"But frequently a logical unit of web information is actually an aggregation of Resources."
The ORE spec goes on to list here examples of aggregations, none of which are collections of versions.
(KEVIN +1 - I suspect you're right regarding ORE. I think the wider issue (which you picked up on below) is whether the resource at this URI is a RO as to be defined by the model from the RO TF, and how this will handle versioning (through an ore:aggregate and extensions, or something completely different?)
PSNC: Agree. Do you have any suggestion for the format to be used to return list of versions? Similarly, any suggestion about the format in response for GET sent to BASE_URI/workspaces/WORKSPACE_ID/ROs to return list of ROs?
- - BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/RO_VERSION_ID
For GET, the indicated use of content negotiation here seems potentially problematic. I think it is not the intent of HTTP content negotiation to select fundamentally different information about a resource (here, a ZIP file or RDF description of contents is suggested). I can't find a definitive normative statement about this, but all the normative discussion of content negotiations talks about variations of format or representation rather than fundamental variations of content.
(KEVIN +1 - Content negotiation should return different representations of the same information resource (i.e. the "content" of the resource should be the same)
In this case, I think it would be better to have distinct URIs for the RDF description of contents and the contents themselves. By my reading, the OAI/ORE spec says something similar:
(KEVIN +1 - Although this doesn't rule out the possibility of also having a genuinely common information resource that is returned by content negotiation (e.g. an HTML version equivalent to the RDF rendered for humans).
PSNC: Agree. We suggest to add a query parameter for accessing ZIP file (as for accessing contents of individual files), i.e., BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/RO_VERSION_ID?content
- Using POST to a version-specific URI to create a new version is surprising to me. I'd suggest POST to a version-agnostic URI for the RO for this.
PSNC: Agreed. It will be moved one level up.
- It seems to me that the use of POST here is not consistent with previous examples that use PUT for similar types of operation (e.g. BASE_URI/workspaces/WORKSPACE_ID, etc.), though I can see that PUT would not be appropriate in this case.
PSNC: After changes mentioned earlier it will be consistent (POST).
- - BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/RO_VERSION_ID/manifest.rdf
At first encounter, it seems that PUT would more closely match the intended semantics than POST; i.e. supplying new values that replace the current values. Of course, the SRS does supply and return additional values, which is at odds with some readings of PUT.
The more general problem here, I think, is the extent to which the manifest is considered to be a resource in its own right, or part of the state of the RO resource, which might more closely reflect its actual use. (This doesn't exclude having a read-only URI for accessing a representation of the manifest, IMO.) If the manifest were updated by posting values to the RO URI, I think this might better match expectation about the use of PUT, POST, etc.
(KEVIN - It may come down to the mechanics of how we need to use the manifest for the local dropbox client, where it may be useful to always have an explicit manifest resource. Otherwise, I would hope much of the manifest would end up in the RO resource)
PSNC: +1. TODO: remove manifest.rdf as a special URL. Move up one level meaning of the URLs:
- Accessing/modifying the metadata:
Creating new RO versions (POST):
Creating new RO (POST):
Creating new Workspace (POST):
- - BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/RO_VERSION_ID/any/other/file
In this case, I really expect the URI to denote the resource itself, not the metadata about it. I think the operation used to replace the file contents really should be an HTTP PUT. Then GET should return the content, not metadata. To access the metadata, one can use a different URI (e.g. with a query parameter).
KEVIN probably. Pragmatically, I completely agree. The files themselves are clearly information resources, and the behavior should be as you say. I suspect there's agreement here that we need to be careful about what metadata you get back when you dereference a URI, and identifying information resources and non-information resources is probably part of this -- again, this comes back to what a "file" is in the context of a RO, and how different copies and/or versions are linked.
PSNC: We propose to leave it like this at this stage of development. We decided to follow this approach after taking into account Jun's feedback. Besides, we are being consistent and plan to adopt the same approach for getting ZIP with the research object. I.e.,
- RO (content as ZIP):
- file (content):
KEVIN - My reservations (more of a wait-and-see) are:
a) at what level we care about versions? If we were to care about versions of these constituent files, I think we
should handle versions in the same manner as ROs, which implies RDF metadata here rather than the files. I also think I recall that the dLibra model can have file versions independent of aggregations (e.g. an explicit file version included in several aggregations) so the above may not be sufficient for that.
b) how the wf4ever RO model handles versions of resources within an RO; we may end up needing file versioning because we're implementing this (and again, your thoughts are equally useful input to working out the model).
PSNC: See comments above
GK: I must confess that my own view on this may be changing in light of some discussions we've been having around the ADMIRAL system. Roughly, when accessing the RO, if content negotiating for HTML returns a description page, or landing page, then asking for RDF naturally returns the manifest. If the URI denotes the RO itself, these can be viewed as (partial) representations of the state of the RO. I'd find it easieer to think about this if we went through an exercise of defining what resources we deal with, and their corresponding URIs.
GK: But a paper Carole mentioned (http://home.badc.rl.ac.uk/lawrence/blog/2011/01/07/citation,_digital_object_identifiers,_persistence,_correction_and_metadata) implies that the "landing page" should be distinct from data+metadata+.... I'm still digesting this viewpoint.
- - BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/RO_VERSION_ID
reading this, I better understand some of the comments that have been made about URIs for RO components vs URIs for ROs. It seems to me that these concerns would be substantially alleviated if the URI query syntax were used for this rather than the URI hierarchy syntax. E.g.
By simply dropping the version= parameter, the latest version could be assumed.
Suppose I work with a base URI relating to particular RO version:
Does it make sense to use the relative URI "../RO_VERSION_OTHER" to access another version of the same RO? Marginally, maybe. But then consider that the working base URI is:
Can I easily construct a relative URI to access "file" from a different version of the RO? Not quite so easy. But with this base URI:
I can use:
But if I use just:
as a relative URI, the version id from the base URI is not carried over.
(a) using query parameters for version identification alleviates the concerns about distinguishing URIs of resources within ROs from the URIs of ROs in the face of version identification.
(b) using URIn path elements for versioning makes it easier to have relative references within a nominated RO version.
At this stage, I'm not sure what is most important.
PSNC: Let's wait for more user requirements related to versioning. At this stage we want to avoid the problem of "latest version" identification.
- (Almost) separately from the above considerations, at what level do we need the SRS to apply versioning: at the level of ROs, or within ROs? I.e. does it make sense to talk of (say)
My view is not. That is, if the service versions ROs, then file versions can be associated with RO versions (e.g. like a Subversion code repository). This does not preclude the possibility of associating version information with individual files via metadata in the manifest: this is a separate issue from having file versioning supported by the SRS interface.
(As an implementation matter, I'd expect the underlying repository to optimize the use of RO storage when (say) only one file changes between versions, but that optimization doesn't matter at the interface.)
But if it is desired to expose file versioning separately from RO versioning, then I'd suggest that using URI query parameters would be a cleaner way to do this; e.g. BASE_URI/workspaces/WORKSPACE_ID/ROs/RO_ID/any/other/file?RO_version=RO_VERSION_ID&File_version=FILE_VERSION_ID
PSNC: Let's wait for more user requirements related to versioning. The mentioned storage optimization can be implemented with dLibra mechanisms.