Gap Analysis between astro needs and the current RO model
Right now we do have the provenance of the workflow captured within the RO, but not the provenance of the RO itself or its components. This issue has already been raised in several mail discussions this week:
The edits, creation, deletion, etc. of an RO should be recorded automatically by the portal, and not annotated by the users.
Linkage of the RO resources with the entities of the workflow. Looking at the Taverna export, several questions raised:
Is the export going to be a file in RDF or is the tool going to incorporate some kind of functionality to communicate with Piotr's portal?
If we choose the former, then we are going to need a platform for uploading this RDF content, not just the file. Also, how are we going to connect the entities of the RDF to the resources already available in the RO?
If we choose the latter then would Taverna upload automatically all the input files, output files and codes of the workflow through the portal to the RO? This does not seem within the scope of the project, because we would make everything for Taverna. Additional comment by Pique: Be able to import the codes/scripts of parts of the RO. The ideal case would be to have in the RO all the components of the Wfs involved in the experiment, and not only the digital Wfs built in Taverna. We have been considering until now the addition of the data files to the RO as well as the digital Wfs and some other files like bibliography. But it would be good to consider also how to add the processes inside the Wf, like Web services or python scripts, in order to link and visualize all the minimal components of the experiment in the overalll RO picture.
Capturing dependences between parts of the RO:
How could I say that I have used some bibliography for making the workflow, or explicit parts of the workflow?
How could I annotate dependencies of the workflow to external services, tools or libraries? (And their version, url, etc.)
Other dependencies useful to build the workflow
According to Pique, an experiment is normally not a single workflow, but a group of different workflows. In the golden exemplar of D5.3 v1 (page 10) we can find a figure explaining a schema between the dependences and decisions to be made in the experiment. These decisions should be recorded somehow in the RO model too, because they are key for the understanding of the experiment.
When do we know that an RO is another version of an RO? Is it something personal from the point of view of the scientist?
According to pique, a Research Object is a living experiment. If we add some minor changes to the workflows, then we are in the same living RO. However, if we change some of the main decisions or we add further workflows, we would be dealing with another version of the experiment.
Is this vision the one that we are going to have when capturing the provenance of an RO? According to Esteban's work reusing OPM, each addition and deletion of a resource is going to affect the RO state. We can't call version to all these states, shall we have an aggregation of states of the RO? (i.e. its timeline)
Privacy and restricted access to parts of the RO haven't been discussed. This should be recorded as part of the RO model right? Feedback from Pique: versioning and RO identity relates more to re-use of a published/archived RO. In the case of a Living (still not published) RO I would say we are always dealing with the same RO as long as we keep the final purpose of the experiment unchanged (e.g. the problem we want to solve). But when re-use is done on a Published RO, the issue of attribution is raised. My position at this point is that only a different picture of the RO Schema can be considered a different RO. Changes in the data, switching from services to scripts, changes in the scripts, new thresholds for decissions in the forks, etc. will provide a new version of the same RO.
Should the RO model provide the means to annotate that a workflow is failing, or the current issues of an experiment? (At least as an internal view for the group of scientists handling the experiment)
Should the workflow system be present apart from the workflow Engine?
Powered by a free Atlassian Confluence Open Source Project License granted to Poznan Supercomputing . Evaluate Confluence today.