Skip to end of metadata
Go to start of metadata

Blog RO provenance querying by user Marco

Icon

Marco is the type of bioinformatician who can do SPARQL. Note about target users: we should expect two types of users in the genomics domain: ?(i) genomics users who use what bioinformaticians provide them, (ii) bioinformaticians who are capable of SPARQL or willing to learn. At this time, the majority of bioinformaticians in the genomics field cannot do SPARQL, but this number is growing.

The test

Example question from  the user view on provenance table:
As a researcher working on a Live RO, when I click on a workflow I would like to see:
Previous runs, when they were run, who ran them, summary of results and comments on the run.

Can I do this with SPARQL?

Part I - Querying the Allegrograph/WINGS Knowledge Base ("naive" validation)

In this part I am using the WINGS example RDF provided via the endpoint http://wind.isi.edu:10035/catalogs/java-catalog/repositories/WINGSTemplatesAndResults, because this was the reference to an endpoint that I found first on the Showcase 22 page. Initially, I looked only at what I could find at this SPARQL endpoint and did not consult further documentation on the Showcase 22 page.  or elsewhere. In part II, I did consult more information, e.g. I found out that there is a second end point used by showcase 22, that contains the wf4ever model artefacts. This obsoletes questions 1,2,4,5 in my notes below.

Objective 1: Find previous runs of a workflow.

Assuming that the sample data will contain one workflow, I expect to find the URL for one run of one workflow.

My first steps is to try to find the reference for the workflow run. I cannot find 'run' somewhere directly, so I have started with a result (as that must be a result of a run). I found an instance of the class SMAPComparisonResults, and used that as my starting point. It wasGeneratedBy the ProcessInstance COMPARELIGANDBINDINGSITESV211332778615941, which hasProcessTemplate ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON_COMPARELIGANDBINDINGSITESV21. This is an instance of ProcessTemplate with label "Process template CompareLigandBindingSitesV21".

Questions:
1. The name of the Process Template instances suggest that these are (sub)workflows. Do classes for workflow templates/instances exist?
2. It seems that I cannot directly distinguish a subworkflow from a workflow (NB this can perhaps be established logically. Arguably in an open world a workflow can always be a subworkflow unless we explicitly specify it can't be.)

Now I wish to find out of what workflow template ProcessTemplate ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON_COMPARELIGANDBINDINGSITESV21 is a component of, such that I can find the overall workflow.

I performed this query to find the subject the hasTemplateComponent ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON_COMPONENT2

Result:
ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON_COMPARELIGANDBINDINGSITESV21

The name suggests that we are again dealing with a subworkflow, so I try the above again:

Results:
No results

Apparently, ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON_COMPARELIGANDBINDINGSITESV21 is not a TemplateComponent of anything. I conclude that this must have been the workflow that was run to produce the SMAPComparisonResults that I started with.

Questions
3. wasGeneratedBy is apparently used as a predicate of ProcessInstances as well as ProcessTemplates. Is this correct?
4. I find that Results are results of ProcessInstances, which are instantiations (~runs?) of ProcessTemplates. Is it anywhere explicit that something is a run of something?

I guess that 'runs' can be found by selecting instances of process (workflow) templates. To select all runs in a particular repository (query1: csv, xml):

produces a list of 38 items. A stricter query requires that all results are indeed ProcessInstances (query2: csv, xml):

produces the same list of 38 items, showing that the range of hasProcessTemplate was limited to ProcessInstances in this repository.

Questions
5. Apparently, the content of the Allegrograph repository has not been mapped to wfprov/wfdesc.

Part II - Querying the Wf4ever Knowledge Base

''In search for how to query the wf4ever RO model, I now consulted the Showcase 22 documentation on http://www.wf4ever-project.org/wiki/display/docs/Showcase+22+Querying+workflow+execution+provenance'''

I found a reference to another SPARQL endpoint on the Showcase 22 wiki page: http://test-wf4ever.isoco.com/test/
Indeed it seems that the example queries run here. Unfortunately, the endpoint does not seem to have a simple UI for its results. I saved and displayed each query result manually.

Querying the wfruns cf demo 1 (query3: RDFquery3):

gives me these results:

http://wings.isi.edu/opmexport/resource/Account/ACCOUNT1332778615941
http://wings.isi.edu/opmexport/resource/Account/ACCOUNT1332778606534
http://ns.taverna.org.uk/2011/run/479c9612-4862-4bcb-ad09-315b7b864260/

Two queries to learn about the predicates associated with the wfruns (query4: RDF):

and (query5: RDF):

The results are saved in the files associated with this blog. At first glance, the repository indeed contains the results of the Allegrograph repository. I would like to check that more specifically: can I find Process Instance COMPARELIGANDBINDINGSITESV211332778615941 and its Process Template?

Query to find the uri of COMPARELIGANDBINDINGSITESV211332778615941 from the Allegrograph repository:

among the results is:
<http://wings.isi.edu/opmexport/resource/ProcessInstance/COMPARELIGANDBINDINGSITESV211332778615941>

All its predicates and objects, for where it is the subject (query6: RDF)

and its subjects and predicates, for where it is the object (query7: RDF):

query 6 gives a quite extensive report of what is associated with this ProcessInstance, including the relation that this ProcessInstance wfprov:wasPartOfWorkflowRun of http://wings.isi.edu/opmexport/resource/Account/ACCOUNT1332778615941 (one of the wfRuns reported by query 3).
Query 7's results are more sparse and only provides references to outputs. Together query 6 and 7 seem to give a comprehensive 'report' on COMPARELIGANDBINDINGSITESV211332778615941

The type information from query 6 tells me that COMPARELIGANDBINDINGSITESV211332778615941 is a ro:Resource, wfprov:Process, and wfprov:Artifact. This seems a little high level.

Questions
6. Is ro:Resource, wfprov:Process, and wfprov:Artifact sufficient type information for something that has predicate wfprov:wasPartOfWorkflowRun?

My starting point should be a workflow. See if I can find that using http://wings.isi.edu/opmexport/resource/Account/ACCOUNT1332778615941 as my starting point (query8: RDF):

to my surprise I find three workflows:
ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON, ABSTRACTSUBWFDOCKING, ABSTRACTGLOBALWORKFLOW2
Looking at the names, I suspect that ABSTRACTGLOBALWORKFLOW2 has ABSTRACTSUBWFLIGANDBINDINGSITESCOMPARISON, ABSTRACTSUBWFDOCKING as components. Possibly all three were returned by query 8 by inference. ABSTRACTGLOBALWORKFLOW2 should be the overall workflow that I was looking for.

Questions
7. Can one workflow run be described by multiple workflow templates? (I assume if and only if all workfllows except the master workflow are part of the master workflow.)

Back to the example user query:
''As a researcher working on a Live RO, when I click on a workflow I would like to see previous runs, when they were run, who ran them, summary of results and comments on the run.''

"Click on a workflow": Assume this is ABSTRACTGLOBALWORKFLOW2

"see previous runs" (query9: RDF):

Result:
ACCOUNT1332778615941, ACCOUNT1332778606534

"when they were run", "who ran them":
I first looked at all predicates of ACCOUNT1332778615941 (query10: RDF, query 11: RDF)

Unfortunately, I did not find timestamps, or relations that point to timestamps. Similarly, I did not see the relations to people. NB to my surprise I saw that the type of the workflow run is wfprov:WorkflowRun and wfdesc:Workflow.

Questions
8. Can a workflow run (type wfprov:WorkflowRun) have type wfdesc:Workflow?
9. Where are the timestamp and credit properties? Which wf4ever RO model artefacts should I look for for these?

I remember that I saw 'Daniel' in the Allegrograph triples store. In queries 12 and 13 I requested all triples for <http://wings.isi.edu/opmexport/resource/Agent/DANIEL>, but none were found (query12: RDF, query13: RDF)

Next I looked at the RO vocabulary specification v0.2 and found that Dublin Core Terminology terms 'created' and 'creator' are suggested. So, I probed the repository for createds (query14: RDF) and creators (query15: RDF)

Result:https://raw.github.com/wf4ever/ro-catalogue/master/v0.1/wf74/ created 2012-03-26T16:41:29

Result:https://raw.github.com/wf4ever/ro-catalogue/master/v0.1/wf74/ creator "Test User"

It seems that for the WINGS workflow these annotations were not applied.

Finally, out of curiosity I looked if anything was aggregated for the WINGS workflow (query16: RDF):

Results:
Many references to wf74, but no reference with 'wings' in the uri.

Idem for annotations (query17: RDF):

To my surprise it appears that no resources in the repository are annotated with the Annotation Ontology.

Question:
10. Where are the Annotation Ontology annotations? Who creates them, where, and how? Could this relate to Annotation=Notebook keeping, and checklist showcases?

My conclusions

  • In general I am happy with the results so far. The RDF of the workflows that I looked at seems pretty ok, both in the Allegrograph KB and the wf4ever KB. The mapping between the RDF produced from the WINGS workflow seems to have worked. Getting to workflow templates, their results and runs or 'process instances' via their interrelationships was not difficult using the lists of predicates and classes provided by the endpoints (as self-explanatory as one may expect from RDF triples).
  • Some information appears to be missing or items may not have been mapped to the wf4ever models yet (see questions below).
  • I could not answer the first user provenance question of the user view on provenance fully (timestamps, credit).
  • I did not check how exhaustive the annotations are, e.g. with respect to the user view on provenance|wiki/display/docs/User+view+on+Provenance|||\. This would require going through all examples there.
  • For testing/validating the KB, it would be convenient if the wf4ever KB would have a more feature-rich user interface (Sesame?, Allegrograph?)
  • The knowledge base endpoints were not highly visible on the showcase 22 report. Because I wanted to do the validation as naive as possible going straight from driving user question to querying the endpoint, I started on the wrong KB.

Highlighted questions copied from above (omitted most questions from part I):

  1. wasGeneratedBy is apparently used as a predicate of ProcessInstances as well as ProcessTemplates. Is this correct? I don't see where is that relation ....not sure what are u refering to.
  2. Is ro:Resource, wfprov:Process, and wfprov:Artifact sufficient type information for something that has predicate wfprov:wasPartOfWorkflowRun?
  3. Can one workflow run be described by multiple workflow templates?
  4. Can a workflow run have type wfdesc:Workflow?
  5. Where are the timestamp and credit properties of the wings workflow? Are there RO model artefacts other than Dublin Core terms 'created' and 'creator' that should I look for?Timestamp is not covered by wfprov right now, and creator is given by wfprov:workflowEngine
  6. Where are the Annotation Ontology annotations? Who creates them, where, and how? Could this relate to Annotation=Notebook keeping, and checklist showcases? IMO As far as the procceses are referable (WINGS case)  it can be annotated using AO as for other RO resources
  • No labels