Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Table of Contents

Esteban text

Introduction

The main goal of the overall showcase 44 is to provide a way to scientist for searching workflows by its funcionality, properties, or other conceptualization allowing their easy accessibility.

...

  1. Taken from Dave mail (full reference at [1] : I would very much like Wf4ever to make this process "assistive" so that when a workflow/pack/RO is uploaded the system provides recommendations for the provenance metadata.
    1. I am missing what the recommendation would be in this case. --khalid
  2. Taken from Pique comments at showcase 45: In multi-Wf ROs composed of several Wfs, when comparing several similar ROs, the workflows that are unusual might be very relevant to a user ---they are a distinguishing feature of the RO. On the contrary, Wfs that occur together in different ROs represent a pattern that can be of interest to the users. This use case may be expanded to comparison of several Wfs having unusual or common/patterns as scripts, web services or modules.
    1. Sorry to be picky :-) What does Pique means by unusual workflow (if I remember well it is related with the discovery of parts of workflows which are characteristic of that workflow and therefore it identifies in some sense what it does).
    2. ^^ I guess unusual means what there are not many of the same kind. What is important to find out is how would it be useful. As a recommendation? Over the templates? over the runs?
  3. Also there is a description of the problem and some comments from Marco at ?[2], where he presents some queries that he would like to
  4. Pinar also highlighted during the sprint planning meeting the repair WF scenario and preservation of workflows as use cases. "yes I think it is more relevant to the focus of the workflow for ever project which is about preservation. REpair is an activity performed as part of conservation of workflows. Therefore I believe it is important to support such use cases. Related work in this area exist mainly based on Case Based Reasoning [9]. "--pinar 
  5. If we are targeting workflow similarity, then rather than relying on structural or semantic based similarity, I would ask the users to provide examples of similar (and different) workflows, and ask them why they think that they are similar or different. This may give us better clues on when two workflows are similar or not. --khalid
    1. Warning: I think Antoon tried to create a golden standard to compare workflows asking different authors whether the workflows were similar/if what similar workflows would be, and he failed because they would not agree. Part of this showcase could be to explore alternatives and then try to confirm them with the users.--dani

...

select ?service_name ?service_uri (count(?workflow) as ?number_of_workflows)
where {
?processor comp:belongs-to-workflow ?workflow .
?processor a comp:WSDLProcessor .
?processor comp:processor-uri ?service_uri .
?processor comp:service-name ?service_name .
}
group by ?service_name
order by desc(?number_of_workflows)

Query2:

PREFIX comp:<http://rdf.myexperiment.org/ontologies/components/>

select ?service_name ?service_uri ?workflow
where {
     ?processor comp:belongs-to-workflow ?workflow .
     ?processor comp:WSDLProcessor .
     ?processor comp:processor-uri ?service_uri .
     ?processor comp:service-name ?service_name .
      FILTER regex(?service_name'run_eSearch'.
      FILTER (?service_uri <http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl>)

 }
     GROUP BY ?workflow

Process syntactic similarity

(Daniel has written this) During the last plenary, we spent some time trying to detect whether 2 different processors were of the same type.

For our analysis we used the SILK framework, which analyzes the similarity of 2 different processes. We did so because we thought that different processors could have different names in different templates.

We compared the different processors, using the dc:title property. This is the configuration file resultant from this configuration:

<Interlinks>
    <Interlink id="aemet-geo">
      <LinkType>owl:sameAs</LinkType>
      <SourceDataset dataSource="myExp1" var="a">
        <RestrictTo> ?a rdf:type myExp:Processor . </RestrictTo>
      </SourceDataset>
      <TargetDataset dataSource="myExp2" var="b">
        <RestrictTo> ?b rdf:type myExp:Processor . </RestrictTo>
      </TargetDataset>
      <LinkageRule>
        <Compare weight="1" threshold="0.1" required="true" metric="levenshteinDistance" id="unnamed_5">
          <TransformInput function="lowerCase" id="unnamed_3">
            <Input path="?a/dct:title" id="unnamed_1"></Input>
          </TransformInput>
          <TransformInput function="lowerCase" id="unnamed_4">
            <Input path="?b/dct:title" id="unnamed_2"></Input>
          </TransformInput>
          <Param name="minChar" value="0"></Param>
          <Param name="maxChar" value="z"></Param>
        </Compare>
      </LinkageRule>
      <Filter></Filter>
      <Outputs>
          <Output type="file" minConfidence="0.1">
            <Param name="file" value="C:\DOld\SILK\silk_2.5.3\silk_2.5.3\MyExperimentProcessors.nt"/>
            <Param name="format" value="ntriples"/>
          </Output>
       </Outputs>
    </Interlink>
  </Interlinks>

...

[7] http://140.115.80.66/data%20mining%20paper%20databases/Data%20and%20Knowledge%20Engineering/Workflow%20mining%20A%20survey.pdf

[8] Khalid Belhajjame, Carole A. Goble, Stian Soiland-Reyes, and David De Roure. Fostering Scientific Workflow Preservation Trough Discovery of Substitute Services. In the proceedings of the IEEE eScience Conference (eScience 2011), IEEE CS, Stockholm, Sweden, 2011.

[9] Towards Case-Based Adaptation of Workflows Mirjam Minor, Ralph Bergmann, Sebastian Görg, and Kirstin Walter.Towards Case-Based Adaptation of Workflows
Mirjam Minor, Ralph Bergmann, Sebastian Görg, and Kirstin Walter