Document the provenance of the results
Timeline
SubTasks
From Geoscience Paper of the Future
    TypeM
    low
    ProgressM
    40%
    Start dateM
    7th Mar 2015
    Target dateM
    4th Apr 2015
    Participants
    Not defined!
    Expertise
    open science
    Legend: M Mandatory | States: Not defined, Valid, Inconsistent with parent
    (Set PropertyValue: TargetDate = 2015-03-20)
    (Set PropertyValue: TargetDate = 2015-04-04)
     
    (15 intermediate revisions by the same user not shown)
    Line 1: Line 1:
     
    [[Category:Task]]
     
    [[Category:Task]]
     +
     +
    == What This Task Involves ==
     +
     +
    The training session and training materials indicate how to:
     +
     +
    # Capture the provenance of the results in a paper
     +
    # Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
     +
    # Publish the provenance and make it part of a publication
     +
     +
    == Training Materials ==
     +
     +
    This training session was held on March 6, 2015:
     +
     +
    * '''[https://www.dropbox.com/s/dlbpkdjque54oni/GPF-Provenance-6March2015.pdf?dl=0 Presentation]'''
     +
     +
    === Suggested Readings ===
     +
     +
    * [http://ijdc.net/index.php/ijdc/article/view/203 “Requirements for Provenance on the Web.”] Paul Groth, Yolanda Gil, James Cheney, Simon Miles.  International Journal of Digital Curation, 7(1), 2012.
     +
    ** ''A general overview of provenance'''
     +
     +
    * [http://www.w3.org/TR/prov-primer/ "A Primer for the PROV Provenance Model."] Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik.  Published as a W3C Working Group Note on 30 April 2013.
     +
    ** ''A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle''
     +
     +
    * [http://www.iemss.org/sites/iemss2014/papers/iemss2014_submission_384.pdf "Intelligent Workflow Systems and Provenance-Aware Software."] Gil, Y. In Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, 2014. 
     +
    ** ''A brief introduction to workflows for scientists, giving examples and explanations of their benefits''
     +
     +
    == What To Do ==
     +
     +
    We described many options in the training.  Here is a sketch of the most common approach:
     +
    # At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
     +
    #*Mention the datasets used, the software, and the data flow across the software components
     +
    #* Specify unique identifiers for data and software, mention the version used, credit all the sources
     +
    # Develop a workflow sketch and show it in a figure or in an appendix
     +
    #* Capture high-level dataflow across components
     +
    # To really capture the full provenance, specify the formal workflow or provenance record
     +
    #* The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
     +
    #* Options:
     +
    #*# Describe it as a graph where the nodes are computations and the links show data and parameters
     +
    #*# Use the PROV provenance standard (start with a result and trace back how it was generated)
     +
    #*# Use a workflow system (e.g. [http://www.wings-workflows.org WINGS]) to create the data flow graph
     +
    #* Publish the formal workflow or provenance record, and assign a unique identifier
     +
    #** Cite it in the paper
     +
    #** Show the provenance graph
     +
     +
    === Using the WINGS Workflow System to Document Provenance ===
     +
     +
    Documentation on how to use the [http://www.wings-workflows.org/ WINGS workflow system]:
     +
    * [http://www.wings-workflows.org/tutorial A tutorial]
     +
    * [http://www.wings-workflows.org/node/43 Introductory papers]
     +
    * [http://www.wings-workflows.org/wings-portal/ A portal where you can get an account and try it out]
     +
    * [http://www.wings-workflows.org/download-wings-portal Downloading the code to set up in a local machine]
     +
     +
    Some examples of workflows created with WINGS for GPF papers:
     +
    * [[Document_provenance_of_results_by_Kyo_Lee | Kyo's provenance]]
     +
     +
     
    <!-- Add any wiki Text above this Line -->
     
    <!-- Add any wiki Text above this Line -->
     
    <!-- Do NOT Edit below this Line -->
     
    <!-- Do NOT Edit below this Line -->
     
    {{#set:
     
    {{#set:
    Progress=0|
    +
    Expertise=Open_science|
    StartDate=2015-02-20|
    +
    Owner=Yolanda_Gil|
    TargetDate=2015-03-20|
    +
    Progress=40|
     +
    StartDate=2015-03-07|
     +
    TargetDate=2015-04-04|
     
    Type=Low}}
     
    Type=Low}}

    Latest revision as of 05:28, 17 April 2015


    What This Task Involves

    The training session and training materials indicate how to:

    1. Capture the provenance of the results in a paper
    2. Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
    3. Publish the provenance and make it part of a publication

    Training Materials

    This training session was held on March 6, 2015:

    Suggested Readings

    • "A Primer for the PROV Provenance Model." Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Group Note on 30 April 2013.
      • A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle

    What To Do

    We described many options in the training. Here is a sketch of the most common approach:

    1. At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
      • Mention the datasets used, the software, and the data flow across the software components
      • Specify unique identifiers for data and software, mention the version used, credit all the sources
    2. Develop a workflow sketch and show it in a figure or in an appendix
      • Capture high-level dataflow across components
    3. To really capture the full provenance, specify the formal workflow or provenance record
      • The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
      • Options:
        1. Describe it as a graph where the nodes are computations and the links show data and parameters
        2. Use the PROV provenance standard (start with a result and trace back how it was generated)
        3. Use a workflow system (e.g. WINGS) to create the data flow graph
      • Publish the formal workflow or provenance record, and assign a unique identifier
        • Cite it in the paper
        • Show the provenance graph

    Using the WINGS Workflow System to Document Provenance

    Documentation on how to use the WINGS workflow system:

    Some examples of workflows created with WINGS for GPF papers:


    Properties
    Credits
    Users who have contributed to this Task, its SubTasks and Answers: