Difference between revisions of "Document the provenance of the results"

From Geoscience Paper of the Future
Jump to: navigation, search
(Added PropertyValue: Expertise = open science)
(Set PropertyValue: TargetDate = 2015-04-04)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Category:Task]]
 
[[Category:Task]]
 +
 +
== What This Task Involves ==
 +
 +
The training session and training materials indicate how to:
 +
 +
# Capture the provenance of the results in a paper
 +
# Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
 +
# Publish the provenance and make it part of a publication
 +
 +
== Training Materials ==
 +
 +
This training session was held on March 6, 2015:
 +
 +
* '''[https://www.dropbox.com/s/dlbpkdjque54oni/GPF-Provenance-6March2015.pdf?dl=0 Presentation]'''
 +
 +
=== Suggested Readings ===
 +
 +
* [http://ijdc.net/index.php/ijdc/article/view/203 “Requirements for Provenance on the Web.”] Paul Groth, Yolanda Gil, James Cheney, Simon Miles.  International Journal of Digital Curation, 7(1), 2012.
 +
** ''A general overview of provenance'''
 +
 +
* [http://www.w3.org/TR/prov-primer/ "A Primer for the PROV Provenance Model."] Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik.  Published as a W3C Working Group Note on 30 April 2013.
 +
** ''A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle''
 +
 +
* [http://www.iemss.org/sites/iemss2014/papers/iemss2014_submission_384.pdf "Intelligent Workflow Systems and Provenance-Aware Software."] Gil, Y. In Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, 2014. 
 +
** ''A brief introduction to workflows for scientists, giving examples and explanations of their benefits''
 +
 +
== What To Do ==
 +
 +
We described many options in the training.  Here is a sketch of the most common approach:
 +
# At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
 +
#*Mention the datasets used, the software, and the data flow across the software components
 +
#* Specify unique identifiers for data and software, mention the version used, credit all the sources
 +
# Develop a workflow sketch and show it in a figure or in an appendix
 +
#* Capture high-level dataflow across components
 +
# To really capture the full provenance, specify the formal workflow or provenance record
 +
#* The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
 +
#* Options:
 +
#*# Describe it as a graph where the nodes are computations and the links show data and parameters
 +
#*# Use the PROV provenance standard (start with a result and trace back how it was generated)
 +
#*# Use a workflow system (e.g. [http://www.wings-workflows.org WINGS]) to create the data flow graph
 +
#* Publish the formal workflow or provenance record, and assign a unique identifier
 +
#** Cite it in the paper
 +
#** Show the provenance graph
 +
 +
=== Using the WINGS Workflow System to Document Provenance ===
 +
 +
Documentation on how to use the [http://www.wings-workflows.org/ WINGS workflow system]:
 +
* [http://www.wings-workflows.org/tutorial A tutorial]
 +
* [http://www.wings-workflows.org/node/43 Introductory papers]
 +
* [http://www.wings-workflows.org/wings-portal/ A portal where you can get an account and try it out]
 +
* [http://www.wings-workflows.org/download-wings-portal Downloading the code to set up in a local machine]
 +
 +
Some examples of workflows created with WINGS for GPF papers:
 +
* [[Document_provenance_of_results_by_Kyo_Lee | Kyo's provenance]]
 +
 +
 
<!-- Add any wiki Text above this Line -->
 
<!-- Add any wiki Text above this Line -->
 
<!-- Do NOT Edit below this Line -->
 
<!-- Do NOT Edit below this Line -->
Line 5: Line 61:
 
Expertise=Open_science|
 
Expertise=Open_science|
 
Owner=Yolanda_Gil|
 
Owner=Yolanda_Gil|
Progress=0|
+
Progress=40|
StartDate=2015-03-13|
+
StartDate=2015-03-07|
TargetDate=2015-03-20|
+
TargetDate=2015-04-04|
 
Type=Low}}
 
Type=Low}}

Latest revision as of 22:28, 16 April 2015


What This Task Involves

The training session and training materials indicate how to:

  1. Capture the provenance of the results in a paper
  2. Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
  3. Publish the provenance and make it part of a publication

Training Materials

This training session was held on March 6, 2015:

Suggested Readings

  • "A Primer for the PROV Provenance Model." Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Group Note on 30 April 2013.
    • A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle

What To Do

We described many options in the training. Here is a sketch of the most common approach:

  1. At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
    • Mention the datasets used, the software, and the data flow across the software components
    • Specify unique identifiers for data and software, mention the version used, credit all the sources
  2. Develop a workflow sketch and show it in a figure or in an appendix
    • Capture high-level dataflow across components
  3. To really capture the full provenance, specify the formal workflow or provenance record
    • The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
    • Options:
      1. Describe it as a graph where the nodes are computations and the links show data and parameters
      2. Use the PROV provenance standard (start with a result and trace back how it was generated)
      3. Use a workflow system (e.g. WINGS) to create the data flow graph
    • Publish the formal workflow or provenance record, and assign a unique identifier
      • Cite it in the paper
      • Show the provenance graph

Using the WINGS Workflow System to Document Provenance

Documentation on how to use the WINGS workflow system:

Some examples of workflows created with WINGS for GPF papers: