Difference between revisions of "Document the provenance of the results"
From Geoscience Paper of the Future
(Set PropertyValue: TargetDate = 2015-03-06) |
(Set PropertyValue: TargetDate = 2015-04-04) |
||
| (16 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
[[Category:Task]] | [[Category:Task]] | ||
| + | |||
| + | == What This Task Involves == | ||
| + | |||
| + | The training session and training materials indicate how to: | ||
| + | |||
| + | # Capture the provenance of the results in a paper | ||
| + | # Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is | ||
| + | # Publish the provenance and make it part of a publication | ||
| + | |||
| + | == Training Materials == | ||
| + | |||
| + | This training session was held on March 6, 2015: | ||
| + | |||
| + | * '''[https://www.dropbox.com/s/dlbpkdjque54oni/GPF-Provenance-6March2015.pdf?dl=0 Presentation]''' | ||
| + | |||
| + | === Suggested Readings === | ||
| + | |||
| + | * [http://ijdc.net/index.php/ijdc/article/view/203 “Requirements for Provenance on the Web.”] Paul Groth, Yolanda Gil, James Cheney, Simon Miles. International Journal of Digital Curation, 7(1), 2012. | ||
| + | ** ''A general overview of provenance''' | ||
| + | |||
| + | * [http://www.w3.org/TR/prov-primer/ "A Primer for the PROV Provenance Model."] Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Group Note on 30 April 2013. | ||
| + | ** ''A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle'' | ||
| + | |||
| + | * [http://www.iemss.org/sites/iemss2014/papers/iemss2014_submission_384.pdf "Intelligent Workflow Systems and Provenance-Aware Software."] Gil, Y. In Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, 2014. | ||
| + | ** ''A brief introduction to workflows for scientists, giving examples and explanations of their benefits'' | ||
| + | |||
| + | == What To Do == | ||
| + | |||
| + | We described many options in the training. Here is a sketch of the most common approach: | ||
| + | # At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix | ||
| + | #*Mention the datasets used, the software, and the data flow across the software components | ||
| + | #* Specify unique identifiers for data and software, mention the version used, credit all the sources | ||
| + | # Develop a workflow sketch and show it in a figure or in an appendix | ||
| + | #* Capture high-level dataflow across components | ||
| + | # To really capture the full provenance, specify the formal workflow or provenance record | ||
| + | #* The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used | ||
| + | #* Options: | ||
| + | #*# Describe it as a graph where the nodes are computations and the links show data and parameters | ||
| + | #*# Use the PROV provenance standard (start with a result and trace back how it was generated) | ||
| + | #*# Use a workflow system (e.g. [http://www.wings-workflows.org WINGS]) to create the data flow graph | ||
| + | #* Publish the formal workflow or provenance record, and assign a unique identifier | ||
| + | #** Cite it in the paper | ||
| + | #** Show the provenance graph | ||
| + | |||
| + | === Using the WINGS Workflow System to Document Provenance === | ||
| + | |||
| + | Documentation on how to use the [http://www.wings-workflows.org/ WINGS workflow system]: | ||
| + | * [http://www.wings-workflows.org/tutorial A tutorial] | ||
| + | * [http://www.wings-workflows.org/node/43 Introductory papers] | ||
| + | * [http://www.wings-workflows.org/wings-portal/ A portal where you can get an account and try it out] | ||
| + | * [http://www.wings-workflows.org/download-wings-portal Downloading the code to set up in a local machine] | ||
| + | |||
| + | Some examples of workflows created with WINGS for GPF papers: | ||
| + | * [[Document_provenance_of_results_by_Kyo_Lee | Kyo's provenance]] | ||
| + | |||
| + | |||
<!-- Add any wiki Text above this Line --> | <!-- Add any wiki Text above this Line --> | ||
<!-- Do NOT Edit below this Line --> | <!-- Do NOT Edit below this Line --> | ||
{{#set: | {{#set: | ||
| − | Progress= | + | Expertise=Open_science| |
| − | StartDate=2015- | + | Owner=Yolanda_Gil| |
| − | TargetDate=2015- | + | Progress=40| |
| + | StartDate=2015-03-07| | ||
| + | TargetDate=2015-04-04| | ||
Type=Low}} | Type=Low}} | ||
Latest revision as of 05:28, 17 April 2015
Contents
What This Task Involves
The training session and training materials indicate how to:
- Capture the provenance of the results in a paper
- Develop a workflow sketch, a formal workflow, or a provenance record that represent to different degrees of accuracy what the provenance of the results is
- Publish the provenance and make it part of a publication
Training Materials
This training session was held on March 6, 2015:
Suggested Readings
- “Requirements for Provenance on the Web.” Paul Groth, Yolanda Gil, James Cheney, Simon Miles. International Journal of Digital Curation, 7(1), 2012.
- A general overview of provenance'
- "A Primer for the PROV Provenance Model." Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Group Note on 30 April 2013.
- A brief and practical introduction to the PROV standard for provenance, showing examples of how to represent the provenance record in RDF through a simple notation called Turtle
- "Intelligent Workflow Systems and Provenance-Aware Software." Gil, Y. In Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, 2014.
- A brief introduction to workflows for scientists, giving examples and explanations of their benefits
What To Do
We described many options in the training. Here is a sketch of the most common approach:
- At the very minimum, describe the workflow in the text (a "Methods" section) or in an appendix
- Mention the datasets used, the software, and the data flow across the software components
- Specify unique identifiers for data and software, mention the version used, credit all the sources
- Develop a workflow sketch and show it in a figure or in an appendix
- Capture high-level dataflow across components
- To really capture the full provenance, specify the formal workflow or provenance record
- The formal workflow shows all data flow across components, corresponding to the detailed command line invocations and parameter values used
- Options:
- Describe it as a graph where the nodes are computations and the links show data and parameters
- Use the PROV provenance standard (start with a result and trace back how it was generated)
- Use a workflow system (e.g. WINGS) to create the data flow graph
- Publish the formal workflow or provenance record, and assign a unique identifier
- Cite it in the paper
- Show the provenance graph
Using the WINGS Workflow System to Document Provenance
Documentation on how to use the WINGS workflow system:
- A tutorial
- Introductory papers
- A portal where you can get an account and try it out
- Downloading the code to set up in a local machine
Some examples of workflows created with WINGS for GPF papers: