https://www.organicdatascience.org/gpf/api.php?action=feedcontributions&user=Mimi&feedformat=atomGeoscience Paper of the Future - User contributions [en]2024-03-29T13:05:11ZUser contributionsMediaWiki 1.24.1https://www.organicdatascience.org/gpf/index.php?title=Plan_timeline_for_all_submissions&diff=12176Plan timeline for all submissions2015-08-19T17:36:06Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br />
'''[[Main_Page#Roster_of_GeoSoft_GPF_Papers | Details about the papers are here]]'''<br />
<br />
Submissions will be staggered, probably with two target dates:<br />
<br />
* Submissions targeting July 10: David, Fulweiler, Pope, Tzeng, Yu, and the joint paper.<br />
* Submissions targeting August 31: Goodall/Essawy, Lee, Mills, Oh, Villamizar, and probably also Karlstrom and Pierce.<br />
<br />
Target submission dates and peer review assignments are below. In bold are target dates confirmed by authors (please update as needed). Check with your peer reviewer a few days before you finish your draft to give them some warning that your paper is coming.<br />
<br />
{| class="wikitable" style="color:black; background-color:#ffffcc;" cellpadding="10"<br />
|style="width: 20%" |'''Name'''<br />
|style="width: 20%" |'''First Full Draft'''<br />
|style="width: 20%" |'''Peer Reviewer'''<br />
|style="width: 20%" |'''Peer Review Completed'''<br />
|style="width: 20%" |'''Final Draft'''<br />
|-<br />
| Cedric David<br />
| June 20<br />
| Villamizar<br />
| June 25<br />
| '''June 30'''<br />
|-<br />
| Demir et al<br />
| July 15<br />
| Yu<br />
| July 22<br />
| '''July 31'''<br />
|-<br />
| Wally Fulweiler et al<br />
| June 20<br />
| Mills<br />
| June 25<br />
| '''June 30'''<br />
|-<br />
| Bakinam Essawy, Jon Goodall et al <br />
| July 20<br />
| Pierce<br />
| July 25<br />
| '''July 31'''<br />
|-<br />
|Leif Karlstrom & Lay Kuan Loh<br />
| June 20<br />
| Pope<br />
| June 25<br />
| '''June 30'''<br />
|-<br />
| Kyo Lee et al<br />
| Aug 20<br />
| Oh<br />
| Aug 25<br />
| '''Aug 31'''<br />
|-<br />
| Heath Mills et al<br />
| Aug 20<br />
| Fulweiler<br />
| Aug 25<br />
| '''Aug 31'''<br />
|-<br />
| Ji-Hyun Oh<br />
| Aug 20<br />
| Tzeng<br />
| Aug 25<br />
| '''Aug 31'''<br />
|-<br />
| Suzanne Pierce et al<br />
| June 20<br />
| Essawy<br />
| June 25<br />
| June 30<br />
|-<br />
| Allen Pope<br />
| June 23<br />
| Karlstrom<br />
| June 27<br />
| '''July 3'''<br />
|-<br />
| Mimi Tzeng et al<br />
| Aug 20<br />
| Lee<br />
| Aug 25<br />
| '''Aug 31'''<br />
|-<br />
| Sandra Villamizar et al<br />
| '''June 30'''<br />
| David<br />
| '''July 15'''<br />
| '''July 31'''<br />
|-<br />
| Xuan Yu et al<br />
| June 25<br />
| Demir<br />
| June 30<br />
| '''July 5'''<br />
|}<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Owner=Yolanda_Gil|<br />
Progress=90|<br />
StartDate=2015-06-05|<br />
TargetDate=2015-07-31|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_software_by_specifying_metadata_by_Mimi_Tzeng&diff=12172Document software by specifying metadata by Mimi Tzeng2015-07-24T00:31:09Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br />
Tried out the GeoSoft Portal to document [http://www.geosoft-earthcube.org/portal/#browse/Software-uruml4oqtqp2 mooring_all.pl]. Information entered so far is preliminary. I assume that I will need to write metadata for all of the following scripts:<br />
<br />
<ol><br />
<li>mooring_all.pl</li><br />
<li>FindMoorEnds.m</li><br />
<li>moorburst.m</li><br />
<li>MOORprocess_all.m</li><br />
<li>FindADCPendpoints.m</li><br />
<li>stickplot.m (this one isn't actually mine; will have to figure out where I got it from)</li><br />
<li>MoorADCP.m</li><br />
</ol><br />
<br />
Update 7/23/2015: I decided that I would only write metadata for the three main scripts: mooring_all.pl, MOORprocess_all.m, and MoorADCP.m. I used the OntoSoft portal. Unfortunately, the interface has some serious bugs when it comes to saving entered info, so I guess when the time comes to collect copies of completed metadata for my future zip file, I will have to edit in the info manually.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Progress=90}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_software_by_specifying_metadata_by_Mimi_Tzeng&diff=12171Document software by specifying metadata by Mimi Tzeng2015-07-24T00:29:13Z<p>Mimi: Set PropertyValue: Progress = 90</p>
<hr />
<div>[[Category:Task]]<br />
<br />
Tried out the GeoSoft Portal to document [http://www.geosoft-earthcube.org/portal/#browse/Software-uruml4oqtqp2 mooring_all.pl]. Information entered so far is preliminary. I assume that I will need to write metadata for all of the following scripts:<br />
<br />
<ol><br />
<li>mooring_all.pl</li><br />
<li>FindMoorEnds.m</li><br />
<li>moorburst.m</li><br />
<li>MOORprocess_all.m</li><br />
<li>FindADCPendpoints.m</li><br />
<li>stickplot.m (this one isn't actually mine; will have to figure out where I got it from)</li><br />
<li>MoorADCP.m</li><br />
</ol><br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Progress=90}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Prepare_the_article_for_publication_by_Mimi_Tzeng&diff=12170Prepare the article for publication by Mimi Tzeng2015-07-24T00:28:50Z<p>Mimi: Set PropertyValue: Progress = 75</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Prepare the article for publication]]<br/><br/><br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=75|<br />
StartDate=2015-05-16|<br />
TargetDate=2015-05-29|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=12168Make data accessible by Mimi Tzeng2015-07-14T19:58:30Z<p>Mimi: Set PropertyValue: Progress = 100</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<hr noshade size=2><br />
<br />
'''Files to include:'''<br />
<br />
# From MOOR: the initial data files after preliminary processing through the proprietary software that came with the sensors, before the perl script<br />
# From MOOR: timestamps.txt<br />
# From MOOR: the Matlab data file that contains all the variables, generated by MOORprocess_all.m<br />
# From MOOR: everything generated by MOORprocess_all.m (after PDFs have been concatenated)<br />
# From ADCP: the Matlab data file exported from WinADCP<br />
# From ADCP: endpoints.txt<br />
# From ADCP: the Matlab data file generated by MoorADCP.m<br />
# From ADCP: everything generated by MoorADCP.m (after PDFs have been concatenated)<br />
<br />
7/1/2015 update: The data are all accessible on Zenodo now. Minor issue: four of the text files are not formatted correctly, and I'll need to track down in the code why it's not outputting the way it's supposed to. I don't know if I'm going to get to this or not, or if I will just put a note somewhere mentioning it.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=12165Make data accessible by Mimi Tzeng2015-07-01T21:19:24Z<p>Mimi: Set PropertyValue: Progress = 95</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<hr noshade size=2><br />
<br />
'''Files to include:'''<br />
<br />
# From MOOR: the initial data files after preliminary processing through the proprietary software that came with the sensors, before the perl script<br />
# From MOOR: timestamps.txt<br />
# From MOOR: the Matlab data file that contains all the variables, generated by MOORprocess_all.m<br />
# From MOOR: everything generated by MOORprocess_all.m (after PDFs have been concatenated)<br />
# From ADCP: the Matlab data file exported from WinADCP<br />
# From ADCP: endpoints.txt<br />
# From ADCP: the Matlab data file generated by MoorADCP.m<br />
# From ADCP: everything generated by MoorADCP.m (after PDFs have been concatenated)<br />
<br />
7/1/2015 update: The data are all accessible on Zenodo now. Minor issue: four of the text files are not formatted correctly, and I'll need to track down in the code why it's not outputting the way it's supposed to. I don't know if I'm going to get to this or not, or if I will just put a note somewhere mentioning it.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=95|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=12164Make data accessible by Mimi Tzeng2015-07-01T21:19:02Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<hr noshade size=2><br />
<br />
'''Files to include:'''<br />
<br />
# From MOOR: the initial data files after preliminary processing through the proprietary software that came with the sensors, before the perl script<br />
# From MOOR: timestamps.txt<br />
# From MOOR: the Matlab data file that contains all the variables, generated by MOORprocess_all.m<br />
# From MOOR: everything generated by MOORprocess_all.m (after PDFs have been concatenated)<br />
# From ADCP: the Matlab data file exported from WinADCP<br />
# From ADCP: endpoints.txt<br />
# From ADCP: the Matlab data file generated by MoorADCP.m<br />
# From ADCP: everything generated by MoorADCP.m (after PDFs have been concatenated)<br />
<br />
7/1/2015 update: The data are all accessible on Zenodo now. Minor issue: four of the text files are not formatted correctly, and I'll need to track down in the code why it's not outputting the way it's supposed to. I don't know if I'm going to get to this or not, or if I will just put a note somewhere mentioning it.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=90|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Prepare_the_article_for_publication_by_Mimi_Tzeng&diff=12163Prepare the article for publication by Mimi Tzeng2015-07-01T21:16:43Z<p>Mimi: Set PropertyValue: Progress = 50</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Prepare the article for publication]]<br/><br/><br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=50|<br />
StartDate=2015-05-16|<br />
TargetDate=2015-05-29|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=12162Document domain characteristics by Mimi Tzeng2015-07-01T21:16:08Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
Update: I've started working on matching them up with the CSDMS variables via a [https://docs.google.com/spreadsheets/d/1IUTSWAHs2Rr5Fuuo0POPF1dQcB_Uhty4p1ZFURMpVE8/edit?usp=sharing Google spreadsheet]. It turns out that I don't know what specifically some of my variables are (conductivity for example: electrical or thermal?). I also don't see anything in there about light attenuation (PAR, turbidity). There's chlorophyll, but apparently only as measured via diatoms and not all phytoplankton. Dissolved oxygen seems to be missing, as are nutrients in general (nitrogen, phosphates) as is anything involving particulates (colored dissolved organic matter, gelbstoff, total suspended solids, etc). I might be looking in the wrong part of the list. Then there seems to be both a sea_water_flow__speed and a sea_water__flow_speed, and I'm not sure what the difference is.<br />
<br />
In summary: it sounds like a simple task, but actually involves looking a lot of things up in detail, and can potentially be time-consuming.<br />
<br />
Whatever the final released version of the CSDMS dictionary is, it needs to provide links to what everything means on each item in the list. <br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=60|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_software_accessible_by_Mimi_Tzeng&diff=12161Make software accessible by Mimi Tzeng2015-07-01T21:14:21Z<p>Mimi: Set PropertyValue: Progress = 100</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software accessible]]<br/><br/><br />
<br />
So far I've created an account on GitHub and plan to put the software there when it's ready. The account is located at: https://github.com/KayarBlue/GeoSoftGPF_1<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-04-04|<br />
TargetDate=2015-04-17|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=12160Make data accessible by Mimi Tzeng2015-07-01T21:13:34Z<p>Mimi: Set PropertyValue: Progress = 90</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<hr noshade size=2><br />
<br />
'''Files to include:'''<br />
<br />
# From MOOR: the initial data files after preliminary processing through the proprietary software that came with the sensors, before the perl script<br />
# From MOOR: timestamps.txt<br />
# From MOOR: the Matlab data file that contains all the variables, generated by MOORprocess_all.m<br />
# From MOOR: everything generated by MOORprocess_all.m (after PDFs have been concatenated)<br />
# From ADCP: the Matlab data file exported from WinADCP<br />
# From ADCP: endpoints.txt<br />
# From ADCP: the Matlab data file generated by MoorADCP.m<br />
# From ADCP: everything generated by MoorADCP.m (after PDFs have been concatenated)<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=90|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Ensure_software_is_usable_by_Mimi_Tzeng&diff=12159Ensure software is usable by Mimi Tzeng2015-07-01T21:13:21Z<p>Mimi: Set PropertyValue: Progress = 100</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Ensure software is usable]]<br/><br/><br />
<br />
<li>Data to be used: FOCAL mooring from 20110413_20110518</li><br />
<br />
<li>Software for CTDs and Thermisters: </li><br />
<ul><li>SeaTerm</li><br />
<li>SBE Data Processing</li><br />
<li>EcoWatch (for YSI)</li><br />
<li>perl</li><br />
<li>any text editor</li><br />
<li>Matlab</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
<li>Software for ADCP:</li><br />
<ul><li>WinADCP</li><br />
<li>Matlab</li><br />
<li>any text editor</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
Software seems to come in two forms:<br><br />
A) the off-the-shelf professionally-produced kind that most people think of upon seeing the word "software"<br><br />
B) the highly specific custom-made scripts that automate particular tasks for a particular lab or other working group, generally intended for internal use.<br />
<br><br><br />
The list above is for software of type A.<br><br />
Software of Type B for this project are in the form of several Matlab and perl scripts. These are gathered together in a folder; they don't have names that will mean anything to anyone else, and so they aren't listed here. <br />
<br><br><br />
Task completion: I am waiting for Matlab to be reinstalled on the PI's lab computer so that I can rerun the entire processing to make sure that everything still works as intended. I am 95% confident that all will be fine, but won't claim that the "ensure software is usable" task is 100% complete until I can do this.<br />
<br />
Update 20 Mar 2015: Today the PI got around to putting Matlab on his lab computer. I'm now much less confident that all will be fine, because the version of Matlab I used originally was (probably) from 2010, and the current version on the new PI's lab computer is 2015. Matlab has an annoying habit of changing things around between versions so that all previous scripts break in really stupid and time-consuming ways. I've already found two subtle differences (one of them will require a minor change in my perl script), and will have to look through the rest in more detail. <br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-07|<br />
TargetDate=2015-02-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=File:20150320_DocumentProvenance-MWTc.png&diff=12158File:20150320 DocumentProvenance-MWTc.png2015-06-27T01:37:53Z<p>Mimi: Mimi uploaded a new version of File:20150320 DocumentProvenance-MWTc.png</p>
<hr />
<div>third draft of workflow diagram</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Prepare_the_article_for_publication_by_Mimi_Tzeng&diff=12000Prepare the article for publication by Mimi Tzeng2015-06-03T22:42:08Z<p>Mimi: Set PropertyValue: Progress = 10</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Prepare the article for publication]]<br/><br/><br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-05-16|<br />
TargetDate=2015-05-29|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=11999Document domain characteristics by Mimi Tzeng2015-06-03T19:16:43Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
Update: I've started working on matching them up with the CSDMS variables via a [https://docs.google.com/spreadsheets/d/1IUTSWAHs2Rr5Fuuo0POPF1dQcB_Uhty4p1ZFURMpVE8/edit?usp=sharing Google spreadsheet]. It turns out that I don't know what specifically some of my variables are (conductivity for example: electrical or thermal?). I also don't see anything in there about light attenuation (PAR, turbidity). There's chlorophyll, but apparently only as measured via diatoms and not all phytoplankton. Dissolved oxygen seems to be missing, as are nutrients in general (nitrogen, phosphates) as is anything involving particulates (colored dissolved organic matter, gelbstoff, total suspended solids, etc). I might be looking in the wrong part of the list. Then there seems to be both a sea_water_flow__speed and a sea_water__flow_speed, and I'm not sure what the difference is.<br />
<br />
In summary: it sounds like a simple task, but actually involves looking a lot of things up in detail, and can potentially be time-consuming.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=60|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=11998Document domain characteristics by Mimi Tzeng2015-06-03T19:14:54Z<p>Mimi: Set PropertyValue: Progress = 60</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
Update: I've started working on matching them up with the CSDMS variables via a [https://docs.google.com/spreadsheets/d/1IUTSWAHs2Rr5Fuuo0POPF1dQcB_Uhty4p1ZFURMpVE8/edit?usp=sharing Google spreadsheet]. It turns out that I don't know what specifically some of my variables are (conductivity for example: electrical or thermal?). I also don't see anything in there about light attenuation (PAR, turbidity). There's chlorophyll, but apparently only as measured via diatoms and not all phytoplankton. Dissolved oxygen seems to be missing, as are nutrients in general (nitrogen, phosphates) as is anything involving particulates (colored dissolved organic matter, gelbstoff, total suspended solids, etc). I might be looking in the wrong part of the list.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=60|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=11997Document domain characteristics by Mimi Tzeng2015-06-03T19:11:09Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
Update: I've started working on matching them up with the CSDMS variables via a [https://docs.google.com/spreadsheets/d/1IUTSWAHs2Rr5Fuuo0POPF1dQcB_Uhty4p1ZFURMpVE8/edit?usp=sharing Google spreadsheet]. It turns out that I don't know what specifically some of my variables are (conductivity for example: electrical or thermal?). I also don't see anything in there about light attenuation (PAR, turbidity). There's chlorophyll, but apparently only as measured via diatoms and not all phytoplankton. Dissolved oxygen seems to be missing, as are nutrients in general (nitrogen, phosphates) as is anything involving particulates (colored dissolved organic matter, gelbstoff, total suspended solids, etc). I might be looking in the wrong part of the list.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=20|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11996Document provenance of results by Mimi Tzeng2015-06-03T18:21:25Z<p>Mimi: Set PropertyValue: Progress = 100</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTc.png]]<br />
<br />
Note from Telecon 3/20: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
Note from Telecon 4/3: use footnotes on a separate page, since the workflow diagram is already pretty complex.<br />
<br />
11 Apr 2015: updated version of workflow diagram. Coming soon: a separate page keyed to the blue letters.<br />
<br />
3 Jun 2015: I've decided that the blue letters are going to be explained in detail in the main text instead. The only other change that needs to be made to the diagram is to remove the little note at the upper left that says I need to make a mooring diagram for the paper; I have this done now.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11995Document provenance of results by Mimi Tzeng2015-06-03T18:20:12Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTc.png]]<br />
<br />
Note from Telecon 3/20: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
Note from Telecon 4/3: use footnotes on a separate page, since the workflow diagram is already pretty complex.<br />
<br />
11 Apr 2015: updated version of workflow diagram. Coming soon: a separate page keyed to the blue letters.<br />
<br />
3 Jun 2015: I've decided that the blue letters are going to be explained in detail in the main text instead. The only other change that needs to be made to the diagram is to remove the little note at the upper left that says I need to make a mooring diagram for the paper; I have this done now.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=95|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=11994Document domain characteristics by Mimi Tzeng2015-06-03T18:14:47Z<p>Mimi: Set PropertyValue: Progress = 20</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=20|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_domain_characteristics_by_Mimi_Tzeng&diff=11993Document domain characteristics by Mimi Tzeng2015-06-03T18:14:37Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document domain characteristics]]<br/><br/><br />
<br />
As I understand this task, we are to take all of the variables used in the project and find the corresponding variable names from the [http://csdms.colorado.edu/wiki/CSN_Searchable_List CSDMS list]. So to start, here are lists of all of my variables. I'm just going to directly copy-paste from my scripts (MOORprocess_all.m and MoorADCP.m) for starters. I imagine the publication-ready version of this will be in three columns: the "common" name of the variable as oceanographers would understand it, the name of the variable in my code, and the name of the variable in the CSDMS.<br />
<br />
<pre><br />
YSI<br />
<ul><br />
ctddepth{3} = YSI(:,6);<br />
ctdsal{3} = YSI(:,7);<br />
ctdtemp{3} = YSI(:,8);<br />
ctdpress{3} = YSI(:,9);<br />
%ctdpress{3}(1:length(ctdtimestamp{3})) = NaN;<br />
ctdcond{3} = YSI(:,10);<br />
ctdspcond{3} = YSI(:,11);<br />
ctdoxy{3} = YSI(:,13);<br />
ctdoxysat{3} = YSI(:,14);<br />
%ctdchla{3} = YSI(:,?);<br />
%ctdturb{3} = YSI(:,15);<br />
ctdturb{3}(1:length(ctdtimestamp{3})) = NaN; % a kludge for nonexistent variables in an existing package<br />
ctdtds{3} = YSI(:,15);<br />
</ul><br />
<br />
CTDs<br />
<ul><br />
ctdtemp{r} = CTD(:,1);<br />
ctdcond{r} = CTD(:,2);<br />
ctdpress{r} = CTD(:,3);<br />
ctdsal{r} = CTD(:,4);<br />
ctdsound{r} = CTD(:,5);<br />
ctdjday = CTD(:,6); %coming in, jday is Jan 1 = Day 1<br />
ctddens{r} = CTD(:,8);<br />
ctddepth{r} = CTD(:,9);<br />
ctdspcond{r} = CTD(:,10);<br />
</ul><br />
<br />
Thermistors<br />
<ul><br />
tempT{r} = dataT(:,1);<br />
pressT = dataT(:,2);<br />
depthT = cnv(:,2);<br />
</ul><br />
<br />
CTD cast<br />
<ul><br />
depth=datablock(:,3);<br />
sal=datablock(:,4);<br />
temp=datablock(:,5);<br />
dens=datablock(:,6);<br />
press=datablock(:,7);<br />
cond=datablock(:,8);<br />
spcond=datablock(:,9);<br />
oxy=datablock(:,10);<br />
oxysat=datablock(:,11);<br />
ph=datablock(:,12);<br />
chla=datablock(:,13);<br />
par=datablock(:,14);<br />
%cdom=datablock(:,15);<br />
cdom(1:length(scan)) = NaN;<br />
batt=datablock(:,15);<br />
xmiss=datablock(:,16);<br />
</ul><br />
<br />
ADCP<br />
<ul><br />
Pitch = AnP100thDeg *.01;<br />
Roll = AnR100thDeg *.01;<br />
East_u = SerEmmpersec * 0.1; % convert mm per sec to cm per sec<br />
North_v = SerNmmpersec * 0.1;<br />
Vert_w = SerVmmpersec * 0.1;<br />
VelError = SerErmmpersec * 0.1;<br />
Mag = SerMagmmpersec * 0.1;<br />
DIR = SerDir10thDeg * 0.1;<br />
</ul><br />
</pre><br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=0|<br />
StartDate=2015-04-18|<br />
TargetDate=2015-05-01|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_software_by_specifying_metadata_by_Mimi_Tzeng&diff=11992Document software by specifying metadata by Mimi Tzeng2015-06-03T17:58:56Z<p>Mimi: Set PropertyValue: Progress = 12</p>
<hr />
<div>[[Category:Task]]<br />
<br />
Tried out the GeoSoft Portal to document [http://www.geosoft-earthcube.org/portal/#browse/Software-uruml4oqtqp2 mooring_all.pl]. Information entered so far is preliminary. I assume that I will need to write metadata for all of the following scripts:<br />
<br />
<ol><br />
<li>mooring_all.pl</li><br />
<li>FindMoorEnds.m</li><br />
<li>moorburst.m</li><br />
<li>MOORprocess_all.m</li><br />
<li>FindADCPendpoints.m</li><br />
<li>stickplot.m (this one isn't actually mine; will have to figure out where I got it from)</li><br />
<li>MoorADCP.m</li><br />
</ol><br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:|<br />
Progress=12}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_software_by_specifying_metadata_by_Mimi_Tzeng&diff=11991Document software by specifying metadata by Mimi Tzeng2015-06-03T17:57:56Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br />
Tried out the GeoSoft Portal to document [http://www.geosoft-earthcube.org/portal/#browse/Software-uruml4oqtqp2 mooring_all.pl]. Information entered so far is preliminary. I assume that I will need to write metadata for all of the following scripts:<br />
<br />
<ol><br />
<li>mooring_all.pl</li><br />
<li>FindMoorEnds.m</li><br />
<li>moorburst.m</li><br />
<li>MOORprocess_all.m</li><br />
<li>FindADCPendpoints.m</li><br />
<li>stickplot.m (this one isn't actually mine; will have to figure out where I got it from)</li><br />
<li>MoorADCP.m</li><br />
</ol></div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_software_accessible_by_Mimi_Tzeng&diff=11990Make software accessible by Mimi Tzeng2015-06-03T17:51:04Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software accessible]]<br/><br/><br />
<br />
So far I've created an account on GitHub and plan to put the software there when it's ready. The account is located at: https://github.com/KayarBlue/GeoSoftGPF_1<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=5|<br />
StartDate=2015-04-04|<br />
TargetDate=2015-04-17|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Develop_proposal_for_special_issue&diff=11932Develop proposal for special issue2015-04-17T21:01:48Z<p>Mimi: /* Tzeng: [Tzeng and Park 2015] */ added Brian Dz to author line</p>
<hr />
<div>[[Category:Task]]<br />
<br />
== Background: Why a Special Issue on Geoscience Papers of the Future? ==<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#The_Vision | Include here our discussion for the vision]]<br />
<br />
=== Motivation: The EarthCube Initiative and the GeoSoft Project ===<br />
<br />
[http://www.geosoft-earthcube.org/about Include here background about EarthCube and GeoSoft from the project web site]<br />
<br />
=== What is a GPF ===<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#What_is_a_Geoscience_Paper_of_the_Future.3F | Include here our discussion of what is a GPF]]<br />
<br />
=== The challenges of creating GPFs ===<br />
<br />
The articles in this special issue will reflect the current best practice for generating a Geoscience Paper of the Future. The authors will discuss the challenges that they have encountered, including limitations and availability of data publishing repositories, difficulties in describing software infrastructure, constraints posed by projects and collaborators on the release of data and software, and open questions about what aspects of the research process should be published.<br />
<br />
=== Related work ===<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#New_Frameworks_to_Create_a_New_Generation_of_Scientific_Articles | Include here the related work we have discussed]]<br />
<br />
== Papers to be included ==<br />
<br />
The papers included will be by invitation only. This is analogous to special issues of journals that are based on a conference or event, where all the papers presented are invited among the event participants. The Geoscience Paper of the Future activity is of a similar nature, and the papers we will invite are from the participants of that activity. This will make the special issue more appealing as there will be a consistent structure across the papers.<br />
<br />
For each planned submission, we describe here:<br />
<br />
* '''Authors and affiliations'''<br />
* '''Keywords of research area'''<br />
* '''Tentative title'''<br />
* '''Short abstract'''<br />
* '''Challenge''': this can be ''Reproducibility'' (i.e., documenting and reproducing previously published results), ''Dark Code'' (i.e., describing and sharing code integral to the presented results), ''Sharing Big Data'' (i.e. making available large datasets), and ''Transferability'' (i.e., updating a previously-used method to a new version of software, etc.).<br />
* '''Relationship to other publications''': is the article based on a previously published article? is it new content? If previously published, this also indicates the percentage of new work presented. <br />
* '''Pointer to the wiki page that documents the article'''<br />
* '''Expected submission date'''<br />
<br />
=== [David 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Cedric David]]<br />
* '''Keywords of research area:''' Hydrology, Rivers, Modeling, Testing, Reproducibility. <br />
* '''Tentative title:''' Going beyond triple-checking, allowing for peace of mind in community model development.<br />
* '''Short abstract:''' The development of computer models in the general field of geoscience is often made incrementally over many years. Endeavors that generally start on one single researcher's own machine evolve over time into software that are often much larger than was initially anticipated. Looking at years of building on their computer code, sometimes without much training in computer science, geoscience software developers can easily experience an overwhelming sense of incompetence when contemplating ways to further community usage of their software. How does one allow others to use their code? How can one foster survival of their tool? How could one possibly ensure the scientific integrity of ongoing developments including those made by others? Common issues faced by geoscience developers include selecting a license, learning how to track and document past and ongoing changes, choosing a software repository, and allowing for community development. This paper provides a brief summary of experience with the three former steps of software growth by focusing on the almost decade-long code development of a river routing model. The core of this study, however, focuses on reproducing previously-published experiments. This step is highly repetitive and can therefore benefit greatly from automation. Additionally, enabling automated software testing can arguably be considered the final step for sustainable software sharing, by allowing the main software developer to let go of a mental block considering scientific integrity. Creating tools to automatically compare the results of an updated version of a software with those of previous studies can not only save the main developer's own time, it can also empower other researchers to in their ability to check and justify that their potential additions have retained scientific integrity. <br />
* '''Challenge:''' Reproducibility; Sharing Big Data. Ensure that updates to an existing model are able to reproduce a series of simulations published previously.<br />
* '''Relationship to other publications:''' This research is related to past and ongoing development of the Routing Application for Parallel computatIon of Discharge (RAPID). The primary focus of this paper is to allow automated reproducibility of at least the [http://dx.doi.org/10.1175/2011JHM1345.1 first RAPID publication]. The scientific subject of this GPF differs from the article(s) to be reproduced as its focus is on development of automatic testing methods. In that regard, the paper is expected to be 95% new. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Cedric_David | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Demir 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Ibrahim Demir]]<br />
* '''Keywords of research area:''' hydrological network, optimization, network representation, database query<br />
* '''Tentative title:''' Analysis and Optimization of Hydrological Network Database Representation Methods for Fast Access and Query in Web-based System<br />
* '''Short abstract:''' Web based systems allow users to delineate watersheds on interactive map environments using server side processing. With increasing resolution of hydrological networks, optimized methods for storage of network representation in databases, and efficient queries and actions on the river network structure become critical. This paper presents a detailed study on analysis of widely used methods for representing hydrological networks in relational databases, and benchmarking common queries and modifications on the network structure using these methods. The analysis has been applied to the hydrological network of Iowa utilizing 90m DEM and 600,000 network nodes. The application results indicate that the representation methods provide massive improvements on query times and storage of network structure in the database. Suggested method allows watershed delineation tools running on client-side with desktop-like performance. <br />
* '''Challenge:''' Reproducibility, Transferability; Some of the internal steps to prepare data might require long computation time and different software environments.<br />
* '''Relationship to other publications:''' The article is based on a new study<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Ibrahim_Demir | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== Fulweiler: [Fulweiler, Emery, and Maguire 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Wally Fulweiler | Robinson W. Fulweiler]]<sup>1,2</sup>, Hollie E. Emery<sup>1</sup>, and Timothy J. Maguire<sup>2</sup> (1: Department of Earth and Environment, Boston University; 2: Department of Biology, Boston University.)<br />
* '''Keywords of research area:''' biogeochemistry, climate change, benthic-pelagic coupling, long-term data, reproducibility<br />
* '''Tentative title:''' What can we learn from a decade of directly measured sediment di-nitrogen gas fluxes?<br />
* '''Short abstract:''' Long-term data sets provide unique opportunities to examine temporal variability of important ecosystem processes. Unfortunately, these data sets are rare and curating them is a real challenge. Additionally, it can be difficult to publish them in a timely manner. However, if we wish to make our data available for interested parties (e.g., students, scientists, managers, etc.) then we need to provide mechanisms that allow others to access the data, reproduce the results, and see updates as they become available. Here will use a long-term data set of directly measured net sediment N2 fluxes to assess how a temperate estuary changes overtime. Specifically, we will address how environmental factors alter the balance between sediment denitrification (nitrogen removal) and sediment nitrogen fixation (nitrogen addition). This balance is essential to understand if we wish to better manage coastal systems and the anthropogenic nitrogen loads they receive. <br />
* '''Challenge:''' Reproducibility and efficient ways to update long-term data. This paper will address how to reproduce the key figures from the paper that was published in Oceanography in 2014. I also want to address how we deal with long-term data sets and the intermittent collection of data. For example, in the paper we will use here we published 9 years’ worth of data – a collection of previously published data plus data from additional measurements over seven years. Now I have another year – do I publish another paper? Do I wait for another seven years? The former seems to short and the latter too long. But having a way to update the data and the figures – would be really powerful. A much more timely and relevant set of information. <br />
* '''Relationship to other publications:''' The majority of these data were previously published (Fulweiler and Heiss 2014) and a small amount of additional previously un-published data will be included here. The point of this paper is to develop a framework to describe how we analyzed the data and to provide code that will allow others to reproduce our results using our data or data that they acquire. Additionally, if possible we will share these data and then have it updated as we collect new data so that interested parties can see how the system we are study is changing overtime. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Wally_Fulweiler | Page]]<br />
* '''Expected submission date:''' June 2015.<br />
<br />
=== Goodall: [Essawy, Goodall, Billah, and Xu 2015] ===<br />
<br />
* '''Authors and affiliations:''' Bakinam T. Essawy, [[Jon Goodall | Jonathan L. Goodall]], Mirza Billah, and Hao Xu, Department of Civil and Environmental Engineering, University of Virginia.<br />
* '''Keywords of research area:''' Hydrology, Automating workflows, Reproducibility, sharing<br />
* '''Tentative title:''' Post-processing Workflows Using Data Grids to Support Hydrologic Modeling<br />
* '''Short abstract:''' Data grids are architectures that allow scientists to access and share large data sets that are geographically distributed on the Internet, but appear to the scientist as a single file management system. Data grids are useful for scientific communities, like hydrology, that rely on multiple resource providers and data resources that are distributed across the Internet. One data grid technology is the Integrated Rule-Oriented System (iRODS). This paper leverages iRODS and demonstrates how it can be used to access distributed data, encapsulate hydrological modeling knowledge as workflows, and interoperate with other community-driven cyberinfrastructures. Included within iRODS is the concept of Workflow Structured Objects (WSO) that can be used to automate data processing using data collections stored within iRODS. A use case is presented that demonstrates creating WSOs that automate the creation of data visualizations from large model output collections. By co-locating the workflow used to create the visualization with the data collection, the use case demonstrates how data grid technology aids in reuse, reproducibility, and sharing of workflows within scientific communities. The use case leverages output from a hydrologic model (the Variable Infiltration Capacity model) for the Carolinas region of the US, and is part of a larger effort under the DataNet Federation Consortium (DFC) project that aims to demonstrate data and computational interoperability across scientific communities. <br />
* '''Challenge:''' This paper discusses how to automate workflows to make it reproducible and sharable by other disciplines.<br />
* '''Relationship to other publications:''' This article is an extension of another article that is currently under review. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Jon_Goodall | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== Karlstrom: [Loh and Karlstrom 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Lay Kuan Loh]]<sup>1</sup> and [[Leif Karlstrom]]<sup>2</sup> (1: Department of Electrical and Computer Engineering, Carnegie Mellon University; 2: Department of Geological Sciences, University of Oregon)<br />
* '''Keywords of research area:''' Spatial clustering, Eigenvector selection, Entropy Ranking, Cascades Volcanic Region, [http://geosphere.gsapubs.org/content/3/3/152.abstract Afar Depression], [http://astrogeology.usgs.gov/search/details/Mars/Research/Volcanic/TharsisVents/zip Tharsis provonce]<br />
* '''Tentative title:''' Characterization of volcanic vent distributions using spectral clustering with eigenvector selection and entropy ranking<br />
* '''Short abstract:''' Volcanic vents on the surface of Earth and other planets often appear in groups that exhibit spatial patterning. Such vent distributions reflect complex interplay between time-evolving mechanical controls on the pathways of magma ascent, background tectonic stresses, and unsteady supply of rising magma. With the ultimate aim of connecting surface vent distributions with the dynamics of magma ascent, we have developed a clustering method to quantify spatial patterns in vents. Clustering is typically used in exploratory data analysis to identify groups with similar behavior by partitioning a dataset into clusters that share similar attributes. Traditional clustering algorithms that work well on simple point-cloud type synthetic datasets generally do not scale well the real-world data we are interested in, where there are poor boundaries between clusters and much ambiguity in cluster assignments. We instead use a spectral clustering algorithm with eigenvector selection based on entropy ranking based off work from [http://www.sciencedirect.com/science/article/pii/S0925231210001311 Zhao et al 2010] that outperforms traditional spectral clustering algorithms in choosing the right number of clusters for point data. We benchmark this algorithm on synthetic vent data with increasingly complex spatial distributions, to test the ability to accurately cluster vent data with variable spatial density, skewness, number of clusters, and proximity of clusters. We then apply our algorithm to several real-world datasets from the Cascades, Afar Depression and Mars. <br />
* '''Challenge:''' Reproducibility (i.e., Quantifying clustering); We plan to study how varying the statistical distribution, density, skewness, background noise, number of clusters, proximity of clusters, and combinations of any of these factors affects the performance of our algorithm. We test it against man-made and real world datasets. ''' <br />
* '''Relationship to other publications:''' New content, but one of the databases we are studying in the paper (Cascades Volcanic Range) would be based off a different paper we are preparing and planning to submit earlier. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Leif_Karlstrom | Page]]<br />
* '''Expected submission date:''' June 2015<br />
<br />
=== Lee: [Lee, Boustani, and Mattmann 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Kyo Lee]], Maziyar Boustani and Chris Mattmann, Jet Propulsion Laboratory<br />
* '''Keywords of research area:'''North American regional climate, regional climate model evaluation system, Open Climate Workbench, <br />
* '''Tentative title:''' Evaluation of simulated temperature, precipitation, cloud fraction and insolation over the conterminous United States using Regional Climate Model Evaluation System<br />
* '''Short abstract:'''This study describes the detailed process of evaluating model fidelity in simulating four key climate variables, surface air temperature, precipitation, cloud fraction and insolation and their covariability over the conterminous United States region. Regional Climate Model Evaluation System (RCMES), a suite of public database and open-source software package, provides both observational datasets and data processors useful for evaluating any climate models. In this paper, we provide a clear and easy-to-follow workflow of RCMES to replicate published papers evaluating North American Regional Climate Change Assessment Program (NARCCAP) regional climate model (RCM) hindcast simulations using observations from variety of sources. <br />
* '''Challenge:'''Big Data Sharing, Dark Code; Sharing big data, better documenting source codes, encouraging climate science community to use RCMES <br />
* '''Relationship to other publications:''' [http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-12-00452.1 Kim et al. 2013], [http://link.springer.com/article/10.1007/s00382-014-2253-y Lee et al. 2014]<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Kyo_Lee | Page]]<br />
* '''Expected submission date:'''End of June 2015<br />
<br />
=== Mills: [Reese, Mills, Witmer, and Morse 2015] ===<br />
<br />
* '''Authors and affiliations:''' Brandi Kiel Reese, Texas A&M Corpus Christi<br />
Brandi Kiel Reese<sup>1</sup>, [[Heath Mills | Heath J. Mills]]<sup>2</sup>, Angela D. Witmer<sup>3</sup>, John W. Morse<sup>4</sup> (1: Texas A&M University Corpus Christi, Department of Life Sciences, Corpus Christi, TX 78412; 2: University of Houston Clear Lake, Division of Natural Sciences, Houston, TX 77058; 3: Georgia Southern University, Department of Biology, Statesboro, GA 30458; 4: Texas A&M University, Department of Oceanography, College Station, TX 77843<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''Iron and Sulfur Cycling Biogeography Using Advanced Geochemical and Molecular Analyses<br />
* '''Short abstract:'''<br />
Most biogeochemical studies describe microbial ecology using only aqueous geochemistry. However these studies neglect to characterize the bioavailable, solid-phase portions as it relates to the local environment. Solid phase species of key elements, including sulfur and iron, are typically underestimated sources for microbial activity. The objective of this study was to determine spatial and temporal variability of the benthic ecosystem through biogeochemical and molecular analysis while focusing on the sulfur and iron cycles. During this study, multiple 20 cm sediment cores were collected from three northern Gulf of Mexico hypoxic zone locations, each in 20 m water depth. Aqueous and solid phase iron and sulfur compounds were analyzed in combination with molecular microbial characterization, and the degree of pyritization was determined. Sediments within this study had geochemically distinct profiles at all three locations in the forms of iron or sulfur species, despite bulk sediment chemistry appearing functionally similar (i.e., the concentrations of general chemical species did not vary greatly). Variations in iron and sulfur bioavailability altered the microbial ecology, both in terms of structure and function. In turn, community activity can contribute to small spatial scale changes in the geochemistry providing a potential for geographic and geochemical isolation. Therefore, localized feedback loops between available geochemistry and the microbial community can result in population divergence, exhibiting biogeography with sediments.<br />
* '''Challenge:''' Reproducibility; Dark Code. This paper will develop and document a new pipeline to analyze a combined and robust genetic and geochemical data set. New, reproducible methods will be highlighted in this manuscript to help others better analyze similar data sets. There is a general lack of guidance within this field for such challenges. This manuscript will be unique and helpful from an analysis standpoint as well as for the science being presented.<br />
* '''Relationship to other publications:''' Original Manuscript<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Heath_Mills | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Oh 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Ji-Hyun Oh]] Jet Propulsion Laboratory/University of Southern California<br />
* '''Keywords of research area:''' Tropical Meteorology, Madden-Julian Oscillation, Momentum budget analysis <br />
* '''Tentative title:''' Tools for computing momentum budget for the westerly wind event associated with the Madden-Julian Oscillation<br />
* '''Short abstract:'''As one of the most pronounced modes of tropical intraseasonal variability, the Madden-Julian Oscillation (MJO) prominently connects global weather and climate, and serves as one of critical predictability sources for extended-range forecasting. The zonal circulation of the MJO is characterized by low-level westerlies (easterlies) in and to the west (east) of the convective center, respectively. The direction of zonal winds in the upper troposphere is opposite to that in the lower troposphere. In addition to the convective signal as an identifier of the MJO initiation, certain characteristics of the zonal circulation been used as a standard metric for monitoring the state of MJO and investigating features of the MJO and its impact on other atmospheric phenomena. This paper documents a tool for investigating the generation of low-level westerly winds during the MJO life cycle. The tool is used for the momentum budget analysis to understand the respective contributions of various processes involved in the wind evolution associated with the MJO using European Centre for Medium-Range Weather Forecasts operational analyses during Dynamics of the Madden–Julian Oscillation field campaign.<br />
* '''Challenge:''' Reproducibility, Dark Code; This paper will cover how to reproduce two key figures from the paper that I recently submitted to Journal of Atmospheric Science. This will include detailed procedures related to generating the figures such as how/where to download data, how to transform the format of the data to be used as an input for my codes, and so on.. <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?) This article is related to the part of the paper submitted to Journal of Atmospheric Science. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Ji-Hyun_Oh | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== Pierce: [Pierce, Gentle, and Noll 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Suzanne Pierce]], John Gentle, and Daniel Noll (Texas Advanced Computing Center and Jackson School of Geosciences, The University of Texas at Austin; US Department of Energy)<br />
<br />
* '''Keywords of research area:''' Decision Support Systems, Hydrogeology, Participatory Modeling, Data Fusion <br />
* '''Tentative title:''' MCSDSS: An accessible platform and application to enable data fusion and interactive visualization for the Geosciences<br />
* '''Short abstract:'''The MCSDSS application is an advanced example of interactive design that can lead to data fusion for science visualization, decision support applications, and education. What sets the tool apart is its firm underpinning in data, innovative new forms of interface design, and the reusable platform. A key advance is the creation of a framework that can be used to feed new data, videos maps, images, or formats of information into the application with relative ease. <br />
<br />
* '''Challenge:''' Reproducibility, Dark Code; Fully document a new software application and framework using example case study data and tutorials; Creation of an interface that enables non-programmers to build out interactive visualizations for their data<br />
* '''Relationship to other publications:''' This article is new content, the proof of concept idea was developed with DOE funding for a student competition and resulted in an initial implementation that was reported in the DOE competition report and a masters thesis for co-author Daniel Noll<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Suzanne_Pierce | Page]]<br />
* '''Expected submission date:''' mid- to late June 2015<br />
<br />
=== [Pope 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Allen Pope]], National Snow and Ice Data Center, University of Colorado, Boulder<br />
* '''Keywords of research area:''' Glaciology, Remote Sensing, Landsat 8, Polar Science<br />
* '''Tentative title:''' Data and Code for Estimating and Evaluating Supraglacial Lake Depth With Landsat 8 and other Multispectral Sensors<br />
* '''Short abstract:''' Supraglacial lakes play a significant role in glacial hydrological systems – for example, transporting water to the glacier bed in Greenland or leading to ice shelf fracture and disintegration in Antarctica. To investigate these important processes, multispectral remote sensing provides multiple methods for estimating supraglacial lake depth – either through single-band or band-ratio methods, both empirical and physically-based. Landsat 8 is the newest satellite in the Landsat series. With new bands, higher dynamic range, and higher radiometric resolution, the Operational Land Imager (OLI) aboard Landsat 8 has a lot of potential. <br />
<br />
: This paper will document the data and code used in processing in situ reflectance spectra and depth measurements to investigate the ability of Landsat 8 to estimate lake depths using multiple methods, as well as quantify improvements over Landsat 7’s ETM+. A workflow, data, and code are provided to detail promising methods as applied to Landsat 8 OLI imagery of case study areas in Greenland, allowing calculation of regional volume estimates using 2013 and 2014 summer-season imagery. Altimetry from WorldView DEMs are used to validate lake depth estimates. The optimal method for supraglacial lake depth estimation with Landsat 8 is shown to be an average of single band depths by red and panchromatic bands. With this best method, preliminary investigation of seasonal behavior and elevation distribution of lakes is also discussed and documented.<br />
* '''Challenge:''' Reproducibility, Dark Code<br />
* '''Relationship to other publications:''' Documenting and explaining the data and code behind the analysis and results presented in another paper.<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Allen_Pope | Page]]<br />
* '''Expected submission date:''' Late June 2015<br />
<br />
=== Tzeng: [Tzeng, Park, and Dzwonkowski 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Mimi Tzeng]], Brian Dzwonkowski (DISL); Kyeong Park (TAMU Galveston)<br />
* '''Keywords of research area:'''physical oceanography, remote sensing<br />
* '''Tentative title:''' Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
* '''Short abstract:'''The Fisheries Oceanography in Coastal Alabama (FOCAL) program began in 2006 as a way for scientists at Dauphin Island Sea Lab (DISL) to study the natural variability of Alabama's nearshore environment as it relates to fisheries production. FOCAL provided a long-term baseline data set that included time-series hydrographic data from a permanent offshore mooring (ADCP, vertical thermister array and CTDs at surface and bottom) and shipboard surveys (vertical CTD profiles and water sampling), as well as monthly ichthyoplankton and zooplankton (depth-discrete) sample collections at FOCAL sites. The subset of data presented here are from the mooring, and includes a vertical array of thermisters, CTDs at surface and bottom, an ADCP at the bottom, and vertical CTD profiles collected at the mooring during maintenance surveys. The mooring is located at 30 05.410'N 88 12.694'W, 25 km southwest of the entrance to Mobile Bay. Temperature, salinity, density, depth, and current velocity data were collected at 20-minute intervals from 2006 to 2012. Other parameters, such as dissolved oxygen, are available for portions of the time series depending on which instruments were deployed at the time.<br />
* '''Challenge:''' Dark Code, Reproducibility; My paper will be about the processing of data in a larger dataset, from which peer-reviewed papers have been written. The processing I did was not specific to any particular paper. I can point to an example paper that used some of the data from this dataset, that I processed, however all of the figures in the paper are composites that also include other data from elsewhere that I had nothing to do with (and it wouldn't be feasible to try to get hold of the other data within our timeframe).<br />
* '''Relationship to other publications:''' A recent paper that used the part of the FOCAL data I'm documenting as the sample from the larger dataset: Dzwonkowski, Brian, Kyeong Park, Jungwoo Lee, Bret M. Webb, and Arnoldo Valle-Levinson. 2014. "Spatial variability of flow over a river-influenced inner shelf in coastal Alabama during spring." Continental Shelf Research 74:25-34.<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Mimi_Tzeng | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Villamizar, Pai, and Harmon 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Sandra Villamizar]], Henry Pai and Thomas Harmon, University of California, Merced<br />
* '''Keywords of research area:''' River Ecohydrology<br />
* '''Tentative title:''' Producing long-term series of whole-stream metabolism using readily available data. <br />
* '''Short abstract:''' Continuous water quality and river discharge data that are readily available through government websites may be used to produce valuable information about key processes within a river ecosystem. In this technical note, I describe in detail the steps for acquisition and processing of river flow, dissolved oxygen, temperature, and specific conductance data that, combined with atmospheric data and physical properties of the river reach of interest, allow for the production of a long-term series of whole stream metabolism, an important piece of information for the purposes of understanding the structure and function of river ecosystems. The restoration reach of the San Joaquin River in California (USA) has been intensively instrumented since 2010 and serves as an ideal case for testing this tool. The set of scripts, written in the R code, can be used immediately for any other river for which the key parameters (river flow, dissolved oxygen, temperature, and specific conductivity) are available and can be modified by the new users to fit their particular site conditions.<br />
<br />
* '''Challenge:''' Reproducibility; Dark Code; Document new software/applications. This set of scripts was written after the necessity of generating daily estimates of metabolic rates for long periods of time and at various sites within the San Joaquin River. <br />
* '''Relationship to other publications:''' This will be a new publication - Potentially a Technical Note<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Sandra_Villamizar | Page]]<br />
* '''Expected submission date:''' June 2015<br />
<br />
=== Yu: [Yu, Bhatt, Rousseau, Pardo-Alvarez, and Duffy 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Xuan Yu]], Department of Geological Sciences, University of Delaware. <br/><br />
Gopal Bhatt, Department of Civil & Environmental Engineering, Pennsylvania State University. <br/><br />
Alain N. Rousseau, Institut National de la Recherche Scientifique (Centre Eau, Terre et Environnement), Université du Québec, 490 rue de la Couronne, Québec City, QC, Canada, G1K 9A9. <br/><br />
Alvaro Pardo-Alvarez, Institut National de la Recherche Scientifique (Centre Eau, Terre et Environnement), Université du Québec, 490 rue de la Couronne, Québec City, QC, Canada, G1K 9A9. <br/><br />
Chris Duffy, Department of Civil & Environmental Engineering, Pennsylvania State University.<br />
* '''Keywords of research area:''' coupled processes, integrated hydrologic modeling, PIHM, surface flow, subsurface flow, open science<br />
* '''Tentative title:''' Learning integrated modeling of coupled surface and subsurface hydrology from scratch<br />
* '''Short abstract:''' Integrated modeling of coupled surface and subsurface flow for diverse earth system processes is of current interest to researchers not only to establish the interconnectedness of hydrological, and atmospheric processes, but also to understand the local- scale features of land-surface energy balances, biogeochemical and ecological processes, geochemical weathering and landscape evolution dynamics. A growing number of complex hydrologic models have been used for resolving environmental processes, hypothesis testing, hydrologic predictions for effective watershed management, though very few of these resources been made accessible to the potentially large group of model users. The users have to invest an extraordinary amount of time and effort to reproduce, and understand the workflow of hydrologic simulation in a modeling paper. To provide a challenging and stimulating use case focusing on integrated modeling of coupled surface and subsurface flow, we describe the use case for a new user of an integrated process model PIHM (Penn State Integrated Hydrologic Model). The user is guided through the data and model development process by reproducing a numerical benchmarking example, and a real watershed application. Specifically, we document PIHM and its modeling workflow to enable basic understanding of simulating coupled surface and subsurface flow processes. We detail the strategy of a new user attempting to implement a community model and national geospatial and data services. In addition, we describe the user experience as an important dimension in the modeling workflow, and enable clear strategy for provenance and deeper communications between model developers and users. The workflow has important implications for smoothing, accelerating and automating open scientific collaborations in geosciences research.<br />
* '''Challenge:''' Reproducibility; Reproduce published simulations by a existing model with the latest version. Benchmarking modeling application for numerical experiment and field data.<br />
* '''Relationship to other publications:''' The article is based on a previously published article. <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Xuan_Yu | Page]]<br />
* '''Expected submission date:''' End of June 2015<br />
<br />
== Special Issue Editors ==<br />
<br />
Chris Duffy, Scott Peckham, Cedric David, and Karan Venayagamoorthy.<br />
<br />
The editors will only accept submissions that follow the [[Develop_proposal_for_special_issue#Special_Issue_Review_Criteria | special issue review criteria]].<br />
<br />
The editors will select a set of reviewers to handle the submissions. Reviewers will include experts in geosciences, computer science, and library sciences.<br />
<br />
== Special Issue Review Criteria ==<br />
<br />
The reviewers will be asked to provide feedback on the papers according to the following criteria. Note that some papers will have good reasons for limiting the information (e.g. the data is from third parties and not openly available, etc), and in that case they would document those reasons.<br />
<br />
* Documentation of the datasets: descriptions of datasets, unique identifiers, repositories.<br />
* Documentation of software: description of all software used (including pre-processing of data, visualization steps, etc), unique identifiers, repositories.<br />
* Documentation of the provenance of results: provenance for each figure or result, such as the workflow or the provenance record.<br />
<br />
Submissions will be in two categories: <br />
<br />
# Technical papers: articles that have a novel contribution in some technical area of geosciences<br />
# Technical notes: articles that do not have a significant novel technical contribution, and whose major contribution is to reproduce a previously published result or illustrate the computational aspects of research published elsewhere<br />
<br />
== Tentative Timeline ==<br />
<br />
* Journal committed to special issue: April 30, 2015<br />
* Submissions due to editors: June 30, 2015<br />
* Reviews due: Sept 15, 2015<br />
* Decisions out to authors: Sept 30, 2015<br />
* Revisions due: October 31, 2015<br />
* Final versions due November 15, 2015<br />
* Issue published December 31, 2015<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Owner=Chris_Duffy|<br />
Participants=Scott_Peckham|<br />
Participants=Cedric_David|<br />
Participants=Ibrahim_Demir|<br />
Participants=Wally_Fulweiler|<br />
Participants=Leif_Karlstrom|<br />
Participants=Kyo_Lee|<br />
Participants=Heath_Mills|<br />
Participants=Ji-Hyun_Oh|<br />
Participants=Suzanne_Pierce|<br />
Participants=Allen_Pope|<br />
Participants=Mimi_Tzeng|<br />
Participants=Sandra_Villamizar|<br />
Participants=Xuan_Yu|<br />
Participants=Yolanda_Gil|<br />
Progress=80|<br />
StartDate=2015-03-10|<br />
TargetDate=2015-03-16|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11785Document provenance of results by Mimi Tzeng2015-04-11T16:07:53Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTc.png]]<br />
<br />
Note from Telecon 3/20: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
Note from Telecon 4/3: use footnotes on a separate page, since the workflow diagram is already pretty complex.<br />
<br />
11 Apr 2015: updated version of workflow diagram. Coming soon: a separate page keyed to the blue letters.<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=95|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=File:20150320_DocumentProvenance-MWTc.png&diff=11783File:20150320 DocumentProvenance-MWTc.png2015-04-11T16:05:35Z<p>Mimi: third draft of workflow diagram</p>
<hr />
<div>third draft of workflow diagram</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11740Document provenance of results by Mimi Tzeng2015-04-03T21:45:27Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTb.png]]<br />
<br />
Note from Telecon 3/20: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
Note from Telecon 4/3: use footnotes on a separate page, since the workflow diagram is already pretty complex.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=95|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Ensure_software_is_usable_by_Mimi_Tzeng&diff=11624Ensure software is usable by Mimi Tzeng2015-03-31T15:07:10Z<p>Mimi: Set PropertyValue: Progress = 90</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Ensure software is usable]]<br/><br/><br />
<br />
<li>Data to be used: FOCAL mooring from 20110413_20110518</li><br />
<br />
<li>Software for CTDs and Thermisters: </li><br />
<ul><li>SeaTerm</li><br />
<li>SBE Data Processing</li><br />
<li>EcoWatch (for YSI)</li><br />
<li>perl</li><br />
<li>any text editor</li><br />
<li>Matlab</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
<li>Software for ADCP:</li><br />
<ul><li>WinADCP</li><br />
<li>Matlab</li><br />
<li>any text editor</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
Software seems to come in two forms:<br><br />
A) the off-the-shelf professionally-produced kind that most people think of upon seeing the word "software"<br><br />
B) the highly specific custom-made scripts that automate particular tasks for a particular lab or other working group, generally intended for internal use.<br />
<br><br><br />
The list above is for software of type A.<br><br />
Software of Type B for this project are in the form of several Matlab and perl scripts. These are gathered together in a folder; they don't have names that will mean anything to anyone else, and so they aren't listed here. <br />
<br><br><br />
Task completion: I am waiting for Matlab to be reinstalled on the PI's lab computer so that I can rerun the entire processing to make sure that everything still works as intended. I am 95% confident that all will be fine, but won't claim that the "ensure software is usable" task is 100% complete until I can do this.<br />
<br />
Update 20 Mar 2015: Today the PI got around to putting Matlab on his lab computer. I'm now much less confident that all will be fine, because the version of Matlab I used originally was (probably) from 2010, and the current version on the new PI's lab computer is 2015. Matlab has an annoying habit of changing things around between versions so that all previous scripts break in really stupid and time-consuming ways. I've already found two subtle differences (one of them will require a minor change in my perl script), and will have to look through the rest in more detail. <br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=90|<br />
StartDate=2015-02-07|<br />
TargetDate=2015-02-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=11555Make data accessible by Mimi Tzeng2015-03-21T17:41:28Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<hr noshade size=2><br />
<br />
'''Files to include:'''<br />
<br />
# From MOOR: the initial data files after preliminary processing through the proprietary software that came with the sensors, before the perl script<br />
# From MOOR: timestamps.txt<br />
# From MOOR: the Matlab data file that contains all the variables, generated by MOORprocess_all.m<br />
# From MOOR: everything generated by MOORprocess_all.m (after PDFs have been concatenated)<br />
# From ADCP: the Matlab data file exported from WinADCP<br />
# From ADCP: endpoints.txt<br />
# From ADCP: the Matlab data file generated by MoorADCP.m<br />
# From ADCP: everything generated by MoorADCP.m (after PDFs have been concatenated)<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Select_target_article_by_Mimi_Tzeng&diff=11554Select target article by Mimi Tzeng2015-03-21T17:24:11Z<p>Mimi: Deleted PropertyValue: Expertise = ocean science</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Select target article]]<br/><br/><br />
'''Article:'''<br />
<br />
Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
<br />
[http://www.ncddc.noaa.gov/approved_recs/org/disl/FOCAL/Park/FOCAL-data/Mooring-FOCAL.html FGDC Metadata Record] - effectively the "paper" that functions as my target article for the purposes of this project.<br />
<br />
<hr><br />
'''Paper Introduction'''<br />
<br />
The purpose of the Fisheries Oceanography in Coastal Alabama (FOCAL) program, which ran from 2006 to 2012, was to capture and understand the highly variable nature of Alabama's nearshore environment as it relates to fisheries production. FOCAL provided a long-term baseline data set that included time-series hydrographic data from a permanent offshore mooring and shipboard surveys (vertical CTD profiles and water sampling), as well as monthly depth-discrete ichthyoplankton and zooplankton sample collections. The subset of data presented here are from the mooring, and includes a vertical array of thermistors, CTDs at surface and bottom, an ADCP at the bottom, and vertical CTD profiles collected at the mooring during maintenance surveys. The mooring is located at 30 05.410'N 88 12.694'W, 25 km southwest of the entrance to Mobile Bay. Temperature, salinity, density, depth, and current velocity data were collected at 20-minute intervals from 2006 to 2012. Other parameters, such as dissolved oxygen, are available for portions of the time series depending on which instruments were deployed at the time.<br />
<br />
Figure 1: Diagram of the mooring sensor array, as of 2011, with an inset of Mobile Bay showing the location of the mooring. (need to make this)<br />
<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Geosciences|<br />
Expertise=Open_science|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-01|<br />
TargetDate=2015-02-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Select_target_article_by_Mimi_Tzeng&diff=11553Select target article by Mimi Tzeng2015-03-21T17:24:09Z<p>Mimi: Added PropertyValue: Expertise = open science</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Select target article]]<br/><br/><br />
'''Article:'''<br />
<br />
Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
<br />
[http://www.ncddc.noaa.gov/approved_recs/org/disl/FOCAL/Park/FOCAL-data/Mooring-FOCAL.html FGDC Metadata Record] - effectively the "paper" that functions as my target article for the purposes of this project.<br />
<br />
<hr><br />
'''Paper Introduction'''<br />
<br />
The purpose of the Fisheries Oceanography in Coastal Alabama (FOCAL) program, which ran from 2006 to 2012, was to capture and understand the highly variable nature of Alabama's nearshore environment as it relates to fisheries production. FOCAL provided a long-term baseline data set that included time-series hydrographic data from a permanent offshore mooring and shipboard surveys (vertical CTD profiles and water sampling), as well as monthly depth-discrete ichthyoplankton and zooplankton sample collections. The subset of data presented here are from the mooring, and includes a vertical array of thermistors, CTDs at surface and bottom, an ADCP at the bottom, and vertical CTD profiles collected at the mooring during maintenance surveys. The mooring is located at 30 05.410'N 88 12.694'W, 25 km southwest of the entrance to Mobile Bay. Temperature, salinity, density, depth, and current velocity data were collected at 20-minute intervals from 2006 to 2012. Other parameters, such as dissolved oxygen, are available for portions of the time series depending on which instruments were deployed at the time.<br />
<br />
Figure 1: Diagram of the mooring sensor array, as of 2011, with an inset of Mobile Bay showing the location of the mooring. (need to make this)<br />
<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Expertise=Ocean_science|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-01|<br />
TargetDate=2015-02-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Select_target_article_by_Mimi_Tzeng&diff=11552Select target article by Mimi Tzeng2015-03-21T17:23:17Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Select target article]]<br/><br/><br />
'''Article:'''<br />
<br />
Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
<br />
[http://www.ncddc.noaa.gov/approved_recs/org/disl/FOCAL/Park/FOCAL-data/Mooring-FOCAL.html FGDC Metadata Record] - effectively the "paper" that functions as my target article for the purposes of this project.<br />
<br />
<hr><br />
'''Paper Introduction'''<br />
<br />
The purpose of the Fisheries Oceanography in Coastal Alabama (FOCAL) program, which ran from 2006 to 2012, was to capture and understand the highly variable nature of Alabama's nearshore environment as it relates to fisheries production. FOCAL provided a long-term baseline data set that included time-series hydrographic data from a permanent offshore mooring and shipboard surveys (vertical CTD profiles and water sampling), as well as monthly depth-discrete ichthyoplankton and zooplankton sample collections. The subset of data presented here are from the mooring, and includes a vertical array of thermistors, CTDs at surface and bottom, an ADCP at the bottom, and vertical CTD profiles collected at the mooring during maintenance surveys. The mooring is located at 30 05.410'N 88 12.694'W, 25 km southwest of the entrance to Mobile Bay. Temperature, salinity, density, depth, and current velocity data were collected at 20-minute intervals from 2006 to 2012. Other parameters, such as dissolved oxygen, are available for portions of the time series depending on which instruments were deployed at the time.<br />
<br />
Figure 1: Diagram of the mooring sensor array, as of 2011, with an inset of Mobile Bay showing the location of the mooring. (need to make this)<br />
<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Ocean_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-01|<br />
TargetDate=2015-02-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Mimi_Tzeng_should_make_software_executable_by_others&diff=11549Mimi Tzeng should make software executable by others2015-03-21T02:08:55Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software executable by others]]<br/><br/><br />
<br />
I can tell just by reading the instructions that this is going to be a major pain, because: Matlab.<br />
<br />
First, as noted in "make sure software is usable", the version of Matlab is hugely important and I should've noted which one I was using when I did the original processing. I think it was 2010b or something. I am kind of interested whether something like Docker or Vagrant is possible for past versions of Matlab, when Matlab is proprietary, expensive, and has extremely restrictive licensing. If it were, I would be able to check that the software works in the previous version and then just note what version to use. Failing that: it will probably take an enormous amount of time and effort to get the scripts to run correctly in the 2015 version of Matlab. <br />
<br />
Other concerns: the Matlab scripts are not as automated as they might first appear, because every single batch of data has some sort of issue with it that requires adjusting things in the code. I've automated as much as possible for the most common problems, such as sensor/sensor package lost battery power halfway through deployment, sensor/sensor package completely missing due to malfunction or just not deployed, variables not always in the same order, variable missing from a particular sensor package because the sensor that measures it malfunctioned or was removed, new variables due to new sensors added, etc. There is also the case where one of the ten thermistors also has a pressure sensor and it's present at every other deployment. The project as a whole started out in 2004 with 20 thermistors, 10 at a time spaced equally through the water column; in 2011 it was down to 10, with 5 of them at a time placed at strategic depths of interest to physical oceanographers. As of the end of 2014, I think they're down to 7-8; a number of them failed in 2014.<br />
<br />
And that's the core problem about having this software be executable by others. The scripts are highly specific to that particular mooring in that particular place with the particular sensors in their particular deployment plan. Nobody else will have the exact same set of sensors doing this exact thing. Also, each PI will be interested in seeing different types and formats of preliminary figures and data files from any other PIs, so the outputs won't necessarily make everyone equally happy either.<br />
<br />
So what should I adjust to make it more broadly useful to others? I can add code to ask a whole lot of "does X sensor have Y variable this time? If so, it's # what in the input file?" This will get very annoying to have to answer each and every time, which is why I just made a note in my processing steps instructions to check and adjust the variable order in the code directly. Can I just add to my processing steps instructions instead, and say "check and adjust these line numbers in the input file against these line numbers in the Matlab script" ?<br />
<br />
<hr noshade size=2><br />
<br />
Making the perl script executable by others was fairly simple, by comparison. It only needs to be placed in the same directory as the input files. I have added the following to the top of the file:<br />
<br />
<pre><br />
#Step D in the Workflow Diagram<br />
#This perl script should be placed in the same directory as the data files to be<br />
#processed. It takes the CTD, YSI, and thermistor files after they have been<br />
#initially processed with the proprietary software that came with the sensors,<br />
#and creates versions that will auto-open in Matlab, by stripping off the <br />
#headers (which often have a variable and unpredictable number of lines). <br />
#It also creates a file called moor-timestamps.txt to tell Matlab the input <br />
#variables of importance, that either came from the stripped off headers<br />
#(station names, starting date, starting time) or are found manually (starting<br />
#and ending scan numbers for the "good" data). <br />
</pre><br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-03-21|<br />
TargetDate=2015-04-03|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Mimi_Tzeng_should_make_software_executable_by_others&diff=11547Mimi Tzeng should make software executable by others2015-03-21T02:05:12Z<p>Mimi: Set PropertyValue: Progress = 10</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software executable by others]]<br/><br/><br />
<br />
I can tell just by reading the instructions that this is going to be a major pain, because: Matlab.<br />
<br />
First, as noted in "make sure software is usable", the version of Matlab is hugely important and I should've noted which one I was using when I did the original processing. I think it was 2010b or something. I am kind of interested whether something like Docker or Vagrant is possible for past versions of Matlab, when Matlab is proprietary, expensive, and has extremely restrictive licensing. If it were, I would be able to check that the software works in the previous version and then just note what version to use. Failing that: it will probably take an enormous amount of time and effort to get the scripts to run correctly in the 2015 version of Matlab. <br />
<br />
Other concerns: the Matlab scripts are not as automated as they might first appear, because every single batch of data has some sort of issue with it that requires adjusting things in the code. I've automated as much as possible for the most common problems, such as sensor/sensor package lost battery power halfway through deployment, sensor/sensor package completely missing due to malfunction or just not deployed, variables not always in the same order, variable missing from a particular sensor package because the sensor that measures it malfunctioned or was removed, new variables due to new sensors added, etc. There is also the case where one of the ten thermistors also has a pressure sensor and it's present at every other deployment. The project as a whole started out in 2004 with 20 thermistors, 10 at a time spaced equally through the water column; in 2011 it was down to 10, with 5 of them at a time placed at strategic depths of interest to physical oceanographers. <br />
<br />
And that's the core problem about having this software be executable by others. The scripts are highly specific to the particular mooring with the particular sensors in their particular deployment plan. Nobody else will have the exact same set of sensors doing this exact thing. Also, each PI will be interested in seeing different types and formats of preliminary figures and data files from any other PIs, so the outputs won't necessarily make everyone equally happy either.<br />
<br />
So what should I adjust to make it more broadly useful to others? I can add code to ask a whole lot of "does X sensor have Y variable this time? If so, it's # what in the input file?" This will get very annoying to have to answer each and every time, which is why I just made a note in my processing steps instructions to check and adjust the variable order in the code directly. Can I just add to my processing steps instructions instead, and say "check and adjust these line numbers in the input file against these line numbers in the Matlab script" ?<br />
<br />
<hr noshade size=2><br />
<br />
Making the perl script executable by others was fairly simple, by comparison. It only needs to be placed in the same directory as the input files. I have added the following to the top of the file:<br />
<br />
<pre><br />
#Step D in the Workflow Diagram<br />
#This perl script should be placed in the same directory as the data files to be<br />
#processed. It takes the CTD, YSI, and thermistor files after they have been<br />
#initially processed with the proprietary software that came with the sensors,<br />
#and creates versions that will auto-open in Matlab, by stripping off the <br />
#headers (which often have a variable and unpredictable number of lines). <br />
#It also creates a file called moor-timestamps.txt to tell Matlab the input <br />
#variables of importance, that either came from the stripped off headers<br />
#(station names, starting date, starting time) or are found manually (starting<br />
#and ending scan numbers for the "good" data). <br />
</pre><br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-03-21|<br />
TargetDate=2015-04-03|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Mimi_Tzeng_should_make_software_executable_by_others&diff=11546Mimi Tzeng should make software executable by others2015-03-21T02:03:14Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software executable by others]]<br/><br/><br />
<br />
I can tell just by reading the instructions that this is going to be a major pain, because: Matlab.<br />
<br />
First, as noted in "make sure software is usable", the version of Matlab is hugely important and I should've noted which one I was using when I did the original processing. I think it was 2010b or something. I am kind of interested whether something like Docker or Vagrant is possible for past versions of Matlab, when Matlab is proprietary, expensive, and has extremely restrictive licensing. If it were, I would be able to check that the software works in the previous version and then just note what version to use. Failing that: it will probably take an enormous amount of time and effort to get the scripts to run correctly in the 2015 version of Matlab. <br />
<br />
Other concerns: the Matlab scripts are not as automated as they might first appear, because every single batch of data has some sort of issue with it that requires adjusting things in the code. I've automated as much as possible for the most common problems, such as sensor/sensor package lost battery power halfway through deployment, sensor/sensor package completely missing due to malfunction or just not deployed, variables not always in the same order, variable missing from a particular sensor package because the sensor that measures it malfunctioned or was removed, new variables due to new sensors added, etc. There is also the case where one of the ten thermistors also has a pressure sensor and it's present at every other deployment. The project as a whole started out in 2004 with 20 thermistors, 10 at a time spaced equally through the water column; in 2011 it was down to 10, with 5 of them at a time placed at strategic depths of interest to physical oceanographers. <br />
<br />
And that's the core problem about having this software be executable by others. The scripts are highly specific to the particular mooring with the particular sensors in their particular deployment plan. Nobody else will have the exact same set of sensors doing this exact thing. Also, each PI will be interested in seeing different types and formats of preliminary figures and data files from any other PIs, so the outputs won't necessarily make everyone equally happy either.<br />
<br />
So what should I adjust to make it more broadly useful to others? I can add code to ask a whole lot of "does X sensor have Y variable this time? If so, it's # what in the input file?" This will get very annoying to have to answer each and every time, which is why I just made a note in my processing steps instructions to check and adjust the variable order in the code directly. Can I just add to my processing steps instructions instead, and say "check and adjust these line numbers in the input file against these line numbers in the Matlab script" ?<br />
<br />
<hr noshade size=2><br />
<br />
Making the perl script executable by others was fairly simple, by comparison. It only needs to be placed in the same directory as the input files. I have added the following to the top of the file:<br />
<br />
<pre><br />
#Step D in the Workflow Diagram<br />
#This perl script should be placed in the same directory as the data files to be<br />
#processed. It takes the CTD, YSI, and thermistor files after they have been<br />
#initially processed with the proprietary software that came with the sensors,<br />
#and creates versions that will auto-open in Matlab, by stripping off the <br />
#headers (which often have a variable and unpredictable number of lines). <br />
#It also creates a file called moor-timestamps.txt to tell Matlab the input <br />
#variables of importance, that either came from the stripped off headers<br />
#(station names, starting date, starting time) or are found manually (starting<br />
#and ending scan numbers for the "good" data). <br />
</pre><br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=0|<br />
StartDate=2015-03-21|<br />
TargetDate=2015-04-03|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_software_accessible_by_Mimi_Tzeng&diff=11544Make software accessible by Mimi Tzeng2015-03-21T01:35:51Z<p>Mimi: Set PropertyValue: Progress = 5</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software accessible]]<br/><br/><br />
<br />
So far I've created an account on GitHub and plan to put the software there when it's ready.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=5|<br />
StartDate=2015-04-04|<br />
TargetDate=2015-04-17|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_software_accessible_by_Mimi_Tzeng&diff=11543Make software accessible by Mimi Tzeng2015-03-21T01:35:42Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software accessible]]<br/><br/><br />
<br />
So far I've created an account on GitHub and plan to put the software there when it's ready.<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=0|<br />
StartDate=2015-04-04|<br />
TargetDate=2015-04-17|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11542Document provenance of results by Mimi Tzeng2015-03-21T01:33:46Z<p>Mimi: Set PropertyValue: Progress = 95</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTb.png]]<br />
<br />
Note from Telecon: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=95|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11541Document provenance of results by Mimi Tzeng2015-03-21T01:33:08Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is also available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243 (but thanks to Suzanne, I now know I can just upload these directly to the wiki!)<br />
<br />
[[File:20150320 DocumentProvenance-MWTb.png]]<br />
<br />
Note from Telecon: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=File:20150320_DocumentProvenance-MWTb.png&diff=11540File:20150320 DocumentProvenance-MWTb.png2015-03-21T01:31:04Z<p>Mimi: First draft version of workflow diagram</p>
<hr />
<div>First draft version of workflow diagram</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11539Document provenance of results by Mimi Tzeng2015-03-21T01:27:51Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243<br />
<br />
Note from Telecon: I need to include more information about what specific processing each of the software components did to the data files. Perhaps by adding text to the actual diagram boxes? That would make those boxes very large. Or would it work to give them all numbers, and then make a separate page with notes in outline form?<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Mimi_Tzeng_should_make_software_executable_by_others&diff=11538Mimi Tzeng should make software executable by others2015-03-21T00:41:27Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make software executable by others]]<br/><br/><br />
<br />
I can tell just by reading the instructions that this is going to be a major pain, because: Matlab.<br />
<br />
First, as noted in "make sure software is usable", the version of Matlab is hugely important and I should've noted which one I was using when I did the original processing. I think it was 2010b or something. I am kind of interested whether something like Docker or Vagrant is possible for past versions of Matlab, when Matlab is proprietary, expensive, and has extremely restrictive licensing. If it were, I would be able to check that the software works in the previous version and then just note what version to use. Failing that: it will probably take an enormous amount of time and effort to get the scripts to run correctly in the 2015 version of Matlab. <br />
<br />
Other concerns: the Matlab scripts are not as automated as they might first appear, because every single batch of data has some sort of issue with it that requires adjusting things in the code. I've automated as much as possible for the most common problems, such as sensor/sensor package lost battery power halfway through deployment, sensor/sensor package completely missing due to malfunction or just not deployed, variables not always in the same order, variable missing from a particular sensor package because the sensor that measures it malfunctioned or was removed, new variables due to new sensors added, etc. There is also the case where one of the ten thermistors also has a pressure sensor and it's present at every other deployment. The project as a whole started out in 2004 with 20 thermistors, 10 at a time spaced equally through the water column; in 2011 it was down to 10, with 5 of them at a time placed at strategic depths of interest to physical oceanographers. <br />
<br />
And that's the core problem about having this software be executable by others. The scripts are highly specific to the particular mooring with the particular sensors in their particular deployment plan. Nobody else will have the exact same set of sensors doing this exact thing. Also, each PI will be interested in seeing different types and formats of preliminary figures and data files from any other PIs, so the outputs won't necessarily make everyone equally happy either.<br />
<br />
So what should I adjust to make it more broadly useful to others? I can add code to ask a whole lot of "does X sensor have Y variable this time? If so, it's # what in the input file?" This will get very annoying to have to answer each and every time, which is why I just made a note in my processing steps instructions to check and adjust the variable order in the code directly. Can I just add to my processing steps instructions instead, and say "check and adjust these line numbers in the input file against these line numbers in the Matlab script" ?<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=0|<br />
StartDate=2015-03-21|<br />
TargetDate=2015-04-03|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=11537Make data accessible by Mimi Tzeng2015-03-21T00:07:09Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task. <br />
<br />
The question then becomes: which intermediate files should be included? I think I'll probably omit most of the pre-processing files and start with the ones that go into the perl script. Then I'll also skip a lot of the intermediate files that come out of Matlab and just go with the combined figure PDFs, especially for the ADCP where there are a lot.<br />
<br />
Also, new plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Make_data_accessible_by_Mimi_Tzeng&diff=11536Make data accessible by Mimi Tzeng2015-03-21T00:02:11Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Make data accessible]]<br/><br/><br />
<br />
So far I've signed up to FigShare and obtained explicit permission from the PI to upload the data to it. I am now waiting for the PI to reinstall Matlab so I can rerun the processing. <br />
<br />
As I recall, we are supposed to also make available all of the original raw data and intermediate files. There are many of these; should I also include a README.txt in the ultimate zip file that explains what all of these are?<br />
<br />
Answer from telecon: include just the intermediate files that might be useful to someone else, such as the *.mat files. No need to include every single raw and intermediate file for this task.<br />
<br />
New plan: going to use Zenodo instead of FigShare because it's run by CERN. The organization does matter for lending weight to legitimacy; CERN is a well-known, well-established science research institution, and FigShare seems to be a random startup... <br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=10|<br />
StartDate=2015-02-21|<br />
TargetDate=2015-03-06|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11530Document provenance of results by Mimi Tzeng2015-03-20T20:58:36Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
20 Mar 2015: Diagram is available at Zenodo: http://dx.doi.org/10.5281/zenodo.16243<br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Ensure_software_is_usable_by_Mimi_Tzeng&diff=11529Ensure software is usable by Mimi Tzeng2015-03-20T20:57:03Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Ensure software is usable]]<br/><br/><br />
<br />
<li>Data to be used: FOCAL mooring from 20110413_20110518</li><br />
<br />
<li>Software for CTDs and Thermisters: </li><br />
<ul><li>SeaTerm</li><br />
<li>SBE Data Processing</li><br />
<li>EcoWatch (for YSI)</li><br />
<li>perl</li><br />
<li>any text editor</li><br />
<li>Matlab</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
<li>Software for ADCP:</li><br />
<ul><li>WinADCP</li><br />
<li>Matlab</li><br />
<li>any text editor</li><br />
<li>pdfsam (or other PDF concatenator)</li><br />
<li>MS Word</li></ul><br />
<br />
Software seems to come in two forms:<br><br />
A) the off-the-shelf professionally-produced kind that most people think of upon seeing the word "software"<br><br />
B) the highly specific custom-made scripts that automate particular tasks for a particular lab or other working group, generally intended for internal use.<br />
<br><br><br />
The list above is for software of type A.<br><br />
Software of Type B for this project are in the form of several Matlab and perl scripts. These are gathered together in a folder; they don't have names that will mean anything to anyone else, and so they aren't listed here. <br />
<br><br><br />
Task completion: I am waiting for Matlab to be reinstalled on the PI's lab computer so that I can rerun the entire processing to make sure that everything still works as intended. I am 95% confident that all will be fine, but won't claim that the "ensure software is usable" task is 100% complete until I can do this.<br />
<br />
Update 20 Mar 2015: Today the PI got around to putting Matlab on his lab computer. I'm now much less confident that all will be fine, because the version of Matlab I used originally was (probably) from 2010, and the current version on the new PI's lab computer is 2015. Matlab has an annoying habit of changing things around between versions so that all previous scripts break in really stupid and time-consuming ways. I've already found two subtle differences (one of them will require a minor change in my perl script), and will have to look through the rest in more detail. <br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=80|<br />
StartDate=2015-02-07|<br />
TargetDate=2015-02-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Document_provenance_of_results_by_Mimi_Tzeng&diff=11528Document provenance of results by Mimi Tzeng2015-03-20T19:24:45Z<p>Mimi: Set PropertyValue: Progress = 100</p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Document the provenance of the results]]<br/><br/><br />
<br />
I am using [https://www.literatureandlatte.com/scapple.php Scapple] to make my workflow diagram. It was intended for diagramming story structure in fiction, but I've found this to be a useful tool for making entity relationship diagrams (ERDs) for relational databases and wanted to try it out for workflows. <br />
<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Open_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-03-07|<br />
TargetDate=2015-03-20|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Develop_proposal_for_special_issue&diff=11523Develop proposal for special issue2015-03-20T18:08:16Z<p>Mimi: /* [Tzeng 2015] */</p>
<hr />
<div>[[Category:Task]]<br />
<br />
== Background: Why a Special Issue on Geoscience Papers of the Future? ==<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#The_Vision | Include here our discussion for the vision]]<br />
<br />
Background should be 1-2 pages.<br />
<br />
Motivated by need to fully document and make research accessible and reproducible. <br />
<br />
=== Motivation: The EarthCube Initiative and the GeoSoft Project ===<br />
<br />
[http://www.geosoft-earthcube.org/about Include here background about GeoSoft from the web site]<br />
<br />
OSTP memo. EarthCube reports.<br />
Other reports that talk about the need for new approaches to editing.<br />
<br />
It's possible that small or very large contributions are not well captured in the current publishing paradigms. Nanopublications.<br />
<br />
For example, nano-publications are a possible way to reflect advances in a research process that may not merit a full pubication but they are useful advances to share with the community. A challenge here is that there is a stigma in publishing for publishing units that are too small or very small. <br />
<br />
Alternatively, a very large piece of research or work with many parts may be better suited to a GPF style publication.<br />
<br />
<br />
Perhaps, the concept of a 'paper' can be better reflected in the concept of a 'wrapper' or a collection of materials and resources. The purpose is to assure that publications are representative of the work, effort, and results achieved in the research process.<br />
<br />
=== What is a GPF ===<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#What_is_a_Geoscience_Paper_of_the_Future.3F | Include here our discussion of what is a GPF]]<br />
<br />
=== The challenges of creating GPFs ===<br />
<br />
The articles in this issue reflect the current best practice for generating a Geoscience Paper of the Future.<br />
<br />
'''Figure discussions''': Do we want to do exactly the same figure automatically. Figures in the paper may be a clean versions of an image generated by software. To the extent possible, authors have included clear delineations of provenance. The goal is to assure that readers may regenerate the figures using documented workflows, data, and codes. An important note (Allen, Sandra) is that frequently figures are generated by code, scripts, etc. yet the actual figure is finalized with user..... Mimi is trying to say: is it really worth belaboring the point about how the prettified version of the figure is made? If it is: both of the visualization software I've used (Matlab and SigmaPlot) have actual code in the background that specifies how to set up the prettification, and this code can be found, copied out, and rerun to generate the exact same figure with all of the prettification in the same place. SigmaPlot uses Visual Basic (I think) in its macros. If it is an important point about explicit code, this should be doable. But I'm not sure it's strictly necessary to specify exactly where all the prettifications are to get the gist across.<br />
<br />
How much of your experimental history does one include? (Ibrahim). The experimental process often ends up nowhere. Should we document all the failed experiments? Get one DOI for the results of the successful experiment? Another for failed trials?<br />
<br />
<br />
'''''Documenting: Timing and Intermediate proceses'''''<br />
When should we document and what are the bounds on what we document?<br />
For example, should we document and include data and workflows for 'failed' experiments? Or should we assign datasets DOIs before we know the results from using them? <br />
The group thinks that good ideas/practices may include documenting and sharing data when you have a clear understanding of the outcomes worth reporting. For example successful experiments should have clear, clean data documented and shared. Whereas one strategy with 'failed' experiments could include bundling the intermediate datasets with one DOI and a more general discussion of the process/methods.<br />
<br />
=== Related work ===<br />
<br />
[[Discuss_what_we_will_consider_a_GPF#New_Frameworks_to_Create_a_New_Generation_of_Scientific_Articles | Include here the related work we have discussed]]<br />
<br />
== Papers to be included ==<br />
<br />
Would it be worthwhile to group the papers into broader categories rather than giving specifics about every single paper?<br />
<br />
For each submission, we describe:<br />
<br />
* '''Authors and affiliations'''<br />
* '''Keywords of research area'''<br />
* '''Tentative title'''<br />
* '''Short abstract'''<br />
* '''Challenge'''<br />
* '''Relationship to other publications''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article'''<br />
* '''Expected submission date'''<br />
<br />
=== [David 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Cedric David]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:''' Going beyond triple-checking, allowing for peace of mind in model development.<br />
* '''Short abstract:'''<br />
* '''Challenge:''' Ensure that updates to an existing model are able to reproduce a series of simulations published previously.<br />
* '''Relationship to other publications:''' <br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Cedric_David | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Demir 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Ibrahim Demir]]<br />
* '''Keywords of research area:''' hydrologic network, optimization, network representation, database query<br />
* '''Tentative title:''' Optimization of hydrological network representation for fast access and query in web-based system<br />
* '''Short abstract:''' The article is about benchmarking various network representation techniques for optimization of hydrological network access and query. <br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' The article is based on a new study<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Ibrahim_Demir | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Fulweiler 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Wally Fulweiler]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Wally_Fulweiler | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Karlstrom and Lay 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Leif Karlstrom]] and [[Lay Kuan Loh]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Leif_Karlstrom | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Lee 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Kyo Lee]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Kyo_Lee | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Miller 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Kim Miller]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Kim_Miller | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Mills 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Heath Mills]], University of Houston Clear Lake; Brandi Kiel Reese, Texas A&M Corpus Christi<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''Iron and Sulfur Cycling Biogeography Using Advanced Geochemical and Molecular Analyses<br />
* '''Short abstract:'''<br />
* '''Challenge:''' My paper will develop and document a new pipeline to analyze a combined and robust genetic and geochemical data set. New, reproducible methods will be highlighted in this manuscript to help others better analyze similar data sets. There is a general lack of guidance within my field for such challenges. This manuscript will be unique and helpful from an analysis standpoint as well as for the science being presented.<br />
* '''Relationship to other publications:''' Original Manuscript<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Heith_Mills | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Oh 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Ji-Hyun Oh]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' This paper will cover how to reproduce two key figures from the paper that I recently submitted to Journal of Atmospheric Science. This will include detailed procedures related to generating the figures such as how/where to download data, how to transform the format of the data to be used as an input for my codes, and so on.. <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Ji_Hyun | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Pierce 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Suzanne Pierce]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' Fully document a new software application and framework using example case study data and tutorials.<br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Suzanne_Pierce | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Pope 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Allen Pope]], National Snow and Ice Data Center, University of Colorado, Boulder<br />
* '''Keywords of research area:''' Glaciology, Remote Sensing, Landsat 8, Polar Science<br />
* '''Tentative title:''' Data and Code for Estimating and Evaluating Supraglacial Lake Depth With Landsat 8 and other Multispectral Sensors<br />
* '''Short abstract:''' Supraglacial lakes play a significant role in glacial hydrological systems – for example, transporting water to the glacier bed in Greenland or leading to ice shelf fracture and disintegration in Antarctica. To investigate these important processes, multispectral remote sensing provides multiple methods for estimating supraglacial lake depth – either through single-band or band-ratio methods, both empirical and physically-based. Landsat 8 is the newest satellite in the Landsat series. With new bands, higher dynamic range, and higher radiometric resolution, the Operational Land Imager (OLI) aboard Landsat 8 has a lot of potential. <br />
<br />
This paper will document the data and code used in processing in situ reflectance spectra and depth measurements to investigate the ability of Landsat 8 to estimate lake depths using multiple methods, as well as quantify improvements over Landsat 7’s ETM+. A workflow, data, and code are provided to detail promising methods as applied to Landsat 8 OLI imagery of case study areas in Greenland, allowing calculation of regional volume estimates using 2013 and 2014 summer-season imagery. Altimetry from WorldView DEMs are used to validate lake depth estimates. The optimal method for supraglacial lake depth estimation with Landsat 8 is shown to be an average of single band depths by red and panchromatic bands. With this best method, preliminary investigation of seasonal behavior and elevation distribution of lakes is also discussed and documented.<br />
* '''Challenge:''' Reproducibility, Dark Code<br />
* '''Relationship to other publications:''' Documenting and explaining the data and code behind the analysis and results presented in another paper.<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Allen_Pope | Page]]<br />
* '''Expected submission date:''' Late June 2015<br />
<br />
=== [Read and Winslow 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Jordan Read]] and [[Luke Winslow]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' <br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Jordan_Read | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Tzeng 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Mimi Tzeng]], Brian Dzwonkowski (DISL); Kyeong Park (TAMU Galveston)<br />
* '''Keywords of research area:'''physical oceanography, remote sensing<br />
* '''Tentative title:''' Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
* '''Short abstract:'''The Fisheries Oceanography in Coastal Alabama (FOCAL) program began in 2006 as a way for scientists at Dauphin Island Sea Lab (DISL) to study the natural variability of Alabama's nearshore environment as it relates to fisheries production. FOCAL provided a long-term baseline data set that included time-series hydrographic data from a permanent offshore mooring (ADCP, vertical thermister array and CTDs at surface and bottom) and shipboard surveys (vertical CTD profiles and water sampling), as well as monthly ichthyoplankton and zooplankton (depth-discrete) sample collections at FOCAL sites. The subset of data presented here are from the mooring, and includes a vertical array of thermisters, CTDs at surface and bottom, an ADCP at the bottom, and vertical CTD profiles collected at the mooring during maintenance surveys. The mooring is located at 30 05.410'N 88 12.694'W, 25 km southwest of the entrance to Mobile Bay. Temperature, salinity, density, depth, and current velocity data were collected at 20-minute intervals from 2006 to 2012. Other parameters, such as dissolved oxygen, are available for portions of the time series depending on which instruments were deployed at the time.<br />
* '''Challenge:''' My paper will be about the processing of data in a larger dataset, from which peer-reviewed papers have been written. The processing I did was not specific to any particular paper. I can point to an example paper that used some of the data from this dataset, that I processed, however all of the figures in the paper are composites that also include other data from elsewhere that I had nothing to do with (and it wouldn't be feasible to try to get hold of the other data within our timeframe).<br />
* '''Relationship to other publications:''' A recent paper that used the part of the FOCAL data I'm documenting as the sample from the larger dataset: Dzwonkowski, Brian, Kyeong Park, Jungwoo Lee, Bret M. Webb, and Arnoldo Valle-Levinson. 2014. "Spatial variability of flow over a river-influenced inner shelf in coastal Alabama during spring." Continental Shelf Research 74:25-34.<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Mimi_Tzeng | Page]]<br />
* '''Expected submission date:'''<br />
<br />
=== [Villamizar 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Sandra Villamizar]], University of California, Merced<br />
* '''Keywords of research area:''' river ecohydrology<br />
* '''Tentative title:''' Producing long-term series of whole-stream metabolism using readily available data. <br />
* '''Short abstract:''' Continuous water quality and discharge data that are readily available through government websites may be used to produce useful information about the processes within a river ecosystem. This paper will provide a detailed description on how to produce a long-term series of whole stream metabolism for the case of the restoration reach of the San Joaquin River in California. <br />
* '''Challenge:''' Document new software/applications<br />
* '''Relationship to other publications:''' This will be a new publication<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Sandra_Villamizar | Page]]<br />
* '''Expected submission date:''' To be defined<br />
<br />
=== [Yu 2015] ===<br />
<br />
* '''Authors and affiliations:''' [[Xuan Yu]]<br />
* '''Keywords of research area:'''<br />
* '''Tentative title:'''<br />
* '''Short abstract:'''<br />
* '''Challenge:''' Reproduce published simulations by a existing model with a latest version. Benchmarking modeling application for numerical experiment and field data.<br />
* '''Relationship to other publications:''' (is the article based on a previously published article? is it new content?)<br />
* '''Pointer to the wiki page that documents the article:''' [[Document_GPF_activities_by_Xuan_Yu | Page]]<br />
* '''Expected submission date:'''<br />
<br />
== Special Issue Editors ==<br />
<br />
* Co-editor: Chris Duffy and/or Scott Peckham<br />
* Co-editor: Cedric David<br />
* Co-editor: possibly Karan Venayagamoorthy<br />
<br />
The editors will only accept submissions that follow the [[Develop_proposal_for_special_issue#Special_Issue_Review_Criteria | special issue review criteria]].<br />
<br />
The editors will select a set of reviewers to handle the submissions. Reviewers will include computer scientists, library scientists, and geoscientists.<br />
<br />
== Special Issue Review Criteria ==<br />
<br />
The reviewers will be asked to provide feedback on the papers according to the following criteria. Note that some papers will have good reasons for limiting the information (e.g. the data is from third parties and not openly available, etc), and in that case they would document those reasons.<br />
<br />
* Documentation of the datasets: descriptions of datasets, unique identifiers, repositories.<br />
* Documentation of software: description of all software used (including pre-processing of data, visualization steps, etc), unique identifiers, repositories.<br />
* Documentation of the provenance of results: provenance for each figure or result, such as the workflow or the provenance record.<br />
<br />
== Tentative Timeline ==<br />
<br />
* Journal committed to special issue: April 15, 2015<br />
* Submissions due to editors: June 30, 2015<br />
* Reviews due: Sept 15, 2015<br />
* Decisions out to authors: Sept 30, 2015<br />
* Revisions due: October 31, 2015<br />
* Final versions due November 15, 2015<br />
* Issue published December 31, 2015<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Owner=Chris_Duffy|<br />
Participants=Yolanda_Gil|<br />
Participants=Scott_Peckham|<br />
Participants=Cedric_David|<br />
Participants=Ibrahim_Demir|<br />
Participants=Wally_Fulweiler|<br />
Participants=Leif_Karlstrom|<br />
Participants=Kyo_Lee|<br />
Participants=Kim_Miller|<br />
Participants=Heath_Mills|<br />
Participants=Ji-Hyun_Oh|<br />
Participants=Suzanne_Pierce|<br />
Participants=Allen_Pope|<br />
Participants=Jordan_Read|<br />
Participants=Mimi_Tzeng|<br />
Participants=Sandra_Villamizar|<br />
Participants=Xuan_Yu|<br />
Progress=20|<br />
StartDate=2015-03-10|<br />
TargetDate=2015-03-16|<br />
Type=Low}}</div>Mimihttps://www.organicdatascience.org/gpf/index.php?title=Select_target_article_by_Mimi_Tzeng&diff=11522Select target article by Mimi Tzeng2015-03-20T18:02:34Z<p>Mimi: </p>
<hr />
<div>[[Category:Task]]<br />
<br/><b>Details on how to do this task:</b> [[Select target article]]<br/><br/><br />
'''Article:'''<br />
<br />
Fisheries Oceanography of Coastal Alabama (FOCAL): A Subset of a Time-Series of Hydrographic and Current Data from a Permanent Moored Station Outside Mobile Bay (27 Jan to 18 May 2011)<br />
<br />
[http://www.ncddc.noaa.gov/approved_recs/org/disl/FOCAL/Park/FOCAL-data/Mooring-FOCAL.html FGDC Metadata Record]<br />
<br />
<!-- Add any wiki Text above this Line --><br />
<!-- Do NOT Edit below this Line --><br />
{{#set:<br />
Expertise=Ocean_science|<br />
Expertise=Geosciences|<br />
Owner=Mimi_Tzeng|<br />
Progress=100|<br />
StartDate=2015-02-01|<br />
TargetDate=2015-02-06|<br />
Type=Low}}</div>Mimi