Make data accessible

Has No Subtasks

TypeM

low

ProgressM

90%

Start dateM

21st Feb 2015

Target dateM

6th Mar 2015

OwnerM

Yolanda Gil

Participants

Erin Robinson

Expertise

open science

Legend: M Mandatory | States: ■ Not defined, ■ Valid, ■ Inconsistent with parent

What This Task Involves

The training session and training materials indicate how to:

Get a permanent unique identifier for your dataset in a public repository
Specify general (creator, license, version) and domain metadata (categories, tags)
Upload or specify a pointer to the dataset

Training Materials

This training session will be held on February 20, 2015:

Presentation

What To Do

We described many options in the training. Here is a sketch of the most common approach:

Create a public entry for your dataset with a permanent unique identifier.
1. Select a repository
  - Option 1: Find a repository that your community uses
  - Option 2: Go to figshare.com or zenodo.org (supported by CERN) or similarly free service, create an account. Figshare has 250MB file limits and 1GB private storage, but unlimited open storage. Zenodo allows files up to 2GB (with the potential for higher, if you talk to the site managers) and no current total storage limit.
2. Create an entry for each of your datasets
  1. Specify the metadata
  2. Include license information: choose from Creative Commons, for example CC-BY or CC0.
  3. Upload or point to the data
3. The repository should give you a unique identifier (a DOI)
Create a data citation for each of your datasets
- Include: authors, date of publication, dataset name, repository name, permanent unique identifier, timestamp of retrieval.
- Specify the data citation in the repository entry for each dataset, so others can use it
Include the data citations in the GPF

Some interesting cases that you may run into:

I have several related datasets in several files (e.g., each file has data for a time period)
- Create a DOI for each file and a DOI for the whole set. If there are too many files (dozens or hundreds, it may be best to create a DOI for the whole set.
My data is in a public repository, it is not my data
- Create a DOI for the slice of data that you use. Describe the data by specifying the query that you did to the repository and put a pointer to the repository, so others can also retrieve it.
My data is from a database
- Ask for permission to publish the data that you extracted, and mention that you will give appropriate credit. Get an understanding of the appropriate license to use. Put the data in a file and publish it.
Some of the data that I use is from a colleague
- Encourage them to make the data public in Figshare or any public repository, and offer to help. Explain to them how the license works. If they do not want to make the data public, that is ok. In that case, you should create an entry that does not have the data but at least describes it with all the metadata, which would include information about your colleague as the data creator and other information about how to get the data from them.
My data comes from many sources
- Credit each source, create repository entries as needed
- An option is to create in the paper a table with “microattributions” that summarize each data source
My data has many versions (e.g., sensors that collect more data over time)
- Create an entry for either each slice or each snapshot
My datasets are very large
- Leave the datasets in a repository that can contain data of that size, or put the data in a publicly accessible URL. Then get a PURL at [1], and create an entry in Figshare or similar pointing to that PURL.

Properties

Incoming Properties

Document GPF activitiesSubTaskMake data accessible

Credits

Users who have contributed to this Task, its SubTasks and Answers:

Yolanda (18 Edits)
Suzanne (2 Edits)
Allen (1 Edits)

Contents

What This Task Involves

Training Materials

Suggested Readings

What To Do

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools