Metadata

Overview

DataLad has built-in, modular, and extensible support for metadata in various formats.

Supported metadata formats

This following sections provide an overview of supported metadata formats.

RFC822-compliant metadata

This is a custom metadata format, inspired by the standard used for Debian software packages that is particularly suited for manual entry. This format is a good choice when metadata describing a dataset as a whole cannot be obtained from some other structured format. The syntax is RFC 822-compliant. In other words: this is a text-based format that uses the syntax of email headers. Metadata must be placed in DATASETROOT/.datalad/meta.rfc822 for this format.

Here is an example:

Name: myamazingdataset
Version: 1.0.0-rc3
Description: Basic summary
 A text with arbitrary length and content that can span multiple
 .
 paragraphs (this is a new one)
License: CC0
 The person who associated a work with this deed has dedicated the work to the
 public domain by waiving all of his or her rights to the work worldwide under
 copyright law, including all related and neighboring rights, to the extent
 allowed by law.
 .
 You can copy, modify, distribute and perform the work, even for commercial
 purposes, all without asking permission.
Homepage: http://example.com
Funding: Grandma's and Grandpa's support
Issue-Tracker: https://github.com/datalad/datalad/issues
Cite-As: Mike Author (2016). We made it. The breakthrough journal of unlikely
  events. 1, 23-453.
DOI: 10.0000/nothere.48421

The following fields are supported:

Audience:
A description of the target audience of the dataset.
Author:
A comma-delimited list of authors of the dataset, preferably in the format. Firstname Lastname <Email Adress>
Cite-as:
Instructions on how to cite the dataset, or a structured citation.
Description:
Description of the dataset as a whole. The first line should represent a compact short description with no more than 6-8 words.
DOI:
A digital object identifier for the dataset.
Funding:
Information on potential funding for the creation of the dataset and/or its content. This field can also be used to acknowledge non-monetary support.
Homepage:
A URL to a project website for the dataset.
Issue-tracker:
A URL to an issue tracker where known problems are documented and/or new reports can be submitted.
License:
A description of the license or terms of use for the dataset. The first lines should contain a list of license labels (e.g. CC0, PPDL) for standard licenses, if possible. Full license texts or term descriptions can be included.
Maintainer:
Can be used in addition and analog to Author, when authors (creators of the data) need to be distinguished from maintainers of the dataset.
Name:
A short name for the dataset. It may be beneficial to avoid special characters, umlauts, spaces, etc. to enable widespread use of this name for URL, catalog keys, etc. in unmodified form.
Version:
A version for the dataset. This should be in a format that is alphanumerically sortable and lead to a “greater” version for an update of a dataset.

Brain Imaging Data Structure (BIDS)

DataLad has basic support for extraction of metadata from the BIDS dataset_description.json file.

Friction-less data packages

DataLad has basic support for extraction of metadata from friction-less data packages (datapackage.json). file.

Vocabulary

The following sections describe details and changes in the metadata specifications implemented in datalad.

v0.1

  • Original implementation