DataLad extension for semantic metadata handling¶
This software is a DataLad extension that equips DataLad with an alternative command suite for metadata handling (extraction, aggregation, reporting). It is backward-compatible with the metadata storage format in DataLad proper, while being substantially more performant (especially on large dataset hierarchies). Additionally, it provides new metadata extractors and improved variants of DataLad’s own ones that are tuned for better performance and richer, JSON-LD compliant metadata reports.
API¶
High-level API commands¶
These commands provide and improved and extended equivalent to the metadata and aggregate_metadata commands (and the primitive extract-metadata plugin) that ship with the DataLad core package.
meta_extract (extractorname, path, dataset, …) |
Run a metadata extractor on a dataset or file. |
meta_aggregate ([dataset, path]) |
Aggregate metadata of one or more sub-datasets for later reporting. |
meta_dump ([dataset, path, recursive]) |
Dump a dataset’s aggregated metadata for dataset and file metadata |
MetadataRecord extractors¶
To use any of the contained extractors their names needs to be prefixed with metalad_, such that the runprov extractor is effectively named metalad_runprov.
core |
MetadataRecord extractor for Datalad’s own core storage |
annex |
MetadataRecord extractor for Git-annex metadata |
custom |
MetadataRecord extractor for custom (JSON-LD) metadata contained in a dataset |
runprov |
MetadataRecord extractor for provenance information in DataLad’s run records |
Acknowledgments¶
DataLad development is being performed as part of a US-German collaboration in computational neuroscience (CRCNS) project “DataGit: converging catalogues, warehouses, and deployment logistics into a federated ‘data distribution’” (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411). Additional support is provided by the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform
DataLad is built atop the git-annex software that is being developed and maintained by Joey Hess.