datalad.api.catalog

datalad.api.catalog()

Generate a user-friendly web-based data catalog from structured metadata.

datalad catalog can be used to -create a new catalog, -add and -remove metadata entries to/from an existing catalog, start a local http server to -serve an existing catalog locally. It can also -validate a metadata entry (validation is also performed implicitly when adding), -set dataset properties such as the home page to be shown by default, and -get dataset properties such as the config, specific metadata, or the home page.

Metadata can be provided to DataLad Catalog from any number of arbitrary metadata sources, as an aggregated set or as individual metadata items. DataLad Catalog has a dedicated schema (using the JSON Schema vocabulary) against which incoming metadata items are validated. This schema allows for standard metadata fields as one would expect for datasets of any kind (such as name, doi, url, description, license, authors, and more), as well as fields that support identification, versioning, dataset context and linkage, and file tree specification.

The output is a set of structured metadata files, as well as a Vue.js-based browser interface that understands how to render this metadata in the browser. These can be hosted on a platform of choice as a static webpage.

Note: in the catalog website, each dataset entry is displayed under <main page>/#/dataset/<dataset_id>/<dataset_version>. By default, the main page of the catalog will display a 404 error, unless the default dataset is configured with datalad catalog-set home.

Examples

CREATE a new catalog from scratch:

> catalog_create(catalog='/tmp/my-cat')

ADD metadata to an existing catalog:

> catalog_add(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')

SET a property of an existing catalog, such as the home page of an existing catalog - i.e. the first dataset displayed when navigating to the root URL of the catalog:

> catalog_set(property='home', catalog='/tmp/my-cat', dataset_id='abcd', dataset_version='1234')

SERVE the content of the catalog via a local HTTP server at http://localhost:8001:

> catalog_serve(catalog='/tmp/my-cat/', port=8001)

VALIDATE metadata against a catalog schema without adding it to the catalog:

> catalog_validate(catalog='/tmp/my-cat/',metadata='path/to/metadata.jsonl')

GET a property of an existing catalog, such as the catalog configuration:

> catalog_get(property='config', catalog='/tmp/my-cat/')

REMOVE a specific metadata record from an existing catalog:

> catalog_remove(catalog='/tmp/my-cat', dataset_id='efgh', dataset_version='5678')

TRANSLATE a metalad-extracted metadata item from a particular source structure into the catalog schema. A dedicated translator should be provided and exposed as an entry point (e.g. via a DataLad extension) as part of the 'datalad.metadata.translators' group.:

> catalog_translate(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')

RUN A WORKFLOW for recursive metadata extraction (using datalad- metalad), translating metadata to the catalog schema, and adding the translated metadata to a new catalog:

> catalog_workflow(mode='new', catalog='/tmp/my-cat/', dataset='path/to/superdataset', extractor='metalad_core')

RUN A WORKFLOW for updating a catalog after registering a subdataset to the superdataset which the catalog represents. This workflow includes extraction (using datalad-metalad), translating metadata to the catalog schema, and adding the translated metadata to the existing catalog.:

> catalog_workflow(mode='update', catalog='/tmp/my-cat/', dataset='path/to/superdataset', subdataset='path/to/subdataset', extractor='metalad_core')