datalad.api.catalog
- datalad.api.catalog()
Generate a user-friendly web-based data catalog from structured metadata.
datalad catalog
can be used to-create
a new catalog,-add
and-remove
metadata entries to/from an existing catalog, start a local http server to-serve
an existing catalog locally. It can also-validate
a metadata entry (validation is also performed implicitly when adding),-set
dataset properties such as thehome
page to be shown by default, and-get
dataset properties such as theconfig
, specificmetadata
, or thehome
page.Metadata can be provided to DataLad Catalog from any number of arbitrary metadata sources, as an aggregated set or as individual metadata items. DataLad Catalog has a dedicated schema (using the JSON Schema vocabulary) against which incoming metadata items are validated. This schema allows for standard metadata fields as one would expect for datasets of any kind (such as name, doi, url, description, license, authors, and more), as well as fields that support identification, versioning, dataset context and linkage, and file tree specification.
The output is a set of structured metadata files, as well as a Vue.js-based browser interface that understands how to render this metadata in the browser. These can be hosted on a platform of choice as a static webpage.
Note: in the catalog website, each dataset entry is displayed under
<main page>/#/dataset/<dataset_id>/<dataset_version>
. By default, the main page of the catalog will display a 404 error, unless the default dataset is configured withdatalad catalog-set home
.Examples
CREATE a new catalog from scratch:
> catalog_create(catalog='/tmp/my-cat')
ADD metadata to an existing catalog:
> catalog_add(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')
SET a property of an existing catalog, such as the home page of an existing catalog - i.e. the first dataset displayed when navigating to the root URL of the catalog:
> catalog_set(property='home', catalog='/tmp/my-cat', dataset_id='abcd', dataset_version='1234')
SERVE the content of the catalog via a local HTTP server at http://localhost:8001:
> catalog_serve(catalog='/tmp/my-cat/', port=8001)
VALIDATE metadata against a catalog schema without adding it to the catalog:
> catalog_validate(catalog='/tmp/my-cat/',metadata='path/to/metadata.jsonl')
GET a property of an existing catalog, such as the catalog configuration:
> catalog_get(property='config', catalog='/tmp/my-cat/')
REMOVE a specific metadata record from an existing catalog:
> catalog_remove(catalog='/tmp/my-cat', dataset_id='efgh', dataset_version='5678')
TRANSLATE a metalad-extracted metadata item from a particular source structure into the catalog schema. A dedicated translator should be provided and exposed as an entry point (e.g. via a DataLad extension) as part of the 'datalad.metadata.translators' group.:
> catalog_translate(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')
RUN A WORKFLOW for recursive metadata extraction (using datalad- metalad), translating metadata to the catalog schema, and adding the translated metadata to a new catalog:
> catalog_workflow(mode='new', catalog='/tmp/my-cat/', dataset='path/to/superdataset', extractor='metalad_core')
RUN A WORKFLOW for updating a catalog after registering a subdataset to the superdataset which the catalog represents. This workflow includes extraction (using datalad-metalad), translating metadata to the catalog schema, and adding the translated metadata to the existing catalog.:
> catalog_workflow(mode='update', catalog='/tmp/my-cat/', dataset='path/to/superdataset', subdataset='path/to/subdataset', extractor='metalad_core')