datalad.api.catalog
- datalad.api.catalog()
Generate a user-friendly web-based data catalog from structured metadata.
datalad catalogcan be used to-createa new catalog,-addand-removemetadata entries to/from an existing catalog, start a local http server to-servean existing catalog locally. It can also-validatea metadata entry (validation is also performed implicitly when adding),-setdataset properties such as thehomepage to be shown by default, and-getdataset properties such as theconfig, specificmetadata, or thehomepage.Metadata can be provided to DataLad Catalog from any number of arbitrary metadata sources, as an aggregated set or as individual metadata items. DataLad Catalog has a dedicated schema (using the JSON Schema vocabulary) against which incoming metadata items are validated. This schema allows for standard metadata fields as one would expect for datasets of any kind (such as name, doi, url, description, license, authors, and more), as well as fields that support identification, versioning, dataset context and linkage, and file tree specification.
The output is a set of structured metadata files, as well as a Vue.js-based browser interface that understands how to render this metadata in the browser. These can be hosted on a platform of choice as a static webpage.
Note: in the catalog website, each dataset entry is displayed under
<main page>/#/dataset/<dataset_id>/<dataset_version>. By default, the main page of the catalog will display a 404 error, unless the default dataset is configured withdatalad catalog-set home.Examples
CREATE a new catalog from scratch:
> catalog_create(catalog='/tmp/my-cat')
ADD metadata to an existing catalog:
> catalog_add(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')
SET a property of an existing catalog, such as the home page of an existing catalog - i.e. the first dataset displayed when navigating to the root URL of the catalog:
> catalog_set(property='home', catalog='/tmp/my-cat', dataset_id='abcd', dataset_version='1234')
SERVE the content of the catalog via a local HTTP server at http://localhost:8001:
> catalog_serve(catalog='/tmp/my-cat/', port=8001)
VALIDATE metadata against a catalog schema without adding it to the catalog:
> catalog_validate(catalog='/tmp/my-cat/',metadata='path/to/metadata.jsonl')
GET a property of an existing catalog, such as the catalog configuration:
> catalog_get(property='config', catalog='/tmp/my-cat/')
REMOVE a specific metadata record from an existing catalog:
> catalog_remove(catalog='/tmp/my-cat', dataset_id='efgh', dataset_version='5678')
TRANSLATE a metalad-extracted metadata item from a particular source structure into the catalog schema. A dedicated translator should be provided and exposed as an entry point (e.g. via a DataLad extension) as part of the 'datalad.metadata.translators' group.:
> catalog_translate(catalog='/tmp/my-cat', metadata='path/to/metadata.jsonl')
RUN A WORKFLOW for recursive metadata extraction (using datalad- metalad), translating metadata to the catalog schema, and adding the translated metadata to a new catalog:
> catalog_workflow(mode='new', catalog='/tmp/my-cat/', dataset='path/to/superdataset', extractor='metalad_core')
RUN A WORKFLOW for updating a catalog after registering a subdataset to the superdataset which the catalog represents. This workflow includes extraction (using datalad-metalad), translating metadata to the catalog schema, and adding the translated metadata to the existing catalog.:
> catalog_workflow(mode='update', catalog='/tmp/my-cat/', dataset='path/to/superdataset', subdataset='path/to/subdataset', extractor='metalad_core')