datalad.api.metadata

datalad.api.metadata(path=None, dataset=None, get_aggregates=False, reporton='all', recursive=False)

Metadata reporting for files and entire datasets

Two types of metadata are supported:

  1. metadata describing a dataset as a whole (dataset-global metadata), and
  2. metadata for files in a dataset (content metadata).

Both types can be accessed with this command.

Examples

Report the metadata of a single file, as aggregated into the closest locally available dataset, containing the query path:

% datalad metadata somedir/subdir/thisfile.dat

Sometimes it is helpful to get metadata records formatted in a more accessible form, here as pretty-printed JSON:

% datalad -f json_pp metadata somedir/subdir/thisfile.dat

Same query as above, but specify which dataset to query (must be containing the query path):

% datalad metadata -d . somedir/subdir/thisfile.dat

Report any metadata record of any dataset known to the queried dataset:

% datalad metadata --recursive --reporton datasets

Get a JSON-formatted report of aggregated metadata in a dataset, incl. information on enabled metadata extractors, dataset versions, dataset IDs, and dataset paths:

% datalad -f json metadata --get-aggregates
Parameters:
  • path (sequence of str or None, optional) – path(s) to query metadata for. [Default: None]
  • dataset (Dataset or None, optional) – dataset to query. If given, metadata will be reported as stored in this dataset. Otherwise, the closest available dataset containing a query path will be consulted. [Default: None]
  • get_aggregates (bool, optional) – if set, yields all (sub)datasets for which aggregate metadata are available in the dataset. No other action is performed, even if other arguments are given. The reported results contain a datasets’s ID, the commit hash at which metadata aggregation was performed, and the location of the object file(s) containing the aggregated metadata. [Default: False]
  • reporton ({'all', 'datasets', 'files', 'none'}, optional) – choose on what type result to report on: ‘datasets’, ‘files’, ‘all’ (both datasets and files), or ‘none’ (no report). [Default: ‘all’]
  • recursive (bool, optional) – if set, recurse into potential subdataset. [Default: False]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: ‘tailored’]
  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]