datalad.api.meta_extract

datalad.api.meta_extract(extractorname: str, path: Optional[str] = None, dataset: Union[datalad.distribution.dataset.Dataset, str, None] = None, context: Union[str, Dict[str, str], None] = None, get_context: bool = False, extractorargs: Optional[List[str]] = None)

Run a metadata extractor on a dataset or file.

This command distinguishes between dataset-level extraction and file-level extraction.

If no “path” argument is given, the command assumes that a given extractor is a dataset-level extractor and executes it on the dataset that is given by the current working directory or by the “-d” argument.

If a path is given, the command assumes that the path identifies a file and that the given extractor is a file-level extractor, which will then be executed on the specified file. If the file level extractor requests the content of a file that is not present, the command might “get” the file content to make it locally available. Path must not refer to a sub-dataset. Path must not be a directory.

Note

If you want to insert sub-dataset-metadata into the super-dataset’s metadata, you currently have to do the following: first, extract dataset metadata of the sub-dataset using a dataset- level extractor, second add the extracted metadata with sub-dataset information (i.e. dataset_path, root_dataset_id, root-dataset- version) to the metadata of the super-dataset.

The extractor configuration can be parameterized with key-value pairs given as additional arguments. Each key-value pair consists of two arguments, first the key, followed by the value. If no path is given, and you want to provide key-value pairs, you have to give the path “++”, to prevent that the first key is interpreted as path.

The results are written into the repository of the source dataset or into the repository of the dataset given by the “-i” parameter. If the same extractor is executed on the same element (dataset or file) with the same configuration, any existing results will be overwritten.

The command can also take legacy datalad-metalad extractors and will execute them in either “content” or “dataset” mode, depending on the presence of the “path”-parameter.

Examples

Parameters:
  • extractorname – Name of a metadata extractor to be executed.
  • path (str or None, optional) – Path of a file or dataset to extract metadata from. If this argument is provided, we assume a file extractor is requested, if the path is not given, or if it identifies the root of a dataset, i.e. “”, we assume a dataset level metadata extractor is specified. [Default: None]
  • dataset (Dataset or None, optional) – Dataset to extract metadata from. If no dataset is given, the dataset is determined by the current work directory. [Default: None]
  • context (Dataset or None, optional) – Context, a JSON-serialized dictionary that provides constant data which has been gathered before, so meta-extract will not have re- gather this data. Keys and values are strings. meta-extract will look for the following key: ‘dataset_version’. [Default: None]
  • get_context (bool, optional) – Show the context that meta-extract determines with the given parameters and exit. The context can be used in subsequent calls to meta-extract with identical parameter, except from –get-context, to speed up the execution of meta-extract. [Default: False]
  • extractorargs (sequence of str or None, optional) – Extractor arguments. [Default: None]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: ‘tailored’]
  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]