datalad.api.meta_extract

datalad.api.meta_extract(extractorname: str, path: Optional[str] = None, dataset: Union[datalad.distribution.dataset.Dataset, str, None] = None, context: Union[str, Dict[str, str], None] = None, get_context: bool = False, extractorargs: Optional[List[str]] = None)

Run a metadata extractor on a dataset or file.

This command distinguishes between dataset-level extraction and file-level extraction.

If no “path” argument is given, the command assumes that a given extractor is a dataset-level extractor and executes it on the dataset that is given by the current working directory or by the “-d” argument.

If a path is given, the command assumes that the path identifies a file and that the given extractor is a file-level extractor, which will then be executed on the specified file. If the file level extractor requests the content of a file that is not present, the command might “get” the file content to make it locally available. Path must not refer to a sub-dataset. Path must not be a directory.

Note

If you want to insert sub-dataset-metadata into the super-dataset’s metadata, you currently have to do the following: first, extract dataset metadata of the sub-dataset using a dataset- level extractor, second add the extracted metadata with sub-dataset information (i.e. dataset_path, root_dataset_id, root-dataset- version) to the metadata of the super-dataset.

The extractor configuration can be parameterized with key-value pairs given as additional arguments. Each key-value pair consists of two arguments, first the key, followed by the value. If no path is given, and you want to provide key-value pairs, you have to give the path “++”, to prevent that the first key is interpreted as path.

The command can also take legacy datalad-metalad extractors and will execute them in either “content” or “dataset” mode, depending on the presence of the “path”-parameter.

Examples

Parameters:
  • extractorname – Name of a metadata extractor to be executed.
  • path (str or None, optional) – Path of a file or dataset to extract metadata from. If this argument is provided, we assume a file extractor is requested, if the path is not given, or if it identifies the root of a dataset, i.e. “”, we assume a dataset level metadata extractor is specified. [Default: None]
  • dataset (Dataset or None, optional) – Dataset to extract metadata from. If no dataset is given, the dataset is determined by the current work directory. [Default: None]
  • context (Dataset or None, optional) – Context, a JSON-serialized dictionary that provides constant data which has been gathered before, so meta-extract will not have re- gather this data. Keys and values are strings. meta-extract will look for the following key: ‘dataset_version’. [Default: None]
  • get_context (bool, optional) – Show the context that meta-extract determines with the given parameters and exit. The context can be used in subsequent calls to meta-extract with identical parameter, except from –get-context, to speed up the execution of meta-extract. [Default: False]
  • extractorargs (sequence of str or None, optional) – Extractor arguments given as string arguments to the extractor. If dataset level extraction is performed, i.e. no path is required, specify ‘–’ as path to prevent interpretation of the first extractor argument as path. [Default: None]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]