datalad.api.meta_conduct

datalad.api.meta_conduct(configuration: Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]], List[Union[str, int, float, bool, None, Dict[str, Union[str, int, float, bool, None, Dict[str, Any], List[Any]]], List[Union[str, int, float, bool, None, Dict[str, Any], List[Any]]]]]]]]]], arguments: List[str], max_workers: Optional[int] = None, processing_mode: str = 'process', pipeline_help: bool = False)

Conduct the execution of a processing pipeline

A processing pipeline is a metalad-specific application of the Unix shell philosophy, have a number of small programs that do one thing, but that one thing, very well.

Processing pipelines consist of:

  • A provider, that provides data that should be processed
  • A list of processors. A processor reads data, either from the previous processor or the provider and performs computations on the data and return a result that is processed by the next processor. The computation may have side-effect, e.g. store metadata.

The provider is usually executed in the current processes’ main thread. Processors are usually executed in concurrent processes, i.e. workers. The maximum number of workers is given by the parameter max_workers.

Which provider and which processors are used is defined in an “configuration”, which is given as JSON-serialized dictionary.

Examples

Parameters:
  • configuration – Path to a file with contains the pipeline configuration as JSON- serialized object. If the path is “-”, the configuration is read from standard input.
  • arguments – Constructor arguments for pipeline elements, i.e. provider, processors, and consumer. The arguments have to be prefixed with the name of the pipeline element, followed by “.”, the keyname, a “=”, and the value. The pipeline element arguments are identified by the pattern “<name>.<key>=<value>”.
  • max_workers (int or None, optional) – maximum number of workers. [Default: None]
  • processing_mode ({'process', 'thread', 'sequential'}, optional) – Specify how elements are executed, either in subprocesses, in threads, or sequentially in the main thread. The respective values are “process”, “thread”, and “sequential”, (default: “process”). [Default: ‘process’]
  • pipeline_help (bool, optional) – Show documentation for the elements in the pipeline and exit. [Default: False]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]