datalad.api.subdatasets

datalad.api.subdatasets(path=None, *, dataset=None, state='any', fulfilled=None(DEPRECATED), recursive=False, recursion_limit=None, contains=None, bottomup=False, set_property=None, delete_property=None)

Report subdatasets and their properties.

The following properties are reported (if possible) for each matching subdataset record.

“name”

Name of the subdataset in the parent (often identical with the relative path in the parent dataset)

“path”

Absolute path to the subdataset

“parentds”

Absolute path to the parent dataset

“gitshasum”

SHA1 of the subdataset commit recorded in the parent dataset

“state”

Condition of the subdataset: ‘absent’, ‘present’

“gitmodule_url”

URL of the subdataset recorded in the parent

“gitmodule_name”

Name of the subdataset recorded in the parent

“gitmodule_<label>”

Any additional configuration property on record.

Performance note: Property modification, requesting bottomup reporting order, or a particular numerical recursion_limit implies an internal switch to an alternative query implementation for recursive query that is more flexible, but also notably slower (performs one call to Git per dataset versus a single call for all combined).

The following properties for subdatasets are recognized by DataLad (without the ‘gitmodule_’ prefix that is used in the query results):

“datalad-recursiveinstall”

If set to ‘skip’, the respective subdataset is skipped when DataLad is recursively installing its superdataset. However, the subdataset remains installable when explicitly requested, and no other features are impaired.

“datalad-url”

If a subdataset was originally established by cloning, ‘datalad-url’ records the URL that was used to do so. This might be different from ‘url’ if the URL contains datalad specific pieces like any URL of the form “ria+<some protocol>…”.

Parameters:
  • path (sequence of str or None, optional) – path/name to query for subdatasets. Defaults to the current directory, or the entire dataset if called as a dataset method. [Default: None]

  • dataset (Dataset or None, optional) – specify the dataset to query. If no dataset is given, an attempt is made to identify the dataset based on the input and/or the current working directory. [Default: None]

  • state ({'present', 'absent', 'any'}, optional) – indicate which (sub)datasets to consider: either only locally present, absent, or any of those two kinds. [Default: ‘any’]

  • fulfilled (bool or None, optional) – DEPRECATED: use state instead. If given, must be a boolean flag indicating whether to consider either only locally present or absent datasets. By default all subdatasets are considered regardless of their status. [Default: None(DEPRECATED)]

  • recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]

  • recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]

  • contains (list of str or None, optional) – limit to the subdatasets containing the given path. If a root path of a subdataset is given, the last considered dataset will be the subdataset itself. Can be a list with multiple paths, in which case datasets that contain any of the given paths will be considered. [Default: None]

  • bottomup (bool, optional) – whether to report subdatasets in bottom-up order along each branch in the dataset tree, and not top-down. [Default: False]

  • set_property (list of 2-item sequence of str or None, optional) – Name and value of one or more subdataset properties to be set in the parent dataset’s .gitmodules file. The property name is case- insensitive, must start with a letter, and consist only of alphanumeric characters. The value can be a Python format() template string wrapped in ‘<>’ (e.g. ‘<{gitmodule_name}>’). Supported keywords are any item reported in the result properties of this command, plus ‘refds_relpath’ and ‘refds_relname’: the relative path of a subdataset with respect to the base dataset of the command call, and, in the latter case, the same string with all directory separators replaced by dashes. [Default: None]

  • delete_property (list of str or None, optional) – Name of one or more subdataset properties to be removed from the parent dataset’s .gitmodules file. [Default: None]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]

  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]