datalad.api.subdatasets

datalad.api.subdatasets(path=None, *, dataset=None, state='any', fulfilled=None(DEPRECATED), recursive=False, recursion_limit=None, contains=None, bottomup=False, set_property=None, delete_property=None)

Report subdatasets and their properties.

The following properties are reported (if possible) for each matching subdataset record.

“name”: Name of the subdataset in the parent (often identical with the relative path in the parent dataset)
“path”: Absolute path to the subdataset
“parentds”: Absolute path to the parent dataset
“gitshasum”: SHA1 of the subdataset commit recorded in the parent dataset
“state”: Condition of the subdataset: ‘absent’, ‘present’
“gitmodule_url”: URL of the subdataset recorded in the parent
“gitmodule_name”: Name of the subdataset recorded in the parent
“gitmodule_<label>”: Any additional configuration property on record.

Performance note: Property modification, requesting bottomup reporting order, or a particular numerical recursion_limit implies an internal switch to an alternative query implementation for recursive query that is more flexible, but also notably slower (performs one call to Git per dataset versus a single call for all combined).

The following properties for subdatasets are recognized by DataLad (without the ‘gitmodule_’ prefix that is used in the query results):

“datalad-recursiveinstall”: If set to ‘skip’, the respective subdataset is skipped when DataLad is recursively installing its superdataset. However, the subdataset remains installable when explicitly requested, and no other features are impaired.
“datalad-url”: If a subdataset was originally established by cloning, ‘datalad-url’ records the URL that was used to do so. This might be different from ‘url’ if the URL contains datalad specific pieces like any URL of the form “ria+<some protocol>…”.

Parameters:

path (sequence of str or None, optional) – path/name to query for subdatasets. Defaults to the current directory, or the entire dataset if called as a dataset method. [Default: None]
dataset (Dataset or None, optional) – specify the dataset to query. If no dataset is given, an attempt is made to identify the dataset based on the input and/or the current working directory. [Default: None]
state ({'present', 'absent', 'any'}, optional) – indicate which (sub)datasets to consider: either only locally present, absent, or any of those two kinds. [Default: ‘any’]
fulfilled (bool or None, optional) – DEPRECATED: use state instead. If given, must be a boolean flag indicating whether to consider either only locally present or absent datasets. By default all subdatasets are considered regardless of their status. [Default: None(DEPRECATED)]
recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]
recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]
contains (list of str or None, optional) – limit to the subdatasets containing the given path. If a root path of a subdataset is given, the last considered dataset will be the subdataset itself. Can be a list with multiple paths, in which case datasets that contain any of the given paths will be considered. [Default: None]
bottomup (bool, optional) – whether to report subdatasets in bottom-up order along each branch in the dataset tree, and not top-down. [Default: False]
set_property (list of 2-item sequence of str or None, optional) – Name and value of one or more subdataset properties to be set in the parent dataset’s .gitmodules file. The property name is case- insensitive, must start with a letter, and consist only of alphanumeric characters. The value can be a Python format() template string wrapped in ‘<>’ (e.g. ‘<{gitmodule_name}>’). Supported keywords are any item reported in the result properties of this command, plus ‘refds_relpath’ and ‘refds_relname’: the relative path of a subdataset with respect to the base dataset of the command call, and, in the latter case, the same string with all directory separators replaced by dashes. [Default: None]
delete_property (list of str or None, optional) – Name of one or more subdataset properties to be removed from the parent dataset’s .gitmodules file. [Default: None]
on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]