datalad.api.tree

datalad.api.tree(path='.', *, depth=None, recursive=False, recursion_limit=None, include_files=False, include_hidden=False)

Visualize directory and dataset hierarchies

This command mimics the UNIX/MS-DOS 'tree' utility to generate and display a directory tree, with DataLad-specific enhancements.

It can serve the following purposes:

  1. Glorified 'tree' command

  2. Dataset discovery

  3. Programmatic directory traversal

Glorified 'tree' command

The rendered command output uses 'tree'-style visualization:

/tmp/mydir
├── [DS~0] ds_A/
│   └── [DS~1] subds_A/
└── [DS~0] ds_B/
    ├── dir_B/
    │   ├── file.txt
    │   ├── subdir_B/
    │   └── [DS~1] subds_B0/
    └── [DS~1] (not installed) subds_B1/

5 datasets, 2 directories, 1 file

Dataset paths are prefixed by a marker indicating subdataset hierarchy level, like [DS~1]. This is the absolute subdataset level, meaning it may also take into account superdatasets located above the tree root and thus not included in the output. If a subdataset is registered but not installed (such as after a non-recursive datalad clone), it will be prefixed by (not installed). Only DataLad datasets are considered, not pure git/git-annex repositories.

The 'report line' at the bottom of the output shows the count of displayed datasets, in addition to the count of directories and files. In this context, datasets and directories are mutually exclusive categories.

By default, only directories (no files) are included in the tree, and hidden directories are skipped. Both behaviours can be changed using command options.

Symbolic links are always followed. This means that a symlink pointing to a directory is traversed and counted as a directory (unless it potentially creates a loop in the tree).

Dataset discovery

Using the recursive or recursion_limit option, this command generates the layout of dataset hierarchies based on subdataset nesting level, regardless of their location in the filesystem.

In this case, tree depth is determined by subdataset depth. This mode is thus suited for discovering available datasets when their location is not known in advance.

By default, only datasets are listed, without their contents. If depth is specified additionally, the contents of each dataset will be included up to depth directory levels (excluding subdirectories that are themselves datasets).

Tree filtering options such as include_hidden only affect which directories are reported as dataset contents, not which directories are traversed to find datasets.

Performance note: since no assumption is made on the location of datasets, running this command with the recursive or recursion_limit option does a full scan of the whole directory tree. As such, it can be significantly slower than a call with an equivalent output that uses depth to limit the tree instead.

Programmatic directory traversal

The command yields a result record for each tree node (dataset, directory or file). The following properties are reported, where available:

"path"

Absolute path of the tree node

"type"

Type of tree node: "dataset", "directory" or "file"

"depth"

Directory depth of node relative to the tree root

"exhausted_levels"

Depth levels for which no nodes are left to be generated (the respective subtrees have been 'exhausted')

"count"

Dict with cumulative counts of datasets, directories and files in the tree up until the current node. File count is only included if the command is run with the include_files option.

"dataset_depth"

Subdataset depth level relative to the tree root. Only included for node type "dataset".

"dataset_abs_depth"

Absolute subdataset depth level. Only included for node type "dataset".

"dataset_is_installed"

Whether the registered subdataset is installed. Only included for node type "dataset".

"symlink_target"

If the tree node is a symlink, the path to the link target

"is_broken_symlink"

If the tree node is a symlink, whether it is a broken symlink

Examples

Show up to 3 levels of subdirectories below the current directory, including files and hidden contents:

> tree(depth=3, include_files=True, include_hidden=True)

Find all top-level datasets located anywhere under /tmp:

> tree('/tmp', recursion_limit=0)

Report all subdatasets recursively and their directory contents, up to 1 subdirectory deep within each dataset:

> tree(recursive=True, depth=1)
Parameters:
  • path -- path to directory from which to generate the tree. Defaults to the current directory. [Default: '.']

  • depth -- limit the tree to maximum level of subdirectories. If not specified, will generate the full tree with no depth constraint. If paired with recursive or recursion_limit, refers to the maximum directory level to output below each dataset. [Default: None]

  • recursive (bool, optional) -- produce a dataset tree of the full hierarchy of nested subdatasets. Note: may have slow performance on large directory trees. [Default: False]

  • recursion_limit -- limit the dataset tree to maximum level of nested subdatasets. 0 means include only top-level datasets, 1 means top-level datasets and their immediate subdatasets, etc. Note: may have slow performance on large directory trees. [Default: None]

  • include_files (bool, optional) -- include files in the tree. [Default: False]

  • include_hidden (bool, optional) -- include hidden files/directories in the tree. This option does not affect which directories will be searched for datasets when specifying recursive or recursion_limit. For example, datasets located underneath the hidden folder .datalad will be reported even if include_hidden is omitted. [Default: False]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) -- behavior to perform on failure: 'ignore' any failure is reported, but does not cause an exception; 'continue' if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; 'stop': processing will stop on first failure and an exception is raised. A failure is any result with status 'impossible' or 'error'. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: 'continue']

  • result_filter (callable or None, optional) -- if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable's return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer -- select rendering mode command results. 'tailored' enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the 'generic' result renderer; 'generic' renders each result in one line with key info like action, status, path, and an optional message); 'json' a complete JSON line serialization of the full result record; 'json_pp' like 'json', but pretty-printed spanning multiple lines; 'disabled' turns off result rendering entirely; '<template>' reports any value(s) of any result properties in any format indicated by the template (e.g. '{path}', compare with JSON output for all key-value choices). The template syntax follows the Python "format() language". It is possible to report individual dictionary values, e.g. '{metadata[name]}'. If a 2nd-level key contains a colon, e.g. 'music:Genre', ':' must be substituted by '#' in the template, like so: '{metadata[music#Genre]}'. [Default: 'tailored']

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) -- if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) -- return value behavior switch. If 'item-or-list' a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: 'list']