datalad tree

Synopsis

datalad tree [-h] [-L DEPTH] [-r] [-R LEVELS] [--include-files] [--include-hidden] [--version] [path]

Description

Visualize directory and dataset hierarchies

This command mimics the UNIX/MS-DOS 'tree' utility to generate and display a directory tree, with DataLad-specific enhancements.

It can serve the following purposes:

  1. Glorified 'tree' command

  2. Dataset discovery

  3. Programmatic directory traversal

Glorified 'tree' command

The rendered command output uses 'tree'-style visualization:

/tmp/mydir
├── [DS~0] ds_A/
│   └── [DS~1] subds_A/
└── [DS~0] ds_B/
    ├── dir_B/
    │   ├── file.txt
    │   ├── subdir_B/
    │   └── [DS~1] subds_B0/
    └── [DS~1] (not installed) subds_B1/

5 datasets, 2 directories, 1 file

Dataset paths are prefixed by a marker indicating subdataset hierarchy level, like [DS~1]. This is the absolute subdataset level, meaning it may also take into account superdatasets located above the tree root and thus not included in the output. If a subdataset is registered but not installed (such as after a non-recursive datalad clone), it will be prefixed by (not installed). Only DataLad datasets are considered, not pure git/git-annex repositories.

The 'report line' at the bottom of the output shows the count of displayed datasets, in addition to the count of directories and files. In this context, datasets and directories are mutually exclusive categories.

By default, only directories (no files) are included in the tree, and hidden directories are skipped. Both behaviours can be changed using command options.

Symbolic links are always followed. This means that a symlink pointing to a directory is traversed and counted as a directory (unless it potentially creates a loop in the tree).

Dataset discovery

Using the --recursive or --recursion-limit option, this command generates the layout of dataset hierarchies based on subdataset nesting level, regardless of their location in the filesystem.

In this case, tree depth is determined by subdataset depth. This mode is thus suited for discovering available datasets when their location is not known in advance.

By default, only datasets are listed, without their contents. If --depth is specified additionally, the contents of each dataset will be included up to --depth directory levels (excluding subdirectories that are themselves datasets).

Tree filtering options such as --include-hidden only affect which directories are reported as dataset contents, not which directories are traversed to find datasets.

Performance note: since no assumption is made on the location of datasets, running this command with the --recursive or --recursion-limit option does a full scan of the whole directory tree. As such, it can be significantly slower than a call with an equivalent output that uses --depth to limit the tree instead.

Programmatic directory traversal

The command yields a result record for each tree node (dataset, directory or file). The following properties are reported, where available:

"path"

Absolute path of the tree node

"type"

Type of tree node: "dataset", "directory" or "file"

"depth"

Directory depth of node relative to the tree root

"exhausted_levels"

Depth levels for which no nodes are left to be generated (the respective subtrees have been 'exhausted')

"count"

Dict with cumulative counts of datasets, directories and files in the tree up until the current node. File count is only included if the command is run with the --include-files option.

"dataset_depth"

Subdataset depth level relative to the tree root. Only included for node type "dataset".

"dataset_abs_depth"

Absolute subdataset depth level. Only included for node type "dataset".

"dataset_is_installed"

Whether the registered subdataset is installed. Only included for node type "dataset".

"symlink_target"

If the tree node is a symlink, the path to the link target

"is_broken_symlink"

If the tree node is a symlink, whether it is a broken symlink

Examples

Show up to 3 levels of subdirectories below the current directory, including files and hidden contents:

% datalad tree -L 3 --include-files --include-hidden

Find all top-level datasets located anywhere under /tmp:

% datalad tree /tmp -R 0

Report all subdatasets recursively and their directory contents, up to 1 subdirectory deep within each dataset:

% datalad tree -r -L 1

Options

path

path to directory from which to generate the tree. Defaults to the current directory. [Default: '.']

-h, --help, --help-np

show this help message. --help-np forcefully disables the use of a pager for displaying the help message

-L DEPTH, --depth DEPTH

limit the tree to maximum level of subdirectories. If not specified, will generate the full tree with no depth constraint. If paired with --recursive or --recursion-limit, refers to the maximum directory level to output below each dataset.

-r, --recursive

produce a dataset tree of the full hierarchy of nested subdatasets. Note: may have slow performance on large directory trees.

-R LEVELS, --recursion-limit LEVELS

limit the dataset tree to maximum level of nested subdatasets. 0 means include only top-level datasets, 1 means top-level datasets and their immediate subdatasets, etc. Note: may have slow performance on large directory trees.

--include-files

include files in the tree.

--include-hidden

include hidden files/directories in the tree. This option does not affect which directories will be searched for datasets when specifying --recursive or --recursion-limit. For example, datasets located underneath the hidden folder .datalad will be reported even if --include-hidden is omitted.

--version

show the module and its version which provides the command

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.