datalad tree
Synopsis
datalad tree [-h] [-L DEPTH] [-r] [-R LEVELS] [--include-files] [--include-hidden] [--version] [path]
Description
Visualize directory and dataset hierarchies
This command mimics the UNIX/MS-DOS 'tree' utility to generate and display a directory tree, with DataLad-specific enhancements.
It can serve the following purposes:
Glorified 'tree' command
Dataset discovery
Programmatic directory traversal
Glorified 'tree' command
The rendered command output uses 'tree'-style visualization:
/tmp/mydir
├── [DS~0] ds_A/
│ └── [DS~1] subds_A/
└── [DS~0] ds_B/
├── dir_B/
│ ├── file.txt
│ ├── subdir_B/
│ └── [DS~1] subds_B0/
└── [DS~1] (not installed) subds_B1/
5 datasets, 2 directories, 1 file
Dataset paths are prefixed by a marker indicating subdataset hierarchy
level, like [DS~1]
.
This is the absolute subdataset level, meaning it may also take into
account superdatasets located above the tree root and thus not included
in the output.
If a subdataset is registered but not installed (such as after a
non-recursive datalad clone
), it will be prefixed by (not
installed)
. Only DataLad datasets are considered, not pure
git/git-annex repositories.
The 'report line' at the bottom of the output shows the count of displayed datasets, in addition to the count of directories and files. In this context, datasets and directories are mutually exclusive categories.
By default, only directories (no files) are included in the tree, and hidden directories are skipped. Both behaviours can be changed using command options.
Symbolic links are always followed. This means that a symlink pointing to a directory is traversed and counted as a directory (unless it potentially creates a loop in the tree).
Dataset discovery
Using the --recursive
or --recursion-limit
option, this command generates the layout of dataset hierarchies based on
subdataset nesting level, regardless of their location in the
filesystem.
In this case, tree depth is determined by subdataset depth. This mode is thus suited for discovering available datasets when their location is not known in advance.
By default, only datasets are listed, without their contents. If
--depth
is specified additionally,
the contents of each dataset will be included up to --depth
directory levels (excluding
subdirectories that are themselves datasets).
Tree filtering options such as --include-hidden
only affect which directories are
reported as dataset contents, not which directories are traversed to find
datasets.
Performance note: since no assumption is made on the location of
datasets, running this command with the --recursive
or --recursion-limit
option does a full scan of the whole directory
tree. As such, it can be significantly slower than a call with an
equivalent output that uses --depth
to
limit the tree instead.
Programmatic directory traversal
The command yields a result record for each tree node (dataset, directory or file). The following properties are reported, where available:
- "path"
Absolute path of the tree node
- "type"
Type of tree node: "dataset", "directory" or "file"
- "depth"
Directory depth of node relative to the tree root
- "exhausted_levels"
Depth levels for which no nodes are left to be generated (the respective subtrees have been 'exhausted')
- "count"
Dict with cumulative counts of datasets, directories and files in the tree up until the current node. File count is only included if the command is run with the
--include-files
option.- "dataset_depth"
Subdataset depth level relative to the tree root. Only included for node type "dataset".
- "dataset_abs_depth"
Absolute subdataset depth level. Only included for node type "dataset".
- "dataset_is_installed"
Whether the registered subdataset is installed. Only included for node type "dataset".
- "symlink_target"
If the tree node is a symlink, the path to the link target
- "is_broken_symlink"
If the tree node is a symlink, whether it is a broken symlink
Examples
Show up to 3 levels of subdirectories below the current directory, including files and hidden contents:
% datalad tree -L 3 --include-files --include-hidden
Find all top-level datasets located anywhere under /tmp
:
% datalad tree /tmp -R 0
Report all subdatasets recursively and their directory contents, up to 1 subdirectory deep within each dataset:
% datalad tree -r -L 1
Options
path
path to directory from which to generate the tree. Defaults to the current directory. [Default: '.']
-h, --help, --help-np
show this help message. --help-np forcefully disables the use of a pager for displaying the help message
-L DEPTH, --depth DEPTH
limit the tree to maximum level of subdirectories. If not specified, will generate the full tree with no depth constraint. If paired with --recursive
or --recursion-limit
, refers to the maximum directory level to output below each dataset.
-r, --recursive
produce a dataset tree of the full hierarchy of nested subdatasets. Note: may have slow performance on large directory trees.
-R LEVELS, --recursion-limit LEVELS
limit the dataset tree to maximum level of nested subdatasets. 0 means include only top-level datasets, 1 means top-level datasets and their immediate subdatasets, etc. Note: may have slow performance on large directory trees.
--include-files
include files in the tree.
--version
show the module and its version which provides the command