DataLad purposefully uses a terminology that is different from the one used by its technological foundations Git and git-annex. This glossary provides definitions for terms used in the datalad documentation and API, and relates them to the corresponding Git/git-annex concepts.
- Extension to a Git repository, provided and managed by git-annex as means to track and distribute large (and small) files without having to inject them directly into a Git repository (which would slow Git operations significantly and impair handling of such repositories in general).
- DataLad extension
- A Python package, developed outside of the core DataLad codebase, which (when installed) typically either provides additional top level datalad commands and/or additional metadata extractors. Visit Handbook, Ch.2. DataLad’s extensions for a representative list of extensions and instructions on how to install them.
- A regular Git repository with an (optional) annex.
- A dataset (location) that is related to a particular dataset, by sharing content and history. In Git terminology, this is a clone of a dataset that is configured as a remote.
- A dataset that is part of another dataset, by means of being tracked as a Git submodule. As such, a subdataset is also a complete dataset and not different from a standalone dataset.
- A dataset that contains at least one subdataset.