datalad clone


datalad clone [-h] [-d DATASET] [-D DESCRIPTION] [--reckless [{auto}]] SOURCE [PATH]


Obtain a dataset (copy) from a URL or local directory

The purpose of this command is to obtain a new clone (copy) of a dataset and place it into a not-yet-existing or empty directory. As such CLONE provides a strict subset of the functionality offered by INSTALL. Only a single dataset can be obtained, and immediate recursive installation of subdatasets is not supported. However, once a (super)dataset is installed via CLONE, any content, including subdatasets can be obtained by a subsequent GET command.

Primary differences over a direct git clone call are 1) the automatic initialization of a dataset annex (pure Git repositories are equally supported); 2) automatic registration of the newly obtained dataset as a subdataset (submodule), if a parent dataset is specified; and 3) support for additional resource identifiers (DataLad resource identifiers as used on, and RIA store URLs as used for; see examples); and 4) automatic configurable generation of alternative access URL for common cases (such as appending ‘.git’ to the URL in case the accessing the base URL failed).

More information on Remote Indexed Archive (RIA) stores


Install a dataset from Github into the current directory:

% datalad clone

Install a dataset into a specific directory:

% datalad clone myfavpodcasts

Install a dataset as a subdataset into the current dataset:

% datalad clone -d .

Install the main superdataset from

% datalad clone ///

Install a dataset identified by its ID from

% datalad clone ria+



URL, DataLad resource identifier, local path or instance of dataset to be cloned. Constraints: value must be a string


path to clone into. If no PATH is provided a destination path will be derived from a source URL similar to git clone.

-h, --help, --help-np

show this help message. –help-np forcefully disables the use of a pager for displaying the help message

-d DATASET, --dataset DATASET

(parent) dataset to clone into. If given, the newly cloned dataset is registered as a subdataset of the parent. Also, if given, relative paths are interpreted as being relative to the parent dataset, and not relative to the working directory. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path)


short description to use for a dataset location. Its primary purpose is to help humans to identify a dataset copy (e.g., “mike’s dataset on lab server”). Note that when a dataset is published, this information becomes available on the remote side. Constraints: value must be a string

--reckless [{auto}]

Set up the dataset to be able to obtain content in the cheapest/fastest possible way, even if this poses a potential risk the data integrity (e.g. hardlink files from a local clone of the dataset). Use with care, and limit to “read-only” use cases. With this flag the installed dataset will be marked as untrusted. The reckless mode is stored in a dataset’s local configuration under ‘datalad.clone.reckless’, and will be inherited to any of its subdatasets. Constraints: value must be one of (None, True, False, ‘auto’)


datalad is developed by The DataLad Team and Contributors <>.