datalad install
Synopsis
datalad install [-h] [-s URL-OR-PATH] [-d DATASET] [-g] [-D DESCRIPTION] [-r] [-R
LEVELS] [--reckless [auto|ephemeral|shared-...]] [-J NJOBS]
[--branch BRANCH] [--version] [URL-OR-PATH ...]
Description
Install one or many datasets from remote URL(s) or local PATH source(s).
This command creates local sibling(s) of existing dataset(s) from (remote) locations specified as URL(s) or path(s). Optional recursion into potential subdatasets, and download of all referenced data is supported. The new dataset(s) can be optionally registered in an existing superdataset by identifying it via the DATASET argument (the new dataset’s path needs to be located within the superdataset for that).
If no explicit -s|–source option is specified, then all positional URL-OR-PATH arguments are considered to be “sources” if they are URLs or target locations if they are paths. If a target location path corresponds to a submodule, the source location for it is figured out from its record in the .gitmodules. If -s|–source is specified, then a single optional positional PATH would be taken as the destination path for that dataset.
It is possible to provide a brief description to label the dataset’s nature and location, e.g. “Michael’s music on black laptop”. This helps humans to identify data locations in distributed scenarios. By default an identifier comprised of user and machine name, plus path will be generated.
When only partial dataset content shall be obtained, it is recommended to use this command without the get-data flag, followed by a get operation to obtain the desired data.
- NOTE
Power-user info: This command uses git clone, and git annex init to prepare the dataset. Registering to a superdataset is performed via a git submodule add operation in the discovered superdataset.
Examples
Install a dataset from GitHub into the current directory:
% datalad install https://github.com/datalad-datasets/longnow-podcasts.git
Install a dataset as a subdataset into the current dataset:
% datalad install -d . \
--source='https://github.com/datalad-datasets/longnow-podcasts.git'
Install a dataset into ‘podcasts’ (not ‘longnow-podcasts’) directory, and get all content right away:
% datalad install --get-data \
-s https://github.com/datalad-datasets/longnow-podcasts.git podcasts
Install a dataset with all its subdatasets:
% datalad install -r \
https://github.com/datalad-datasets/longnow-podcasts.git
Options
URL-OR-PATH
path/name of the installation target. If no PATH is provided a destination path will be derived from a source URL similar to git clone.
-h, --help, --help-np
show this help message. –help-np forcefully disables the use of a pager for displaying the help message
-s URL-OR-PATH, --source URL-OR-PATH
URL or local path of the installation source. Constraints: value must be a string or value must be NONE
-d DATASET, --dataset DATASET
specify the dataset to perform the install operation on. If no dataset is given, an attempt is made to identify the dataset in a parent directory of the current working directory and/or the PATH given. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE
-g, --get-data
if given, obtain all data content too.
-D DESCRIPTION, --description DESCRIPTION
short description to use for a dataset location. Its primary purpose is to help humans to identify a dataset copy (e.g., “mike’s dataset on lab server”). Note that when a dataset is published, this information becomes available on the remote side. Constraints: value must be a string or value must be NONE
-r, --recursive
if set, recurse into potential subdatasets.
-R LEVELS, --recursion-limit LEVELS
limit recursion into subdatasets to the given number of levels. Constraints: value must be convertible to type ‘int’ or value must be NONE
-J NJOBS, --jobs NJOBS
how many parallel jobs (where possible) to use. “auto” corresponds to the number defined by ‘datalad.runtime.max-annex-jobs’ configuration item NOTE: This option can only parallelize input retrieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. Constraints: value must be convertible to type ‘int’ or value must be NONE or value must be one of (‘auto’,) [Default: ‘auto’]
--branch BRANCH
Clone source at this branch or tag. This option applies only to the top-level dataset not any subdatasets that may be cloned when installing recursively. Note that if the source is a RIA URL with a version, it takes precedence over this option. Constraints: value must be a string or value must be NONE
--version
show the module and its version which provides the command