When there isn’t anything more convenient¶
Unless system packages are available for your operating system (see below), DataLad can be installed via pip (Pip Installs Python). To automatically install datalad with all its Python dependencies type:
pip install datalad
In addition, it is necessary to have a current version of git-annex installed which is not set up automatically by using the pip method.
If you do not have admin powers…
pip supports installation into a user’s home directory with
Git-annex can be deployed by extracting pre-built binaries from a tarball
(that also includes an up-to-date Git installation). Obtain the tarball, extract it, and
PATH environment variable to include the root of the
extracted tarball. Fingers crossed and good luck!
Advanced users can chose from several installation schemes (e.g.
pip install datalad[SCHEME]
SCHEME could be
crawlto also install scrapy which is used in some crawling constructs
teststo also install dependencies used by unit-tests battery of the datalad
fullto install all dependencies
(Neuro)Debian, Ubuntu, and similar systems¶
For Debian-based operating systems the most convenient installation method is to enable the NeuroDebian repository. The following command installs datalad and all its software dependencies (including the git-annex-standalone package):
sudo apt-get install datalad
A simple way to get things installed is the homebrew package manager, which in itself is fairly easy to install. Git-annex is installed by the command:
brew install git-annex
Once Git-annex is available, datalad can be installed via
pip as described
pip comes with Python distributions like anaconda.
HPC environments or any system with singularity installed¶
If you want to use DataLad in a high-performance computing (HPC) environment, such as a computer cluster, or a similar multi-user machine, where you don’t have admin privileges, chances are that Singularity is installed. Even if it isn’t installed, singularity helps you make a solid case why your admin might want to install it.
On any system with Singularity installed, you can pull a container with a full installation of DataLad (~300 MB) straight from Singularity Hub. The following command pulls the latest container for the DataLad development version (check on Singularity Hub for alternative container variants):
singularity pull shub://datalad/datalad:fullmaster
This will produce an executable image file. You can rename this image to
and put the directory it is located in into your
PATH environment variable.
From there on, you will have a
datalad command in the commandline that transparently
executes all DataLad functionality in the container.
With Singularity version 2.4.2 you can choose the image name directly in the download command:
singularity pull --name datalad shub://datalad/datalad:fullmaster
DataLad can be queried for information about known datasets. Doing a first search query, datalad automatically offers assistence to obtain a superdataset first. The superdataset is a lightweight container that contains meta information about known datasets but does not contain actual data itself.
For example, we might want to look for dataset thats were funded by, or acknowledge the US National Science Foundation (NSF):
~ % datalad search NSF No DataLad dataset found at current location Would you like to install the DataLad superdataset at '~/datalad'? (choices: yes, no): yes 2016-10-24 09:13:32,414 [INFO ] Installing dataset at ~/datalad from http://datasets.datalad.org/ From now on you can refer to this dataset using the label '///' 2016-10-24 09:13:39,072 [INFO ] Performing search using DataLad superdataset '~/datalad' 2016-10-24 09:13:39,086 [INFO ] Loading and caching local meta-data... might take a few seconds ~/datalad/openfmri/ds000001 ~/datalad/openfmri/ds000002 ~/datalad/openfmri/ds000003 ...
Any known dataset can now be installed inside the local superdataset with a command like this:
datalad install ///openfmri/ds000002