Getting started

Installation

When there isn’t anything more convenient

Unless system packages are available for your operating system (see below), DataLad can be installed via pip (Pip Installs Python). To automatically install datalad with all its Python dependencies type:

pip install datalad

In addition, it is necessary to have a current version of git-annex installed which is not set up automatically by using the pip method.

Note

If you do not have admin powers…

pip supports installation into a user’s home directory with --user. Git-annex can be deployed by extracting pre-built binaries from a tarball (that also includes an up-to-date Git installation). Obtain the tarball, extract it, and set the PATH environment variable to include the root of the extracted tarball. Fingers crossed and good luck!

Advanced users can chose from several installation schemes (e.g. publish, metadata, tests or crawl):

pip install datalad[SCHEME]

where SCHEME could be

  • crawl to also install scrapy which is used in some crawling constructs
  • tests to also install dependencies used by unit-tests battery of the datalad
  • full to install all dependencies

(Neuro)Debian, Ubuntu, and similar systems

For Debian-based operating systems the most convenient installation method is to enable the NeuroDebian repository. The following command installs datalad and all its software dependencies (including the git-annex-standalone package):

sudo apt-get install datalad

MacOSX

A simple way to get things installed is the homebrew package manager, which in itself is fairly easy to install. Git-annex is installed by the command:

brew install git-annex

Once Git-annex is available, datalad can be installed via pip as described above. pip comes with Python distributions like anaconda.

HPC environments or any system with singularity installed

If you want to use DataLad in a high-performance computing (HPC) environment, such as a computer cluster, or a similar multi-user machine, where you don’t have admin privileges, chances are that Singularity is installed. Even if it isn’t installed, singularity helps you make a solid case why your admin might want to install it.

On any system with Singularity installed, you can pull a container with a full installation of DataLad (~300 MB) straight from Singularity Hub. The following command pulls the latest container for the DataLad development version (check on Singularity Hub for alternative container variants):

singularity pull shub://datalad/datalad:fullmaster

This will produce an executable image file. You can rename this image to datalad, and put the directory it is located in into your PATH environment variable. From there on, you will have a datalad command in the commandline that transparently executes all DataLad functionality in the container.

With Singularity version 2.4.2 you can choose the image name directly in the download command:

singularity pull --name datalad shub://datalad/datalad:fullmaster

First steps

DataLad can be queried for information about known datasets. Doing a first search query, datalad automatically offers assistence to obtain a superdataset first. The superdataset is a lightweight container that contains meta information about known datasets but does not contain actual data itself.

For example, we might want to look for dataset thats were funded by, or acknowledge the US National Science Foundation (NSF):

~ % datalad search NSF
No DataLad dataset found at current location
Would you like to install the DataLad superdataset at '~/datalad'? (choices: yes, no): yes
2016-10-24 09:13:32,414 [INFO   ] Installing dataset at ~/datalad from http://datasets.datalad.org/
From now on you can refer to this dataset using the label '///'
2016-10-24 09:13:39,072 [INFO   ] Performing search using DataLad superdataset '~/datalad'
2016-10-24 09:13:39,086 [INFO   ] Loading and caching local meta-data... might take a few seconds
~/datalad/openfmri/ds000001
~/datalad/openfmri/ds000002
~/datalad/openfmri/ds000003
...

Any known dataset can now be installed inside the local superdataset with a command like this:

datalad install ///openfmri/ds000002

Now, have a look at the demos on the DataLad website, some common data management scenarios, and a bit of background info on the fundamental concepts the DataLad API(s) are built on.