DataLad XNAT: Track and retrieve XNAT projects with DataLad

This is documentation for the DataLad extension, datalad-xnat, that equips DataLad with additional functionality to work with XNAT servers. Use it to [COMPLETE ME]

_images/git-annex-xnat-logo.png

The extension was created during the Juelich Brain Hackathon 2021 and wouldn't have been possible without a dedicated team of volunteers. If you want to get in touch or on board as well, please see our contributing guidelines.

Documentation overview

Introduction

XNAT

XNAT is an open source platform purpose-built for imaging data and data associated with it. In addition to hosting and cataloging the data, XNAT can assist with triggering quality assurance tasks and other workflows.

Goal of the extension

[COMPLETE ME] What did we set out to do?

What can I use this extension for?

[COMPLETE ME] Usecases for the extentions

What can I not use this extension for?

The list of valid use cases for this extension is much shorter than the list of invalid use cases. You for example will not be able to open a bottle of your favourite beverage with it (sorry). But here is a list of invalid use cases that may be most relevant to know about:

  • You can not use this extension as a replacement for an XNAT instance. It requires access to one, and tracks the data available on this instance.

  • You should not use this extension to share data that you retrieved from an XNAT server publicly. While DataLad datasets are ideal for sharing large amounts of data publicly, the data from an XNAT server is usually not to be shared publicly - for example due to, but not limited to, privacy concerns. Please think twice before you attempt to do this.

Quickstart

Requirements

DataLad and datalad-xnat are available for all major operating systems (Linux, MacOS, Windows 10 [1]). The relevant requirements are listed below.

An XNAT account with appropriate permissions

You need access to an XNAT server to able to interact with it, and appropriate permissions to access the projects you are interested in. Keep your XNAT instance's URL and your user name and password to your account close by.

DataLad

If you don't have DataLad and its underlying tools (git, git-annex) installed yet, please follow the instructions from the datalad handbook.

Installation

datalad-xnat is a Python package available on pypi and installable via pip.

# create and enter a new virtual environment (optional)
$ virtualenv --python=python3 ~/env/dl-xnat
$ . ~/env/dl-xnat/bin/activate
# install from PyPi
$ pip install datalad-xnat

Getting started

Here's the gist of some of this extension's functionality. Checkout the Tutorial for more detailed demonstrations.

Start by creating and initializing a new DataLad dataset to track a specific XNAT project. This example uses the XNAT central instance with anonymous credentials for the project DCMPHANTOM.

$ datalad create dcm_phantom
$ cd dcm_phantom
$ datalad xnat-init https://central.xnat.org --credential anonymous --project DCMPHANTOM

After initialization, run xnat-update to download all files for the project.

$ datalad xnat-update --credential anonymous --subject CENTRAL_S01742

HELP! I'm new to this!

If you are confused about the words DataLad dataset, please head over to the DataLad Handbook for an introduction to DataLad.

If you are confused about the words project, experiment, or session in the context of XNAT, take a look at the Glossary or in the XNAT documentation.

Footnotes

Tutorial

Authentication

The authentication process

Typical interactions with an XNAT instance require a user name and a password. When you initialize a project using datalad xnat-init for a given XNAT URL you will thus be prompted to supply those credentials in the command line:

$ datalad xnat-init <myxnatinstance>
  You need to authenticate with '<myxnatinstance>' credentials. <myxnatinstance>/app/template/Register.vm provides information on how to gain access
  user: <myusername>
  password: <mypassword>
  password (repeat): <mypassword>

Afterwards, these credentials are stored internally in your systems keyring under the credential name datalad-<myxnatinstance>, and subsequent interactions to this XNAT instance will authenticate automatically using the stored credentials.

Multiple different credentials

If you have multiple XNAT instances with different user names or passwords you want to authenticate against, the name of the credential should automatically authenticate you with the correct user password combination based on the XNAT URL. If you nevertheless want to enfore a specific credential to be used, you can supply the --credential <name> parameter to xnat-init. If <name> matches an existing credential in your keyring, the given credential will be used for authentication. If <name> does not match an existing credential, you will be prompted for user name and password, and the supplied credentials will be saved under the <name> you specified.

Authenticating as anonymous

Typical interactions with an XNAT instance require a user name and a password. Some XNAT instances, however, allow anonymous access, such as XNAT central. In order to authenticate as an anonymous user, supply the special value anonymous to the --credential parameter.

$ datalad xnat-init --credential anonymous https://central.xnat.org
  [INFO   ] Querying https://central.xnat.org for projects available to user anonymous
  No project name specified. The following projects are available on https://central.xnat.org for user anonymous:
  [...]
If things go wrong during authentication

Unauthorized Errors: If the authentication process fails, datalad xnat-init will throw an error:

xnat_init(error): . (dataset) [Request to XNAT server failed: Unauthorized]

In this case, read on in the last paragraph on how to update your credential.

Faulty XNAT URLs: If the provided XNAT URL is fault, and can be appropriately reached, you may see an error like this:

xnat_init(error): . (dataset) [During authentication the XNAT server sent MissingSchema(Invalid URL 'myxnatinstance/data/JSESSION': No schema supplied. Perhaps you meant http://<wrongurl>/data/JSESSION?)]

In this case, double check the URL you provided. Open an issue if you need help or think that you found a bug.

Updating credentials

"Oh no, I accidentally mistyped my password!" If you supplied wrong credentials, or previously working credentials expired and stopped working, you can re-enter new credentials with the configuration datalad.credentials.force-ask=1:

$ datalad -c datalad.credentials.force-ask=1 xnat-init <url>
You need to authenticate with [...]
user: <user>
password: <password>

Alternatively, find your system's secure Keyring (your systems credential store) and remove or replace your password in there.

Tracking a project

Internals: Understanding xnat-init

Understanding the dataset configurations

One of the main functions of the datalad xnat-init command is to create a dataset-internal configuration with information about the XNAT instance, the directory structure for downloaded files, and the project to track. This configuration is what determines the looks and feel of the final dataset, in particular the presence and location of subdatasets, and the imaging files you will be able to retrieve afterwards.

The individual components of these configurations are spread over two different places in your dataset:

  1. a DataLad configuration within .datalad/config

  2. a provider configuration within .datalad/providers/

Both of these configurations are dataset-specific, i.e., configure only the behavior of this particular dataset, not other datasets you may have on your system.

The provider configuration

The first configuration created by datalad xnat-init is a so-called "provider configuration". Provider configurations are small, plain text files that configure how a specific service provider shall be accessed. You can find general information on them in the DataLad handbook.

The provider configuration created by datalad xnat-init lives in .datalad/providers/xnat-<name>.cfg, where name is a placeholder for an arbitrary identifier. The default name is .datalad/providers/xnat-default.cfg, but in case of datasets that track projects from multiple different XNAT instances the identifier allows to differentiate between them.

We can take a look into an exemplary configuration file:

[provider:xnat-default]
url_re = https://xnat.kube.fz-juelich.de/.*
credential = xnat.kube.fz-juelich.de
authentication_type = http_basic_auth

[credential:xnat.kube.fz-juelich.de]
type = user_password

It specifies the URL to the XNAT instance supplied during datalad xnat-init, and determines the credential (such as authentication with a user name and a password) and authentication (such HTTP Basic Authentication) type.

The DataLad configuration

The second configuration created by datalad xnat-init is a DataLad configuration that configures the dataset to use a given provider configuration. It is included in the .datalad/config file and looks like this:

[datalad "xnat.default"]
        url = https://xnat.kube.fz-juelich.de
        project = phantoms
        path = {subject}/{session}/{scan}/
        credential-name = xnat.kube.fz-juelich.de

Note how it identifies a provider configuration via its <name>, and how it includes the configuration about which XNAT project to track.

Contributing

If you have any questions, comments, bug fixes or improvement suggestions, feel free to contact us via our Github page. Before contributing, be sure to read the contributing guidelines.

Acknowledgments

DataLad development is being performed as part of a US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1912266) and the German Federal Ministry of Education and Research (BMBF 01GQ1905). Additional support is provided by the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform.

DataLad is built atop the git-annex software that is being developed and maintained by Joey Hess.

The extension was created during the Juelich Brain Hackathon 2021 and wouldn't have been possible without a dedicated team of volunteers.

Glossary

credentials

Something used to authenticate to a server, typically a username and a password. Datalad-xnat may prompt you for your credentials or use ones which have been previously saved (in datalad configuration or system keyring, see datalad docs on credential management). Some XNAT servers allow anonymous access (without checking credentials).

experiment

In XNAT terms, an experiment is an event by which data is acquired. This data can be imaging data, or non-imaging data. It exists within the context of a project, but can be registered into multiple projects. Most experiments will be imaging sessions.

project

A project is used to define a collection of data stored in XNAT. These often correlate directly to an IRB approved study, or a multi-site data acquisition program. Within XNAT, the project is used to define a security structure for data. Users are given certain permissions for data within certain projects -- thus, as a user you may not have permissions for all projects on a given XNAT instance.

session

In XNAT, an image session is a specific kind of an experiment which contains image data. A session groups together multiple scans, where a scan corresponds to a DICOM series or BIDS data acquisition / run.

subdataset

A DataLad dataset contained within a different DataLad dataset (the parent or DataLad superdataset).

subject

A subject is anyone who participates in a study, and exist within the context of a project. Subjects can be registered in multiple projects (e.g., to capture longitudinal data from various studies).

superdataset

A DataLad dataset that contains one or more levels of other DataLad datasets (DataLad subdatasets).

XNAT

XNAT is an open source imaging informatics platform developed by the Neuroinformatics Research Group at Washington University. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data. Imaging centers can operate an XNAT instance to manage their imaging acquisitions. Typically, they require a user name and password to gain access.

Walk-through connectomeDB

This walk-through tutorial shows how to obtain the subjects in a project of the ConnecotmeDB It is assumed that you have a working installation of datalad-xnat and you have accepted the data user agreement in connectomeDB.

Create a Datalad dataset. Here we'll call the dataset hcp

datalad create hcp

Move into the hcp dataset.

cd hcp

Initialize XNAT to track ConnectomeDB, and list all the projects in ConnectomeDB.We will use the project WU_L1A_Subj.

datalad xnat-init https://db.humanconnectome.org --project WU_L1A_Subj

If a dataset was already initialized before, you will need to force the initialization.

datalad xnat-init https://db.humanconnectome.org --project MGH_DIFF --force

Obtain all subjects within the project.

datalad xnat-update

Download all data that belongs to a subject, here subject ConnectomeDB_S01439. Make sure that you have enough free space on your disk.

datalad xnat-update -s ConnectomeDB_S01439

Now the data should start to download.

Command line reference

datalad-xnat has three main commands: xnat-init for for configuring a dataset to track XNAT projects, xnat-update for updating and retrieving files from tracked XNAT projects, and xnat-query-files for for querying available files on an XNAT server. Find out more about each command below.

datalad xnat-init

Synopsis
datalad xnat-init [-h] [-F PATHFMT] [-p ID] [-s ID] [-e ID] [-c LABEL] [--credential NAME] [-f] [--interactive] [-d DATASET] [--version] url
Description

Initialize an existing dataset to track an XNAT project

Examples

Initialize a dataset in the current directory:

% datalad xnat-init http://central.xnat.org:8080

Initialize with anonymous access (no credentials used):

% datalad xnat-init https://central.xnat.org --credential anonymous

Use credentials previously stored as <NAME>:

% datalad xnat-init https://central.xnat.org --credential <NAME>

Track a specific XNAT project, without credentials:

% datalad xnat-init https://central.xnat.org --project Sample_DICOM --credential anonymous
Options
url

XNAT instance URL.

-h, --help, --help-np

show this help message. --help-np forcefully disables the use of a pager for displaying the help message

-F PATHFMT, --pathfmt PATHFMT

Specify the directory structure for the downloaded files, and if/where a subdataset should be created. The format string must use POSIX notation and must end with a slash ('/'). To include the subject, session, or scan values, use the following format: {subject}/{session}/{scan}/ To insert a subdataset at a specific directory level use '//': {subject}/{session}//{scan}/. [Default: '{subject}/{session}/{scan}/']

-p ID, --project ID

accession ID of a single XNAT project to track.

-s ID, --subject ID

accession ID of a single subject to track.

-e ID, --experiment ID

accession ID of a single experiment to track.

-c LABEL, --collection LABEL

limit updates to a specific collection/resource. Can be given multiple times.

--credential NAME

name of the credential providing a user/password combination to be used for authentication. The special value 'anonymous' will cause no credentials to be used, and all XNAT requests to be performed anonymously. The credential can be supplied via configuration settings 'datalad.credential.<name>.{user|password}', or environment variables DATALAD_CREDENTIAL_<NAME>_{USER|PASSWORD}, or will be queried from the active credential store using the provided name. If none is provided, the host-part of the XNAT URL is used as a name (e.g. 'https://central.xnat.org' -> 'central.xnat.org'). Constraints: value must be a string or value must be NONE

-f, --force

force (re-)initialization.

--interactive

enables interactive configuration based on XNAT queries. Default: enabled in interactive sessions.

-d DATASET, --dataset DATASET

specify the dataset to perform the initialization on. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

--version

show the module and its version which provides the command

Authors

datalad is developed by Michael Hanke <michael.hanke@gmail.com>.

datalad xnat-query-files

Synopsis
datalad xnat-query-files [-h] [-p ID] [-e ID] [-s ID] [--credential NAME] [--version] url
Description

Query an XNAT server for projects, or an XNAT project for subjects

Use this command to get a list of available projects at an XNAT instance for a given URL, or to get a list of subjects inside a specific project at the given XNAT instance.

Examples

Get a list of projects for a given XNAT instance::

% datalad xnat-query http://central.xnat.org:8080

Get a list of subject for a given XNAT project::

% datalad xnat-query http://central.xnat.org:8080 -p myproject
Options
url

XNAT instance URL to query.

-h, --help, --help-np

show this help message. --help-np forcefully disables the use of a pager for displaying the help message

-p ID, --project ID

accession ID of a single XNAT project to track.

-e ID, --experiment ID

accession ID of a single experiment to track.

-s ID, --subject ID

accession ID of a single subject to track.

--credential NAME

name of the credential providing a user/password combination to be used for authentication. The special value 'anonymous' will cause no credentials to be used, and all XNAT requests to be performed anonymously. The credential can be supplied via configuration settings 'datalad.credential.<name>.{user|password}', or environment variables DATALAD_CREDENTIAL_<NAME>_{USER|PASSWORD}, or will be queried from the active credential store using the provided name. If none is provided, the host-part of the XNAT URL is used as a name (e.g. 'https://central.xnat.org' -> 'central.xnat.org'). Constraints: value must be a string or value must be NONE

--version

show the module and its version which provides the command

Authors

datalad is developed by Michael Hanke <michael.hanke@gmail.com>.

datalad xnat-update

Synopsis
datalad xnat-update [-h] [-p ID] [-s ID] [-e ID] [-c LABEL] [--credential NAME] [-f] [--reckless fast] [--ifexists {overwrite|skip}] [-J NJOBS] [-d DATASET] [--version]
Description

Update files for a subject(s) of an XNAT project.

This command expects an xnat-init initialized DataLad dataset. The dataset may or may not have existing content already.

Options
-h, --help, --help-np

show this help message. --help-np forcefully disables the use of a pager for displaying the help message

-p ID, --project ID

accession ID of a single XNAT project to track.

-s ID, --subject ID

accession ID of a single subject to track.

-e ID, --experiment ID

accession ID of a single experiment to track.

-c LABEL, --collection LABEL

limit updates to a specific collection/resource. Can be given multiple times.

--credential NAME

name of the credential providing a user/password combination to be used for authentication. The special value 'anonymous' will cause no credentials to be used, and all XNAT requests to be performed anonymously. The credential can be supplied via configuration settings 'datalad.credential.<name>.{user|password}', or environment variables DATALAD_CREDENTIAL_<NAME>_{USER|PASSWORD}, or will be queried from the active credential store using the provided name. If none is provided, the host-part of the XNAT URL is used as a name (e.g. 'https://central.xnat.org' -> 'central.xnat.org'). Constraints: value must be a string or value must be NONE

-f, --force

force (re-)building the addurl tables.

--reckless fast

Update the files in a potentially unsafe way. Supported modes are: ["fast"]: No content verification or download. Will only register the urls. Constraints: value must be one of ('fast',)

--ifexists {overwrite|skip}

Flag for addurls. Constraints: value must be one of ('overwrite', 'skip')

-J NJOBS, --jobs NJOBS

how many parallel jobs (where possible) to use. "auto" corresponds to the number defined by 'datalad.runtime.max-annex-jobs' configuration item NOTE: This option can only parallelize input retrieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. Constraints: value must be convertible to type 'int' or value must be NONE or value must be one of ('auto',) [Default: 'auto']

-d DATASET, --dataset DATASET

specify the dataset to perform the update on. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

--version

show the module and its version which provides the command

Authors

datalad is developed by Michael Hanke <michael.hanke@gmail.com>.

Python API

datalad-xnat has three main commands that are exposed as functions via datalad.api and as methods of the Dataset class: xnat_init for configuring a dataset to track XNAT projects, xnat_update for updating and retrieving files from tracked XNAT projects, and xnat_query-files for querying available files on an XNAT server. Find out more about each command below.

xnat_init(url[, pathfmt, project, subject, ...])

Initialize an existing dataset to track an XNAT project

xnat_query_files(url[, project, experiment, ...])

Query an XNAT server for projects, or an XNAT project for subjects

xnat_update([project, subject, experiment, ...])

Update files for a subject(s) of an XNAT project.

Indices and tables