Concepts & Terms¶
The extension operates on a single UK Biobank subject, and acts as a wrapper
ukbfetch tool to retrieve, ingest, restructure, and update data.
The first command,
ukb-init, initializes a dataset for a given UK Biobank
participant and data field(s). The second command,
ukb-update updates a
dataset for the initialized subject/data fields.
datalad-ukbiobank allows for only one subject per dataset. Tailored or
comprehensive superdatasets can then be created to link the desired subject
datasets as subdatasets. This structure keeps each dataset lightweight and
promotes parallel downloads.
Data can be viewed in different layouts by checking out layout-specific branches:
- the unextracted archives, as downloaded from the UK Biobank (e.g. zip files)
- the extracted files in the original layout provided by the UK Biobank
- if enabled, the extracted files converted to a BIDS(-like) layout
The required bulk file lists all participant IDs and data field IDs that are
available for download for an approved application. These participant IDs and
data field IDs are then used as input for the
To generate a bulk file, follow the UK Biobank accessing data guide to first download the main dataset and then generate a bulk file. Section 3.2.2 of this document explains how to create modality specific bulk files (e.g. participant IDs for all those with T1 structural brain images).
Once a bulk file is created, it can be parsed to extract the desired participant
and data field IDs for download with
Snippet of a bulk file:
1002532 20227_2_0 1002532 20227_3_0 1002532 20249_2_0 1002532 20249_3_0 1002532 20250_2_0 1002532 20250_3_0 1003339 20251_2_0 1003339 20251_3_0 1003339 20252_2_0 1003339 20252_3_0 1003339 20253_2_0 1003339 20253_3_0
- Participant ID
- These are unique to each application/project (e.g. 1002532).
- Data field IDs
- Indicates the data type (e.g. 20227 = NIFTI functional rest image), instance index (e.g. 2 = first imaging visit), and array index (e.g. 0). The instance index distinguishes data that were gathered at different times (sessions). The array index indicates if multiple pieces of data were gathered at the same time. These fields are explained in more detail in section 2.8 of the UK Biobank accessing data guide
datalad-ukbionbank downloads data with the
ukbfetch tool (which must be
The UK Biobank allows multiple downloads in parallel, but limits each application to 10 concurrent downloads.
If you already have UK Biobank archives downloaded, and want to use
datalad-ukbiobank without re-downloading everything, you can simply replace
ukbfetch with a script
to obtain the relevant files from where they are located.