Configuration

Datalad uses the same configuration mechanism and syntax as Git itself. Consequently, datalad can be configured using the git config command. Both a global user configuration (typically at ~/.gitconfig), and a local repository-specific configuration (.git/config) are inspected.

In addition, datalad supports a persistent dataset-specific configuration. This configuration is stored at .datalad/config in any dataset. As it is part of a dataset, settings stored there will also be in effect for any consumer of such a dataset. Both global and local settings on a particular machine always override configuration shipped with a dataset.

All datalad-specific configuration variables are prefixed with datalad..

It is possible to override or amend the configuration using environment variables. Any variable with a name that starts with DATALAD_ will be available as the corresponding datalad. configuration variable, replacing any _ in the name with a dot, and all letters converted to lower case. Values from environment variables take precedence over configuration file settings.

The following sections provide a (non-exhaustive) list of settings honored by datalad. They are categorized according to the scope they are typically associated with.

Global user configuration

datalad.crawl.init_direct
Default annex repository mode: Should dataset be initialized in direct mode?
datalad.crawl.pipeline.housekeeping

Crawler pipeline house keeping: Should the crawler tidy up datasets (git gc, repack, clean)?

[value must be convertible to type bool]

datalad.externals.nda.dbserver
NDA database server: Hostname of the database server
datalad.locations.cache
Cache directory: Where should datalad cache files? Default: ~/.cache/datalad

Local repository configuration

datalad.crawl.cache

Crawler download caching: Should the crawler cache downloaded files?

[bool]

datalad.crawl.dryrun

Crawler dry-run: Should the crawler ... I AM NOT QUITE SURE WHAT?

[value must be convertible to type bool]

Sticky dataset configuration

datalad.crawl.default_backend
Default annex backend: Content hashing method to be used by git-annex

Miscellaneous configuration

datalad.cmd.protocol
Specifies the protocol number used by the Runner to note shell command or python function call times and allows for dry runs. “externals-time” for ExecutionTimeExternalsProtocol, “time” for ExecutionTimeProtocol and “null” for NullProtocol. Any new DATALAD_CMD_PROTOCOL has to implement datalad.support.protocol.ProtocolInterface:
datalad.cmd.protocol.prefix
Sets a prefix to add before the command call times are noted by DATALAD_CMD_PROTOCOL.:
datalad.exc.str.tblimit
This flag is used by the datalad extract_tb function which extracts and formats stack-traces. It caps the number of lines to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.:
datalad.log.level
Used for control the verbosity of logs printed to stdout while running datalad commands/debugging:
datalad.log.name
Include name of the log target in the log line:
datalad.log.names
Which names (,-separated) to print log lines for:
datalad.log.namesre
Regular expression for which names to print log lines for:
datalad.log.outputs
Used to control either both stdout and stderr of external commands execution are logged in detail (at DEBUG level):
datalad.log.timestamp

Used to add timestamp to datalad logs: Default: False

[value must be convertible to type bool]

datalad.log.traceback
Runs TraceBack function with collide set to True, if this flag is set to “collide”. This replaces any common prefix between current traceback log and previous invocation with ”...”:
datalad.repo.direct

Direct Mode for git-annex repositories: Set this flag to create annex repositories in direct mode by default

[value must be convertible to type bool]

datalad.repo.version

git-annex repository version: Specifies the repository version for git-annex to be used by default

[value must be convertible to type ‘int’]

datalad.tests.dataladremote

Binary flag to specify whether each annex repository should get datalad special remote in every test repository:

[value must be convertible to type bool]

datalad.tests.nonetwork

Skips network tests completely if this flag is set Examples include test for s3, git_repositories, openfmri etc:

[value must be convertible to type bool]

datalad.tests.nonlo
Specifies network interfaces to bring down/up for testing. Currently used by travis.:
datalad.tests.noteardown

Does not execute teardown_package which cleans up temp files and directories created by tests if this flag is set:

[value must be convertible to type bool]

datalad.tests.protocolremote

Binary flag to specify whether to test protocol interactions of custom remote with annex:

[value must be convertible to type bool]

datalad.tests.runcmdline

Binary flag to specify if shell testing using shunit2 to be carried out:

[value must be convertible to type bool]

datalad.tests.ssh

Skips SSH tests if this flag is not set:

[value must be convertible to type bool]

datalad.tests.temp.dir
Create a temporary directory at location specified by this flag. It is used by tests to create a temporary git directory while testing git annex archives etc:
datalad.tests.temp.fs
Specify the temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
datalad.tests.temp.fssize
Specify the size of temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
datalad.tests.temp.keep

Function rmtemp will not remove temporary file/directory created for testing if this flag is set:

[value must be convertible to type bool]

datalad.tests.ui.backend
Tests UI backend: Which UI backend to use Default: tests-noninteractive
datalad.tests.usecassette
Specifies the location of the file to record network transactions by the VCR module. Currently used by when testing custom special remotes: