Configuration
DataLad uses the same configuration mechanism and syntax as Git itself.
Consequently, datalad can be configured using the git config
command. Both a global user configuration (typically at
~/.gitconfig
), and a local repository-specific configuration
(.git/config
) are inspected.
In addition, datalad supports a persistent dataset-specific configuration.
This configuration is stored at .datalad/config
in any dataset. As it
is part of a dataset, settings stored there will also be in effect for any
consumer of such a dataset. Both global and local settings on a particular
machine always override configuration shipped with a dataset.
All datalad-specific configuration variables are prefixed with datalad.
.
It is possible to override or amend the configuration using environment
variables. Any variable with a name that starts with DATALAD_
will
be available as the corresponding datalad.
configuration variable,
replacing any __
(two underscores) with a hyphen, then any _
(single underscore) with a dot, and finally converting all letters to
lower case. Values from environment variables take precedence over
configuration file settings.
In addition, the DATALAD_CONFIG_OVERRIDES_JSON
environment variable can
be set to a JSON record with configuration values. This is
particularly useful for options that aren’t accessible through the
naming scheme described above (e.g., an option name that includes an
underscore).
The following sections provide a (non-exhaustive) list of settings honored by datalad. They are categorized according to the scope they are typically associated with.
Global user configuration
- datalad.clone.url-substitute.github
GitHub URL substitution rule: Mangling for GitHub-related URL. A substitution specification is a string with a match and substitution expression, each following Python’s regular expression syntax. Both expressions are concatenated to a single string with an arbitrary delimiter character. The delimiter is defined by prefixing the string with the delimiter. Prefix and delimiter are stripped from the expressions (Example: “,^http://(.*)$,https://1”). This setting can be defined multiple times. Substitutions will be applied incrementally, in order of their definition. The first substitution in such a series must match, otherwise no further substitutions in a series will be considered. However, following the first match all further substitutions in a series are processed, regardless whether intermediate expressions match or not. Default: (‘,https?://github.com/([^/]+)/(.*)$,\1###\2’, ‘,[/\\]+(?!$),-’, ‘,\s+|(%2520)+|(%20)+,_’, ‘,([^#]+)###(.*),https://github.com/\1/\2’)
- datalad.clone.url-substitute.osf
Open Science Framework URL substitution rule: Mangling for OSF-related URLs. A substitution specification is a string with a match and substitution expression, each following Python’s regular expression syntax. Both expressions are concatenated to a single string with an arbitrary delimiter character. The delimiter is defined by prefixing the string with the delimiter. Prefix and delimiter are stripped from the expressions (Example: “,^http://(.*)$,https://1”). This setting can be defined multiple times. Substitutions will be applied incrementally, in order of their definition. The first substitution in such a series must match, otherwise no further substitutions in a series will be considered. However, following the first match all further substitutions in a series are processed, regardless whether intermediate expressions match or not. Default: (‘,^https://osf.io/([^/]+)[/]*$,osf://\1’,)
- datalad.extensions.load
DataLad extension packages to load: Indicate which extension packages should be loaded unconditionally on CLI startup or on importing ‘datalad.[core]api’. This enables the respective extensions to customize DataLad with functionality and configurability outside the scope of extension commands. For merely running extension commands it is not necessary to load them specifically Default: None
- datalad.externals.nda.dbserver
NDA database server: Hostname of the database server Default: https://nda.nih.gov/DataManager/dataManager
- datalad.locations.cache
Cache directory: Where should datalad cache files? Default: ~/.cache/datalad
- datalad.locations.default-dataset
Default dataset path: Where should datalad should look for (or install) a default dataset? Default: ~/datalad
- datalad.locations.extra-procedures
Extra procedure directory: Where should datalad search for some additional procedures?
- datalad.locations.locks
Lockfile directory: Where should datalad store lock files? Default: ~/.cache/datalad/locks
- datalad.locations.sockets
Socket directory: Where should datalad store socket files? Default: ~/.cache/datalad/sockets
- datalad.locations.system-procedures
System procedure directory: Where should datalad search for system procedures? Default: /etc/xdg/datalad/procedures
- datalad.locations.user-procedures
User procedure directory: Where should datalad search for user procedures? Default: ~/.config/datalad/procedures
- datalad.ssh.executable
Name of ssh executable for ‘datalad sshrun’: Specifies the name of the ssh-client executable thatdatalad will use. This might be an absolute path. On Windows systems it is currently by default set to point to the ssh executable of OpenSSH for Windows, if OpenSSH for Windows is installed. On other systems it defaults to ‘ssh’. Default: ssh
[value must be a string]
- datalad.ssh.identityfile
If set, pass this file as ssh’s -i option.: Default: None
- datalad.ssh.multiplex-connections
Whether to use a single shared connection for multiple SSH processes aiming at the same target.: Default: True
[value must be convertible to type bool]
- datalad.ssh.try-use-annex-bundled-git
Whether to attempt adjusting the PATH in a remote shell to include Git binaries located in a detected git-annex bundle: If enabled, this will be a ‘best-effort’ attempt that only supports remote hosts with a Bourne shell and the which command available. The remote PATH must already contain a git-annex installation. If git-annex is not found, or the detected git-annex does not have a bundled Git installation, detection failure will not result in an error, but only slow remote execution by one-time sensing overhead per each opened connection. Default: False
[value must be convertible to type bool]
- datalad.tests.cache
Cache directory for tests: Where should datalad cache test files? Default: ~/.cache/datalad/tests
- datalad.tests.credentials
Credentials to use during tests: Which credentials should be available while running tests? If “plaintext” (default), a new plaintext keyring would be created in tests temporary HOME. If “system”, no custom configuration would be passed to keyring and known to system credentials could be used. Default: plaintext
[value must be one of [CMD: (‘plaintext’, ‘system’) CMD][PY: (‘plaintext’, ‘system’) PY]]
Local repository configuration
Sticky dataset configuration
- datalad.locations.dataset-procedures
Dataset procedure directory: Where should datalad search for dataset procedures (relative to a dataset root)? Default: .datalad/procedures
Miscellaneous configuration
- datalad.annex.retry
Value for annex.retry to use for git-annex calls: On transfer failure, annex.retry (sans “datalad.”) controls the number of times that git-annex retries. DataLad will call git-annex with annex.retry set to the value here unless the annex.retry is explicitly configured Default: 3
[value must be convertible to type ‘int’]
- datalad.credentials.force-ask
Force (re-)entry of credentials: Should DataLad prompt for credential (re-)entry? This can be used to update previously stored credentials. Default: False
[value must be convertible to type bool]
- datalad.credentials.githelper.noninteractive
Non-interactive mode for git-credential helper: Should git-credential-datalad operate in non-interactive mode? This would mean to not ask for user confirmation when storing new credentials/provider configs. Default: False
[bool]
- datalad.exc.str.tblimit
This flag is used by datalad to cap the number of traceback steps included in exception logging and result reporting to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.:
- datalad.fake-dates-start
Initial fake date: When faking dates and there are no commits in any local branches, generate the date by adding one second to this value (Unix epoch time). The value must be positive. Default: 1112911993
[value must be convertible to type ‘int’]
- datalad.github.token-note
GitHub token note: Description for a Personal access token to generate. Default: DataLad
- datalad.install.inherit-local-origin
Inherit local origin of dataset source: If enabled, a local ‘origin’ remote of a local dataset clone source is configured as an ‘origin-2’ remote to make its annex automatically available. The process is repeated recursively for any further qualifying ‘origin’ dataset thereof.Note that if clone.defaultRemoteName is configured to use a name other than ‘origin’, that name will be used instead. Default: True
[value must be convertible to type bool]
- datalad.log.level
Used for control the verbosity of logs printed to stdout while running datalad commands/debugging:
- datalad.log.name
Include name of the log target in the log line:
- datalad.log.names
Which names (,-separated) to print log lines for:
- datalad.log.namesre
Regular expression for which names to print log lines for:
- datalad.log.outputs
Whether to log stdout and stderr for executed commands: When enabled, setting the log level to 5 should catch all execution output, though some output may be logged at higher levels Default: False
[value must be convertible to type bool]
- datalad.log.result-level
Log level for command result messages: If ‘match-status’, it will log ‘impossible’ results as a warning, ‘error’ results as errors, and everything else as ‘debug’. Otherwise the indicated log-level will be used for all such messages Default: debug
[value must be one of [CMD: (‘debug’, ‘info’, ‘warning’, ‘error’, ‘match-status’) CMD][PY: (‘debug’, ‘info’, ‘warning’, ‘error’, ‘match-status’) PY]]
- datalad.log.timestamp
Used to add timestamp to datalad logs: Default: False
[value must be convertible to type bool]
- datalad.log.traceback
Includes a compact traceback in a log message, with generic components removed. This setting is only in effect when given as an environment variable DATALAD_LOG_TRACEBACK. An integer value specifies the maximum traceback depth to be considered. If set to “collide”, a common traceback prefix between a current traceback and a previously logged traceback is replaced with “…” (maximum depth 100).:
- datalad.repo.backend
git-annex backend: Backend to use when creating git-annex repositories Default: MD5E
- datalad.repo.direct
Direct Mode for git-annex repositories: Set this flag to create annex repositories in direct mode by default Default: False
[value must be convertible to type bool]
- datalad.repo.version
git-annex repository version: Specifies the repository version for git-annex to be used by default Default: 8
[value must be convertible to type ‘int’]
- datalad.runtime.max-annex-jobs
Maximum number of git-annex jobs to request when “jobs” option set to “auto” (default): Set this value to enable parallel annex jobs that may speed up certain operations (e.g. get file content). The effective number of jobs will not exceed the number of available CPU cores (or 3 if there is less than 3 cores). Default: 1
[value must be convertible to type ‘int’]
- datalad.runtime.max-batched
Maximum number of batched commands to run in parallel: Automatic cleanup of batched commands will try to keep at most this many commands running. Default: 20
[value must be convertible to type ‘int’]
- datalad.runtime.max-inactive-age
Maximum time (in seconds) a batched command can be inactive before it is eligible for cleanup: Automatic cleanup of batched commands will consider an inactive command eligible for cleanup if more than this many seconds have transpired since the command’s last activity. Default: 60
[value must be convertible to type ‘int’]
- datalad.runtime.max-jobs
Maximum number of jobs DataLad can run in “parallel”: Set this value to enable parallel multi-threaded DataLad jobs that may speed up certain operations, in particular operation across multiple datasets (e.g., install multiple subdatasets, etc). Default: 1
[value must be convertible to type ‘int’]
- datalad.runtime.pathspec-from-file
Provide list of files to git commands via –pathspec-from-file: Instructs when DataLad will provide list of paths to ‘git’ commands which support –pathspec-from-file option via some temporary file. If set to ‘multi-chunk’ it will be done only if multiple invocations of the command on chunks of files list is needed. If set to ‘always’, DataLad will always use –pathspec-from-file. Default: multi-chunk
[value must be one of [CMD: (‘multi-chunk’, ‘always’) CMD][PY: (‘multi-chunk’, ‘always’) PY]]
- datalad.runtime.raiseonerror
Error behavior: Set this flag to cause DataLad to raise an exception on errors that would have otherwise just get logged Default: False
[value must be convertible to type bool]
- datalad.runtime.report-status
Command line result reporting behavior: If set (to other than ‘all’), constrains command result report to records matching the given status. ‘success’ is a synonym for ‘ok’ OR ‘notneeded’, ‘failure’ stands for ‘impossible’ OR ‘error’ Default: None
[value must be one of [CMD: (‘all’, ‘success’, ‘failure’, ‘ok’, ‘notneeded’, ‘impossible’, ‘error’) CMD][PY: (‘all’, ‘success’, ‘failure’, ‘ok’, ‘notneeded’, ‘impossible’, ‘error’) PY]]
- datalad.runtime.stalled-external
Behavior for handing external processes: What to do with external processes if they do not finish in some minimal reasonable time. If “abandon”, datalad would proceed without waiting for external process to exit. ATM applies only to batched git-annex processes. Should be changed with caution. Default: wait
[value must be one of [CMD: (‘wait’, ‘abandon’) CMD][PY: (‘wait’, ‘abandon’) PY]]
- datalad.save.no-message
Commit message handling: When no commit message was provided: attempt to obtain one interactively (interactive); or use a generic commit message (generic). NOTE: The interactive option is experimental. The behavior may change in backwards-incompatible ways. Default: generic
[value must be one of [CMD: (‘interactive’, ‘generic’) CMD][PY: (‘interactive’, ‘generic’) PY]]
- datalad.save.windows-compat-warning
Action when Windows-incompatible file names are saved: Certain characters or names can make file names incompatible with Windows. If such files are saved ‘warning’ will alert users with a log message, ‘error’ will yield an ‘impossible’ result, and ‘none’ will ignore the incompatibility. Default: warning
[value must be one of [CMD: (‘warning’, ‘error’, ‘none’) CMD][PY: (‘warning’, ‘error’, ‘none’) PY]]
- datalad.source.epoch
Datetime epoch to use for dates in built materials: Datetime to use for reproducible builds. Originally introduced for Debian packages to interface SOURCE_DATE_EPOCH described at https://reproducible-builds.org/docs/source-date-epoch/ .By default - current time Default: 1734018945.878176
[value must be convertible to type ‘float’]
- datalad.tests.dataladremote
Binary flag to specify whether each annex repository should get datalad special remote in every test repository:
[value must be convertible to type bool]
- datalad.tests.knownfailures.probe
Probes tests that are known to fail on whether or not they are actually still failing: Default: False
[value must be convertible to type bool]
- datalad.tests.knownfailures.skip
Skips tests that are known to currently fail: Default: True
[value must be convertible to type bool]
- datalad.tests.nonetwork
Skips network tests completely if this flag is set, Examples include test for S3, git_repositories, OpenfMRI, etc:
[value must be convertible to type bool]
- datalad.tests.nonlo
Specifies network interfaces to bring down/up for testing. Currently used by Travis CI.:
- datalad.tests.noteardown
Does not execute teardown_package which cleans up temp files and directories created by tests if this flag is set:
[value must be convertible to type bool]
- datalad.tests.runcmdline
Binary flag to specify if shell testing using shunit2 to be carried out:
[value must be convertible to type bool]
- datalad.tests.setup.testrepos
Pre-creates repositories for @with_testrepos within setup_package: Default: False
[value must be convertible to type bool]
- datalad.tests.ssh
Skips SSH tests if this flag is not set:
[value must be convertible to type bool]
- datalad.tests.temp.dir
Create a temporary directory at location specified by this flag. It is used by tests to create a temporary git directory while testing git annex archives etc: Default: None
[value must be a string]
- datalad.tests.temp.fs
Specify the temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
- datalad.tests.temp.fssize
Specify the size of temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
- datalad.tests.temp.keep
Function rmtemp will not remove temporary file/directory created for testing if this flag is set:
[value must be convertible to type bool]
- datalad.tests.ui.backend
Tests UI backend: Which UI backend to use Default: tests-noninteractive
- datalad.tests.usecassette
Specifies the location of the file to record network transactions by the VCR module. Currently used by when testing custom special remotes:
- datalad.ui.color
Colored terminal output: Enable or disable ANSI color codes in outputs; “on” overrides NO_COLOR environment variable Default: auto
[value must be one of [CMD: (‘on’, ‘off’, ‘auto’) CMD][PY: (‘on’, ‘off’, ‘auto’) PY]]
- datalad.ui.progressbar
UI progress bars: Default backend for progress reporting Default: None
[value must be one of [CMD: (‘tqdm’, ‘tqdm-ipython’, ‘log’, ‘none’) CMD][PY: (‘tqdm’, ‘tqdm-ipython’, ‘log’, ‘none’) PY]]
- datalad.ui.suppress-similar-results
Suppress rendering of similar repetitive results: If enabled, after a certain number of subsequent results that are identical regarding key properties, such as ‘status’, ‘action’, and ‘type’, additional similar results are not rendered by the common result renderer anymore. Instead, a count of suppressed results is displayed. If disabled, or when not running in an interactive terminal, all results are rendered. Default: True
[value must be convertible to type bool]
- datalad.ui.suppress-similar-results-threshold
Threshold for suppressing similar repetitive results: Minimum number of similar results to occur before suppression is considered. See ‘datalad.ui.suppress-similar-results’ for more information. Default: 10
[value must be convertible to type ‘int’]