Configuration¶
DataLad uses the same configuration mechanism and syntax as Git itself.
Consequently, datalad can be configured using the git config
command. Both a global user configuration (typically at
~/.gitconfig
), and a local repository-specific configuration
(.git/config
) are inspected.
In addition, datalad supports a persistent dataset-specific configuration.
This configuration is stored at .datalad/config
in any dataset. As it
is part of a dataset, settings stored there will also be in effect for any
consumer of such a dataset. Both global and local settings on a particular
machine always override configuration shipped with a dataset.
All datalad-specific configuration variables are prefixed with datalad.
.
It is possible to override or amend the configuration using environment
variables. Any variable with a name that starts with DATALAD_
will
be available as the corresponding datalad.
configuration variable,
replacing any __
(two underscores) with a hyphen, then any _
(single underscore) with a dot, and finally converting all letters to
lower case. Values from environment variables take precedence over
configuration file settings.
The following sections provide a (non-exhaustive) list of settings honored by datalad. They are categorized according to the scope they are typically associated with.
Global user configuration¶
- datalad.externals.nda.dbserver
- NDA database server: Hostname of the database server Default: https://nda.nih.gov/DataManager/dataManager
- datalad.locations.cache
- Cache directory: Where should datalad cache files? Default: ~/.cache/datalad
- datalad.locations.default-dataset
- Default dataset path: Where should datalad should look for (or install) a default dataset? Default: ~/datalad
- datalad.locations.extra-procedures
- Extra procedure directory: Where should datalad search for some additional procedures?
- datalad.locations.sockets
- Socket directory: Where should datalad store socket files? Default: ~/.cache/datalad/sockets
- datalad.locations.system-plugins
- System plugin directory: Where should datalad search for system plugins? Default: /etc/xdg/datalad/plugins
- datalad.locations.system-procedures
- System procedure directory: Where should datalad search for system procedures? Default: /etc/xdg/datalad/procedures
- datalad.locations.user-plugins
- User plugin directory: Where should datalad search for user plugins? Default: ~/.config/datalad/plugins
- datalad.locations.user-procedures
- User procedure directory: Where should datalad search for user procedures? Default: ~/.config/datalad/procedures
- datalad.ssh.identityfile
- If set, pass this file as ssh’s -i option.: Default: None
- datalad.ssh.multiplex-connections
Whether to use a single shared connection for multiple SSH processes aiming at the same target.: Default: True
[value must be convertible to type bool]
- datalad.tests.cache
- Cache directory for tests: Where should datalad cache test files? Default: ~/.cache/datalad/tests
Local repository configuration¶
- datalad.crawl.cache
Crawler download caching: Should the crawler cache downloaded files?
[bool]
- datalad.fake-dates
Fake (anonymize) dates: Should the dates in the logs be faked? Default: False
[value must be convertible to type bool]
Sticky dataset configuration¶
- datalad.locations.dataset-procedures
- Dataset procedure directory: Where should datalad search for dataset procedures (relative to a dataset root)? Default: .datalad/procedures
Miscellaneous configuration¶
- datalad.annex.retry
Value for annex.retry to use for git-annex calls: On transfer failure, annex.retry (sans “datalad.”) controls the number of times that git-annex retries. DataLad will call git-annex with annex.retry set to the value here unless the annex.retry is explicitly configured Default: 3
[value must be convertible to type ‘int’]
- datalad.exc.str.tblimit
- This flag is used by the datalad extract_tb function which extracts and formats stack-traces. It caps the number of lines to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.:
- datalad.fake-dates-start
Initial fake date: When faking dates and there are no commits in any local branches, generate the date by adding one second to this value (Unix epoch time). The value must be positive. Default: 1112911993
[value must be convertible to type ‘int’]
- datalad.github.token-note
- Github token note: Description for a Personal access token to generate. Default: DataLad
- datalad.install.inherit-local-origin
Inherit local origin of dataset source: If enabled, a local ‘origin’ remote of a local dataset clone source is configured as an ‘origin-2’ remote to make its annex automatically available. The process is repeated recursively for any further qualifying ‘origin’ dataset thereof. Default: True
[value must be convertible to type bool]
- datalad.log.level
- Used for control the verbosity of logs printed to stdout while running datalad commands/debugging:
- datalad.log.name
- Include name of the log target in the log line:
- datalad.log.names
- Which names (,-separated) to print log lines for:
- datalad.log.namesre
- Regular expression for which names to print log lines for:
- datalad.log.outputs
Whether to log stdout and stderr for executed commands: When enabled, setting the log level to 5 should catch all execution output, though some output may be logged at higher levels Default: False
[value must be convertible to type bool]
- datalad.log.result-level
Log level for command result messages: Overrides the default behavior of logging ‘impossible’ results as a warning, ‘error’ results as errors, and everything else as ‘debug’ with a single alternative log level Default: None
[value must be one of (‘debug’, ‘info’, ‘warning’, ‘error’)]
- datalad.log.timestamp
Used to add timestamp to datalad logs: Default: False
[value must be convertible to type bool]
- datalad.log.traceback
- Runs TraceBack function with collide set to True, if this flag is set to “collide”. This replaces any common prefix between current traceback log and previous invocation with “…”:
- datalad.metadata.create-aggregate-annex-limit
- Limit configuration annexing aggregated metadata in new dataset: Git-annex large files expression (see https://git-annex.branchable.com/tips/largefiles; given expression will be wrapped in parentheses) Default: anything
- datalad.metadata.maxfieldsize
Maximum metadata field size: Metadata fields exceeding this size (in bytes/chars) are excluded from metadata extractio Default: 100000
[value must be convertible to type ‘int’]
- datalad.metadata.nativetype
- Native dataset metadata scheme: Set this label to engage a particular metadata extraction parser
- datalad.metadata.store-aggregate-content
Aggregated content metadata storage: If this flag is enabled, content metadata is aggregated into superdataset to allow for discovery of individual files. If disable unique content metadata values are still aggregated to enable dataset discovery Default: True
[value must be convertible to type bool]
- datalad.repo.backend
- git-annex backend: Backend to use when creating git-annex repositories Default: MD5E
- datalad.repo.direct
Direct Mode for git-annex repositories: Set this flag to create annex repositories in direct mode by default Default: False
[value must be convertible to type bool]
- datalad.repo.version
git-annex repository version: Specifies the repository version for git-annex to be used by default Default: 5
[value must be convertible to type ‘int’]
- datalad.runtime.max-annex-jobs
Maximum number of git-annex jobs to request when “jobs” option set to “auto” (default): Set this value to enable parallel annex jobs that may speed up certain operations (e.g. get file content). The effective number of jobs will not exceed the number of available CPU cores (or 3 if there is less than 3 cores). Default: 1
[value must be convertible to type ‘int’]
- datalad.runtime.max-jobs
Maximum number of jobs DataLad can run in “parallel”: Set this value to enable parallel multi-threaded DataLad jobs that may speed up certain operations, in particular operation across multiple datasets (e.g., install multiple subdatasets, etc). Default: 1
[value must be convertible to type ‘int’]
- datalad.runtime.raiseonerror
Error behavior: Set this flag to cause DataLad to raise an exception on errors that would have otherwise just get logged Default: False
[value must be convertible to type bool]
- datalad.runtime.report-status
Command line result reporting behavior: If set (to other than ‘all’), constrains command result report to records matching the given status. ‘success’ is a synonym for ‘ok’ OR ‘notneeded’, ‘failure’ stands for ‘impossible’ OR ‘error’ Default: None
[value must be one of (‘all’, ‘success’, ‘failure’, ‘ok’, ‘notneeded’, ‘impossible’, ‘error’)]
- datalad.runtime.stalled-external
Behavior for handing external processes: What to do with external processes if they do not finish in some minimal reasonable time. If “abandon”, datalad would proceed without waiting for external process to exit. ATM applies only to batched git-annex processes. Should be changed with caution. Default: wait
[value must be one of (‘wait’, ‘abandon’)]
- datalad.save.no-message
Commit message handling: When no commit message was provided: attempt to obtain one interactively (interactive); or use a generic commit message (generic). NOTE: The interactive option is experimental. The behavior may change in backwards-incompatible ways. Default: generic
[value must be one of (‘interactive’, ‘generic’)]
- datalad.search.default-mode
Default search mode: Label of the mode to be used by default Default: egrep
[value must be one of (‘egrep’, ‘textblob’, ‘autofield’)]
- datalad.search.index-default-documenttype
Type of search index documents: Labels of document types to include in a default search index Default: datasets
[value must be one of (‘all’, ‘datasets’, ‘files’)]
- datalad.search.indexercachesize
Maximum cache size for search index (per process): Actual memory consumption can be twice as high as this value in MB (one process per CPU is used) Default: 256
[value must be convertible to type ‘int’]
- datalad.tests.dataladremote
Binary flag to specify whether each annex repository should get datalad special remote in every test repository:
[value must be convertible to type bool]
- datalad.tests.knownfailures.probe
Probes tests that are known to fail on whether or not they are actually still failing: Default: False
[value must be convertible to type bool]
- datalad.tests.knownfailures.skip
Skips tests that are known to currently fail: Default: True
[value must be convertible to type bool]
- datalad.tests.nonetwork
Skips network tests completely if this flag is set Examples include test for s3, git_repositories, openfmri etc:
[value must be convertible to type bool]
- datalad.tests.nonlo
- Specifies network interfaces to bring down/up for testing. Currently used by travis.:
- datalad.tests.noteardown
Does not execute teardown_package which cleans up temp files and directories created by tests if this flag is set:
[value must be convertible to type bool]
- datalad.tests.runcmdline
Binary flag to specify if shell testing using shunit2 to be carried out:
[value must be convertible to type bool]
- datalad.tests.setup.testrepos
Pre-creates repositories for @with_testrepos within setup_package: Default: False
[value must be convertible to type bool]
- datalad.tests.ssh
Skips SSH tests if this flag is not set:
[value must be convertible to type bool]
- datalad.tests.temp.dir
Create a temporary directory at location specified by this flag. It is used by tests to create a temporary git directory while testing git annex archives etc: Default: None
[value must be a string]
- datalad.tests.temp.fs
- Specify the temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
- datalad.tests.temp.fssize
- Specify the size of temporary file system to use as loop device for testing DATALAD_TESTS_TEMP_DIR creation:
- datalad.tests.temp.keep
Function rmtemp will not remove temporary file/directory created for testing if this flag is set:
[value must be convertible to type bool]
- datalad.tests.ui.backend
- Tests UI backend: Which UI backend to use Default: tests-noninteractive
- datalad.tests.usecassette
- Specifies the location of the file to record network transactions by the VCR module. Currently used by when testing custom special remotes:
- datalad.ui.color
Colored terminal output: Enable or disable ANSI color codes in outputs; “on” overrides NO_COLOR environment variable Default: auto
[value must be one of (‘on’, ‘off’, ‘auto’)]
- datalad.ui.progressbar
UI progress bars: Default backend for progress reporting Default: None
[value must be one of (‘tqdm’, ‘tqdm-ipython’, ‘log’, ‘none’)]