Credential management

Various components of DataLad need to be passed credentials to interact with services that require authentication. This includes downloading files, but also things like REST API usage or authenticated cloning. Key components of DataLad’s credential management are credentials types, providers, authenticators and downloaders.

Credentials

Supported credential types include basic user/password combinations, access tokens, and a range of tailored solutions for particular services. All credential type implementations are derived from a common Credential base class. A mapping from string labels to credential classes is defined in datalad.downloaders.CREDENTIAL_TYPES.

Importantly, credentials must be identified by a name. This name is a label that is often hard-coded in the program code of DataLad, any of its extensions, or specified in a dataset or in provider configurations (see below).

Given a credential name, one or more credential component(s) (e.g., token, username, or password) can be looked up by DataLad in at least two different locations. These locations are tried in the following order, and the first successful lookup yields the final value.

  1. A configuration item datalad.credential.<name>.<component>. Such configuration items can be defined in any location supported by DataLad’s configuration system. As with any other specification of configuration items, environment variables can be used to set or override credentials. Variable names take the form of DATALAD_CREDENTIAL_<NAME>_<COMPONENT>, and standard replacement rules into configuration variable names apply.

  2. DataLad uses the keyring package https://pypi.org/project/keyring to connect to any of its supported back-ends for setting or getting credentials, via a wrapper in keyring_. This provides support for credential storage on all major platforms, but also extensibility, providing 3rd-parties to implement and use specialized solutions.

When a credential is required for operation, but could not be obtained via any of the above approaches, DataLad can prompt for credentials in interactive terminal sessions. Interactively entered credentials will be stored in the active credential store available via the keyring package. Note, however, that the keyring approach is somewhat abused by datalad. The wrapper only uses get_/set_password of keyring with the credential’s FIELDS as the name to query (essentially turning the keyring into a plain key-value store) and “datalad-<CREDENTIAL-LABEL>” as the “service name”. With this approach it’s not possible to use credentials in a system’s keyring that were defined by other, datalad unaware software (or users).

When a credential value is known but invalid, the invalid value must be removed or replaced in the active credential store. By setting the configuration flag datalad.credentials.force-ask, DataLad can be instructed to force interactive credential re-entry to effectively override any store credential with a new value.

Providers

Providers are associating credentials with a context for using them and are defined by configuration files. A single provider is represented by Provider object and the list of available providers is represented by the Providers class. A provider is identified by a label and stored in a dedicated config file per provider named LABEL.cfg. Such a file can reside in a dataset (under .datalad/providers/), at the user level (under {user_config_dir}/providers), at the system level (under {site_config_dir}/providers) or come packaged with the datalad distribution (in directory configs next to providers.py). Such a provider specifies a regular expression to match URLs against and assigns authenticator abd credentials to be used for a match. Credentials are referenced by their label, which in turn is the name of another section in such a file specifying the type of the credential. References to credential and authenticator types are strings that are mapped to classes by the following dict definitions:

  • datalad.downloaders.AUTHENTICATION_TYPES

  • datalad.downloaders.CREDENTIAL_TYPES

Available providers can be loaded by Providers.from_config_files and Providers.get_provider(url) will match a given URL against them and return the appropriate Provider instance. A Provider object will determine a downloader to use (derived from BaseDownloader), based on the URL’s protocol.

Note, that the provider config files are not currently following datalad’s general config approach. Instead they are special config files, read by configparser.ConfigParser that are not compatible with git-config and hence the ConfigManager.

There are currently two ways of storing a provider and thus creating its config file: Providers.enter_new and Providers._store_new. The former will only work interactively and provide the user with options to choose from, while the latter is non-interactive and can therefore only be used, when all properties of the provider config are known and passed to it. There’s no way at the moment to store an existing Provider object directly.

Integration with Git

In addition, there’s a special case for interfacing git-credential: A dedicated GitCredential class is used to talk to Git’s git-credential command instead of the keyring wrapper. This class has identical fields to the UserPassword class and thus can be used by the same authenticators. Since Git’s way to deal with credentials doesn’t involve labels but only matching URLs, it is - in some sense - the equivalent of datalad’s provider layer. However, providers don’t talk to a backend, credentials do. Hence, a more seamless integration requires some changes in the design of datalad’s credential system as a whole.

In the opposite direction - making Git aware of datalad’s credentials, there’s no special casing, though. DataLad comes with a git-credential-datalad executable. Whenever Git is configured to use it by setting credential.helper=datalad, it will be able to query datalad’s credential system for a provider matching the URL in question and retrieve the referenced by this provider credentials. This helper can also store a new provider+credentials when asked to do so by Git. It can do this interactively, asking a user to confirm/change that config or - if credential.helper=’datalad –non-interactive’ - try to non-interactively store with its defaults.

Authenticators

Authenticators are used by downloaders to issue authenticated requests. They are not easily available to directly be applied to requests being made outside of the downloaders.