class datalad.utils.ArgSpecFake(args, varargs, keywords, defaults)[source]

Bases: NamedTuple

args: list[str]

Alias for field number 0

defaults: tuple[Any, ...] | None

Alias for field number 3

keywords: str | None

Alias for field number 2

varargs: str | None

Alias for field number 1

class datalad.utils.File(name: str, executable: bool = False)[source]

Bases: object

Helper for a file entry in the create_tree/@with_tree

It allows to define additional settings for entries

class datalad.utils.SequenceFormatter(separator: str = ' ', element_formatter: ~string.Formatter = <string.Formatter object>, *args: ~typing.Any, **kwargs: ~typing.Any)[source]

Bases: Formatter

string.Formatter subclass with special behavior for sequences.

This class delegates formatting of individual elements to another formatter object. Non-list objects are formatted by calling the delegate formatter’s “format_field” method. List-like objects (list, tuple, set, frozenset) are formatted by formatting each element of the list according to the specified format spec using the delegate formatter and then joining the resulting strings with a separator (space by default).

format_element(elem: Any, format_spec: str) Any[source]

Format a single element

For sequences, this is called once for each element in a sequence. For anything else, it is called on the entire object. It is intended to be overridden in subclases.

format_field(value: Any, format_spec: str) Any[source]
class datalad.utils.SwallowLogsAdapter(file_: str | Path | None)[source]

Bases: object

Little adapter to help getting out values

And to stay consistent with how swallow_outputs behaves

assert_logged(msg: str | None = None, level: str | None = None, regex: bool = True, **kwargs: Any) None[source]

Provide assertion on whether a msg was logged at a given level

If neither msg nor level provided, checks if anything was logged at all.

  • msg (str, optional) – Message (as a regular expression, if regex) to be searched. If no msg provided, checks if anything was logged at a given level.

  • level (str, optional) – String representing the level to be logged

  • regex (bool, optional) – If False, regular assert_in is used

  • **kwargs (str, optional) – Passed to assert_re_in or assert_in

cleanup() None[source]
property handle: IO[str]
property lines: list[str]
property out: str
class datalad.utils.SwallowOutputsAdapter[source]

Bases: object

Little adapter to help getting out/err values

cleanup() None[source]
property err: str
property handles: tuple[TextIO, TextIO]
property out: str

Return if any of regexes (list or str) searches successfully for value

datalad.utils.assert_no_open_files(path: str | Path) None[source]
datalad.utils.assure_bool(s: Any) bool

Note: This function is deprecated. Use ensure_bool instead.

datalad.utils.assure_bytes(s: str | bytes, encoding: str = 'utf-8') bytes

Note: This function is deprecated. Use ensure_bytes instead.

datalad.utils.assure_dict_from_str(s: str | dict[K, V], sep: str = '\n') dict[str, str] | None | dict[K, V] | None

Note: This function is deprecated. Use ensure_dict_from_str instead.

datalad.utils.assure_dir(*args: str) str

Note: This function is deprecated. Use ensure_dir instead.

datalad.utils.assure_iter(s: Any, cls: type[ListOrSet], copy: bool = False, iterate: bool = True) ListOrSet

Note: This function is deprecated. Use ensure_iter instead.

datalad.utils.assure_list(s: Any, copy: bool = False, iterate: bool = True) list

Note: This function is deprecated. Use ensure_list instead.

datalad.utils.assure_list_from_str(s: str | list[T], sep: str = '\n') list[str] | None | list[T] | None

Note: This function is deprecated. Use ensure_list_from_str instead.

datalad.utils.assure_tuple_or_list(obj: Any) list | tuple

Note: This function is deprecated. Use ensure_tuple_or_list instead.

datalad.utils.assure_unicode(s: str | bytes, encoding: str | None = None, confidence: float | None = None) str

Note: This function is deprecated. Use ensure_unicode instead.

datalad.utils.auto_repr(cls: type[T], short: bool = True) type[T][source]

Decorator for a class to assign it an automagic quick and dirty __repr__

It uses public class attributes to prepare repr of a class

Original idea:

datalad.utils.bytes2human(n: int | float, format: str = '%(value).1f %(symbol)sB') str[source]

Convert n bytes into a human readable string based on format. symbols can be either “customary”, “customary_ext”, “iec” or “iec_ext”, see:

>>> from datalad.utils import bytes2human
>>> bytes2human(1)
'1.0 B'
>>> bytes2human(1024)
'1.0 KB'
>>> bytes2human(1048576)
'1.0 MB'
>>> bytes2human(1099511627776127398123789121)
'909.5 YB'
>>> bytes2human(10000, "%(value).1f %(symbol)s/sec")
'9.8 K/sec'
>>> # precision can be adjusted by playing with %f operator
>>> bytes2human(10000, format="%(value).5f %(symbol)s")
'9.76562 K'

Taken from: and subsequently simplified Original Author: Giampaolo Rodola’ <g.rodola [AT] gmail [DOT] com> License: MIT

helper similar to datalad.tests.utils_pytest.has_symlink_capability

However, for use in a datalad command context, we shouldn’t assume to be able to write to tmpfile and also not import a whole lot from datalad’s test machinery. Finally, we want to know, whether we can create a symlink at a specific location, not just somewhere. Therefore use arbitrary path to test-build a symlink and delete afterwards. Suitable location can therefore be determined by high lever code.

  • path (Path) –

  • target (Path) –

Return type:


class datalad.utils.chpwd(path: str | Path | None, mkdir: bool = False, logsuffix: str = '')[source]

Bases: object

Wrapper around os.chdir which also adjusts environ[‘PWD’]

The reason is that otherwise PWD is simply inherited from the shell and we have no ability to assess directory path without dereferencing symlinks.

If used as a context manager it allows to temporarily change directory to the given path

datalad.utils.collect_method_callstats(func: Callable[[P], T]) Callable[[P], T][source]

Figure out methods which call the method repeatedly on the same instance

Use case(s):
  • .repo is expensive since does all kinds of checks.

  • .config is expensive transitively since it calls .repo each time


  • fancy one could look through the stack for the same id(self) to see if that location is already in memo. That would hint to the cases where object is not passed into underlying functions, causing them to redo the same work over and over again

  • ATM might flood with all “1 lines” calls which are not that informative. The underlying possibly suboptimal use might be coming from their callers. It might or not relate to the previous TODO

datalad.utils.create_tree(path: str, tree: Tuple[Tuple[str | File, str | bytes | TreeSpec], ...] | List[Tuple[str | File, str | bytes | TreeSpec]] | Dict[str | File, str | bytes | TreeSpec], archives_leading_dir: bool = True, remove_existing: bool = False) None[source]

Given a list of tuples (name, load) create such a tree

if load is a tuple itself – that would create either a subtree or an archive with that content and place it into the tree if name ends with .tar.gz

datalad.utils.create_tree_archive(path: str, name: str, load: Tuple[Tuple[str | File, str | bytes | TreeSpec], ...] | List[Tuple[str | File, str | bytes | TreeSpec]] | Dict[str | File, str | bytes | TreeSpec], overwrite: bool = False, archives_leading_dir: bool = True) None[source]

Given an archive name, create under path with specified load tree

datalad.utils.decode_input(s: str | bytes) str[source]

Given input string/bytes, decode according to stdin codepage (or UTF-8) if not defined

If fails – issue warning and decode allowing for errors being replaced

datalad.utils.disable_logger(logger: Logger | None = None) Iterator[Logger][source]

context manager to temporarily disable logging

This is to provide one of swallow_logs’ purposes without unnecessarily creating temp files (see gh-1865)


logger (Logger) – Logger whose handlers will be ordered to not log anything. Default: datalad’s topmost Logger (‘datalad’)

datalad.utils.dlabspath(path: str | Path, norm: bool = False) str[source]

Symlinks-in-the-cwd aware abspath

os.path.abspath relies on os.getcwd() which would not know about symlinks in the path

TODO: we might want to norm=True by default to match behavior of os .path.abspath?

datalad.utils.encode_filename(filename: str | bytes) bytes[source]

Encode unicode filename

datalad.utils.ensure_bool(s: Any) bool[source]

Convert value into boolean following convention for strings

to recognize on,True,yes as True, off,False,no as False

datalad.utils.ensure_bytes(s: str | bytes, encoding: str = 'utf-8') bytes[source]

Convert/encode unicode string to bytes.

If s isn’t a string, return it as is.


encoding (str, optional) – Encoding to use. “utf-8” is the default

datalad.utils.ensure_dict_from_str(s: str, sep: str = '\n') dict[str, str] | None[source]
datalad.utils.ensure_dict_from_str(s: dict[K, V], sep: str = '\n') dict[K, V] | None

Given a multiline string with key=value items convert it to a dictionary

  • s (str or dict) –

  • empty (Returns None if input s is) –

datalad.utils.ensure_dir(*args: str) str[source]

Make sure directory exists.

Joins the list of arguments to an os-specific path to the desired directory and creates it, if it not exists yet.

datalad.utils.ensure_iter(s: Any, cls: type[ListOrSet], copy: bool = False, iterate: bool = True) ListOrSet[source]

Given not a list, would place it into a list. If None - empty list is returned

  • s (list or anything) –

  • cls (class) – Which iterable class to ensure

  • copy (bool, optional) – If correct iterable is passed, it would generate its shallow copy

  • iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.

datalad.utils.ensure_list(s: Any, copy: bool = False, iterate: bool = True) list[source]

Given not a list, would place it into a list. If None - empty list is returned

  • s (list or anything) –

  • copy (bool, optional) – If list is passed, it would generate a shallow copy of the list

  • iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.

datalad.utils.ensure_list_from_str(s: str, sep: str = '\n') list[str] | None[source]
datalad.utils.ensure_list_from_str(s: list[T], sep: str = '\n') list[T] | None

Given a multiline string convert it to a list of return None if empty


s (str or list) –

datalad.utils.ensure_result_list(r: Any) list[source]

Return a list of result records

Largely same as ensure_list, but special casing a single dict being passed in, which a plain ensure_list would iterate over. Hence, this deals with the three ways datalad commands return results: - single dict - list of dicts - generator

Used for result assertion helpers.

datalad.utils.ensure_tuple_or_list(obj: Any) list | tuple[source]

Given an object, wrap into a tuple if not list or tuple

datalad.utils.ensure_unicode(s: str | bytes, encoding: str | None = None, confidence: float | None = None) str[source]

Convert/decode bytestring to unicode.

If s isn’t a bytestring, return it as is.

  • encoding (str, optional) – Encoding to use. If None, “utf-8” is tried, and then if not a valid UTF-8, encoding will be guessed

  • confidence (float, optional) – A value between 0 and 1, so if guessing of encoding is of lower than specified confidence, ValueError is raised

datalad.utils.ensure_write_permission(path: Path) Iterator[None][source]

Context manager to get write permission on path and restore original mode afterwards.


path (Path) – path to the target file


PermissionError – if write permission could not be obtained

datalad.utils.escape_filename(filename: str) str[source]

Surround filename in “” and escape ” in the filename

datalad.utils.expandpath(path: str | Path, force_absolute: bool = True) str[source]

Expand all variables and user handles in a path.

By default return an absolute path

datalad.utils.file_basename(name: str | Path, return_ext: Literal[True]) tuple[str, str][source]
datalad.utils.file_basename(name: str | Path, return_ext: Literal[False] = False) str

Strips up to 2 extensions of length up to 4 characters and starting with alpha not a digit, so we could get rid of .tar.gz etc

datalad.utils.find_files(regex: str, topdir: str | Path = '.', exclude: str | None = None, exclude_vcs: bool = True, exclude_datalad: bool = False, dirs: bool = False) Iterator[str][source]

Generator to find files matching regex

  • regex (string) –

  • exclude (string, optional) – Matches to exclude

  • exclude_vcs – If True, excludes commonly known VCS subdirectories. If string, used as regex to exclude those files (regex: ‘/\.(?:git|gitattributes|svn|bzr|hg)(?:/|$)’)

  • exclude_datalad – If True, excludes files known to be datalad meta-data files (e.g. under .datalad/ subdirectory) (regex: ‘/\.(?:datalad)(?:/|$)’)

  • topdir (string, optional) – Directory where to search

  • dirs (bool, optional) – Whether to match directories as well as files

datalad.utils.generate_chunks(container: list[T], size: int) Iterator[list[T]][source]

Given a container, generate chunks from it with size up to size

datalad.utils.generate_file_chunks(files: list[str], cmd: str | list[str] | None = None) Iterator[list[str]][source]

Given a list of files, generate chunks of them to avoid exceeding cmdline length

  • files (list of str) –

  • cmd (str or list of str, optional) – Command to account for as well

datalad.utils.get_dataset_root(path: str | Path) str | None[source]

Return the root of an existent dataset containing a given path

The root path is returned in the same absolute or relative form as the input argument. If no associated dataset exists, or the input path doesn’t exist, None is returned.

If path is a symlink or something other than a directory, its the root dataset containing its parent directory will be reported. If none can be found, at a symlink at path is pointing to a dataset, path itself will be reported as the root.


path (Path-like) –

Return type:

str or None

datalad.utils.get_encoding_info() dict[str, str][source]

Return a dictionary with various encoding/locale information

datalad.utils.get_envvars_info() dict[str, str][source]
datalad.utils.get_home_envvars(new_home: str | Path) dict[str, str][source]

Return dict with env variables to be adjusted for a new HOME

Only variables found in current os.environ are adjusted.


new_home (str or Path) – New home path, in native to OS “schema”

datalad.utils.get_ipython_shell() Any | None[source]

Detect if running within IPython and returns its ip (shell) object

Returns None if not under ipython (no get_ipython function)

datalad.utils.get_linux_distribution() tuple[str, str, str][source]

Compatibility wrapper for {platform,distro}.linux_distribution().

datalad.utils.get_logfilename(dspath: str | Path, cmd: str = 'datalad') str[source]

Return a filename to use for logging under a dataset/repository

directory would be created if doesn’t exist, but dspath must exist and be a directory

datalad.utils.get_open_files(path: str | Path, log_open: int = False) dict[str, Any][source]

Get open files under a path

Note: This function is very slow on Windows.

  • path (str) – File or directory to check for open files under

  • log_open (bool or int) – If set - logger level to use


path : pid

Return type:


datalad.utils.get_path_prefix(path: str | Path, pwd: str | None = None) str[source]

Get path prefix (for current directory)

Returns relative path to the topdir, if we are under topdir, and if not absolute path to topdir. If pwd is not specified - current directory assumed

datalad.utils.get_sig_param_names(f: Callable[[...], Any], kinds: tuple[str, ...]) tuple[list[str], ...][source]

A helper to selectively return parameters from inspect.signature.

inspect.signature is the ultimate way for introspecting callables. But its interface is not so convenient for a quick selection of parameters (AKA arguments) of desired type or combinations of such. This helper should make it easier to retrieve desired collections of parameters.

Since often it is desired to get information about multiple specific types of parameters, kinds is a list, so in a single invocation of signature and looping through the results we can obtain all information.

  • f (callable) –

  • kinds (tuple with values from {'pos_any', 'pos_only', 'kw_any', 'kw_only', 'any'}) – Is a list of what kinds of args to return in result (tuple). Each element should be one of: ‘any_pos’ - positional or keyword which could be used positionally. ‘kw_only’ - keyword only (cannot be used positionally) arguments, ‘any_kw` - any keyword (could be a positional which could be used as a keyword), any – any type from the above.


Each element is a list of parameters (names only) of that “kind”.

Return type:


datalad.utils.get_suggestions_msg(values: str | Iterable[str] | None, known: str, sep: str = '\n        ') str[source]

Return a formatted string with suggestions for values given the known ones

datalad.utils.get_tempfile_kwargs(tkwargs: dict[str, Any] | None = None, prefix: str = '', wrapped: Callable | None = None) dict[str, Any][source]

Updates kwargs to be passed to tempfile. calls depending on env vars

datalad.utils.get_timestamp_suffix(time_: int | time.struct_time | None = None, prefix: str = '-') str[source]

Return a time stamp (full date and time up to second)

primarily to be used for generation of log files names

datalad.utils.get_trace(edges: Sequence[tuple[T, T]], start: T, end: T, trace: list[T] | None = None) list[T] | None[source]

Return the trace/path to reach a node in a tree.

  • edges (sequence(2-tuple)) – The tree given by a sequence of edges (parent, child) tuples. The nodes can be identified by any value and data type that supports the ‘==’ operation.

  • start – Identifier of the start node. Must be present as a value in the parent location of an edge tuple in order to be found.

  • end – Identifier of the target/end node. Must be present as a value in the child location of an edge tuple in order to be found.

  • trace (list) – Mostly useful for recursive calls, and used internally.


Returns a list with the trace to the target (the starts and the target are not included in the trace, hence if start and end are directly connected an empty list is returned), or None when no trace to the target can be found, or start and end are identical.

Return type:

None or list

datalad.utils.get_wrapped_class(wrapped: Callable) type[source]

Determine the command class a wrapped __call__ belongs to

datalad.utils.getargspec(func: Callable[[...], Any], *, include_kwonlyargs: bool = False) ArgSpecFake[source]

Compat shim for getargspec deprecated in python 3.

The main difference from inspect.getargspec (and inspect.getfullargspec for that matter) is that by using inspect.signature we are providing correct args/defaults for functools.wraps’ed functions.

include_kwonlyargs option was added to centralize getting all args, even the ones which are kwonly (follow the *,).

For internal use and not advised for use in 3rd party code. Please use inspect.signature directly.

datalad.utils.getpwd() str[source]

Try to return a CWD without dereferencing possible symlinks

This function will try to use PWD environment variable to provide a current working directory, possibly with some directories along the path being symlinks to other directories. Unfortunately, PWD is used/set only by the shell and such functions as os.chdir and os.getcwd nohow use or modify it, thus os.getcwd() returns path with links dereferenced.

While returning current working directory based on PWD env variable we verify that the directory is the same as os.getcwd() after resolving all symlinks. If that verification fails, we fall back to always use os.getcwd().

Initial decision to either use PWD env variable or os.getcwd() is done upon the first call of this function.

datalad.utils.guard_for_format(arg: str) str[source]

Replace { and } with {{ and }}

To be used in cases if arg is not expected to have provided by user .format() placeholders, but ‘arg’ might become a part of a composite passed to .format(), e.g. via ‘Run’

datalad.utils.import_module_from_file(modpath: str, pkg: module | None = None, log:[[str], ~typing.Any] = <bound method Logger.debug of <Logger datalad.utils (INFO)>>) module[source]

Import provided module given a path

TODO: - RF/make use of it in which has similar logic - join with import_modules above?


pkg (module, optional) – If provided, and modpath is under pkg.__path__, relative import will be used

datalad.utils.import_modules(modnames:[str], pkg: str, msg: str = 'Failed to import {module}', log:[[str], ~typing.Any] = <bound method Logger.debug of <Logger datalad.utils (INFO)>>) list[module][source]

Helper to import a list of modules without failing if N/A

  • modnames (list of str) – List of module names to import

  • pkg (str) – Package under which to import

  • msg (str, optional) – Message template for .format() to log at DEBUG level if import fails. Keys {module} and {package} will be provided and ‘: {exception}’ appended

  • log (callable, optional) – Logger call to use for logging messages

datalad.utils.is_explicit_path(path: str | Path) bool[source]

Return whether a path explicitly points to a location

Any absolute path, or relative path starting with either ‘../’ or ‘./’ is assumed to indicate a location on the filesystem. Any other path format is not considered explicit.

datalad.utils.is_interactive() bool[source]

Return True if all in/outs are open and tty.

Note that in a somewhat abnormal case where e.g. stdin is explicitly closed, and any operation on it would raise a ValueError(“I/O operation on closed file”) exception, this function would just return False, since the session cannot be used interactively.

datalad.utils.join_cmdline(args: Iterable[str]) str[source]

Join command line args into a string using quote_cmdlinearg

datalad.utils.knows_annex(path: str | Path) bool[source]

Returns whether at a given path there is information about an annex

It is just a thin wrapper around GitRepo.is_with_annex() classmethod which also checks for path to exist first.

This includes actually present annexes, but also uninitialized ones, or even the presence of a remote annex branch.

datalad.utils.line_profile(func: Callable[[P], T]) Callable[[P], T][source]

Q&D helper to line profile the function and spit out stats

datalad.utils.lmtime(filepath: str | Path, mtime: int | float) None[source]

Set mtime for files, while not de-referencing symlinks.

To overcome absence of os.lutime

Works only on linux and OSX ATM

datalad.utils.lock_if_required(lock_required: bool, lock: allocate_lock) Iterator[allocate_lock][source]

Acquired and released the provided lock if indicated by a flag

datalad.utils.make_tempfile(content: str | bytes | None = None, wrapped: Callable[..., Any] | None = None, **tkwargs: Any) Iterator[str][source]

Helper class to provide a temporary file name and remove it at the end (context manager)

  • mkdir (bool, optional (default: False)) – If True, temporary directory created using tempfile.mkdtemp()

  • content (str or bytes, optional) – Content to be stored in the file created

  • wrapped (function, optional) – If set, function name used to prefix temporary file name

  • **tkwargs – All other arguments are passed into the call to{,d}temp(), and resultant temporary filename is passed as the first argument into the function t. If no ‘prefix’ argument is provided, it will be constructed using module and function names (‘.’ replaced with ‘_’).

  • set (To change the used directory without providing keyword argument 'dir') –



>>> from os.path import exists
>>> from datalad.utils import make_tempfile
>>> with make_tempfile() as fname:
...    k = open(fname, 'w').write('silly test')
>>> assert not exists(fname)  # was removed
>>> with make_tempfile(content="blah") as fname:
...    assert open(fname).read() == "blah"
datalad.utils.map_items(func, v)[source]

A helper to apply func to all elements (keys and values) within dict

No type checking of values passed to func is done, so func should be resilient to values which it should not handle

Initial usecase - apply_recursive(url_fragment, ensure_unicode)

datalad.utils.md5sum(filename: str | Path) str[source]

Compute an MD5 sum for the given file

datalad.utils.never_fail(f: Callable[[P], T]) Callable[[P], T | None][source]

Assure that function never fails – all exceptions are caught

Returns None if function fails internally.

datalad.utils.not_supported_on_windows(msg: str | None = None) None[source]

A little helper to be invoked to consistently fail whenever functionality is not supported (yet) on Windows

datalad.utils.nothing_cm() Iterator[None][source]

Just a dummy cm to programmically switch context managers

datalad.utils.obtain_write_permission(path: Path) int | None[source]

Obtains write permission for path and returns previous mode if a change was actually made.


path (Path) – path to try to obtain write permission for


previous mode of path as return by stat().st_mode if a change in permission was actually necessary, None otherwise.

Return type:

int or None

datalad.utils.open_r_encdetect(fname: str | Path, readahead: int = 1000) IO[str][source]

Return a file object in read mode with auto-detected encoding

This is helpful when dealing with files of unknown encoding.


readahead (int, optional) – How many bytes to read for guessing the encoding type. If negative - full file will be read


allows a decorator to take optional positional and keyword arguments. Assumes that taking a single, callable, positional argument means that it is decorating a function, i.e. something like this:

def function(): pass

Calls decorator with decorator(f, *args, **kwargs)

datalad.utils.partition(items:[~datalad.typing.T], predicate:[[~datalad.typing.T], ~typing.Any] = <class 'bool'>) tuple[[T],[T]][source]

Partition items by predicate.

  • items (iterable) –

  • predicate (callable) – A function that will be mapped over each element in items. The elements will partitioned based on whether the return value is false or true.


  • A tuple with two generators, the first for ‘false’ items and the second for

  • ’true’ ones.


Taken from Peter Otten’s snippet posted at

datalad.utils.path_is_subpath(path: str, prefix: str) bool[source]

Return True if path is a subpath of prefix

It will return False if path == prefix.

  • path (str) –

  • prefix (str) –

datalad.utils.path_startswith(path: str, prefix: str) bool[source]

Return True if path starts with prefix path

  • path (str) –

  • prefix (str) –

datalad.utils.posix_relpath(path: str | Path, start: str | Path | None = None) str[source]

Behave like os.path.relpath, but always return POSIX paths…

on any platform.

datalad.utils.quote_cmdlinearg(arg: str) str[source]

Perform platform-appropriate argument quoting

datalad.utils.read_csv_lines(fname: str | Path, dialect: str | None = None, readahead: int = 16384, **kwargs: Any) Iterator[dict[str, str]][source]

A generator of dict records from a CSV/TSV

Automatically guesses the encoding for each record to convert to UTF-8

  • fname (str) – Filename

  • dialect (str, optional) – Dialect to specify to csv.reader. If not specified – guessed from the file, if fails to guess, “excel-tab” is assumed

  • readahead (int, optional) – How many bytes to read from the file to guess the type

  • **kwargs – Passed to csv.reader

datalad.utils.read_file(fname: str | Path, decode: Literal[True] = True) str[source]
datalad.utils.read_file(fname: str | Path, decode: Literal[False]) bytes

A helper to read file passing content via ensure_unicode


decode (bool, optional) – if False, no ensure_unicode and file content returned as bytes

datalad.utils.rmdir(path: str | Path, *args: Any, **kwargs: Any) None[source]

os.rmdir with our optional checking for open files

datalad.utils.rmtemp(f: str | Path, *args: Any, **kwargs: Any) None[source]

Wrapper to centralize removing of temp files so we could keep them around

It will not remove the temporary file/directory if DATALAD_TESTS_TEMP_KEEP environment variable is defined

datalad.utils.rmtree(path: str | Path, chmod_files: bool | Literal['auto'] = 'auto', children_only: bool = False, *args: Any, **kwargs: Any) None[source]

To remove git-annex .git it is needed to make all files and directories writable again first

  • path (Path or str) – Path to remove

  • chmod_files (string or bool, optional) – Whether to make files writable also before removal. Usually it is just a matter of directories to have write permissions. If ‘auto’ it would chmod files on windows by default

  • children_only (bool, optional) – If set, all files and subdirectories would be removed while the path itself (must be a directory) would be preserved

  • *args

  • **kwargs – Passed into shutil.rmtree call

datalad.utils.rotree(path: str | Path, ro: bool = True, chmod_files: bool = True) None[source]

To make tree read-only or writable

  • path (string) – Path to the tree/directory to chmod

  • ro (bool, optional) – Whether to make it R/O (default) or RW

  • chmod_files (bool, optional) – Whether to operate also on files (not just directories)

datalad.utils.saved_generator(gen: Iterable[T]) tuple[[T],[T]][source]

Given a generator returns two generators, where 2nd one just replays

So the first one would be going through the generated items and 2nd one would be yielding saved items

datalad.utils.shortened_repr(value: Any, l: int = 30) str[source]
datalad.utils.slash_join(base: str | None, extension: str | None) str | None[source]

Join two strings with a ‘/’, avoiding duplicate slashes

If any of the strings is None the other is returned as is.

datalad.utils.split_cmdline(s: str) list[str][source]

Perform platform-appropriate command line splitting.

Identical to shlex.split() on non-windows platforms.

Modified from

datalad.utils.swallow_logs(new_level: str | int | None = None, file_: str | Path | None = None, name: str = 'datalad') Iterator[SwallowLogsAdapter][source]

Context manager to consume all logs.

datalad.utils.swallow_outputs() Iterator[SwallowOutputsAdapter][source]

Context manager to help consuming both stdout and stderr, and print()

stdout is available as cm.out and stderr as cm.err whenever cm is the yielded context manager. Internally uses temporary files to guarantee absent side-effects of swallowing into StringIO which lacks .fileno.

print mocking is necessary for some uses where sys.stdout was already bound to original sys.stdout, thus mocking it later had no effect. Overriding print function had desired effect

datalad.utils.todo_interface_for_extensions(f: T) T[source]
datalad.utils.try_multiple(ntrials: int, exception: type[BaseException], base: float, f: Callable[P, T], *args: P.args, **kwargs: P.kwargs) T[source]

Call f multiple times making exponentially growing delay between the calls

datalad.utils.try_multiple_dec(f: Callable[P, T], ntrials: int | None = None, duration: float = 0.1, exceptions: type[BaseException] | tuple[type[BaseException], ...] | None = None, increment_type: Literal['exponential'] | None = None, exceptions_filter: Callable[[BaseException], Any] | None = None, logger: Callable | None = None) Callable[P, T][source]

Decorator to try function multiple times.

Main purpose is to decorate functions dealing with removal of files/directories and which might need a few seconds to work correctly on Windows which takes its time to release files/directories.

  • ntrials (int, optional) –

  • duration (float, optional) – Seconds to sleep before retrying.

  • increment_type ({None, 'exponential'}) – Note that if it is exponential, duration should typically be > 1.0 so it grows with higher power

  • exceptions (Exception or tuple of Exceptions, optional) – Exception or a tuple of multiple exceptions, on which to retry

  • exceptions_filter (callable, optional) – If provided, this function will be called with a caught exception instance. If function returns True - we will re-try, if False - exception will be re-raised without retrying.

  • logger (callable, optional) – Logger to log upon failure. If not provided, will use stock logger at the level of 5 (heavy debug).

datalad.utils.unique(seq: Sequence[T], key: Callable[[T], Any] | None = None, reverse: bool = False) list[T][source]

Given a sequence return a list only with unique elements while maintaining order

This is the fastest solution. See and for more information. Enhancement – added ability to compare for uniqueness using a key function

  • seq – Sequence to analyze

  • key (callable, optional) – Function to call on each element so we could decide not on a full element, but on its member etc

  • reverse (bool, optional) – If True, uniqueness checked in the reverse order, so that the later ones will take the order

‘Robust’ unlink. Would try multiple times

On windows boxes there is evidence for a latency of more than a second until a file is considered no longer “in-use”. WindowsError is not known on Linux, and if IOError or any other exception is thrown then if except statement has WindowsError in it – NameError also see gh-2533

datalad.utils.updated(d: dict[K, V], update: dict[K, V]) dict[K, V][source]

Return a copy of the input with the ‘update’

Primarily for updating dictionaries

datalad.utils.with_pathsep(path: str) str[source]

Little helper to guarantee that path ends with /