datalad.utils

class datalad.utils.ArgSpecFake(args, varargs, keywords, defaults)[source]

Bases: NamedTuple

args: list[str]: Alias for field number 0

defaults: Optional[tuple[Any, ...]]: Alias for field number 3

keywords: Optional[str]: Alias for field number 2

varargs: Optional[str]: Alias for field number 1

class datalad.utils.File(name, executable=False)[source]

Bases: object

Helper for a file entry in the create_tree/@with_tree

It allows to define additional settings for entries

Parameters:

name (str)
executable (bool)

class datalad.utils.SequenceFormatter(separator=' ', element_formatter=<string.Formatter object>, *args, **kwargs)[source]

Bases: Formatter

string.Formatter subclass with special behavior for sequences.

This class delegates formatting of individual elements to another formatter object. Non-list objects are formatted by calling the delegate formatter’s “format_field” method. List-like objects (list, tuple, set, frozenset) are formatted by formatting each element of the list according to the specified format spec using the delegate formatter and then joining the resulting strings with a separator (space by default).

Parameters:

separator (str)
element_formatter (Formatter)
args (Any)
kwargs (Any)

format_element(elem, format_spec)[source]

Format a single element

For sequences, this is called once for each element in a sequence. For anything else, it is called on the entire object. It is intended to be overridden in subclases.

Parameters:

elem (Any)
format_spec (str)

Return type:

Any

format_field(value, format_spec)[source]

Parameters:

value (Any)
format_spec (str)

Return type:

Any

class datalad.utils.SwallowLogsAdapter(file_)[source]

Bases: object

Little adapter to help getting out values

And to stay consistent with how swallow_outputs behaves

Parameters:: file_ (str | Path | None)

assert_logged(msg=None, level=None, regex=True, **kwargs)[source]

Provide assertion on whether a msg was logged at a given level

If neither msg nor level provided, checks if anything was logged at all.

Parameters:

msg (str, optional) – Message (as a regular expression, if regex) to be searched. If no msg provided, checks if anything was logged at a given level.
level (str, optional) – String representing the level to be logged
regex (bool, optional) – If False, regular assert_in is used
**kwargs (str, optional) – Passed to assert_re_in or assert_in

Return type:

None

cleanup()[source]

Return type:: None

property handle: IO[str]

property lines: list[str]

property out: str

class datalad.utils.SwallowOutputsAdapter[source]

Bases: object

Little adapter to help getting out/err values

cleanup()[source]

Return type:: None

property err: str

property handles: tuple[TextIO, TextIO]

property out: str

datalad.utils.any_re_search(regexes, value)[source]

Return if any of regexes (list or str) searches successfully for value

Parameters:

regexes (str | list[str])
value (str)

Return type:

bool

datalad.utils.assert_no_open_files(path)[source]

Parameters:: path (str | Path)
Return type:: None

datalad.utils.assure_bool(s)

Note: This function is deprecated. Use ensure_bool instead.

Parameters:: s (Any)
Return type:: bool

datalad.utils.assure_bytes(s, encoding='utf-8')

Note: This function is deprecated. Use ensure_bytes instead.

Parameters:

s (str | bytes)
encoding (str)

Return type:

bytes

datalad.utils.assure_dict_from_str(s, sep='\\n')

Note: This function is deprecated. Use ensure_dict_from_str instead.

Parameters:

s (str | dict[TypeVar(K), TypeVar(V)])
sep (str)

Return type:

Union[dict[str, str], None, dict[TypeVar(K), TypeVar(V)]]

datalad.utils.assure_dir(*args)

Note: This function is deprecated. Use ensure_dir instead.

Parameters:: args (str)
Return type:: str

datalad.utils.assure_iter(s, cls, copy=False, iterate=True)

Note: This function is deprecated. Use ensure_iter instead.

Parameters:

s (Any)
cls (type[TypeVar(ListOrSet, list, set)])
copy (bool)
iterate (bool)

Return type:

TypeVar(ListOrSet, list, set)

datalad.utils.assure_list(s, copy=False, iterate=True)

Note: This function is deprecated. Use ensure_list instead.

Parameters:

s (Any)
copy (bool)
iterate (bool)

Return type:

list

datalad.utils.assure_list_from_str(s, sep='\\n')

Note: This function is deprecated. Use ensure_list_from_str instead.

Parameters:

s (str | list[TypeVar(T)])
sep (str)

Return type:

Union[list[str], None, list[TypeVar(T)]]

datalad.utils.assure_tuple_or_list(obj)

Note: This function is deprecated. Use ensure_tuple_or_list instead.

Parameters:: obj (Any)
Return type:: list | tuple

datalad.utils.assure_unicode(s, encoding=None, confidence=None)

Note: This function is deprecated. Use ensure_unicode instead.

Parameters:

s (str | bytes)
encoding (Optional[str])
confidence (Optional[float])

Return type:

str

datalad.utils.auto_repr(cls, short=True)[source]

Decorator for a class to assign it an automagic quick and dirty __repr__

It uses public class attributes to prepare repr of a class

Original idea: http://stackoverflow.com/a/27799004/1265472

Parameters:

cls (type[TypeVar(T)])
short (bool)

Return type:

type[TypeVar(T)]

datalad.utils.bytes2human(n, format='%(value).1f %(symbol)sB')[source]

Convert n bytes into a human readable string based on format. symbols can be either “customary”, “customary_ext”, “iec” or “iec_ext”, see: http://goo.gl/kTQMs

>>> from datalad.utils import bytes2human
>>> bytes2human(1)
'1.0 B'
>>> bytes2human(1024)
'1.0 KB'
>>> bytes2human(1048576)
'1.0 MB'
>>> bytes2human(1099511627776127398123789121)
'909.5 YB'

>>> bytes2human(10000, "%(value).1f %(symbol)s/sec")
'9.8 K/sec'

>>> # precision can be adjusted by playing with %f operator
>>> bytes2human(10000, format="%(value).5f %(symbol)s")
'9.76562 K'

Taken from: http://goo.gl/kTQMs and subsequently simplified Original Author: Giampaolo Rodola’ <g.rodola [AT] gmail [DOT] com> License: MIT

Parameters:

n (int | float)
format (str)

Return type:

str

datalad.utils.check_symlink_capability(path, target)[source]

helper similar to datalad.tests.utils_pytest.has_symlink_capability

However, for use in a datalad command context, we shouldn’t assume to be able to write to tmpfile and also not import a whole lot from datalad’s test machinery. Finally, we want to know, whether we can create a symlink at a specific location, not just somewhere. Therefore use arbitrary path to test-build a symlink and delete afterwards. Suitable location can therefore be determined by high lever code.

Parameters:

path (Path)
target (Path)

Return type:

bool

class datalad.utils.chpwd(path, mkdir=False, logsuffix='')[source]

Bases: object

Wrapper around os.chdir which also adjusts environ[‘PWD’]

The reason is that otherwise PWD is simply inherited from the shell and we have no ability to assess directory path without dereferencing symlinks.

If used as a context manager it allows to temporarily change directory to the given path

Parameters:

path (str | Path | None)
mkdir (bool)
logsuffix (str)

datalad.utils.collect_method_callstats(func)[source]

Figure out methods which call the method repeatedly on the same instance

Use case(s):

.repo is expensive since does all kinds of checks.
.config is expensive transitively since it calls .repo each time

Todo

fancy one could look through the stack for the same id(self) to see if that location is already in memo. That would hint to the cases where object is not passed into underlying functions, causing them to redo the same work over and over again
ATM might flood with all “1 lines” calls which are not that informative. The underlying possibly suboptimal use might be coming from their callers. It might or not relate to the previous TODO

Parameters:: func (Callable[[ParamSpec(P)], TypeVar(T)])
Return type:: Callable[[ParamSpec(P)], TypeVar(T)]

datalad.utils.create_tree(path, tree, archives_leading_dir=True, remove_existing=False)[source]

Given a list of tuples (name, load) create such a tree

if load is a tuple itself – that would create either a subtree or an archive with that content and place it into the tree if name ends with .tar.gz

Parameters:

path (str)
tree (Union[Tuple[Tuple[Union[str, File], Union[str, bytes, TreeSpec]], ...], List[Tuple[Union[str, File], Union[str, bytes, TreeSpec]]], Dict[Union[str, File], Union[str, bytes, TreeSpec]]])
archives_leading_dir (bool)
remove_existing (bool)

Return type:

None

datalad.utils.create_tree_archive(path, name, load, overwrite=False, archives_leading_dir=True)[source]

Given an archive name, create under path with specified load tree

Parameters:

path (str)
name (str)
load (Union[Tuple[Tuple[Union[str, File], Union[str, bytes, TreeSpec]], ...], List[Tuple[Union[str, File], Union[str, bytes, TreeSpec]]], Dict[Union[str, File], Union[str, bytes, TreeSpec]]])
overwrite (bool)
archives_leading_dir (bool)

Return type:

None

datalad.utils.decode_input(s)[source]

Given input string/bytes, decode according to stdin codepage (or UTF-8) if not defined

If fails – issue warning and decode allowing for errors being replaced

Parameters:: s (str | bytes)
Return type:: str

datalad.utils.disable_logger(logger=None)[source]

context manager to temporarily disable logging

This is to provide one of swallow_logs’ purposes without unnecessarily creating temp files (see gh-1865)

Parameters:: logger (Logger) – Logger whose handlers will be ordered to not log anything. Default: datalad’s topmost Logger (‘datalad’)
Return type:: Iterator[Logger]

datalad.utils.dlabspath(path, norm=False)[source]

Symlinks-in-the-cwd aware abspath

os.path.abspath relies on os.getcwd() which would not know about symlinks in the path

TODO: we might want to norm=True by default to match behavior of os .path.abspath?

Parameters:

path (str | Path)
norm (bool)

Return type:

str

datalad.utils.encode_filename(filename)[source]

Encode unicode filename

Parameters:: filename (str | bytes)
Return type:: bytes

datalad.utils.ensure_bool(s)[source]

Convert value into boolean following convention for strings

to recognize on,True,yes as True, off,False,no as False

Parameters:: s (Any)
Return type:: bool

datalad.utils.ensure_bytes(s, encoding='utf-8')[source]

Convert/encode unicode string to bytes.

If s isn’t a string, return it as is.

Parameters:

encoding (str, optional) – Encoding to use. “utf-8” is the default
s (str | bytes)

Return type:

bytes

datalad.utils.ensure_dict_from_str(s, sep='\\n')[source]

Given a multiline string with key=value items convert it to a dictionary

Parameters:

s (str or dict)
empty (Returns None if input s is)
sep (str)

Return type:

Union[dict[str, str], None, dict[TypeVar(K), TypeVar(V)]]

datalad.utils.ensure_dir(*args)[source]

Make sure directory exists.

Joins the list of arguments to an os-specific path to the desired directory and creates it, if it not exists yet.

Parameters:: args (str)
Return type:: str

datalad.utils.ensure_iter(s, cls, copy=False, iterate=True)[source]

Given not a list, would place it into a list. If None - empty list is returned

Parameters:

s (list or anything)
cls (class) – Which iterable class to ensure
copy (bool, optional) – If correct iterable is passed, it would generate its shallow copy
iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.

Return type:

TypeVar(ListOrSet, list, set)

datalad.utils.ensure_list(s, copy=False, iterate=True)[source]

Given not a list, would place it into a list. If None - empty list is returned

Parameters:

s (list or anything)
copy (bool, optional) – If list is passed, it would generate a shallow copy of the list
iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.

Return type:

list

datalad.utils.ensure_list_from_str(s, sep='\\n')[source]

Given a multiline string convert it to a list of return None if empty

Parameters:

s (str or list)
sep (str)

Return type:

Union[list[str], None, list[TypeVar(T)]]

datalad.utils.ensure_result_list(r)[source]

Return a list of result records

Largely same as ensure_list, but special casing a single dict being passed in, which a plain ensure_list would iterate over. Hence, this deals with the three ways datalad commands return results: - single dict - list of dicts - generator

Used for result assertion helpers.

Parameters:: r (Any)
Return type:: list

datalad.utils.ensure_tuple_or_list(obj)[source]

Given an object, wrap into a tuple if not list or tuple

Parameters:: obj (Any)
Return type:: list | tuple

datalad.utils.ensure_unicode(s, encoding=None, confidence=None)[source]

Convert/decode bytestring to unicode.

If s isn’t a bytestring, return it as is.

Parameters:

encoding (str, optional) – Encoding to use. If None, “utf-8” is tried, and then if not a valid UTF-8, encoding will be guessed
confidence (float, optional) – A value between 0 and 1, so if guessing of encoding is of lower than specified confidence, ValueError is raised
s (str | bytes)

Return type:

str

datalad.utils.ensure_write_permission(path)[source]

Context manager to get write permission on path and restore original mode afterwards.

Parameters:: path (Path) – path to the target file
Raises:: PermissionError – if write permission could not be obtained
Return type:: Iterator[None]

datalad.utils.escape_filename(filename)[source]

Surround filename in “” and escape “ in the filename

Parameters:: filename (str)
Return type:: str

datalad.utils.expandpath(path, force_absolute=True)[source]

Expand all variables and user handles in a path.

By default return an absolute path

Parameters:

path (str | Path)
force_absolute (bool)

Return type:

str

datalad.utils.file_basename(name, return_ext=False)[source]

Strips up to 2 extensions of length up to 4 characters and starting with alpha not a digit, so we could get rid of .tar.gz etc

Parameters:

name (str | Path)
return_ext (bool)

Return type:

str | tuple[str, str]

datalad.utils.find_files(regex, topdir='.', exclude=None, exclude_vcs=True, exclude_datalad=False, dirs=False)[source]

Generator to find files matching regex

Parameters:

regex (string)
exclude (string, optional) – Matches to exclude
exclude_vcs (bool) – If True, excludes commonly known VCS subdirectories. If string, used as regex to exclude those files (regex: ‘/\.(?:git|gitattributes|svn|bzr|hg)(?:/|$)’)
exclude_datalad (bool) – If True, excludes files known to be datalad meta-data files (e.g. under .datalad/ subdirectory) (regex: ‘/\.(?:datalad)(?:/|$)’)
topdir (string, optional) – Directory where to search
dirs (bool, optional) – Whether to match directories as well as files

Return type:

Iterator[str]

datalad.utils.generate_chunks(container, size)[source]

Given a container, generate chunks from it with size up to size

Parameters:

container (list[TypeVar(T)])
size (int)

Return type:

Iterator[list[TypeVar(T)]]

datalad.utils.generate_file_chunks(files, cmd=None)[source]

Given a list of files, generate chunks of them to avoid exceeding cmdline length

Parameters:

files (list of str)
cmd (str or list of str, optional) – Command to account for as well

Return type:

Iterator[list[str]]

datalad.utils.get_dataset_root(path)[source]

Return the root of an existent dataset containing a given path

The root path is returned in the same absolute or relative form as the input argument. If no associated dataset exists, or the input path doesn’t exist, None is returned.

If path is a symlink or something other than a directory, its the root dataset containing its parent directory will be reported. If none can be found, at a symlink at path is pointing to a dataset, path itself will be reported as the root.

Parameters:: path (Path-like)
Return type:: str or None

datalad.utils.get_encoding_info()[source]

Return a dictionary with various encoding/locale information

Return type:: dict[str, str]

datalad.utils.get_envvars_info()[source]

Return type:: dict[str, str]

datalad.utils.get_home_envvars(new_home)[source]

Return dict with env variables to be adjusted for a new HOME

Only variables found in current os.environ are adjusted.

Parameters:: new_home (str or Path) – New home path, in native to OS “schema”
Return type:: dict[str, str]

datalad.utils.get_ipython_shell()[source]

Detect if running within IPython and returns its ip (shell) object

Returns None if not under ipython (no get_ipython function)

Return type:: Optional[Any]

datalad.utils.get_linux_distribution()[source]

Compatibility wrapper for {platform,distro}.linux_distribution().

Return type:: tuple[str, str, str]

datalad.utils.get_logfilename(dspath, cmd='datalad')[source]

Return a filename to use for logging under a dataset/repository

directory would be created if doesn’t exist, but dspath must exist and be a directory

Parameters:

dspath (str | Path)
cmd (str)

Return type:

str

datalad.utils.get_open_files(path, log_open=False)[source]

Get open files under a path

Note: This function is very slow on Windows.

Parameters:

path (str) – File or directory to check for open files under
log_open (bool or int) – If set - logger level to use

Returns:

path : pid

Return type:

dict

datalad.utils.get_path_prefix(path, pwd=None)[source]

Get path prefix (for current directory)

Returns relative path to the topdir, if we are under topdir, and if not absolute path to topdir. If pwd is not specified - current directory assumed

Parameters:

path (str | Path)
pwd (Optional[str])

Return type:

str

datalad.utils.get_sig_param_names(f, kinds)[source]

A helper to selectively return parameters from inspect.signature.

inspect.signature is the ultimate way for introspecting callables. But its interface is not so convenient for a quick selection of parameters (AKA arguments) of desired type or combinations of such. This helper should make it easier to retrieve desired collections of parameters.

Since often it is desired to get information about multiple specific types of parameters, kinds is a list, so in a single invocation of signature and looping through the results we can obtain all information.

Parameters:

f (callable)
kinds (tuple with values from {'pos_any', 'pos_only', 'kw_any', 'kw_only', 'any'}) – Is a list of what kinds of args to return in result (tuple). Each element should be one of: ‘any_pos’ - positional or keyword which could be used positionally. ‘kw_only’ - keyword only (cannot be used positionally) arguments, ‘any_kw` - any keyword (could be a positional which could be used as a keyword), any – any type from the above.

Returns:

Each element is a list of parameters (names only) of that “kind”.

Return type:

tuple

datalad.utils.get_suggestions_msg(values, known, sep='\\n ')[source]

Return a formatted string with suggestions for values given the known ones

Parameters:

values (Union[str, Iterable[str], None])
known (str)
sep (str)

Return type:

str

datalad.utils.get_tempfile_kwargs(tkwargs=None, prefix='', wrapped=None)[source]

Updates kwargs to be passed to tempfile. calls depending on env vars

Parameters:

tkwargs (Optional[dict[str, Any]])
prefix (str)
wrapped (Optional[Callable])

Return type:

dict[str, Any]

datalad.utils.get_timestamp_suffix(time_=None, prefix='-')[source]

Return a time stamp (full date and time up to second)

primarily to be used for generation of log files names

Parameters:

time_ (Union[int, struct_time, None])
prefix (str)

Return type:

str

datalad.utils.get_trace(edges, start, end, trace=None)[source]

Return the trace/path to reach a node in a tree.

Parameters:

edges (sequence(2-tuple)) – The tree given by a sequence of edges (parent, child) tuples. The nodes can be identified by any value and data type that supports the ‘==’ operation.
start (TypeVar(T)) – Identifier of the start node. Must be present as a value in the parent location of an edge tuple in order to be found.
end (TypeVar(T)) – Identifier of the target/end node. Must be present as a value in the child location of an edge tuple in order to be found.
trace (list) – Mostly useful for recursive calls, and used internally.

Returns:

Returns a list with the trace to the target (the starts and the target are not included in the trace, hence if start and end are directly connected an empty list is returned), or None when no trace to the target can be found, or start and end are identical.

Return type:

None or list

datalad.utils.get_wrapped_class(wrapped)[source]

Determine the command class a wrapped __call__ belongs to

Parameters:: wrapped (Callable)
Return type:: type

datalad.utils.getargspec(func, *, include_kwonlyargs=False)[source]

Compat shim for getargspec deprecated in python 3.

The main difference from inspect.getargspec (and inspect.getfullargspec for that matter) is that by using inspect.signature we are providing correct args/defaults for functools.wraps’ed functions.

include_kwonlyargs option was added to centralize getting all args, even the ones which are kwonly (follow the *,).

For internal use and not advised for use in 3rd party code. Please use inspect.signature directly.

Parameters:

func (Callable[..., Any])
include_kwonlyargs (bool)

Return type:

ArgSpecFake

datalad.utils.getpwd()[source]

Try to return a CWD without dereferencing possible symlinks

This function will try to use PWD environment variable to provide a current working directory, possibly with some directories along the path being symlinks to other directories. Unfortunately, PWD is used/set only by the shell and such functions as os.chdir and os.getcwd nohow use or modify it, thus os.getcwd() returns path with links dereferenced.

While returning current working directory based on PWD env variable we verify that the directory is the same as os.getcwd() after resolving all symlinks. If that verification fails, we fall back to always use os.getcwd().

Initial decision to either use PWD env variable or os.getcwd() is done upon the first call of this function.

Return type:: str

datalad.utils.guard_for_format(arg)[source]

Replace { and } with {{ and }}

To be used in cases if arg is not expected to have provided by user .format() placeholders, but ‘arg’ might become a part of a composite passed to .format(), e.g. via ‘Run’

Parameters:: arg (str)
Return type:: str

datalad.utils.import_module_from_file(modpath, pkg=None, log=<bound method Logger.debug of <Logger datalad.utils (INFO)>>)[source]

Import provided module given a path

TODO: - RF/make use of it in pipeline.py which has similar logic - join with import_modules above?

Parameters:

pkg (module, optional) – If provided, and modpath is under pkg.__path__, relative import will be used
modpath (str)
log (Callable[[str], Any])

Return type:

ModuleType

datalad.utils.import_modules(modnames, pkg, msg='Failed to import {module}', log=<bound method Logger.debug of <Logger datalad.utils (INFO)>>)[source]

Helper to import a list of modules without failing if N/A

Parameters:

modnames (list of str) – List of module names to import
pkg (str) – Package under which to import
msg (str, optional) – Message template for .format() to log at DEBUG level if import fails. Keys {module} and {package} will be provided and ‘: {exception}’ appended
log (callable, optional) – Logger call to use for logging messages

Return type:

list[ModuleType]

datalad.utils.is_explicit_path(path)[source]

Return whether a path explicitly points to a location

Any absolute path, or relative path starting with either ‘../’ or ‘./’ is assumed to indicate a location on the filesystem. Any other path format is not considered explicit.

Parameters:: path (str | Path)
Return type:: bool

datalad.utils.is_interactive()[source]

Return True if all in/outs are open and tty.

Note that in a somewhat abnormal case where e.g. stdin is explicitly closed, and any operation on it would raise a ValueError(“I/O operation on closed file”) exception, this function would just return False, since the session cannot be used interactively.

Return type:: bool

datalad.utils.join_cmdline(args)[source]

Join command line args into a string using quote_cmdlinearg

Parameters:: args (Iterable[str])
Return type:: str

datalad.utils.knows_annex(path)[source]

Returns whether at a given path there is information about an annex

It is just a thin wrapper around GitRepo.is_with_annex() classmethod which also checks for path to exist first.

This includes actually present annexes, but also uninitialized ones, or even the presence of a remote annex branch.

Parameters:: path (str | Path)
Return type:: bool

datalad.utils.line_profile(func)[source]

Q&D helper to line profile the function and spit out stats

Parameters:: func (Callable[[ParamSpec(P)], TypeVar(T)])
Return type:: Callable[[ParamSpec(P)], TypeVar(T)]

datalad.utils.lmtime(filepath, mtime)[source]

Set mtime for files, while not de-referencing symlinks.

To overcome absence of os.lutime

Works only on linux and OSX ATM

Parameters:

filepath (str | Path)
mtime (int | float)

Return type:

None

datalad.utils.lock_if_required(lock_required, lock)[source]

Acquired and released the provided lock if indicated by a flag

Parameters:

lock_required (bool)
lock (allocate_lock)

Return type:

Iterator[allocate_lock]

datalad.utils.make_tempfile(content=None, wrapped=None, **tkwargs)[source]

Helper class to provide a temporary file name and remove it at the end (context manager)

Parameters:

mkdir (bool, optional (default: False)) – If True, temporary directory created using tempfile.mkdtemp()
content (str or bytes, optional) – Content to be stored in the file created
wrapped (function, optional) – If set, function name used to prefix temporary file name
**tkwargs – All other arguments are passed into the call to tempfile.mk{,d}temp(), and resultant temporary filename is passed as the first argument into the function t. If no ‘prefix’ argument is provided, it will be constructed using module and function names (‘.’ replaced with ‘_’).
set (To change the used directory without providing keyword argument 'dir')
DATALAD_TESTS_TEMP_DIR.

Return type:

Iterator[str]

Examples

>>> from os.path import exists
>>> from datalad.utils import make_tempfile
>>> with make_tempfile() as fname:
...    k = open(fname, 'w').write('silly test')
>>> assert not exists(fname)  # was removed

>>> with make_tempfile(content="blah") as fname:
...    assert open(fname).read() == "blah"

Parameters:: tkwargs (Any)

datalad.utils.map_items(func, v)[source]

A helper to apply func to all elements (keys and values) within dict

No type checking of values passed to func is done, so func should be resilient to values which it should not handle

Initial usecase - apply_recursive(url_fragment, ensure_unicode)

datalad.utils.md5sum(filename)[source]

Compute an MD5 sum for the given file

Parameters:: filename (str | Path)
Return type:: str

datalad.utils.never_fail(f)[source]

Assure that function never fails – all exceptions are caught

Returns None if function fails internally.

Parameters:: f (Callable[[ParamSpec(P)], TypeVar(T)])
Return type:: Callable[[ParamSpec(P)], Optional[TypeVar(T)]]

datalad.utils.not_supported_on_windows(msg=None)[source]

A little helper to be invoked to consistently fail whenever functionality is not supported (yet) on Windows

Parameters:: msg (Optional[str])
Return type:: None

datalad.utils.nothing_cm()[source]

Just a dummy cm to programmically switch context managers

Return type:: Iterator[None]

datalad.utils.obtain_write_permission(path)[source]

Obtains write permission for path and returns previous mode if a change was actually made.

Parameters:: path (Path) – path to try to obtain write permission for
Returns:: previous mode of path as return by stat().st_mode if a change in permission was actually necessary, None otherwise.
Return type:: int or None

datalad.utils.open_r_encdetect(fname, readahead=1000)[source]

Return a file object in read mode with auto-detected encoding

This is helpful when dealing with files of unknown encoding.

Parameters:

readahead (int, optional) – How many bytes to read for guessing the encoding type. If negative - full file will be read
fname (str | Path)

Return type:

IO[str]

datalad.utils.optional_args(decorator)[source]

allows a decorator to take optional positional and keyword arguments. Assumes that taking a single, callable, positional argument means that it is decorating a function, i.e. something like this:

@my_decorator
def function(): pass

Calls decorator with decorator(f, *args, **kwargs)

datalad.utils.partition(items, predicate=<class 'bool'>)[source]

Partition items by predicate.

Parameters:

items (iterable)
predicate (callable) – A function that will be mapped over each element in items. The elements will partitioned based on whether the return value is false or true.

Return type:

tuple[Iterator[TypeVar(T)], Iterator[TypeVar(T)]]

Returns:

A tuple with two generators, the first for ‘false’ items and the second for
’true’ ones.

Notes

Taken from Peter Otten’s snippet posted at https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html

datalad.utils.path_is_subpath(path, prefix)[source]

Return True if path is a subpath of prefix

It will return False if path == prefix.

Parameters:

path (str)
prefix (str)

Return type:

bool

datalad.utils.path_startswith(path, prefix)[source]

Return True if path starts with prefix path

Parameters:

path (str)
prefix (str)

Return type:

bool

datalad.utils.posix_relpath(path, start=None)[source]

Behave like os.path.relpath, but always return POSIX paths…

on any platform.

Parameters:

path (str | Path)
start (Union[str, Path, None])

Return type:

str

datalad.utils.quote_cmdlinearg(arg)[source]

Perform platform-appropriate argument quoting

Parameters:: arg (str)
Return type:: str

datalad.utils.read_csv_lines(fname, dialect=None, readahead=16384, **kwargs)[source]

A generator of dict records from a CSV/TSV

Automatically guesses the encoding for each record to convert to UTF-8

Parameters:

fname (str) – Filename
dialect (str, optional) – Dialect to specify to csv.reader. If not specified – guessed from the file, if fails to guess, “excel-tab” is assumed
readahead (int, optional) – How many bytes to read from the file to guess the type
**kwargs (Any) – Passed to csv.reader

Return type:

Iterator[dict[str, str]]

datalad.utils.read_file(fname, decode=True)[source]

A helper to read file passing content via ensure_unicode

Parameters:

decode (bool, optional) – if False, no ensure_unicode and file content returned as bytes
fname (str | Path)

Return type:

str | bytes

datalad.utils.rmdir(path, *args, **kwargs)[source]

os.rmdir with our optional checking for open files

Parameters:

path (str | Path)
args (Any)
kwargs (Any)

Return type:

None

datalad.utils.rmtemp(f, *args, **kwargs)[source]

Wrapper to centralize removing of temp files so we could keep them around

It will not remove the temporary file/directory if DATALAD_TESTS_TEMP_KEEP environment variable is defined

Parameters:

f (str | Path)
args (Any)
kwargs (Any)

Return type:

None

datalad.utils.rmtree(path, chmod_files='auto', children_only=False, *args, **kwargs)[source]

To remove git-annex .git it is needed to make all files and directories writable again first

Parameters:

path (Path or str) – Path to remove
chmod_files (string or bool, optional) – Whether to make files writable also before removal. Usually it is just a matter of directories to have write permissions. If ‘auto’ it would chmod files on windows by default
children_only (bool, optional) – If set, all files and subdirectories would be removed while the path itself (must be a directory) would be preserved
*args
**kwargs – Passed into shutil.rmtree call
args (Any)
kwargs (Any)

Return type:

None

datalad.utils.rotree(path, ro=True, chmod_files=True)[source]

To make tree read-only or writable

Parameters:

path (string) – Path to the tree/directory to chmod
ro (bool, optional) – Whether to make it R/O (default) or RW
chmod_files (bool, optional) – Whether to operate also on files (not just directories)

Return type:

None

datalad.utils.saved_generator(gen)[source]

Given a generator returns two generators, where 2nd one just replays

So the first one would be going through the generated items and 2nd one would be yielding saved items

Parameters:: gen (Iterable[TypeVar(T)])
Return type:: tuple[Iterator[TypeVar(T)], Iterator[TypeVar(T)]]

datalad.utils.shortened_repr(value, l=30)[source]

Parameters:

value (Any)
l (int)

Return type:

str

datalad.utils.slash_join(base, extension)[source]

Join two strings with a ‘/’, avoiding duplicate slashes

If any of the strings is None the other is returned as is.

Parameters:

base (Optional[str])
extension (Optional[str])

Return type:

Optional[str]

datalad.utils.split_cmdline(s)[source]

Perform platform-appropriate command line splitting.

Identical to shlex.split() on non-windows platforms.

Modified from https://stackoverflow.com/a/35900070

Parameters:: s (str)
Return type:: list[str]

datalad.utils.swallow_logs(new_level=None, file_=None, name='datalad')[source]

Context manager to consume all logs.

Parameters:

new_level (Union[str, int, None])
file_ (Union[str, Path, None])
name (str)

Return type:

Iterator[SwallowLogsAdapter]

datalad.utils.swallow_outputs()[source]

Context manager to help consuming both stdout and stderr, and print()

stdout is available as cm.out and stderr as cm.err whenever cm is the yielded context manager. Internally uses temporary files to guarantee absent side-effects of swallowing into StringIO which lacks .fileno.

print mocking is necessary for some uses where sys.stdout was already bound to original sys.stdout, thus mocking it later had no effect. Overriding print function had desired effect

Return type:: Iterator[SwallowOutputsAdapter]

datalad.utils.todo_interface_for_extensions(f)[source]

Parameters:: f (TypeVar(T))
Return type:: TypeVar(T)

datalad.utils.try_multiple(ntrials, exception, base, f, *args, **kwargs)[source]

Call f multiple times making exponentially growing delay between the calls

Parameters:

ntrials (int)
exception (type[BaseException])
base (float)
f (Callable[[ParamSpec(P)], TypeVar(T)])
args (ParamSpecArgs)
kwargs (ParamSpecKwargs)

Return type:

TypeVar(T)

datalad.utils.try_multiple_dec(f, ntrials=None, duration=0.1, exceptions=None, increment_type=None, exceptions_filter=None, logger=None)[source]

Decorator to try function multiple times.

Main purpose is to decorate functions dealing with removal of files/directories and which might need a few seconds to work correctly on Windows which takes its time to release files/directories.

Parameters:

ntrials (int, optional)
duration (float, optional) – Seconds to sleep before retrying.
increment_type ({None, 'exponential'}) – Note that if it is exponential, duration should typically be > 1.0 so it grows with higher power
exceptions (Exception or tuple of Exceptions, optional) – Exception or a tuple of multiple exceptions, on which to retry
exceptions_filter (callable, optional) – If provided, this function will be called with a caught exception instance. If function returns True - we will re-try, if False - exception will be re-raised without retrying.
logger (callable, optional) – Logger to log upon failure. If not provided, will use stock logger at the level of 5 (heavy debug).
f (Callable[[ParamSpec(P)], TypeVar(T)])

Return type:

Callable[[ParamSpec(P)], TypeVar(T)]

datalad.utils.unique(seq, key=None, reverse=False)[source]

Given a sequence return a list only with unique elements while maintaining order

This is the fastest solution. See https://www.peterbe.com/plog/uniqifiers-benchmark and http://stackoverflow.com/a/480227/1265472 for more information. Enhancement – added ability to compare for uniqueness using a key function

Parameters:

seq (Sequence[TypeVar(T)]) – Sequence to analyze
key (callable, optional) – Function to call on each element so we could decide not on a full element, but on its member etc
reverse (bool, optional) – If True, uniqueness checked in the reverse order, so that the later ones will take the order

Return type:

list[TypeVar(T)]

datalad.utils.unlink(f)[source]

‘Robust’ unlink. Would try multiple times

On windows boxes there is evidence for a latency of more than a second until a file is considered no longer “in-use”. WindowsError is not known on Linux, and if IOError or any other exception is thrown then if except statement has WindowsError in it – NameError also see gh-2533

Parameters:: f (str | Path)
Return type:: None

datalad.utils.updated(d, update)[source]

Return a copy of the input with the ‘update’

Primarily for updating dictionaries

Parameters:

d (dict[TypeVar(K), TypeVar(V)])
update (dict[TypeVar(K), TypeVar(V)])

Return type:

dict[TypeVar(K), TypeVar(V)]

datalad.utils.with_pathsep(path)[source]

Little helper to guarantee that path ends with /

Parameters:: path (str)
Return type:: str