datalad.support.gitrepo

Internal low-level interface to Git repositories

class datalad.support.gitrepo.FetchInfo[source]

Bases: dict

dict that carries results of a fetch operation of a single head

Reduced variant of GitPython’s RemoteProgress class

Original copyright:

Copyright (C) 2008, 2009 Michael Trier and contributors

Original license:

BSD 3-Clause “New” or “Revised” License

ERROR = 128
FAST_FORWARD = 64
FORCED_UPDATE = 32
HEAD_UPTODATE = 4
NEW_HEAD = 2
NEW_TAG = 1
REJECTED = 16
TAG_UPDATE = 8
class datalad.support.gitrepo.GitAddOutput[source]

Bases: TypedDict

file: str
success: bool
class datalad.support.gitrepo.GitProgress(done_future=None, encoding=None)[source]

Bases: WitlessProtocol

Reduced variant of GitPython’s RemoteProgress class

Original copyright:

Copyright (C) 2008, 2009 Michael Trier and contributors

Original license:

BSD 3-Clause “New” or “Revised” License

Parameters:
  • done_future (Optional[Any]) –

  • encoding (Optional[str]) –

BEGIN = 1
CHECKING_OUT = 256
COMPRESSING = 8
COUNTING = 4
DONE_TOKEN = 'done.'
END = 2
ENUMERATING = 512
FINDING_SOURCES = 128
OP_MASK = -4
RECEIVING = 32
RESOLVING = 64
STAGE_MASK = 3
TOKEN_SEPARATOR = ', '
WRITING = 16
connection_made(transport)[source]
Parameters:

transport (Popen) –

Return type:

None

fd_infos: dict[int, tuple[str, Optional[bytearray]]]
pipe_data_received(fd, byts)[source]
Parameters:
  • fd (int) –

  • byts (bytes) –

Return type:

None

proc_err = True
process: Optional[Popen]
process_exited()[source]
Return type:

None

re_op_absolute = re.compile('(remote: )?([\\w\\s]+):\\s+()(\\d+)()(.*)')
re_op_relative = re.compile('(remote: )?([\\w\\s]+):\\s+(\\d+)% \\((\\d+)/(\\d+)\\)(.*)')
class datalad.support.gitrepo.GitRepo(path, runner=None, create=True, git_opts=None, repo=None, fake_dates=False, create_sanity_checks=True, **kwargs)[source]

Bases: GitRepo

Representation of a git repository

Parameters:
  • path (str) –

  • runner (Optional[Any]) –

  • create (bool) –

  • git_opts (Optional[dict[str, Any]]) –

  • repo (Optional[Any]) –

  • fake_dates (bool) –

  • create_sanity_checks (bool) –

  • kwargs (Any) –

GIT_MIN_VERSION = '2.19.1'
add(files, git=True, git_options=None, update=False)[source]

Adds file(s) to the repository.

Parameters:
  • files (list) – list of paths to add

  • git (bool) – somewhat ugly construction to be compatible with AnnexRepo.add(); has to be always true.

  • update (bool) –

    –update option for git-add. From git’s manpage:

    Update the index just where it already has an entry matching <pathspec>. This removes as well as modifies index entries to match the working tree, but adds no new files.

    If no <pathspec> is given when –update option is used, all tracked files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).

  • git_options (Optional[list[str]]) –

Returns:

Of status dicts.

Return type:

list

add_(files, git=True, git_options=None, update=False)[source]

Like add, but returns a generator

Parameters:
  • files (list[str]) –

  • git (bool) –

  • git_options (Optional[list[str]]) –

  • update (bool) –

Return type:

Iterator[GitAddOutput]

add_fake_dates(env)[source]
add_remote(name, url, options=None)[source]

Register remote pointing to a url

Parameters:
  • name (str) –

  • url (str) –

  • options (Optional[list[str]]) –

Return type:

tuple[str, str]

property bare: bool

Returns a bool indicating whether the repository is bare

Importantly, this is not reporting the configuration value of ‘core.bare’, in order to be usable at a stage where a Repo instance is not yet equipped with a ConfigManager. Instead, it is testing whether the repository path and its “dot_git” are identical. The value of ‘core.bare’ can be query from the ConfigManager in a fully initialized instance.

checkout(name, options=None)[source]
Parameters:
  • name (str) –

  • options (Optional[list[str]]) –

Return type:

None

cherry_pick(commit)[source]

Cherry pick commit to the current branch.

Parameters:

commit (str) – A single commit.

Return type:

None

classmethod clone(url, path, *args, clone_options=None, **kwargs)[source]

Clone url into path

Provides workarounds for known issues (e.g. https://github.com/datalad/datalad/issues/785)

Parameters:
  • url (str) –

  • path (str) –

  • clone_options (dict or list) – Arbitrary options that will be passed on to the underlying call to git-clone. This may be a list of plain options or key-value pairs that will be converted to a list of plain options with to_options.

  • expect_fail (bool) – Whether expect that command might fail, so error should be logged then at DEBUG level instead of ERROR

  • kwargs (Any) – Passed to the Repo class constructor.

  • args (Any) –

Return type:

Self

commit(msg=None, options=None, _datalad_msg=False, careless=True, files=None, date=None, index_file=None)[source]

Commit changes to git.

Parameters:
  • msg (str, optional) – commit-message

  • options (list of str, optional) – cmdline options for git-commit

  • _datalad_msg (bool, optional) – To signal that commit is automated commit by datalad, so it would carry the [DATALAD] prefix

  • careless (bool, optional) – if False, raise when there’s nothing actually committed; if True, don’t care

  • files (list of str, optional) – path(s) to commit

  • date (str, optional) – Date in one of the formats git understands

  • index_file (str, optional) – An alternative index to use

Return type:

None

commit_exists(commitish)[source]

Does commitish exist in the repo?

Parameters:

commitish (str) – A commit or an object that can be dereferenced to one.

Return type:

bool

property config
configure_fake_dates()[source]

Configure repository to use fake dates.

Return type:

None

property count_objects: dict[str, int]

return dictionary with count, size(in KiB) information of git objects

describe(commitish=None, **kwargs)[source]

Quick and dirty implementation to call git-describe

Parameters:
  • kwargs (Union[str, bool, None, List[Union[str, bool, None]], Tuple[Union[str, bool, None], ...]]) – transformed to cmdline options for git-describe; see __init__ for description of the transformation

  • commitish (Optional[str]) –

Return type:

Optional[str]

diff(fr, to, paths=None, untracked='all', eval_submodule_state='full')[source]

Like status(), but reports changes between to arbitrary revisions

Parameters:
  • fr (str or None) – Revision specification (anything that Git understands). Passing None considers anything in the target state as new.

  • to (str or None) – Revision specification (anything that Git understands), or None to compare to the state of the work tree.

  • paths (list or None) – If given, limits the query to the specified paths. To query all paths specify None, not an empty list.

  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported when to is None: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.

  • eval_submodule_state ({'full', 'commit', 'no'}) – If ‘full’ (the default), the state of a submodule is evaluated by considering all modifications, with the treatment of untracked files determined by untracked. If ‘commit’, the modification check is restricted to comparing the submodule’s HEAD commit to the one recorded in the superdataset. If ‘no’, the state of the subdataset is not evaluated.

Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

state

Can be ‘added’, ‘untracked’, ‘clean’, ‘deleted’, ‘modified’.

Return type:

dict

diffstatus(fr, to, paths=None, untracked='all', eval_submodule_state='full', _cache=None)[source]

Like diff(), but reports the status of ‘clean’ content too.

It supports an additional submodule evaluation state ‘global’. If given, it will return a single ‘modified’ (vs. ‘clean’) state label for the entire repository, as soon as it can.

Parameters:
  • fr (Optional[str]) –

  • to (Optional[str]) –

  • paths (Optional[Sequence[str | PathLike[str]]]) –

  • untracked (str) –

  • eval_submodule_state (str) –

  • _cache (Optional[dict]) –

Return type:

dict[Path, dict[str, str]] | str

property dirty: bool

Is the repository dirty?

Note: This provides a quick answer when you simply want to know if there are any untracked changes or modifications in this repository or its submodules. For finer-grained control and more detailed reporting, use status() instead.

property fake_dates_enabled: bool

Is the repository configured to use fake dates?

fetch(remote=None, refspec=None, all_=False, git_options=None, **kwargs)[source]

Fetches changes from a remote (or all remotes).

Parameters:
  • remote (str, optional) – name of the remote to fetch from. If no remote is given and all_ is not set, the tracking branch is fetched.

  • refspec (str or list, optional) – refspec(s) to fetch.

  • all (bool, optional) – fetch all remotes (and all of their branches). Fails if remote was given.

  • git_options (list, optional) – Additional command line options for git-fetch.

  • kwargs (Option) – Deprecated. GitPython-style keyword argument for git-fetch. Will be appended to any git_options.

  • all_ (bool) –

Return type:

list[FetchInfo]

fetch_(remote=None, refspec=None, all_=False, git_options=None)[source]

Like fetch, but returns a generator

Parameters:
  • remote (Optional[str]) –

  • refspec (str | list[str] | None) –

  • all_ (bool) –

  • git_options (Optional[list[str]]) –

Return type:

Iterator[FetchInfo]

format_commit(fmt, commitish=None)[source]

Return git show output for commitish.

Parameters:
  • fmt (str) – A format string accepted by git show.

  • commitish (str, optional) – Any commit identifier (defaults to “HEAD”).

Return type:

str or, if there are not commits yet, None.

gc(allow_background=False, auto=False)[source]

Perform house keeping (garbage collection, repacking)

Parameters:
  • allow_background (bool) –

  • auto (bool) –

Return type:

None

get_active_branch()[source]

Get the name of the active branch

Returns:

Returns None if there is no active branch, i.e. detached HEAD, and the branch name otherwise.

Return type:

str or None

get_branch_commits_(branch=None, limit=None, stop=None)[source]

Return commit hexshas for a branch

Parameters:
  • branch (str, optional) – If not provided, assumes current branch

  • limit (None | 'left-only', optional) – Limit which commits to report. If None – all commits (merged or not), if ‘left-only’ – only the commits from the left side of the tree upon merges

  • stop (str, optional) – hexsha of the commit at which stop reporting (matched one is not reported either)

Yields:

str

Return type:

Iterator[str]

get_branches()[source]

Get all branches of the repo.

Returns:

Names of all branches of this repository.

Return type:

[str]

get_commit_date(branch=None, date='authored')[source]

Get the date stamp of the last commit (in a branch or head otherwise)

Parameters:
  • date ({'authored', 'committed'}) – Which date to return. “authored” will be the date shown by “git show” and the one possibly specified via –date to git commit

  • branch (Optional[str]) –

Returns:

None if no commit

Return type:

int or None

get_content_info(paths=None, ref=None, untracked='all')[source]

Get identifier and type information from repository content.

This is simplified front-end for git ls-files/tree.

Both commands differ in their behavior when queried about subdataset paths. ls-files will not report anything, ls-tree will report on the subdataset record. This function uniformly follows the behavior of ls-tree (report on the respective subdataset mount).

Parameters:
  • paths (list(pathlib.PurePath) or None) – Specific paths, relative to the resolved repository root, to query info for. Paths must be normed to match the reporting done by Git, i.e. no parent dir components (ala “some/../this”). If None, info is reported for all content.

  • ref (gitref or None) – If given, content information is retrieved for this Git reference (via ls-tree), otherwise content information is produced for the present work tree (via ls-files). With a given reference, the reported content properties also contain a ‘bytesize’ record, stating the size of a file in bytes.

  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported when no ref was given: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.

Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

gitshasum

SHASUM of the item as tracked by Git, or None, if not tracked. This could be different from the SHASUM of the file in the worktree, if it was modified.

Return type:

dict

Raises:

ValueError – In case of an invalid Git reference (e.g. ‘HEAD’ in an empty repository)

get_corresponding_branch(branch=None)[source]

Always returns None, a plain GitRepo has no managed branches

Parameters:

branch (Optional[Any]) –

Return type:

Optional[str]

get_files(branch=None)[source]

Get a list of files in git.

Lists the files in the (remote) branch.

Parameters:

branch (str) – Name of the branch to query. Default: active branch.

Returns:

list of files.

Return type:

[str]

get_git_attributes()[source]

Query gitattributes which apply to top level directory

It is a thin compatibility/shortcut wrapper around more versatile get_gitattributes which operates on a list of paths and returns a dictionary per each path

Returns:

a dictionary with attribute name and value items relevant for the top (‘.’) directory of the repository, and thus most likely the default ones (if not overwritten with more rules) for all files within repo.

Return type:

dict

static get_git_dir(repo)[source]

figure out a repo’s gitdir

‘.git’ might be a directory, a symlink or a file

Note

This method is likely to get deprecated, please use GitRepo.dot_git instead! That one’s not static, but it’s cheaper and you should avoid not having an instance of a repo you’re working on anyway. Note, that the property in opposition to this method returns an absolute path.

Parameters:

repo (path or Repo instance) – currently expected to be the repos base dir

Returns:

relative path to the repo’s git dir; So, default would be “.git”

Return type:

str

get_gitattributes(path, index_only=False)[source]

Query gitattributes for one or more paths

Parameters:
  • path (path or list) – Path(s) to query. Paths may be relative or absolute.

  • index_only (bool) – Flag whether to consider only gitattribute setting that are reflected in the repository index, not just in the work tree content.

Returns:

Each key is a queried path (always relative to the repository root), each value is a dictionary with attribute name and value items. Attribute values are either True or False, for set and unset attributes, or are the literal attribute value.

Return type:

dict

get_hexsha(commitish=None, short=False)[source]

Return a hexsha for a given commitish.

Parameters:
  • commitish (str, optional) – Any identifier that refers to a commit (defaults to “HEAD”).

  • short (bool, optional) – Return the abbreviated form of the hexsha.

Return type:

str or, if no commitish was given and there are no commits yet, None.

Raises:

ValueError – If a commitish was given, but no corresponding commit could be determined.

get_indexed_files()[source]

Get a list of files in git’s index

Returns:

list of paths rooting in git’s base dir

Return type:

list

get_last_commit_hexsha(files)[source]

Return the hash of the last commit the modified any of the given paths

Parameters:

files (list[str]) –

Return type:

Optional[str]

get_merge_base(commitishes)[source]

Get a merge base hexsha

Parameters:

commitishes (str or list of str) – List of commitishes (branches, hexshas, etc) to determine the merge base of. If a single value provided, returns merge_base with the current branch.

Returns:

If no merge-base for given commits, or specified treeish doesn’t exist, None returned

Return type:

str or None

get_remote_branches()[source]

Get all branches of all remotes of the repo.

Returns:

Names of all remote branches.

Return type:

[str]

get_remote_url(name, push=False)[source]

Get the url of a remote.

Reads the configuration of remote name and returns its url or None, if there is no url configured.

Parameters:
  • name (str) – name of the remote

  • push (bool) – if True, get the pushurl instead of the fetch url.

Return type:

Optional[str]

get_remotes(with_urls_only=False)[source]

Get known remotes of the repository

Parameters:

with_urls_only (bool, optional) – return only remotes which have urls

Returns:

remotes – List of names of the remotes

Return type:

list of str

get_revisions(revrange=None, fmt='%H', options=None)[source]

Return list of revisions in revrange.

Parameters:
  • revrange (str or list of str or None, optional) – Revisions or revision ranges to walk. If None, revision defaults to HEAD unless a revision-modifying option like –all or –branches is included in options.

  • fmt (string, optional) – Format accepted by –format option of git log. This should not contain new lines because the output is split on new lines.

  • options (list of str, optional) – Options to pass to git log. This should not include –format.

Return type:

List of revisions (str), formatted according to fmt.

get_staged_paths()[source]

Returns a list of any stage repository path(s)

This is a rather fast call, as it will not depend on what is going on in the worktree.

Return type:

list[str]

get_submodules(sorted_=True, paths=None)[source]

Return list of submodules.

Parameters:
  • sorted (bool, optional) – Sort submodules by path name.

  • paths (list(pathlib.PurePath), optional) – Restrict submodules to those under paths.

  • sorted_ (bool) –

Return type:

list[dict]

Returns:

  • List of submodule namedtuples if compat is true or otherwise a list

  • of dictionaries as returned by get_submodules_.

get_submodules_(paths=None)[source]

Yield submodules in this repository.

Parameters:

paths (list(pathlib.PurePath), optional) – Restrict submodules to those under paths. Paths must be relative to the resolved repository root, and must be normed to match the reporting done by Git, i.e. no parent dir components (ala “some/../this”).

Return type:

Iterator[dict]

Returns:

  • A generator that yields a dictionary with information for each

  • submodule.

get_tags(output=None)[source]

Get list of tags

Parameters:

output (str, optional) – If given, limit the return value to a list of values matching that particular key of the tag properties.

Returns:

Each item is a dictionary with information on a tag. At present this includes ‘hexsha’, and ‘name’, where the latter is the string label of the tag, and the former the hexsha of the object the tag is attached to. The list is sorted by the creator date (committer date for lightweight tags and tagger date for annotated tags), with the most recent commit being the last element.

Return type:

list

classmethod get_toppath(path, follow_up=True, git_options=None)[source]

Return top-level of a repository given the path.

Parameters:
  • follow_up (bool) – If path has symlinks – they get resolved by git. If follow_up is True, we will follow original path up until we hit the same resolved path. If no such path found, resolved one would be returned.

  • git_options (list of str) – options to be passed to the git rev-parse call

  • repository. (Return None if no parent directory contains a git) –

  • path (str) –

Return type:

Optional[str]

get_tracking_branch(branch=None, remote_only=False)[source]

Get the tracking branch for branch if there is any.

Parameters:
  • branch (str) – local branch to look up. If none is given, active branch is used.

  • remote_only (bool) – Don’t return a value if the upstream remote is set to “.” (meaning this repository).

Returns:

(remote or None, refspec or None) of the tracking branch

Return type:

tuple

git_version = None
is_ancestor(reva, revb)[source]

Is reva an ancestor of revb?

Parameters:
  • reva (str) – Revisions.

  • revb (str) – Revisions.

Return type:

bool

is_valid_git()[source]

Returns whether the underlying repository appears to be still valid

Note, that this almost identical to the classmethod is_valid_repo(). However, if we are testing an existing instance, we can save Path object creations. Since this testing is done a lot, this is relevant. Creation of the Path objects in is_valid_repo() takes nearly half the time of the entire function.

Also note, that this method is bound to an instance but still class-dependent, meaning that a subclass cannot simply overwrite it. This is particularly important for the call from within __init__(), which in turn is called by the subclasses’ __init__. Using an overwrite would lead to the wrong thing being called.

Return type:

bool

classmethod is_valid_repo(path)[source]

Returns if a given path points to a git repository

Parameters:

path (str) –

Return type:

bool

is_with_annex()[source]

Report if GitRepo (assumed) has (remotes with) a git-annex branch

Return type:

bool

merge(name, options=None, msg=None, allow_unrelated=False, **kwargs)[source]
Parameters:
  • name (str) –

  • options (Optional[list[str]]) –

  • msg (Optional[str]) –

  • allow_unrelated (bool) –

  • kwargs (Any) –

Return type:

None

precommit()[source]

Perform pre-commit maintenance tasks

Return type:

None

push(remote=None, refspec=None, all_remotes=False, all_=False, git_options=None, **kwargs)[source]

Push changes to a remote (or all remotes).

If remote and refspec are specified, and remote has remote.{remote}.datalad-push-default-first configuration variable set (e.g. by create-sibling-github), we will first push the first refspec separately to possibly ensure that the first refspec is chosen by remote as the “default branch”. See https://github.com/datalad/datalad/issues/4997 Upon successful push if this variable was set in the local git config, we unset it, so subsequent pushes would proceed normally.

Parameters:
  • remote (str, optional) – name of the remote to push to. If no remote is given and all_ is not set, the tracking branch is pushed.

  • refspec (str or list, optional) – refspec(s) to push.

  • all (bool, optional) – push to all remotes. Fails if remote was given.

  • git_options (list, optional) – Additional command line options for git-push.

  • kwargs (Option) – Deprecated. GitPython-style keyword argument for git-push. Will be appended to any git_options.

  • all_remotes (bool) –

  • all_ (bool) –

Return type:

list[PushInfo]

push_(remote=None, refspec=None, all_=False, git_options=None)[source]

Like push, but returns a generator

Parameters:
  • remote (Optional[str]) –

  • refspec (str | list[str] | None) –

  • all_ (bool) –

  • git_options (Optional[list[str]]) –

Return type:

Iterator[PushInfo]

remove(files, recursive=False, **kwargs)[source]

Remove files.

Calls git-rm.

Parameters:
  • files (list of str) – list of paths to remove

  • recursive (False) – whether to allow recursive removal from subdirectories

  • kwargs (Union[str, bool, None, List[Union[str, bool, None]], Tuple[Union[str, bool, None], ...]]) – see __init__

Returns:

list of successfully removed files.

Return type:

[str]

remove_branch(branch)[source]
Parameters:

branch (str) –

Return type:

None

remove_remote(name)[source]

Remove existing remote

Parameters:

name (str) –

Return type:

None

save(message=None, paths=None, _status=None, **kwargs)[source]

Save dataset content.

Parameters:
  • message (str or None) – A message to accompany the changeset in the log. If None, a default message is used.

  • paths (list or None) – Any content with path matching any of the paths given in this list will be saved. Matching will be performed against the dataset status (GitRepo.status()), or a custom status provided via _status. If no paths are provided, ALL non-clean paths present in the repo status or _status will be saved.

  • _status (dict or None) – If None, Repo.status() will be queried for the given ds. If a dict is given, its content will be used as a constraint. For example, to save only modified content, but no untracked content, set paths to None and provide a _status that has no entries for untracked content.

  • **kwargs (Any) –

    Additional arguments that are passed to underlying Repo methods. Supported:

    • git : bool (passed to Repo.add()

    • eval_submodule_state : {‘full’, ‘commit’, ‘no’} passed to Repo.status()

    • untracked : {‘no’, ‘normal’, ‘all’} - passed to Repo.status()

    • amend : bool (passed to GitRepo.commit)

Return type:

list[dict]

save_(message=None, paths=None, _status=None, **kwargs)[source]

Like save() but working as a generator.

Parameters:
  • message (Optional[str]) –

  • paths (Optional[list[Path]]) –

  • _status (Optional[dict[Path, dict[str, str]]]) –

  • kwargs (Any) –

Return type:

Iterator[dict]

set_gitattributes(attrs, attrfile='.gitattributes', mode='a')[source]

Set gitattributes

By default appends additional lines to attrfile. Note, that later lines in attrfile overrule earlier ones, which may or may not be what you want. Set mode to ‘w’ to replace the entire file by what you provided in attrs.

Parameters:
  • attrs (list) – Each item is a 2-tuple, where the first element is a path pattern, and the second element is a dictionary with attribute key/value pairs. The attribute dictionary must use the same semantics as those returned by get_gitattributes(). Path patterns can use absolute paths, in which case they will be normalized relative to the directory that contains the target .gitattributes file (see attrfile).

  • attrfile (path) – Path relative to the repository root of the .gitattributes file the attributes shall be set in.

  • mode (str) – ‘a’ to append .gitattributes, ‘w’ to replace it

Return type:

None

set_remote_url(name, url, push=False)[source]

Set the URL a remote is pointing to

Sets the URL of the remote name. Requires the remote to already exist.

Parameters:
  • name (str) – name of the remote

  • url (str) –

  • push (bool) – if True, set the push URL, otherwise the fetch URL

Return type:

None

status(paths=None, untracked='all', eval_submodule_state='full')[source]

Simplified git status equivalent.

Parameters:
  • paths (list or None) – If given, limits the query to the specified paths. To query all paths specify None, not an empty list. If a query path points into a subdataset, a report is made on the subdataset record within the queried dataset only (no recursion).

  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.

  • eval_submodule_state ({'full', 'commit', 'no'}) – If ‘full’ (the default), the state of a submodule is evaluated by considering all modifications, with the treatment of untracked files determined by untracked. If ‘commit’, the modification check is restricted to comparing the submodule’s HEAD commit to the one recorded in the superdataset. If ‘no’, the state of the subdataset is not evaluated.

Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

state

Can be ‘added’, ‘untracked’, ‘clean’, ‘deleted’, ‘modified’.

Return type:

dict

tag(tag, message=None, commit=None, options=None)[source]

Tag a commit

Parameters:
  • tag (str) – Custom tag label. Must be a valid tag name.

  • message (str, optional) – If provided, adds [‘-m’, <message>] to the list of git tag arguments.

  • commit (str, optional) – If provided, will be appended as last argument to the git tag call, and can be used to identify the commit that shall be tagged, if not HEAD.

  • options (list, optional) – Additional command options, inserted prior a potential commit argument.

Return type:

None

property untracked_files: list[str]

Legacy interface, do not use! Use the status() method instead.

Despite its name, it also reports on untracked datasets, and yields their names with trailing path separators.

update_ref(ref, value, oldvalue=None, symbolic=False)[source]

Update the object name stored in a ref “safely”.

Just a shim for git update-ref call if not symbolic, and git symbolic-ref if symbolic

Parameters:
  • ref (str) – Reference, such as ref/heads/BRANCHNAME or HEAD.

  • value (str) – Value to update to, e.g. hexsha of a commit when updating for a branch ref, or branch ref if updating HEAD

  • oldvalue (str) – Value to update from. Safeguard to be verified by git. This is only valid if symbolic is not True.

  • symbolic (None) – To instruct if ref is symbolic, e.g. should be used in case of ref=HEAD

Return type:

None

update_remote(name=None, verbose=False)[source]
Parameters:
  • name (Optional[str]) –

  • verbose (bool) –

Return type:

None

class datalad.support.gitrepo.PushInfo[source]

Bases: dict

dict that carries results of a push operation of a single head

Reduced variant of GitPython’s RemoteProgress class

Original copyright:

Copyright (C) 2008, 2009 Michael Trier and contributors

Original license:

BSD 3-Clause “New” or “Revised” License

DELETED = 64
ERROR = 1024
FAST_FORWARD = 256
FORCED_UPDATE = 128
NEW_HEAD = 2
NEW_TAG = 1
NO_MATCH = 4
REJECTED = 8
REMOTE_FAILURE = 32
REMOTE_REJECTED = 16
UP_TO_DATE = 512
class datalad.support.gitrepo.StdOutCaptureWithGitProgress(done_future=None, encoding=None)[source]

Bases: GitProgress

Parameters:
  • done_future (Optional[Any]) –

  • encoding (Optional[str]) –

fd_infos: dict[int, tuple[str, Optional[bytearray]]]
proc_out = True
process: Optional[Popen]
datalad.support.gitrepo.normalize_path(func)[source]

Decorator to provide unified path conversion for a single file

Unlike normalize_paths, intended to be used for functions dealing with a single filename at a time

Note

This is intended to be used within the repository classes and therefore returns a class method!

The decorated function is expected to take a path at first positional argument (after ‘self’). Additionally the class func is a member of, is expected to have an attribute ‘path’.

Parameters:

func (Callable[[_WithPath, str, ParamSpec(P)], TypeVar(T)]) –

Return type:

Callable[[_WithPath, str, ParamSpec(P)], TypeVar(T)]

datalad.support.gitrepo.normalize_paths(func, match_return_type=True, map_filenames_back=False, serialize=False)[source]

Decorator to provide unified path conversions.

Note

This is intended to be used within the repository classes and therefore returns a class method!

The decorated function is expected to take a path or a list of paths at first positional argument (after ‘self’). Additionally the class func is a member of, is expected to have an attribute ‘path’.

Accepts either a list of paths or a single path in a str. Passes a list to decorated function either way, but would return based on the value of match_return_type and possibly input argument.

If a call to the wrapped function includes normalize_path and it is False no normalization happens for that function call (used for calls to wrapped functions within wrapped functions, while possible CWD is within a repository)

Parameters:
  • match_return_type (bool, optional) – If True, and a single string was passed in, it would return the first element of the output (after verifying that it is a list of length 1). It makes easier to work with single files input.

  • map_filenames_back (bool, optional) – If True and returned value is a dictionary, it assumes to carry entries one per file, and then filenames are mapped back to as provided from the normalized (from the root of the repo) paths

  • serialize (bool, optional) – Loop through files giving only a single one to the function one at a time. This allows to simplify implementation and interface to annex commands which do not take multiple args in the same call (e.g. checkpresentkey)

datalad.support.gitrepo.to_options(split_single_char_options=True, **kwargs)[source]

Transform keyword arguments into a list of cmdline options

Imported from GitPython.

Original copyright:

Copyright (C) 2008, 2009 Michael Trier and contributors

Original license:

BSD 3-Clause “New” or “Revised” License

Parameters:
  • split_single_char_options (bool) –

  • kwargs (Union[str, bool, None, List[Union[str, bool, None]], Tuple[Union[str, bool, None], ...]]) –

Return type:

list