datalad.support.gitrepo

Interface to Git via GitPython

For further information on GitPython see http://gitpython.readthedocs.org/

class datalad.support.gitrepo.GitPythonProgressBar(action)[source]

Bases: git.util.RemoteProgress

A handler for Git commands interfaced by GitPython which report progress

close()[source]
update(op_code, cur_count, max_count=None, message='')[source]

Called whenever the progress changes

Parameters:
  • op_code

    Integer allowing to be compared against Operation IDs and stage IDs.

    Stage IDs are BEGIN and END. BEGIN will only be set once for each Operation ID as well as END. It may be that BEGIN and END are set at once in case only one progress message was emitted due to the speed of the operation. Between BEGIN and END, none of these flags will be set

    Operation IDs are all held within the OP_MASK. Only one Operation ID will be active per call.

  • cur_count – Current absolute count of items
  • max_count – The maximum count of items we expect. It may be None in case there is no maximum number of items or if it is (yet) unknown.
  • message – In case of the ‘WRITING’ operation, it contains the amount of bytes transferred. It may possibly be used for other purposes as well.

You may read the contents of the current line in self._cur_line

class datalad.support.gitrepo.GitRepo(path, url=None, runner=None, create=True, git_opts=None, repo=None, fake_dates=False, create_sanity_checks=True, **kwargs)[source]

Bases: datalad.support.repo.RepoInterface

Representation of a git repository

GIT_SSH_ENV = {'GIT_SSH_COMMAND': 'datalad sshrun', 'GIT_SSH_VARIANT': 'ssh'}
add(files, git=True, git_options=None, update=False)[source]

Adds file(s) to the repository.

Parameters:
  • files (list) – list of paths to add
  • git (bool) – somewhat ugly construction to be compatible with AnnexRepo.add(); has to be always true.
  • update (bool) –
    –update option for git-add. From git’s manpage:
    Update the index just where it already has an entry matching <pathspec>. This removes as well as modifies index entries to match the working tree, but adds no new files.

    If no <pathspec> is given when –update option is used, all tracked files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).

Returns:

Of status dicts.

Return type:

list

add_(files, git=True, git_options=None, update=False)[source]

Like add, but returns a generator

add_fake_dates(env)[source]

Add fake dates to env.

Parameters:env (dict or None) – Environment variables.
Returns:
  • A dict (copied from env), with date-related environment
  • variables for git and git-annex set.
add_remote(name, url, options=None)[source]

Register remote pointing to a url

add_submodule(path, name=None, url=None, branch=None)[source]

Add a new submodule to the repository.

This will alter the index as well as the .gitmodules file, but will not create a new commit. If the submodule already exists, no matter if the configuration differs from the one provided, the existing submodule is considered as already added and no further action is performed.

Parameters:
  • path (str) – repository-relative path at which the submodule should be located, and which will be created as required during the repository initialization.
  • name (str or None) – name/identifier for the submodule. If None, the path will be used as name.
  • url (str or None) – git-clone compatible URL. If None, the repository is assumed to exist, and the url of the first remote is taken instead. This is useful if you want to make an existing repository a submodule of another one.
  • branch (str or None) – name of branch to be checked out in the submodule. The given branch must exist in the remote repository, and will be checked out locally as a tracking branch. If None, remote HEAD will be checked out.
call_git(args, files=None, expect_stderr=False, expect_fail=False)[source]

Call git and return standard output.

Parameters:
  • args (list of str) – Arguments to pass to git.
  • files (list of str, optional) – File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.
  • expect_stderr (bool, optional) – Standard error is expected and should not be elevated above the DEBUG level.
  • expect_fail (bool, optional) – A non-zero exit is expected and should not be elevated above the DEBUG level.
Returns:

Return type:

standard output (str)

Raises:

CommandError if the call exits with a non-zero status.

call_git_items_(args, files=None, expect_stderr=False, sep=None)[source]

Call git, splitting output on sep.

Parameters:
  • args (list of str) – Arguments to pass to git.
  • files (list of str, optional) – File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.
  • expect_stderr (bool, optional) – Standard error is expected and should not be elevated above the DEBUG level.
  • sep (str, optional) – Split the output by str.split(sep) rather than str.splitlines.
Returns:

Return type:

Generator that yields output items.

Raises:

CommandError if the call exits with a non-zero status.

call_git_oneline(args, files=None, expect_stderr=False)[source]

Call git for a single line of output.

Parameters:
  • args (list of str) – Arguments to pass to git.
  • files (list of str, optional) – File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.
  • expect_stderr (bool, optional) – Standard error is expected and should not be elevated above the DEBUG level.
  • sep (str, optional) – Split the output by str.split(sep) rather than str.splitlines.
Raises:
  • CommandError if the call exits with a non-zero status.
  • AssertionError if there is more than one line of output.
call_git_success(args, files=None, expect_stderr=False)[source]

Call git and return true if the call exit code of 0.

Parameters:
  • args (list of str) – Arguments to pass to git.
  • files (list of str, optional) – File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.
  • expect_stderr (bool, optional) – Standard error is expected and should not be elevated above the DEBUG level.
Returns:

Return type:

bool

checkout(name, options=None)[source]
cherry_pick(commit)[source]

Cherry pick commit to the current branch.

Parameters:commit (str) – A single commit.
classmethod clone(url, path, *args, **kwargs)[source]

Clone url into path

Provides workarounds for known issues (e.g. https://github.com/datalad/datalad/issues/785)

Parameters:
  • url (str) –
  • path (str) –
  • expect_fail (bool) – Whether expect that command might fail, so error should be logged then at DEBUG level instead of ERROR
commit(msg=None, options=None, _datalad_msg=False, careless=True, files=None, date=None, index_file=None)[source]

Commit changes to git.

Parameters:
  • msg (str, optional) – commit-message
  • options (list of str, optional) – cmdline options for git-commit
  • _datalad_msg (bool, optional) – To signal that commit is automated commit by datalad, so it would carry the [DATALAD] prefix
  • careless (bool, optional) – if False, raise when there’s nothing actually committed; if True, don’t care
  • files (list of str, optional) – path(s) to commit
  • date (str, optional) – Date in one of the formats git understands
  • index_file (str, optional) – An alternative index to use
commit_exists(commitish)[source]

Does commitish exist in the repo?

Parameters:commitish (str) – A commit or an object that can be dereferenced to one.
Returns:
Return type:bool
config

Get an instance of the parser for the persistent repository configuration.

Note: This allows to also read/write .datalad/config, not just .git/config

Returns:
Return type:ConfigManager
configure_fake_dates()[source]

Configure repository to use fake dates.

count_objects

return dictionary with count, size(in KiB) information of git objects

deinit_submodule(path, **kwargs)[source]

Deinit a submodule

Parameters:
  • path (str) – path to the submodule; relative to self.path
  • kwargs – see __init__
describe(commitish=None, **kwargs)[source]

Quick and dirty implementation to call git-describe

Parameters:kwargs – transformed to cmdline options for git-describe; see __init__ for description of the transformation
diff(fr, to, paths=None, untracked='all', eval_submodule_state='full')[source]

Like status(), but reports changes between to arbitrary revisions

Parameters:
  • fr (str or None) – Revision specification (anything that Git understands). Passing None considers anything in the target state as new.
  • to (str or None) – Revision specification (anything that Git understands), or None to compare to the state of the work tree.
  • paths (list or None) – If given, limits the query to the specified paths. To query all paths specify None, not an empty list.
  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported when to is None: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.
  • eval_submodule_state ({'full', 'commit', 'no'}) – If ‘full’ (the default), the state of a submodule is evaluated by considering all modifications, with the treatment of untracked files determined by untracked. If ‘commit’, the modification check is restricted to comparing the submodule’s HEAD commit to the one recorded in the superdataset. If ‘no’, the state of the subdataset is not evaluated.
Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

state

Can be ‘added’, ‘untracked’, ‘clean’, ‘deleted’, ‘modified’.

Return type:

dict

diffstatus(fr, to, paths=None, untracked='all', eval_submodule_state='full', eval_file_type=True, _cache=None)[source]

Like diff(), but reports the status of ‘clean’ content too

dirty

Is the repository dirty?

Note: This provides a quick answer when you simply want to know if there are any untracked changes or modifications in this repository or its submodules. For finer-grained control and more detailed reporting, use status() instead.

fake_dates_enabled

Is the repository configured to use fake dates?

fetch(remote=None, refspec=None, all_=False, **kwargs)[source]

Fetches changes from a remote (or all remotes).

Parameters:
  • remote (str, optional) – name of the remote to fetch from. If no remote is given and all_ is not set, the tracking branch is fetched.
  • refspec (str, optional) – refspec to fetch.
  • all (bool, optional) – fetch all remotes (and all of their branches). Fails if remote was given.
  • kwargs – passed to gitpython. TODO: Figure it out, make consistent use of it and document it.
Returns:

FetchInfo objects of the items fetched from remote

Return type:

list

for_each_ref_(fields=('objectname', 'objecttype', 'refname'), pattern=None, points_at=None, sort=None, count=None, contains=None)[source]

Wrapper for git for-each-ref

Please see manual page git-for-each-ref(1) for a complete overview of its functionality. Only a subset of it is supported by this wrapper.

Parameters:
  • fields (iterable or str) – Used to compose a NULL-delimited specification for for-each-ref’s –format option. The default field list reflects the standard behavior of for-each-ref when the –format option is not given.
  • pattern (list or str, optional) – If provided, report only refs that match at least one of the given patterns.
  • points_at (str, optional) – Only list refs which points at the given object.
  • sort (list or str, optional) – Field name(s) to sort-by. If multiple fields are given, the last one becomes the primary key. Prefix any field name with ‘-‘ to sort in descending order.
  • count (int, optional) – Stop iteration after the given number of matches.
  • contains (str, optional) – Only list refs which contain the specified commit.
Yields:

dict with items matching the given fields

Raises:
  • ValueError – if no fields are given
  • RuntimeError – if git for-each-ref returns a record where the number of properties does not match the number of fields
format_commit(fmt, commitish=None)[source]

Return git show output for commitish.

Parameters:
  • fmt (str) – A format string accepted by git show.
  • commitish (str, optional) – Any commit identifier (defaults to “HEAD”).
Returns:

Return type:

str or, if there are not commits yet, None.

gc(allow_background=False, auto=False)[source]

Perform house keeping (garbage collection, repacking)

get_active_branch()[source]

Get the name of the active branch

Returns:Returns None if there is no active branch, i.e. detached HEAD, and the branch name otherwise.
Return type:str or None
get_branch_commits(branch=None, limit=None, stop=None, value=None)[source]

Return GitPython’s commits for the branch

Pretty much similar to what ‘git log <branch>’ does. It is a generator which returns top commits first

Parameters:
  • branch (str, optional) – If not provided, assumes current branch
  • limit (None | 'left-only', optional) – Limit which commits to report. If None – all commits (merged or not), if ‘left-only’ – only the commits from the left side of the tree upon merges
  • stop (str, optional) – hexsha of the commit at which stop reporting (matched one is not reported either)
  • value (None | 'hexsha', optional) – What to yield. If None - entire commit object is yielded, if ‘hexsha’ only its hexsha
get_branches()[source]

Get all branches of the repo.

Returns:Names of all branches of this repository.
Return type:[str]
get_changed_files(staged=False, diff_filter='', index_file=None, files=None)[source]

Return files that have changed between the index and working tree.

Parameters:
  • staged (bool, optional) – Consider changes between HEAD and the index instead of changes between the index and the working tree.
  • diff_filter (str, optional) – Any value accepted by the –diff-filter option of git diff. Common ones include “A”, “D”, “M” for add, deleted, and modified files, respectively.
  • index_file (str, optional) – Alternative index file for git to use
get_commit_date(branch=None, date='authored')[source]

Get the date stamp of the last commit (in a branch or head otherwise)

Parameters:date ({'authored', 'committed'}) – Which date to return. “authored” will be the date shown by “git show” and the one possibly specified via –date to git commit
Returns:None if no commit
Return type:int or None
get_content_info(paths=None, ref=None, untracked='all', eval_file_type=True)[source]

Get identifier and type information from repository content.

This is simplified front-end for git ls-files/tree.

Both commands differ in their behavior when queried about subdataset paths. ls-files will not report anything, ls-tree will report on the subdataset record. This function uniformly follows the behavior of ls-tree (report on the respective subdataset mount).

Parameters:
  • paths (list(pathlib.PurePath)) – Specific paths, relative to the resolved repository root, to query info for. Paths must be normed to match the reporting done by Git, i.e. no parent dir components (ala “some/../this”). If none are given, info is reported for all content.
  • ref (gitref or None) – If given, content information is retrieved for this Git reference (via ls-tree), otherwise content information is produced for the present work tree (via ls-files). With a given reference, the reported content properties also contain a ‘bytesize’ record, stating the size of a file in bytes.
  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported when no ref was given: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.
  • eval_file_type (bool) – If True, inspect file type of untracked files, and report annex symlink pointers as type ‘file’. This convenience comes with a cost; disable to get faster performance if this information is not needed.
Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

Note that the reported type will not always match the type of content committed to Git, rather it will reflect the nature of the content minus platform/mode-specifics. For example, a symlink to a locked annexed file on Unix will have a type ‘file’, reported, while a symlink to a file in Git or directory will be of type ‘symlink’.

gitshasum

SHASUM of the item as tracked by Git, or None, if not tracked. This could be different from the SHASUM of the file in the worktree, if it was modified.

Return type:

dict

Raises:

ValueError – In case of an invalid Git reference (e.g. ‘HEAD’ in an empty repository)

get_deleted_files()[source]

Return a list of paths with deleted files (staged deletion)

get_files(branch=None)[source]

Get a list of files in git.

Lists the files in the (remote) branch.

Parameters:branch (str) – Name of the branch to query. Default: active branch.
Returns:list of files.
Return type:[str]
get_git_attributes()[source]

Query gitattributes which apply to top level directory

It is a thin compatibility/shortcut wrapper around more versatile get_gitattributes which operates on a list of paths and returns a dictionary per each path

Returns:a dictionary with attribute name and value items relevant for the top (‘.’) directory of the repository, and thus most likely the default ones (if not overwritten with more rules) for all files within repo.
Return type:dict
static get_git_dir(repo)[source]

figure out a repo’s gitdir

‘.git’ might be a directory, a symlink or a file

Note

Please try using GitRepo.dot_git instead! That one’s not static, but it’s cheaper and you should avoid not having an instance of a repo you’re working on anyway. Note, that the property in opposition to this method returns an absolute path.

Parameters:repo (path or Repo instance) – currently expected to be the repos base dir
Returns:relative path to the repo’s git dir; So, default would be “.git”
Return type:str
get_gitattributes(path, index_only=False)[source]

Query gitattributes for one or more paths

Parameters:
  • path (path or list) – Path(s) to query. Paths may be relative or absolute.
  • index_only (bool) – Flag whether to consider only gitattribute setting that are reflected in the repository index, not just in the work tree content.
Returns:

Each key is a queried path (always relative to the repostiory root), each value is a dictionary with attribute name and value items. Attribute values are either True or False, for set and unset attributes, or are the literal attribute value.

Return type:

dict

get_hexsha(commitish=None, short=False)[source]

Return a hexsha for a given commitish.

Parameters:
  • commitish (str, optional) – Any identifier that refers to a commit (defaults to “HEAD”).
  • short (bool, optional) – Return the abbreviated form of the hexsha.
Returns:

Return type:

str or, if there are not commits yet, None.

get_indexed_files()[source]

Get a list of files in git’s index

Returns:list of paths rooting in git’s base dir
Return type:list
get_last_commit_hexsha(files)[source]

Return the hash of the last commit the modified any of the given paths

get_merge_base(commitishes)[source]

Get a merge base hexsha

Parameters:commitishes (str or list of str) – List of commitishes (branches, hexshas, etc) to determine the merge base of. If a single value provided, returns merge_base with the current branch.
Returns:If no merge-base for given commits, or specified treeish doesn’t exist, None returned
Return type:str or None
get_missing_files()[source]

Return a list of paths with missing files (and no staged deletion)

get_remote_branches()[source]

Get all branches of all remotes of the repo.

Returns:Names of all remote branches.
Return type:[str]
get_remote_url(name, push=False)[source]

Get the url of a remote.

Reads the configuration of remote name and returns its url or None, if there is no url configured.

Parameters:
  • name (str) – name of the remote
  • push (bool) – if True, get the pushurl instead of the fetch url.
get_remotes(with_urls_only=False)[source]

Get known remotes of the repository

Parameters:with_urls_only (bool, optional) – return only remotes which have urls
Returns:remotes – List of names of the remotes
Return type:list of str
get_revisions(revrange=None, fmt='%H', options=None)[source]

Return list of revisions in revrange.

Parameters:
  • revrange (str or list of str or None, optional) – Revisions or revision ranges to walk. If None, revision defaults to HEAD unless a revision-modifying option like –all or –branches is included in options.
  • fmt (string, optional) – Format accepted by –format option of git log. This should not contain new lines because the output is split on new lines.
  • options (list of str, optional) – Options to pass to git log. This should not include –format.
Returns:

Return type:

List of revisions (str), formatted according to fmt.

get_staged_paths()[source]

Returns a list of any stage repository path(s)

This is a rather fast call, as it will not depend on what is going on in the worktree.

get_submodules(sorted_=True, paths=None, compat=True)[source]

Return list of submodules.

Parameters:
  • sorted (bool, optional) – Sort submodules by path name.
  • paths (list(pathlib.PurePath), optional) – Restrict submodules to those under paths.
  • compat (bool, optional) – If true, return a namedtuple that incompletely mimics the attributes of GitPython’s Submodule object in hope of backwards compatibility with previous callers. Note that this form should be considered temporary and callers should be updated; this flag will be removed in a future release.
Returns:

  • List of submodule namedtuples if compat is true or otherwise a list
  • of dictionaries as returned by get_submodules_.

get_submodules_(paths=None)[source]

Yield submodules in this repository.

Parameters:paths (list(pathlib.PurePath), optional) – Restrict submodules to those under paths.
Returns:
  • A generator that yields a dictionary with information for each
  • submodule.
get_tags(output=None)[source]

Get list of tags

Parameters:output (str, optional) – If given, limit the return value to a list of values matching that particular key of the tag properties.
Returns:Each item is a dictionary with information on a tag. At present this includes ‘hexsha’, and ‘name’, where the latter is the string label of the tag, and the former the hexsha of the object the tag is attached to. The list is sorted by the creator date (committer date for lightweight tags and tagger date for annotated tags), with the most recent commit being the last element.
Return type:list
classmethod get_toppath(path, follow_up=True, git_options=None)[source]

Return top-level of a repository given the path.

Parameters:
  • follow_up (bool) – If path has symlinks – they get resolved by git. If follow_up is True, we will follow original path up until we hit the same resolved path. If no such path found, resolved one would be returned.
  • git_options (list of str) – options to be passed to the git rev-parse call
  • None if no parent directory contains a git repository. (Return) –
get_tracking_branch(branch=None)[source]

Get the tracking branch for branch if there is any.

Parameters:branch (str) – local branch to look up. If none is given, active branch is used.
Returns:(remote or None, refspec or None) of the tracking branch
Return type:tuple
is_ancestor(reva, revb)[source]

Is reva an ancestor of revb?

Parameters:revb (reva,) – Revisions.
Returns:
Return type:bool
is_valid_git()[source]

Returns whether the underlying repository appears to be still valid

Note, that this almost identical to the classmethod is_valid_repo(). However, if we are testing an existing instance, we can save Path object creations. Since this testing is done a lot, this is relevant. Creation of the Path objects in is_valid_repo() takes nearly half the time of the entire function.

Also note, that this method is bound to an instance but still class-dependent, meaning that a subclass cannot simply overwrite it. This is particularly important for the call from within __init__(), which in turn is called by the subclasses’ __init__. Using an overwrite would lead to the wrong thing being called.

classmethod is_valid_repo(path)[source]

Returns if a given path points to a git repository

is_with_annex()[source]

Report if GitRepo (assumed) has (remotes with) a git-annex branch

merge(name, options=None, msg=None, allow_unrelated=False, **kwargs)[source]
precommit()[source]

Perform pre-commit maintenance tasks

pull(remote=None, refspec=None, **kwargs)[source]

See fetch

push(remote=None, refspec=None, all_remotes=False, **kwargs)[source]

Push to remote repository

Parameters:
  • remote (str) – name of the remote to push to
  • refspec (str) – specify what to push
  • all_remotes (bool) – if set to True push to all remotes. Conflicts with remote not being None.
  • kwargs (dict) – options to pass to git push
Returns:

PushInfo objects of the items pushed to remote

Return type:

list

remove(files, recursive=False, **kwargs)[source]

Remove files.

Calls git-rm.

Parameters:
  • files (str) – list of paths to remove
  • recursive (False) – whether to allow recursive removal from subdirectories
  • kwargs – see __init__
Returns:

list of successfully removed files.

Return type:

[str]

remove_branch(branch)[source]
remove_remote(name)[source]

Remove existing remote

repo
save(message=None, paths=None, _status=None, **kwargs)[source]

Save dataset content.

Parameters:
  • message (str or None) – A message to accompany the changeset in the log. If None, a default message is used.
  • paths (list or None) – Any content with path matching any of the paths given in this list will be saved. Matching will be performed against the dataset status (GitRepo.status()), or a custom status provided via _status. If no paths are provided, ALL non-clean paths present in the repo status or _status will be saved.
  • _status (dict or None) – If None, Repo.status() will be queried for the given ds. If a dict is given, its content will be used as a constraint. For example, to save only modified content, but no untracked content, set paths to None and provide a _status that has no entries for untracked content.
  • **kwargs

    Additional arguments that are passed to underlying Repo methods. Supported:

    • git : bool (passed to Repo.add()
    • eval_submodule_state : {‘full’, ‘commit’, ‘no’} passed to Repo.status()
    • untracked : {‘no’, ‘normal’, ‘all’} - passed to Repo.satus()
save_(message=None, paths=None, _status=None, **kwargs)[source]

Like save() but working as a generator.

set_gitattributes(attrs, attrfile='.gitattributes', mode='a')[source]

Set gitattributes

By default appends additional lines to attrfile. Note, that later lines in attrfile overrule earlier ones, which may or may not be what you want. Set mode to ‘w’ to replace the entire file by what you provided in attrs.

Parameters:
  • attrs (list) – Each item is a 2-tuple, where the first element is a path pattern, and the second element is a dictionary with attribute key/value pairs. The attribute dictionary must use the same semantics as those returned by get_gitattributes(). Path patterns can use absolute paths, in which case they will be normalized relative to the directory that contains the target .gitattributes file (see attrfile).
  • attrfile (path) – Path relative to the repository root of the .gitattributes file the attributes shall be set in.
  • mode (str) – ‘a’ to append .gitattributes, ‘w’ to replace it
set_remote_url(name, url, push=False)[source]

Set the URL a remote is pointing to

Sets the URL of the remote name. Requires the remote to already exist.

Parameters:
  • name (str) – name of the remote
  • url (str) –
  • push (bool) – if True, set the push URL, otherwise the fetch URL
status(paths=None, untracked='all', eval_submodule_state='full')[source]

Simplified git status equivalent.

Parameters:
  • paths (list or None) – If given, limits the query to the specified paths. To query all paths specify None, not an empty list. If a query path points into a subdataset, a report is made on the subdataset record within the queried dataset only (no recursion).
  • untracked ({'no', 'normal', 'all'}) – If and how untracked content is reported: ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories.
  • eval_submodule_state ({'full', 'commit', 'no'}) – If ‘full’ (the default), the state of a submodule is evaluated by considering all modifications, with the treatment of untracked files determined by untracked. If ‘commit’, the modification check is restricted to comparing the submodule’s HEAD commit to the one recorded in the superdataset. If ‘no’, the state of the subdataset is not evaluated.
Returns:

Each content item has an entry under a pathlib Path object instance pointing to its absolute path inside the repository (this path is guaranteed to be underneath Repo.path). Each value is a dictionary with properties:

type

Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’

state

Can be ‘added’, ‘untracked’, ‘clean’, ‘deleted’, ‘modified’.

Return type:

dict

tag(tag, message=None, commit=None, options=None)[source]

Tag a commit

Parameters:
  • tag (str) – Custom tag label. Must be a valid tag name.
  • message (str, optional) – If provided, adds [‘-m’, <message>] to the list of git tag arguments.
  • commit (str, optional) – If provided, will be appended as last argument to the git tag call, and can be used to identify the commit that shall be tagged, if not HEAD.
  • options (list, optional) – Additional command options, inserted prior a potential commit argument.
untracked_files

Legacy interface, do not use! Use the status() method instead.

Despite its name, it also reports on untracked datasets, and yields their names with trailing path separators.

update_ref(ref, value, symbolic=False)[source]

Update the object name stored in a ref “safely”.

Just a shim for git update-ref call if not symbolic, and git symbolic-ref if symbolic

Parameters:
  • ref (str) – Reference, such as ref/heads/BRANCHNAME or HEAD.
  • value (str) – Value to update to, e.g. hexsha of a commit when updating for a branch ref, or branch ref if updating HEAD
  • symbolic (None) – To instruct if ref is symbolic, e.g. should be used in case of ref=HEAD
update_remote(name=None, verbose=False)[source]
update_submodule(path, mode='checkout', init=False)[source]

Update a registered submodule.

This will make the submodule match what the superproject expects by cloning missing submodules and updating the working tree of the submodules. The “updating” can be done in several ways depending on the value of submodule.<name>.update configuration variable, or the mode argument.

Parameters:
  • path (str) – Identifies which submodule to operate on by it’s repository-relative path.
  • mode ({checkout, rebase, merge}) – Update procedure to perform. ‘checkout’: the commit recorded in the superproject will be checked out in the submodule on a detached HEAD; ‘rebase’: the current branch of the submodule will be rebased onto the commit recorded in the superproject; ‘merge’: the commit recorded in the superproject will be merged into the current branch in the submodule.
  • init (bool) – If True, initialize all submodules for which “git submodule init” has not been called so far before updating. Primarily provided for internal purposes and should not be used directly since would result in not so annex-friendly .git symlinks/references instead of full featured .git/ directories in the submodules
datalad.support.gitrepo.Repo(*args, **kwargs)[source]

Factory method around gitpy.Repo to consistently initiate with different backend

class datalad.support.gitrepo.Submodule(name, path, url)

Bases: tuple

name

Alias for field number 0

path

Alias for field number 1

url

Alias for field number 2

datalad.support.gitrepo.guard_BadName(func)[source]

A helper to guard against BadName exception

Workaround for https://github.com/gitpython-developers/GitPython/issues/768 also see https://github.com/datalad/datalad/issues/2550 Let’s try to precommit (to flush anything flushable) and do it again

datalad.support.gitrepo.to_options(**kwargs)[source]

Transform keyword arguments into a list of cmdline options

Parameters:
  • split_single_char_options (bool) –
  • kwargs
Returns:

Return type:

list