Interface to git-annex by Joey Hess.

For further information on git-annex see


Bases: datalad.cmd.WitlessProtocol

pipe_data_received(fd, byts)[source]

Called when the subprocess writes data into stdout/stderr pipe.

fd is int file descriptor. data is bytes object.

proc_err = True
proc_out = True

Bases: datalad.cmd.WitlessProtocol

Subprocess communication protocol for annex … –json commands

Importantly, parsed JSON content is returned as a result, not string output.

This protocol also handles git-annex’s JSON-style progress reporting.


Called when a connection is made.

The argument is the transport representing the pipe connection. To receive data, wait for data_received() calls. When the connection is closed, connection_lost() is called.

pipe_data_received(fd, data)[source]

Called when the subprocess writes data into stdout/stderr pipe.

fd is int file descriptor. data is bytes object.

proc_err = True
proc_out = True

Called when subprocess has exited.

total_nbytes = None
class, url=None, runner=None, backend=None, always_commit=True, create=True, create_sanity_checks=True, init=False, batch_size=None, version=None, description=None, git_opts=None, annex_opts=None, annex_init_opts=None, repo=None, fake_dates=False)[source]


Representation of an git-annex repository.

Paths given to any of the class methods will be interpreted as relative to PWD, in case this is currently beneath AnnexRepo’s base dir (self.path). If PWD is outside of the repository, relative paths will be interpreted as relative to self.path. Absolute paths will be accepted either way.

GIT_ANNEX_MIN_VERSION = '7.20190503'
WEB_UUID = '00000000-0000-0000-0000-000000000001'
add(files, git=None, backend=None, options=None, jobs=None, git_options=None, annex_options=None, update=False)[source]

Add file(s) to the repository.

  • files (list of str) – list of paths to add to the annex
  • git (bool) – if True, add to git instead of annex.
  • backend
  • options
  • update (bool) –
    –update option for git-add. From git’s manpage:
    Update the index just where it already has an entry matching <pathspec>. This removes as well as modifies index entries to match the working tree, but adds no new files.

    If no <pathspec> is given when –update option is used, all tracked files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).

    Note: Used only, if a call to git-add instead of git-annex-add is performed


Return type:

list of dict or dict

add_(files, git=None, backend=None, options=None, jobs=None, git_options=None, annex_options=None, update=False)[source]

Like add, but returns a generator

add_remote(name, url, options=None)[source]

Overrides method from GitRepo in order to set remote.<name>.annex-ssh-options in case of a SSH remote.

add_url_to_file(file_, url, options=None, backend=None, batch=False, git_options=None, annex_options=None, unlink_existing=False)[source]

Add file from url to the annex.

Downloads file from url and add it to the annex. If annex knows file already, records that it can be downloaded from url.

Note: Consider using the higher-level download_url instead.

  • file (str) –
  • url (str) –
  • options (list) – options to the annex command
  • batch (bool, optional) – initiate or continue with a batched run of annex addurl, instead of just calling a single git annex addurl command
  • unlink_existing (bool, optional) – by default crashes if file already exists and is under git. With this flag set to True would first remove it.

In batch mode only ATM returns dict representation of json output returned by annex

Return type:


add_urls(urls, options=None, backend=None, cwd=None, jobs=None, git_options=None, annex_options=None)[source]

Downloads each url to its own file, which is added to the annex.

  • urls (list of str) –
  • options (list, optional) – options to the annex command
  • cwd (string, optional) – working directory from within which to invoke git-annex

enter an adjusted branch

This command is only available in a v6+ git-annex repository.

Parameters:options (list of str) – currently requires ‘–unlock’ or ‘–fix’; default: –unlock
annexstatus(paths=None, untracked='all')[source]
classmethod check_direct_mode_support()[source]

Does git-annex version support direct mode?

The result is cached at cls.supports_direct_mode.

Return type:bool
classmethod check_repository_versions()[source]

Get information on supported and upgradable repository versions.

The result is cached at cls.repository_versions.

Returns:supported -> list of supported versions (int) upgradable -> list of upgradable versions (int)
Return type:dict
copy_to(files, remote, options=None, jobs=None)[source]

Copy the actual content of files to remote

  • files (str or list of str) – path(s) to copy
  • remote (str) – name of remote to copy files to

files successfully copied

Return type:

list of str

drop(files, options=None, key=False, jobs=None)[source]

Drops the content of annexed files from this repository.

Drops only if possible with respect to required minimal number of available copies.

  • files (list of str) – paths to drop
  • options (list of str, optional) – commandline options for the git annex drop command
  • jobs (int, optional) – how many jobs to run in parallel (passed to git-annex call)

‘success’ item in each object indicates failure/success per file path.

Return type:

list(JSON objects)

drop_key(keys, options=None, batch=False)[source]

Drops the content of annexed files from this repository referenced by keys

Dangerous: it drops without checking for required minimal number of available copies.

  • keys (list of str, str) –
  • batch (bool, optional) – initiate or continue with a batched run of annex dropkey, instead of just calling a single git annex dropkey command
enable_remote(name, options=None, env=None)[source]

Enables use of an existing special remote

  • name (str) – name, the special remote was created with
  • options (list, optional) –
file_has_content(files, allow_quick=False, batch=False)[source]

Check whether files have their content present under annex.

  • files (list of str) – file(s) to check for being actually present.
  • allow_quick (bool, optional) – This is no longer supported.

For each input file states whether file has content locally

Return type:

list of bool

find(files, batch=False)[source]

Run git annex find on file(s).

  • files (list of str) – files to find under annex
  • batch (bool, optional) – initiate or continue with a batched run of annex find, instead of just calling a single git annex find command. If any items in files are directories, this value is treated as False.

  • A dictionary the maps each item in files to its git annex find
  • result. Items without a successful result will be an empty string, and
  • multi-item results (which can occur for if files includes a
  • directory) will be returned as a list.

fsck(paths=None, remote=None, fast=False, annex_options=None, git_options=None)[source]

Front-end for git-annex fsck

  • paths (list) – Limit operation to specific paths.
  • remote (str) – If given, the identified remote will be fsck’ed instead of the local repository.
  • fast (bool) – If True, typically means that no actual content is being verified, but tests are limited to the presence of files.
get(files, remote=None, options=None, jobs=None, key=False)[source]

Get the actual content of files

  • files (list of str) – paths to get
  • remote (str, optional) – from which remote to fetch content
  • options (list of str, optional) – commandline options for the git annex get command
  • jobs (int or None, optional) – how many jobs to run in parallel (passed to git-annex call). If not specified (None), then
  • key (bool, optional) – If provided file value is actually a key


Return type:

list of dict

get_annexed_files(with_content_only=False, patterns=None)[source]

Get a list of files in annex

  • with_content_only (bool, optional) – Only list files whose content is present.
  • patterns (list, optional) – Globs to pass to annex’s –include=. Files that match any of these will be returned (i.e., they’ll be separated by –or).

Return type:

A list of POSIX file names

get_content_annexinfo(paths=None, init='git', ref=None, eval_availability=False, key_prefix='', **kwargs)[source]
  • paths (list) – Specific paths to query info for. In none are given, info is reported for all content.
  • init ('git' or dict-like or None) – If set to ‘git’ annex content info will amend the output of GitRepo.get_content_info(), otherwise the dict-like object supplied will receive this information and the present keys will limit the report of annex properties. Alternatively, if None is given, no initialization is done, and no limit is in effect.
  • ref (gitref or None) – If not None, annex content info for this Git reference will be produced, otherwise for the content of the present worktree.
  • eval_availability (bool) – If this flag is given, evaluate whether the content of any annex’ed file is present in the local annex.
  • **kwargs – Additional arguments for GitRepo.get_content_info(), if init is set to ‘git’.

Each content item has an entry under its relative path within the repository. Each value is a dictionary with properties:


Can be ‘file’, ‘symlink’, ‘dataset’, ‘directory’


SHASUM is last commit affecting the item, or None, if not tracked.


Annex key of a file (if an annex’ed file)


Size of an annexed file in bytes.


Bool whether a content object for this key exists in the local annex (with eval_availability)


pathlib.Path of the content object in the local annex, if one is available (with eval_availability)

Return type:


get_contentlocation(key, batch=False)[source]

Get location of the key content

Normally under .git/annex objects in indirect mode and within file tree in direct mode.

Unfortunately there is no (easy) way to discriminate situations when given key is simply incorrect (not known to annex) or its content not currently present – in both cases annex just silently exits with -1

  • key (str) – key
  • batch (bool, optional) – initiate or continue with a batched run of annex contentlocation

path relative to the top directory of the repository. If no content is present, empty string is returned

Return type:



Get the name of a potential corresponding branch.

Parameters:branch (str, optional) – Name of the branch to report a corresponding branch for; defaults to active branch
Returns:Name of the corresponding branch, or None if there is no corresponding branch.
Return type:str or None

Get annex repository description

Parameters:uuid (str, optional) – For which remote (based on uuid) to report description for
Returns:None returned if not found
Return type:str or None

Get the backend currently used for file(s).

Parameters:files (list of str) –
Returns:For each file in input list indicates the used backend by a str like “SHA256E” or “MD5”.
Return type:list of str
get_file_key(files, batch=None)[source]

Get key of an annexed file.

  • files (str or list) – file(s) to look up
  • batch (None or bool, optional) – If True, lookupkey –batch process will be used, which would not crash even if provided file is not under annex (but directly under git), but rather just return an empty string. If False, invokes without –batch. If None, use batch mode if more than a single file is provided.

keys used by git-annex for each of the files; in case of a list an empty string is returned if there was no key for that file

Return type:

str or list

  • FileInGitError – If running in non-batch mode and a file is under git, not annex
  • FileNotInAnnexError – If running in non-batch mode and a file is not under git at all

Get groupwanted expression for a group name

Parameters:name (str) – Name of the groupwanted group
classmethod get_key_backend(key)[source]

Get the backend from a given key

get_metadata(files, timestamps=False, batch=False)[source]

Query git-annex file metadata

  • files (str or iterable(str)) – One or more paths for which metadata is to be queried. If one or more paths could be directories, batch=False must be given to prevent git-annex given an error. Due to technical limitations, such error will lead to a hanging process.
  • timestamps (bool, optional) – If True, the output contains a ‘<metadatakey>-lastchanged’ key for every metadata item, reflecting the modification time, as well as a ‘lastchanged’ key with the most recent modification time of any metadata item.
  • batch (bool, optional) – If True, a metadata –batch process will be used, and only confirmed annex’ed files can be queried (else query will hang indefinitely). If False, invokes without –batch, and gives all files as arguments (this can be problematic with a large number of files).

One tuple per file (could be more items than input arguments when directories are given). First tuple item is the filename, second item is a dictionary with metadata key/value pairs. Note that annex metadata tags are stored under the key ‘tag’, which is a regular metadata item that can be manipulated like any other.

Return type:


get_preferred_content(property, remote=None)[source]

Get preferred content configuration of a repository or remote

  • property ({'wanted', 'required', 'group'}) – Type of property to query
  • remote (str, optional) – If not specified (None), returns the property for the local repository.

Whether the setting is returned, or an empty string if there is none.

Return type:


  • ValueError – If an unknown property label is given.
  • CommandError – If the annex call errors.
get_remotes(with_urls_only=False, exclude_special_remotes=False)[source]

Get known (special-) remotes of the repository

  • exclude_special_remotes (bool, optional) – if True, don’t return annex special remotes
  • with_urls_only (bool, optional) – return only remotes which have urls

remotes – List of names of the remotes

Return type:

list of str

static get_size_from_key(key)[source]

A little helper to obtain size encoded in a key

Returns:size of the file or None if either no size is encoded in the key or key was None itself
Return type:int or None
Raises:ValueError – if key is considered invalid (at least its size-related part)

Get info about all known (not just enabled) special remotes.

Returns:Keys are special remote UUIDs. Each value is a dictionary with configuration information git-annex has for the remote. This should include the ‘type’ and ‘name’ as well as any initremote parameters that git-annex stores.

Note: This is a faithful translation of git-annex:remote.log with one exception. For a special remote initialized with the –sameas flag, git-annex stores the special remote name under the “sameas-name” key, we copy this value under the “name” key so that callers don’t have to check two places for the name. If you need to detect whether you’re working with a sameas remote, the presence of either “sameas-name” or “sameas-uuid” is a reliable indicator.

Return type:dict
classmethod get_toppath(path, follow_up=True, git_options=None)[source]

Return top-level of a repository given the path.

  • follow_up (bool) – If path has symlinks – they get resolved by git. If follow_up is True, we will follow original path up until we hit the same resolved path. If no such path found, resolved one would be returned.
  • git_options (list of str) – options to be passed to the git rev-parse call
  • None if no parent directory contains a git repository. (Return) –
get_tracking_branch(branch=None, remote_only=False, corresponding=True)[source]

Get the tracking branch for branch if there is any.

By default returns the tracking branch of the corresponding branch if branch is a managed branch.

  • branch (str) – local branch to look up. If none is given, active branch is used.
  • remote_only (bool) – Don’t return a value if the upstream remote is set to “.” (meaning this repository).
  • corresponding (bool) – If True actually look up the corresponding branch of branch (also if branch isn’t explicitly given)

(remote or None, refspec or None) of the tracking branch

Return type:


get_urls(file_, key=False, batch=False)[source]

Get URLs for a file/key

  • file (str) –
  • key (bool, optional) – Whether provided files are actually annex keys

Return type:

A list of URLs

git_annex_version = None
info(files, batch=False, fast=False)[source]

Provide annex info for file(s).

Parameters:files (list of str) – files to look for
Returns:Info for each file
Return type:dict
init_remote(name, options)[source]

Creates a new special remote

Parameters:name (str) – name of the special remote
is_available(file_, remote=None, key=False, batch=False)[source]

Check if file or key is available (from a remote)

In case if key or remote is misspecified, it wouldn’t fail but just keep returning False, although possibly also complaining out loud ;)

  • file (str) – Filename or a key
  • remote (str, optional) – Remote which to check. If None, possibly multiple remotes are checked before positive result is reported
  • key (bool, optional) – Whether provided files are actually annex keys
  • batch (bool, optional) – Initiate or continue with a batched run of annex checkpresentkey

with True indicating that file/key is available from (the) remote

Return type:



Return True if git-annex considers current filesystem ‘crippled’.

Return type:True if on crippled filesystem, False otherwise

Return True if annex is in direct mode

Return type:True if in direct mode, False otherwise.

quick check whether this appears to be an annex-init’ed repo


Whether branch is managed by git-annex.

ATM this returns true in direct mode (branch ‘annex/direct/my_branch’) and if on an adjusted branch (annex v6+ repository: either ‘adjusted/my_branch(unlocked)’ or ‘adjusted/my_branch(fixed)’

Note: The term ‘managed branch’ is used to make clear it’s meant to be more general than the v6+ ‘adjusted branch’.

Parameters:branch (str) – name of the branch; default: active branch
Returns:True if on a managed branch, False otherwise
Return type:bool

Return True if remote is explicitly ignored

is_special_annex_remote(remote, check_if_known=True)[source]

Return whether remote is a special annex remote

Decides based on the presence of an annex- option and lack of a configured URL for the remote.

is_under_annex(files, allow_quick=False, batch=False)[source]

Check whether files are under annex control

  • files (list of str) – file(s) to check for being under annex
  • allow_quick (bool, optional) – This is no longer supported.

For each input file states whether file is under annex

Return type:

list of bool

is_valid_annex(allow_noninitialized=False, check_git=True)[source]

Returns whether the underlying repository appears to be still valid

Note, that this almost identical to the classmethod is_valid_repo(). However, if we are testing an existing instance, we can save Path object creations. Since this testing is done a lot, this is relevant. Creation of the Path objects in is_valid_repo() takes nearly half the time of the entire function.

Also note, that this method is bound to an instance but still class-dependent, meaning that a subclass cannot simply overwrite it. This is particularly important for the call from within __init__(), which in turn is called by the subclasses’ __init__. Using an overwrite would lead to the wrong thing being called.

classmethod is_valid_repo(path, allow_noninitialized=False)[source]

Return True if given path points to an annex repository

localsync(remote=None, managed_only=False)[source]

Consolidate the local git-annex branch and/or managed branches.

This method calls git annex sync to perform purely local operations that:

  1. Update the corresponding branch of any managed branch.
  2. Synchronize the local ‘git-annex’ branch with respect to particular or all remotes (as currently reflected in the local state of their remote ‘git-annex’ branches).

If a repository has git-annex’s ‘synced/…’ branches these will be updated. Otherwise, such branches that are created by git annex sync are removed again after the sync is complete.

  • remote (str or list, optional) – If given, specifies the name of one or more remotes to sync against. If not given, all remotes are considered.
  • managed_only (bool, optional) – Only perform a sync if a managed branch with a corresponding branch is detected. By default, a sync is always performed.
migrate_backend(files, backend=None)[source]

Changes the backend used for file.

The backend used for the key-value of files. Only files currently present are migrated. Note: There will be no notification if migrating fails due to the absence of a file’s content!

  • files (list) – files to migrate.
  • backend (str) – specify the backend to migrate to. If none is given, the default backend of this instance will be used.

Perform pre-commit maintenance tasks, such as closing all batched annexes since they might still need to flush their changes into index

remove(files, force=False, **kwargs)[source]

Remove files from git/annex

  • files
  • force (bool, optional) –
repo_info(fast=False, merge_annex_branches=True)[source]

Provide annex info for the entire repository.

  • fast (bool, optional) – Pass –fast to git annex info.
  • merge_annex_branches (bool, optional) – Whether to allow git-annex if needed to merge annex branches, e.g. to make sure up to date descriptions for git annex remotes

Info for the repository, with keys matching the ones returned by annex

Return type:


repository_versions = None
rm_url(file_, url)[source]

Record that the file is no longer available at the url.

  • file (str) –
  • url (str) –
set_default_backend(backend, persistent=True, commit=True)[source]

Set default backend

  • backend (str) –
  • persistent (bool, optional) – If persistent, would add/commit to .gitattributes. If not – would set within .git/config
set_groupwanted(name, expr)[source]

Set expr for the name groupwanted

set_metadata(files, reset=None, add=None, init=None, remove=None, purge=None, recursive=False)[source]

Manipulate git-annex file-metadata

  • files (str or list(str)) – One or more paths for which metadata is to be manipulated. The changes applied to each file item are uniform. However, the result may not be uniform across files, depending on the actual operation.
  • reset (dict, optional) – Metadata items matching keys in the given dict are (re)set to the respective values.
  • add (dict, optional) – The values of matching keys in the given dict appended to any possibly existing values. The metadata keys need not necessarily exist before.
  • init (dict, optional) – Metadata items for the keys in the given dict are set to the respective values, if the key is not yet present in a file’s metadata.
  • remove (dict, optional) – Values in the given dict are removed from the metadata items matching the respective key, if they exist in a file’s metadata. Non-existing values, or keys do not lead to failure.
  • purge (list, optional) – Any metadata item with a key matching an entry in the given list is removed from the metadata.
  • recursive (bool, optional) – If False, fail (with CommandError) when directory paths are given as files.

JSON obj per modified file

Return type:


set_metadata_(files, reset=None, add=None, init=None, remove=None, purge=None, recursive=False)[source]

Like set_metadata() but returns a generator

set_preferred_content(property, expr, remote=None)[source]

Set preferred content configuration of a repository or remote

  • property ({'wanted', 'required', 'group'}) – Type of property to query
  • expr (str) – Any expression or label supported by git-annex for the given property.
  • remote (str, optional) – If not specified (None), sets the property for the local repository.

Raw git-annex output in response to the set command.

Return type:


  • ValueError – If an unknown property label is given.
  • CommandError – If the annex call errors.

Announce to annex that remote is “dead”

set_remote_url(name, url, push=False)[source]

Set the URL a remote is pointing to

Sets the URL of the remote name. Requires the remote to already exist.

  • name (str) – name of the remote
  • url (str) –
  • push (bool) – if True, set the push URL, otherwise the fetch URL; if True, additionally set annexurl to url, to make sure annex uses it to talk to the remote, since access via fetch URL might be restricted.
supports_direct_mode = None

Return True if repository version supports unlocked pointers.

sync(remotes=None, push=True, pull=True, commit=True, content=False, all=False, fast=False)[source]

Synchronize local repository with remotes

Use this command when you want to synchronize the local repository with one or more of its remotes. You can specify the remotes (or remote groups) to sync with by name; the default if none are specified is to sync with all remotes.

  • remotes (str, list(str), optional) – Name of one or more remotes to be sync’ed.
  • push (bool) – By default, git pushes to remotes.
  • pull (bool) – By default, git pulls from remotes
  • commit (bool) – A commit is done by default. Disable to avoid committing local changes.
  • content (bool) – Normally, syncing does not transfer the contents of annexed files. This option causes the content of files in the work tree to also be uploaded and downloaded as necessary.
  • all (bool) – This option, when combined with content, makes all available versions of all files be synced, when preferred content settings allow
  • fast (bool) – Only sync with the remotes with the lowest annex-cost value configured
unannex(files, options=None)[source]

undo accidental add command

Use this to undo an accidental git annex add command. Note that for safety, the content of the file remains in the annex, until you use git annex unused and git annex dropunused.

  • files (list of str) –
  • options (list of str) –

successfully unannexed files

Return type:

list of str


unlock files for modification

Note: This method is silent about errors in unlocking a file (e.g, the file has not content). Use the higher-level interface.unlock to get more informative reporting.

Parameters:files (list of str) –
Returns:successfully unlocked files
Return type:list of str

Annex UUID

Returns:Returns a the annex UUID, if there is any, or None otherwise.
Return type:str
whereis(files, output='uuids', key=False, options=None, batch=False)[source]

Lists repositories that have actual content of file(s).

  • files (list of str) – files to look for
  • output ({'descriptions', 'uuids', 'full'}, optional) – If ‘descriptions’, a list of remotes descriptions returned is per each file. If ‘full’, for each file a dictionary of all fields is returned as returned by annex
  • key (bool, optional) – Whether provided files are actually annex keys
  • options (list, optional) – Options to pass into git-annex call

if output == ‘descriptions’, contains a list of descriptions of remotes for each input file, describing the remote for each remote, which was found by git-annex whereis, like:

u'me@mycomputer:~/where/my/repo/is [origin]' or
u'web' or

if output == ‘uuids’, returns a list of uuids. if output == ‘full’, returns a dictionary with filenames as keys and values a detailed record, e.g.:

{'00000000-0000-0000-0000-000000000001': {
  'description': 'web',
  'here': False,
  'urls': ['', '']

Return type:

list of list of unicode or dict

class, git_options=None, annex_options=None, path=None, json=False, output_proc=None)[source]

Bases: datalad.cmd.BatchedCommand

Container for an annex process which would allow for persistent communication

class, git_options=None)[source]

Bases: datalad.cmd.SafeDelCloseMixin, dict

Class to contain the registry of active batch’ed instances of annex for a repository


Override just to make sure we don’t rely on __del__ to close all the pipes


Close communication to all the batched annexes

It does not remove them from the dictionary though

get(codename, annex_cmd=None, **kwargs)[source]

Return the value for key if key is in the dictionary, else default.


Bases: object

‘Filter’ for annex –json output to react to progress indicators

Instance of this beast should be passed into log_stdout option for git-annex commands runner

start()[source][source], maxlines=100)[source]

Read stdout until line ends with ok or failed