datalad.customremotes.archives

Custom remote to get the load from archives present under annex

class datalad.customremotes.archives.ArchiveAnnexCustomRemote(annex, path=None, persistent_cache=True, **kwargs)[source]

Bases: AnnexCustomRemote

Special custom remote allowing to obtain files from archives

Archives must be under annex’ed themselves.

COST = 500
CUSTOM_REMOTE_NAME = 'archive'
SUPPORTED_SCHEMES = ('dl+archive',)
URL_PREFIX = 'dl+archive:'
URL_SCHEME = 'dl+archive'
property cache
checkpresent(key)[source]

Requests the remote to check if a key is present in it.

Parameters:

key (str) –

Returns:

True if the key is present in the remote. False if the key is not present.

Return type:

bool

Raises:

RemoteError – If the presence of the key couldn’t be determined, eg. in case of connection error.

checkurl(url)[source]

Asks the remote to check if the url’s content can currently be downloaded (without downloading it). The remote can optionally provide additional information about the file.

Parameters:

url (str) –

Returns:

True if the url’s content can currently be downloaded and no additional information can be provided. False if it can’t currently be downloaded.

In order to provide additional information, a list of dictionaries can be returned. The dictionaries can have 3 keys: {‘url’: str, ‘size’: int, ‘filename’: str}. All of them are optional.

If there is only one file to be downloaded, we could return: [{‘size’: 512, ‘filename’:’example_file.txt’}]

Other examples: {‘url’:”https://example.com”, ‘size’:512, ‘filename’:”example_file.txt”} [{‘url’:”Url1”, ‘size’:512, ‘filename’:”Filename1”}, {‘url’:”Url2”, ‘filename’:”Filename2”}]

Return type:

Union(bool, List(Dict))

claimurl(url)[source]

Asks the remote if it wishes to claim responsibility for downloading an url.

Parameters:

url (str) –

Returns:

True if it wants to claim this url. False if it doesn’t.

Return type:

bool

get_contentlocation(key, absolute=False, verify_exists=True)[source]

Return (relative to top or absolute) path to the file containing the key

This is a wrapper around AnnexRepo.get_contentlocation which provides caching of the result (we are asking the location for the same archive key often)

get_file_url(archive_file=None, archive_key=None, file=None, size=None)[source]

Given archive (file or a key) and a file – compose URL for access

Examples

dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d&size=123

when size of file within archive was known to be 123

dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d

when size of file within archive was not provided

Parameters:

size (int, optional) – Size of the file. If not provided, will simply be empty

remove(key)[source]

Requests the remote to remove a key’s contents.

Parameters:

key (str) –

Raises:

RemoteError – If the key couldn’t be deleted from the remote.

stop(*args)[source]

Stop communication with annex

transfer_retrieve(key, file)[source]

Get the file identified by key from the remote and store it in local_file.

While the transfer is running, the remote can repeatedly call annex.progress(size) to indicate the number of bytes already stored. This will influence the progress shown to the user.

Parameters:
  • key (str) – The Key to get from the remote.

  • local_file (str) – Path where to store the file. Note that in some cases, local_file may contain whitespace.

Raises:

RemoteError – If the file could not be received from the remote.

whereis(key)[source]

Asks the remote to provide additional information about ways to access the content of a key stored in it, such as eg, public urls. This will be displayed to the user by eg, git annex whereis. Note that users expect git annex whereis to run fast, without eg, network access.

Parameters:

key (str) –

Returns:

Information about the location of the key, eg. public urls.

Return type:

str

Just a little helper to hardlink files’s load

datalad.customremotes.archives.main()[source]

cmdline entry point