datalad.customremotes.archives
Custom remote to get the load from archives present under annex
- class datalad.customremotes.archives.ArchiveAnnexCustomRemote(annex, path=None, persistent_cache=True, **kwargs)[source]
Bases:
AnnexCustomRemote
Special custom remote allowing to obtain files from archives
Archives must be under annex’ed themselves.
- COST = 500
- CUSTOM_REMOTE_NAME = 'archive'
- SUPPORTED_SCHEMES = ('dl+archive',)
- URL_PREFIX = 'dl+archive:'
- URL_SCHEME = 'dl+archive'
- property cache
- checkpresent(key)[source]
Requests the remote to check if a key is present in it.
- Parameters:
key (str)
- Returns:
True if the key is present in the remote. False if the key is not present.
- Return type:
bool
- Raises:
RemoteError – If the presence of the key couldn’t be determined, eg. in case of connection error.
- checkurl(url)[source]
Asks the remote to check if the url’s content can currently be downloaded (without downloading it). The remote can optionally provide additional information about the file.
- Parameters:
url (str)
- Returns:
True if the url’s content can currently be downloaded and no additional information can be provided. False if it can’t currently be downloaded.
In order to provide additional information, a list of dictionaries can be returned. The dictionaries can have 3 keys: {‘url’: str, ‘size’: int, ‘filename’: str}. All of them are optional.
If there is only one file to be downloaded, we could return: [{‘size’: 512, ‘filename’:’example_file.txt’}]
Other examples: {‘url’:”https://example.com”, ‘size’:512, ‘filename’:”example_file.txt”} [{‘url’:”Url1”, ‘size’:512, ‘filename’:”Filename1”}, {‘url’:”Url2”, ‘filename’:”Filename2”}]
- Return type:
Union(bool, List(Dict))
- claimurl(url)[source]
Asks the remote if it wishes to claim responsibility for downloading an url.
- Parameters:
url (str)
- Returns:
True if it wants to claim this url. False if it doesn’t.
- Return type:
bool
- get_contentlocation(key, absolute=False, verify_exists=True)[source]
Return (relative to top or absolute) path to the file containing the key
This is a wrapper around AnnexRepo.get_contentlocation which provides caching of the result (we are asking the location for the same archive key often)
- get_file_url(archive_file=None, archive_key=None, file=None, size=None)[source]
Given archive (file or a key) and a file – compose URL for access
Examples
- dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d&size=123
when size of file within archive was known to be 123
- dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d
when size of file within archive was not provided
- Parameters:
size (int, optional) – Size of the file. If not provided, will simply be empty
- remove(key)[source]
Requests the remote to remove a key’s contents.
- Parameters:
key (str)
- Raises:
RemoteError – If the key couldn’t be deleted from the remote.
- transfer_retrieve(key, file)[source]
Get the file identified by key from the remote and store it in local_file.
While the transfer is running, the remote can repeatedly call annex.progress(size) to indicate the number of bytes already stored. This will influence the progress shown to the user.
- Parameters:
key (str) – The Key to get from the remote.
local_file (str) – Path where to store the file. Note that in some cases, local_file may contain whitespace.
- Raises:
RemoteError – If the file could not be received from the remote.
- whereis(key)[source]
Asks the remote to provide additional information about ways to access the content of a key stored in it, such as eg, public urls. This will be displayed to the user by eg, git annex whereis. Note that users expect git annex whereis to run fast, without eg, network access.
- Parameters:
key (str)
- Returns:
Information about the location of the key, eg. public urls.
- Return type:
str