datalad.customremotes.archives
Custom remote to get the load from archives present under annex
- class datalad.customremotes.archives.ArchiveAnnexCustomRemote(annex, path=None, persistent_cache=True, **kwargs)[source]
Bases:
datalad.customremotes.base.AnnexCustomRemote
Special custom remote allowing to obtain files from archives
Archives must be under annex’ed themselves.
- AVAILABILITY = 'local'
- COST = 500
- CUSTOM_REMOTE_NAME = 'archive'
- SUPPORTED_SCHEMES = ('dl+archive',)
- URL_PREFIX = 'dl+archive:'
- URL_SCHEME = 'dl+archive'
- property cache
- checkpresent(key)[source]
Requests the remote to check if a key is present in it.
- Parameters
key (str) –
- Returns
True if the key is present in the remote. False if the key is not present.
- Return type
bool
- Raises
RemoteError – If the presence of the key couldn’t be determined, eg. in case of connection error.
- checkurl(url)[source]
Asks the remote to check if the url’s content can currently be downloaded (without downloading it). The remote can optionally provide additional information about the file.
- Parameters
url (str) –
- Returns
True if the url’s content can currently be downloaded and no additional information can be provided. False if it can’t currently be downloaded.
In order to provide additional information, a list of dictionaries can be returned. The dictionaries can have 3 keys: {‘url’: str, ‘size’: int, ‘filename’: str}. All of them are optional.
If there is only one file to be downloaded, we could return: [{‘size’: 512, ‘filename’:’example_file.txt’}]
Other examples: {‘url’:”https://example.com”, ‘size’:512, ‘filename’:”example_file.txt”} [{‘url’:”Url1”, ‘size’:512, ‘filename’:”Filename1”}, {‘url’:”Url2”, ‘filename’:”Filename2”}]
- Return type
Union(bool, List(Dict))
- claimurl(url)[source]
Asks the remote if it wishes to claim responsibility for downloading an url.
- Parameters
url (str) –
- Returns
True if it wants to claim this url. False if it doesn’t.
- Return type
bool
- get_contentlocation(key, absolute=False, verify_exists=True)[source]
Return (relative to top or absolute) path to the file containing the key
This is a wrapper around AnnexRepo.get_contentlocation which provides caching of the result (we are asking the location for the same archive key often)
- get_file_url(archive_file=None, archive_key=None, file=None, size=None)[source]
Given archive (file or a key) and a file – compose URL for access
Examples
- dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d&size=123
when size of file within archive was known to be 123
- dl+archive:SHA256E-s176–69…3e.tar.gz#path=1/d2/2d
when size of file within archive was not provided
- Parameters
size (int, optional) – Size of the file. If not provided, will simply be empty
- remove(key)[source]
Requests the remote to remove a key’s contents.
- Parameters
key (str) –
- Raises
RemoteError – If the key couldn’t be deleted from the remote.
- transfer_retrieve(key, file)[source]
Get the file identified by key from the remote and store it in local_file.
While the transfer is running, the remote can repeatedly call annex.progress(size) to indicate the number of bytes already stored. This will influence the progress shown to the user.
- Parameters
key (str) – The Key to get from the remote.
local_file (str) – Path where to store the file. Note that in some cases, local_file may contain whitespace.
- Raises
RemoteError – If the file could not be received from the remote.
- whereis(key)[source]
Asks the remote to provide additional information about ways to access the content of a key stored in it, such as eg, public urls. This will be displayed to the user by eg, git annex whereis. Note that users expect git annex whereis to run fast, without eg, network access.
- Parameters
key (str) –
- Returns
Information about the location of the key, eg. public urls.
- Return type
str