datalad_dataverse.baseremote
git-annex special remote
- class datalad_dataverse.baseremote.DataverseRemote(*args)[source]
Bases:
SpecialRemoteSpecial remote for IO with Dataverse datasets.
This remote provides the standard set of operations: CHECKPRESENT, STORE, RETRIEVE, and REMOVE.
It uses the pyDataverse package internally, which presently imposes some limitations, such as poor handling of large-file downloads.
The following sections contain notes on dataverse and this particular implementation.
Dataverse
Dataverse datasets come with their own versioning. A version is created upon publishing a draft version. When a change is pushed, it is altering an already existing draft version or, if none existed, the push (implicitly) creates a new draft version. Publishing is not part of this special remote's operations.
Files uploaded to Dataverse have an associated database file ID. Their "path" inside a dataset is a combination of a
labeland adirectoryLabelthat jointly must be unique in a Dataverse dataset. However, the are only metadata associated with the file ID.A file ID is persistent, but not technically a content identifier as it is not created from the content like hash.
Recording the IDs with git-annex enables faster accessing for download, because a dataset content listing request can be avoided. Therefore, the special remote records the IDs of annex keys and tries to rely on them if possible.
Dataverse imposes strict naming limitations for directories and files. See https://github.com/IQSS/dataverse/issues/8807#issuecomment-1164434278 Therefore, remote paths are mangles to match these limitations.
- checkpresent(key)[source]
Requests the remote to check if a key is present in it.
- Parameters:
key (str)
- Returns:
True if the key is present in the remote. False if the key is not present.
- Return type:
bool
- Raises:
RemoteError -- If the presence of the key couldn't be determined, eg. in case of connection error.
- initremote()[source]
Use this command to initialize a remote git annex initremote dv1 type=external externaltype=dataverse encryption=none
- prepare()[source]
Tells the remote that it's time to prepare itself to be used. Gets called whenever git annex is about to access any of the below methods, so it shouldn't be too expensive. Otherwise it will slow down operations like git annex whereis or git annex info.
Internet connection can be established here, though it's recommended to defer this until it's actually needed.
- Raises:
RemoteError -- If the remote could not be prepared.
- remove(key)[source]
Requests the remote to remove a key's contents.
- Parameters:
key (str)
- Raises:
RemoteError -- If the key couldn't be deleted from the remote.
- transfer_retrieve(key, file)[source]
Get the file identified by key from the remote and store it in local_file.
While the transfer is running, the remote can repeatedly call annex.progress(size) to indicate the number of bytes already stored. This will influence the progress shown to the user.
- Parameters:
key (str) -- The Key to get from the remote.
local_file (str) -- Path where to store the file. Note that in some cases, local_file may contain whitespace.
- Raises:
RemoteError -- If the file could not be received from the remote.
- transfer_store(key, local_file)[source]
Store the file in local_file to a unique location derived from key.
It's important that, while a Key is being stored, checkpresent(key) not indicate it's present until all the data has been transferred. While the transfer is running, the remote can repeatedly call annex.progress(size) to indicate the number of bytes already stored. This will influence the progress shown to the user.
- Parameters:
key (str) -- The Key to be stored in the remote. In most cases, this is going to be the remote file name. It should be at least be unambiguously derived from it.
local_file (str) -- Path to the file to upload. Note that in some cases, local_file may contain whitespace. Note that local_file should not influence the filename used on the remote.
- Raises:
RemoteError -- If the file could not be stored to the remote.