datalad_dataverse.dataset

Dataverse IO abstraction

class datalad_dataverse.dataset.FileIdRecord(path: 'PurePosixPath', is_released: 'bool', is_latest_version: 'bool')[source]

Bases: object

is_latest_version: bool
is_released: bool
path: PurePosixPath
class datalad_dataverse.dataset.OnlineDataverseDataset(api, dsid: str, root_path: str | None = None)[source]

Bases: object

Representation of Dataverse dataset in a remote instance.

Apart from providing an API for basic operations on such a dataset, a main purpose of this class is the uniform and consistent mangling of local DataLad datasets path to the corresponding counterparts on Dataverse. Dataverse imposing strict limits to acceptably names for directoryLabel and label. So strict, that it rules out anything not representable by a subset of ASCII, and therefore any non-latin alphabet. See the documentation of the mangle_path() function for details.

If root_path is set, then all paths in the scope of the Dataverse dataset will be prefixed with this path. This establishes an alternative root path for all dataset operations. It will not be possible to upload, download, rename (etc) files from outside this prefix scope, or across scopes.

On initialization only a record of what is in the latest version (draft or not) of the dataverse dataset is retrieved, including an annotation of content on whether it is released. This annotation is crucial, since it has implications on what to record should changes be uploaded. For example: It is not possible to actually remove content from a released version.

This record is later maintained locally when changes are made without ever requesting a full update again. In case of checking the presence of a file that does not appear to be part of the latest version, a request for such a record on all known dataverse dataset versions is made.

property data_access_api
download_file(fid: int, path: Path)[source]
get_fileid_from_path(path: PurePosixPath, *, latest_only: bool) int | None[source]

Get the id of a file, that matches a given path

The path is interpreted as the conjunction of a directoryLabel and a label (filename) in dataverse terminology.

Parameters:
  • path (PurePosixPath)

  • latest_only (bool) -- Whether to only consider the latest version on dataverse. If False, matching against older versions will only be performed when there was no match in the latest version (implies that an additional request may be performed)

Return type:

int or None

has_fileid(fid: int) bool[source]
has_fileid_in_latest_version(fid: int) bool[source]
has_path(path: PurePosixPath) bool[source]
has_path_in_latest_version(path: PurePosixPath) bool[source]
is_released_file(fid: int) bool[source]
remove_file(fid: int)[source]
rename_file(new_path: PurePosixPath, rename_id: int | None = None, rename_path: PurePosixPath | None = None)[source]
Raises:

RuntimeError -- Whenever the operation cannot or did not succeed. This could be because of a missing dependency, or because the file in question cannot be renamed (included in an earlier version).

update_file_metadata(identifier, json_str=None, is_filepid=False)[source]
upload_file(local_path: Path, remote_path: PurePosixPath, replace_id: int | None = None) int[source]