datalad.api.add_sibling_dataverse
- datalad.api.add_sibling_dataverse(dv_url: str, ds_pid: str, *, dataset: DatasetParameter | None = None, name: str = 'dataverse', storage_name: str | None = None, mode: str = 'annex', credential: str | None = None, existing: str = 'error', root_path: PurePosixPath | None = None)
Add a Dataverse dataset as a sibling(-tandem)
Dataverse is a web application to share and cite research data.
This command registers an existing Dataverse dataset as a sibling of a DataLad dataset. Both dataset version history and file content can then be deposited at a Dataverse site via the standard
pushcommand.Dataverse imposes strict limits on directory names (and to some degree also file name). Therefore, names of files that conflict with these rules (e.g., a directory name with any character not found in the English alphabet) are mangled on-push. This mangling does not impact file names in the DataLad dataset (also not for clones from Dataverse). See the package documentation for details.
If a DataLad's dataset version history was deposited on Dataverse, a dataset can also be cloned from Dataverse again, via the standard
clonecommand.In order to be able to use this command, a personal access token has to be generated on the Dataverse platform. You can find it by clicking on your name at the top right corner, and then clicking on API Token>Create Token.
Examples
Add a dataverse dataset sibling for sharing and citing:
> ds = Dataset('.') > ds.add_sibling_dataverse( . url='https://demo.dataverse.org', . name='dataverse', . ds_pid='doi:10.5072/FK2/PMPMZM')
- Parameters:
dv_url -- URL identifying the dataverse instance to connect to (e.g., https://demo.dataverse.org).
ds_pid -- Persistent identifier of the dataverse dataset to use as a sibling. This PID can be found on the dataset's landing page on Dataverse. Either right at the top underneath the title of the dataset as an URL or in the dataset's metadata. Both formats (doi:10.5072/FK2/PMPMZM and https://doi.org/10.5072/FK2/PMPMZM) are supported for this parameter.
dataset -- specify the dataset to add the sibling to. If no dataset is given, an attempt is made to identify the dataset based on the current working directory. [Default: None]
name -- name of the sibling. If none is given, the hostname-part of the URL will be used. [Default: 'dataverse']
storage_name -- name of the storage sibling (git-annex special remote). Must not be identical to the sibling name. If not specified, defaults to the sibling name plus '-storage' suffix. If only a storage sibling is created, this setting is ignored, and the primary sibling name is used. [Default: None]
mode -- Different sibling setups with varying ability to accept file content and dataset versions are supported: 'annex' for a sibling tandem, one for a dataset's Git history and one storage sibling to host any number of file versions; 'git-only' for a single sibling for the Git history only; 'annex-only' for a single annex sibling for multi- version file storage, but no dataset Git history; 'filetree' for a human-readable data organization on the dataverse end that matches the file tree of a dataset branch. This mode is useful for depositing a single dataset snapshot for consumption without DataLad. A dataset's Git history is included in the export and enabled cloning from Dataverse. 'filetree-only' disables the Git history export, and removes the ability to clone from Dataverse. When both a storage sibling and a regular sibling are created together, a publication dependency on the storage sibling is configured for the regular sibling in the local dataset clone. [Default: 'annex']
credential -- name of the credential providing an API token for the dataverse installation of your choice, to be used for authorization. If no credential is given or known, a credential discovery will attempted based on the Dataverse URL. If no credential can be found, a token is prompted for. [Default: None]
existing -- action to perform, if a (storage) sibling is already configured under the given name. In this case, sibling creation can be skipped ('skip') or the sibling (re-)configured ('reconfigure') in the dataset, or the command be instructed to fail ('error'). [Default: 'error']
root_path -- optional alternative root path for the sibling inside the Dataverse dataset. This can be used to represent multiple DataLad datasets within a single Dataverse dataset without conflict. Must be given in POSIX notation. [Default: None]
on_failure ({'ignore', 'continue', 'stop'}, optional) -- behavior to perform on failure: 'ignore' any failure is reported, but does not cause an exception; 'continue' if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; 'stop': processing will stop on first failure and an exception is raised. A failure is any result with status 'impossible' or 'error'. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: 'continue']
result_filter (callable or None, optional) -- if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable's return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
result_renderer -- select rendering mode command results. 'tailored' enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the 'generic' result renderer; 'generic' renders each result in one line with key info like action, status, path, and an optional message); 'json' a complete JSON line serialization of the full result record; 'json_pp' like 'json', but pretty-printed spanning multiple lines; 'disabled' turns off result rendering entirely; '<template>' reports any value(s) of any result properties in any format indicated by the template (e.g. '{path}', compare with JSON output for all key-value choices). The template syntax follows the Python "format() language". It is possible to report individual dictionary values, e.g. '{metadata[name]}'. If a 2nd-level key contains a colon, e.g. 'music:Genre', ':' must be substituted by '#' in the template, like so: '{metadata[music#Genre]}'. [Default: 'tailored']
result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) -- if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
return_type ({'generator', 'list', 'item-or-list'}, optional) -- return value behavior switch. If 'item-or-list' a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: 'list']