datalad.api.create_sibling_webdav

datalad.api.create_sibling_webdav(url, *, dataset=None, name=None, storage_name=None, mode='annex', credential=None, existing='error', recursive=False, recursion_limit=None)

Create a sibling(-tandem) on a WebDAV server

WebDAV is a standard HTTP protocol extension for placing files on a server that is supported by a number of commercial storage services (e.g. 4shared.com, box.com), but also instances of cloud-storage solutions like Nextcloud or ownCloud. These software packages are also the basis for some institutional or public cloud storage solutions, such as EUDAT B2DROP.

For basic usage, only the URL with the desired dataset location on a WebDAV server needs to be specified for creating a sibling. However, the sibling setup can be flexibly customized (no storage sibling, or only a storage sibling, multi-version storage, or human-browsable single-version storage).

This command does not check for conflicting content on the WebDAV server!

When creating siblings recursively for a dataset hierarchy, subdataset exports are placed at their corresponding relative paths underneath the root location on the WebDAV server.

Collaboration on WebDAV siblings

The primary use case for WebDAV siblings is dataset deposition, where only one site is uploading dataset and file content updates. For collaborative workflows with multiple contributors, please make sure to consult the documentation on the underlying datalad-annex:: Git remote helper for advice on appropriate setups: http://docs.datalad.org/projects/next/

Git-annex implementation details

Storage siblings are presently configured to NOT be enabled automatically on cloning a dataset. Due to a limitation of git-annex, this would initially fail (missing credentials). Instead, an explicit datalad siblings enable --name <storage-sibling-name> command must be executed after cloning. If necessary, it will prompt for credentials.

This command does not (and likely will not) support embedding credentials in the repository (see embedcreds option of the git-annex webdav special remote; https://git-annex.branchable.com/special_remotes/webdav), because such credential copies would need to be updated, whenever they change or expire. Instead, credentials are retrieved from DataLad's credential system. In many cases, credentials are determined automatically, based on the HTTP authentication realm identified by a WebDAV server.

This command does not support setting up encrypted remotes (yet). Neither for the storage sibling, nor for the regular Git-remote. However, adding support for it is primarily a matter of extending the API of this command, and passing the respective options on to the underlying git-annex setup.

This command does not support setting up chunking for webdav storage siblings (https://git-annex.branchable.com/chunking).

Examples

Create a WebDAV sibling tandem for storage of a dataset's file content and revision history. A user will be prompted for any required credentials, if they are not yet known.:

> create_sibling_webdav(url='https://webdav.example.com/myds')

Such a dataset can be cloned by DataLad via a specially crafted URL. Again, credentials are automatically determined, or a user is prompted to enter them:

> clone('datalad-annex::?type=webdav&encryption=none&url=https://webdav.example.com/myds')

A sibling can also be created with a human-readable file tree, suitable for data exchange with non-DataLad users, but only able to host a single version of each file:

> create_sibling_webdav(url='https://example.com/browseable', mode='filetree')

Cloning such dataset siblings is possible via a convenience URL:

> clone('webdavs://example.com/browseable')

In all cases, the storage sibling needs to explicitly enabled prior to file content retrieval:

> siblings('enable', name='example.com-storage')
Parameters:
  • url -- URL identifying the sibling root on the target WebDAV server.

  • dataset -- specify the dataset to process. If no dataset is given, an attempt is made to identify the dataset based on the current working directory. [Default: None]

  • name -- name of the sibling. If none is given, the hostname-part of the WebDAV URL will be used. With recursive, the same name will be used to label all the subdatasets' siblings. [Default: None]

  • storage_name -- name of the storage sibling (git-annex special remote). Must not be identical to the sibling name. If not specified, defaults to the sibling name plus '-storage' suffix. If only a storage sibling is created, this setting is ignored, and the primary sibling name is used. [Default: None]

  • mode -- Siblings can be created in various modes: full-featured sibling tandem, one for a dataset's Git history and one storage sibling to host any number of file versions ('annex'). A single sibling for the Git history only ('git-only'). A single annex sibling for multi- version file storage only ('annex-only'). As an alternative to the standard (annex) storage sibling setup that is capable of storing any number of historical file versions using a content hash layout ('annex'|'annex-only'), the 'filetree' mode can used. This mode offers a human-readable data organization on the WebDAV remote that matches the file tree of a dataset (branch). However, it can, consequently, only store a single version of each file in the file tree. This mode is useful for depositing a single dataset snapshot for consumption without DataLad. The 'filetree' mode nevertheless allows for cloning such a single-version dataset, because the full dataset history can still be pushed to the WebDAV server. Git history hosting can also be turned off for this setup ('filetree- only'). When both a storage sibling and a regular sibling are created together, a publication dependency on the storage sibling is configured for the regular sibling in the local dataset clone. [Default: 'annex']

  • credential -- name of the credential providing a user/password credential to be used for authorization. The credential can be supplied via configuration setting 'datalad.credential.<name>.user|secret', or environment variable DATALAD_CREDENTIAL_<NAME>_USER|SECRET, or will be queried from the active credential store using the provided name. If none is provided, the last-used credential for the authentication realm associated with the WebDAV URL will be used. Only if a credential name was given, it will be encoded in the URL of the created WebDAV Git remote, credential auto-discovery will be performed on each remote access. [Default: None]

  • existing -- action to perform, if a (storage) sibling is already configured under the given name. In this case, sibling creation can be skipped ('skip') or the sibling (re-)configured ('reconfigure') in the dataset, or the command be instructed to fail ('error'). [Default: 'error']

  • recursive (bool, optional) -- if set, recurse into potential subdatasets. [Default: False]

  • recursion_limit (int or None, optional) -- limit recursion into subdatasets to the given number of levels. [Default: None]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) -- behavior to perform on failure: 'ignore' any failure is reported, but does not cause an exception; 'continue' if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; 'stop': processing will stop on first failure and an exception is raised. A failure is any result with status 'impossible' or 'error'. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: 'continue']

  • result_filter (callable or None, optional) -- if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable's return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer -- select rendering mode command results. 'tailored' enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the 'generic' result renderer; 'generic' renders each result in one line with key info like action, status, path, and an optional message); 'json' a complete JSON line serialization of the full result record; 'json_pp' like 'json', but pretty-printed spanning multiple lines; 'disabled' turns off result rendering entirely; '<template>' reports any value(s) of any result properties in any format indicated by the template (e.g. '{path}', compare with JSON output for all key-value choices). The template syntax follows the Python "format() language". It is possible to report individual dictionary values, e.g. '{metadata[name]}'. If a 2nd-level key contains a colon, e.g. 'music:Genre', ':' must be substituted by '#' in the template, like so: '{metadata[music#Genre]}'. [Default: 'tailored']

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) -- if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) -- return value behavior switch. If 'item-or-list' a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: 'list']