datalad.api.create_sibling

datalad.api.create_sibling(sshurl, *, name=None, target_dir=None, target_url=None, target_pushurl=None, dataset=None, recursive=False, recursion_limit=None, existing='error', shared=None, group=None, ui=False, as_common_datasrc=None, publish_by_default=None, publish_depends=None, annex_wanted=None, annex_group=None, annex_groupwanted=None, inherit=False, since=None)

Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine

Given a local dataset, and a path or SSH login information this command creates a remote dataset repository and configures it as a dataset sibling to be used as a publication target (see publish command).

Various properties of the remote sibling can be configured (e.g. name location on the server, read and write access URLs, and access permissions.

Optionally, a basic web-viewer for DataLad datasets can be installed at the remote location.

This command supports recursive processing of dataset hierarchies, creating a remote sibling for each dataset in the hierarchy. By default, remote siblings are created in hierarchical structure that reflects the organization on the local file system. However, a simple templating mechanism is provided to produce a flat list of datasets (see –target-dir).

Parameters:

sshurl (str) – Login information for the target server. This can be given as a URL (ssh://host/path), SSH-style (user@host:path) or just a local path. Unless overridden, this also serves the future dataset’s access URL and path on the server.
name (str or None, optional) – sibling name to create for this publication target. If recursive is set, the same name will be used to label all the subdatasets’ siblings. When creating a target dataset fails, no sibling is added. [Default: None]
target_dir (str or None, optional) – path to the directory on the server where the dataset shall be created. By default this is set to the URL (or local path) specified via sshurl. If a relative path is provided here, it is interpreted as being relative to the user’s home directory on the server (or relative to sshurl, when that is a local path). Additional features are relevant for recursive processing of datasets with subdatasets. By default, the local dataset structure is replicated on the server. However, it is possible to provide a template for generating different target directory names for all (sub)datasets. Templates can contain certain placeholder that are substituted for each (sub)dataset. For example: “/mydirectory/dataset%%RELNAME”. Supported placeholders: %%RELNAME - the name of the datasets, with any slashes replaced by dashes. [Default: None]
target_url (str or None, optional) – “public” access URL of the to-be-created target dataset(s) (default: sshurl). Accessibility of this URL determines the access permissions of potential consumers of the dataset. As with target_dir, templates (same set of placeholders) are supported. Also, if specified, it is provided as the annex description. [Default: None]
target_pushurl (str or None, optional) – In case the target_url cannot be used to publish to the dataset, this option specifies an alternative URL for this purpose. As with target_url, templates (same set of placeholders) are supported. [Default: None]
dataset (Dataset or None, optional) – specify the dataset to create the publication target for. If no dataset is given, an attempt is made to identify the dataset based on the current working directory. [Default: None]
recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]
recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]
existing ({'skip', 'error', 'reconfigure', 'replace'}, optional) – action to perform, if a sibling is already configured under the given name and/or a target (non-empty) directory already exists. In this case, a dataset can be skipped (‘skip’), the sibling configuration be updated (‘reconfigure’), or process interrupts with error (‘error’). DANGER ZONE: If ‘replace’ is used, an existing target directory will be forcefully removed, re-initialized, and the sibling (re-)configured (thus implies ‘reconfigure’). replace could lead to data loss, so use with care. To minimize possibility of data loss, in interactive mode DataLad will ask for confirmation, but it would raise an exception in non-interactive mode. [Default: ‘error’]
shared (str or bool or None, optional) – if given, configures the access permissions on the server for multi- users (this could include access by a webserver!). Possible values for this option are identical to those of git init –shared and are described in its documentation. [Default: None]
group (str or None, optional) – Filesystem group for the repository. Specifying the group is particularly important when shared=”group”. [Default: None]
ui (bool or str, optional) – publish a web interface for the dataset with an optional user- specified name for the html at publication target. defaults to index.html at dataset root. [Default: False]
as_common_datasrc – configure the created sibling as a common data source of the dataset that can be automatically used by all consumers of the dataset (technical: git-annex auto-enabled special remote). [Default: None]
publish_by_default (list of str or None, optional) – add a refspec to be published to this sibling by default if nothing specified. [Default: None]
publish_depends (list of str or None, optional) – add a dependency such that the given existing sibling is always published prior to the new sibling. This equals setting a configuration item ‘remote.SIBLINGNAME.datalad-publish-depends’. Multiple dependencies can be given as a list of sibling names. [Default: None]
annex_wanted (str or None, optional) – expression to specify ‘wanted’ content for the repository/sibling. See https://git-annex.branchable.com/git-annex-wanted/ for more information. [Default: None]
annex_group (str or None, optional) – expression to specify a group for the repository. See https://git- annex.branchable.com/git-annex-group/ for more information. [Default: None]
annex_groupwanted (str or None, optional) – expression for the groupwanted. Makes sense only if annex_wanted=”groupwanted” and annex-group is given too. See https://git-annex.branchable.com/git-annex-groupwanted/ for more information. [Default: None]
inherit (bool, optional) – if sibling is missing, inherit settings (git config, git annex wanted/group/groupwanted) from its super-dataset. [Default: False]
since (str or None, optional) – limit processing to subdatasets that have been changed since a given state (by tag, branch, commit, etc). This can be used to create siblings for recently added subdatasets. If ‘^’ is given, the last state of the current branch at the sibling is taken as a starting point. [Default: None]
on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]