datalad.api.create_sibling_gitlab

datalad.api.create_sibling_gitlab(path=None, *, site=None, project=None, layout=None, dataset=None, recursive=False, recursion_limit=None, name=None, existing='error', access=None, publish_depends=None, description=None, dryrun=False, dry_run=False)

Create dataset sibling at a GitLab site

An existing GitLab project, or a project created via the GitLab web interface can be configured as a sibling with the siblings command. Alternatively, this command can create a GitLab project at any location/path a given user has appropriate permissions for. This is particularly helpful for recursive sibling creation for subdatasets. API access and authentication are implemented via python-gitlab, and all its features are supported. A particular GitLab site must be configured in a named section of a python-gitlab.cfg file (see https://python-gitlab.readthedocs.io/en/stable/cli-usage.html#configuration-file-format for details), such as:

[mygit]
url = https://git.example.com
api_version = 4
private_token = abcdefghijklmnopqrst

Subsequently, this site is identified by its name (‘mygit’ in the example above).

(Recursive) sibling creation for all, or a selected subset of subdatasets is supported with two different project layouts (see –layout):

“flat”

All datasets are placed as GitLab projects in the same group. The project name of the top-level dataset follows the configured datalad.gitlab-SITENAME-project configuration. The project names of contained subdatasets extend the configured name with the subdatasets’ s relative path within the root dataset, with all path separator characters replaced by ‘-’. This path separator is configurable (see Configuration).

“collection”

A new group is created for the dataset hierarchy, following the datalad.gitlab-SITENAME-project configuration. The root dataset is placed in a “project” project inside this group, and all nested subdatasets are represented inside the group using a “flat” layout. The root datasets project name is configurable (see Configuration).

GitLab cannot host dataset content. However, in combination with other data sources (and siblings), publishing a dataset to GitLab can facilitate distribution and exchange, while still allowing any dataset consumer to obtain actual data content from alternative sources.

Configuration

Many configuration switches and options for GitLab sibling creation can be provided as arguments to the command. However, it is also possible to specify a particular setup in a dataset’s configuration. This is particularly important when managing large collections of datasets. Configuration options are:

“datalad.gitlab-default-site”

Name of the default GitLab site (see –site)

“datalad.gitlab-SITENAME-siblingname”

Name of the sibling configured for the local dataset that points to the GitLab instance SITENAME (see –name)

“datalad.gitlab-SITENAME-layout”

Project layout used at the GitLab instance SITENAME (see –layout)

“datalad.gitlab-SITENAME-access”

Access method used for the GitLab instance SITENAME (see –access)

“datalad.gitlab-SITENAME-project”

Project “location/path” used for a datasets at GitLab instance SITENAME (see –project). Configuring this is useful for deriving project paths for subdatasets, relative to superdataset. The root-level group (“location”) needs to be created beforehand via GitLab’s web interface.

“datalad.gitlab-default-projectname”

The collection layout publishes (sub)datasets as projects with a custom name. The default name “project” can be overridden with this configuration.

“datalad.gitlab-default-pathseparator”

The flat and collection layout represent subdatasets with project names that correspond to their path within the superdataset, with the regular path separator replaced with a “-”: superdataset-subdataset. This configuration can be used to override this default separator.

This command can be configured with “datalad.create-sibling-ghlike.extra-remote-settings.NETLOC.KEY=VALUE” in order to add any local KEY = VALUE configuration to the created sibling in the local .git/config file. NETLOC is the domain of the Gitlab instance to apply the configuration for. This leads to a behavior that is equivalent to calling datalad’s siblings('configure', ...)``||``siblings configure command with the respective KEY-VALUE pair after creating the sibling. The configuration, like any other, could be set at user- or system level, so users do not need to add this configuration to every sibling created with the service at NETLOC themselves.

Parameters:
  • path – selectively create siblings for any datasets underneath a given path. By default only the root dataset is considered. [Default: None]

  • site (None or str, optional) – name of the GitLab site to create a sibling at. Must match an existing python-gitlab configuration section with location and authentication settings (see https://python- gitlab.readthedocs.io/en/stable/cli-usage.html#configuration). By default the dataset configuration is consulted. [Default: None]

  • project (None or str, optional) – project name/location at the GitLab site. If a subdataset of the reference dataset is processed, its project path is automatically determined by the layout configuration, by default. Users need to create the root-level GitLab group (NAME) via the webinterface before running the command. [Default: None]

  • layout ({None, 'collection', 'flat'}, optional) – layout of projects at the GitLab site, if a collection, or a hierarchy of datasets and subdatasets is to be created. By default the dataset configuration is consulted. [Default: None]

  • dataset (Dataset or None, optional) – reference or root dataset. If no path constraints are given, a sibling for this dataset will be created. In this and all other cases, the reference dataset is also consulted for the GitLab configuration, and desired project layout. If no dataset is given, an attempt is made to identify the dataset based on the current working directory. [Default: None]

  • recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]

  • recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]

  • name (str or None, optional) – name to represent the GitLab sibling remote in the local dataset installation. If not specified a name is looked up in the dataset configuration, or defaults to the site name. [Default: None]

  • existing ({'skip', 'error', 'reconfigure'}, optional) – desired behavior when already existing or configured siblings are discovered. ‘skip’: ignore; ‘error’: fail, if access URLs differ; ‘reconfigure’: use the existing repository and reconfigure the local dataset to use it as a sibling. [Default: ‘error’]

  • access ({None, 'http', 'ssh', 'ssh+http'}, optional) – access method used for data transfer to and from the sibling. ‘ssh’: read and write access used the SSH protocol; ‘http’: read and write access use HTTP requests; ‘ssh+http’: read access is done via HTTP and write access performed with SSH. Dataset configuration is consulted for a default, ‘http’ is used otherwise. [Default: None]

  • publish_depends (list of str or None, optional) – add a dependency such that the given existing sibling is always published prior to the new sibling. This equals setting a configuration item ‘remote.SIBLINGNAME.datalad-publish-depends’. Multiple dependencies can be given as a list of sibling names. [Default: None]

  • description (str or None, optional) – brief description for the GitLab project (displayed on the site). [Default: None]

  • dryrun (bool, optional) – Deprecated. Use the renamed dry_run parameter. [Default: False]

  • dry_run (bool, optional) – if set, no repository will be created, only tests for name collisions will be performed, and would-be repository names are reported for all relevant datasets. [Default: False]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]

  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]