datalad.api.create_sibling_github

datalad.api.create_sibling_github(reponame, *, dataset=None, recursive=False, recursion_limit=None, name='github', existing='error', github_login=None, credential=None, github_organization=None, access_protocol='https', publish_depends=None, private=False, description=None, dryrun=False, dry_run=False, api='https://api.github.com')

Create dataset sibling on GitHub.org (or an enterprise deployment).

GitHub is a popular commercial solution for code hosting and collaborative development. GitHub cannot host dataset content (but see LFS, http://handbook.datalad.org/r.html?LFS). However, in combination with other data sources and siblings, publishing a dataset to GitHub can facilitate distribution and exchange, while still allowing any dataset consumer to obtain actual data content from alternative sources.

In order to be able to use this command, a personal access token has to be generated on the platform (Account->Settings->Developer Settings->Personal access tokens->Generate new token).

This command can be configured with “datalad.create-sibling-ghlike.extra-remote-settings.NETLOC.KEY=VALUE” in order to add any local KEY = VALUE configuration to the created sibling in the local .git/config file. NETLOC is the domain of the Github instance to apply the configuration for. This leads to a behavior that is equivalent to calling datalad’s siblings('configure', ...)``||``siblings configure command with the respective KEY-VALUE pair after creating the sibling. The configuration, like any other, could be set at user- or system level, so users do not need to add this configuration to every sibling created with the service at NETLOC themselves.

Changed in version 0.16: The API has been aligned with the some create_sibling_... commands of other GitHub-like services, such as GOGS, GIN, GitTea.

Deprecated since version 0.16: The dryrun option will be removed in a future release, use the renamed dry_run option instead. The github_login option will be removed in a future release, use the credential option instead. The github_organization option will be removed in a future release, prefix the repository name with <org>/ instead.

Examples

Use a new sibling on GIN as a common data source that is auto- available when cloning from GitHub:

> ds = Dataset('.')

# the sibling on GIN will host data content
> ds.create_sibling_gin('myrepo', name='gin')

# the sibling on GitHub will be used for collaborative work
> ds.create_sibling_github('myrepo', name='github')

# register the storage of the public GIN repo as a data source
> ds.siblings('configure', name='gin', as_common_datasrc='gin-storage')

# announce its availability on github
> ds.push(to='github')
Parameters:
  • reponame (str) – repository name, optionally including an ‘<organization>/’ prefix if the repository shall not reside under a user’s namespace. When operating recursively, a suffix will be appended to this name for each subdataset.

  • dataset (Dataset or None, optional) – dataset to create the publication target for. If not given, an attempt is made to identify the dataset based on the current working directory. [Default: None]

  • recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]

  • recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]

  • name (str or None, optional) – name of the sibling in the local dataset installation (remote name). [Default: ‘github’]

  • existing ({'skip', 'error', 'reconfigure', 'replace'}, optional) – behavior when already existing or configured siblings are discovered: skip the dataset (‘skip’), update the configuration (‘reconfigure’), or fail (‘error’). DEPRECATED DANGER ZONE: With ‘replace’, an existing repository will be irreversibly removed, re-initialized, and the sibling (re-)configured (thus implies ‘reconfigure’). replace could lead to data loss! In interactive sessions a confirmation prompt is shown, an exception is raised in non-interactive sessions. The ‘replace’ mode will be removed in a future release. [Default: ‘error’]

  • github_login (str or None, optional) – Deprecated, use the credential parameter instead. If given must be a personal access token. [Default: None]

  • credential (str or None, optional) – name of the credential providing a personal access token to be used for authorization. The token can be supplied via configuration setting ‘datalad.credential.<name>.token’, or environment variable DATALAD_CREDENTIAL_<NAME>_TOKEN, or will be queried from the active credential store using the provided name. If none is provided, the host-part of the API URL is used as a name (e.g. ‘https://api.github.com’ -> ‘api.github.com’). [Default: None]

  • github_organization (str or None, optional) – Deprecated, prepend a repo name with an ‘<orgname>/’ prefix instead. [Default: None]

  • access_protocol ({'https', 'ssh', 'https-ssh'}, optional) – access protocol/URL to configure for the sibling. With ‘https-ssh’ SSH will be used for write access, whereas HTTPS is used for read access. [Default: ‘https’]

  • publish_depends (list of str or None, optional) – add a dependency such that the given existing sibling is always published prior to the new sibling. This equals setting a configuration item ‘remote.SIBLINGNAME.datalad-publish-depends’. Multiple dependencies can be given as a list of sibling names. [Default: None]

  • private (bool, optional) – if set, create a private repository. [Default: False]

  • description (str or None, optional) – Brief description, displayed on the project’s page. [Default: None]

  • dryrun (bool, optional) – Deprecated. Use the renamed dry_run parameter. [Default: False]

  • dry_run (bool, optional) – if set, no repository will be created, only tests for sibling name collisions will be performed, and would-be repository names are reported for all relevant datasets. [Default: False]

  • api (str or None, optional) – URL of the GitHub instance API. [Default: ‘https://api.github.com’]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]

  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]