Python module reference

This module reference extends the manual with a comprehensive overview of the available functionality built into datalad. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.

High-level user interface

Dataset operations

`api.Dataset`(args, *kwargs)	Representation of a DataLad dataset/repository
`api.create`([path, initopts, force, ...])	Create a new dataset from scratch.
`api.create_sibling`(sshurl, *[, name, ...])	Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine
`api.create_sibling_github`(reponame, *[, ...])	Create dataset sibling on GitHub.org (or an enterprise deployment).
`api.create_sibling_gitlab`([path, site, ...])	Create dataset sibling at a GitLab site
`api.create_sibling_gogs`(reponame, *[, api, ...])	Create a dataset sibling on a GOGS site
`api.create_sibling_gitea`(reponame, *[, ...])	Create a dataset sibling on a Gitea site
`api.create_sibling_gin`(reponame, *[, ...])	Create a dataset sibling on a GIN site (with content hosting)
`api.create_sibling_ria`(url, name, *[, ...])	Creates a sibling to a dataset in a RIA store
`api.drop`([path, what, reckless, dataset, ...])	Drop content of individual files or entire (sub)datasets
`api.get`([path, source, dataset, recursive, ...])	Get any dataset content (files/directories/subdatasets).
`api.install`([path, source, dataset, ...])	Install one or many datasets from remote URL(s) or local PATH source(s).
`api.push`([path, dataset, to, since, data, ...])	Push a dataset to a known sibling.
`api.remove`([path, dataset, drop, reckless, ...])	Remove components from datasets
`api.save`([path, message, dataset, ...])	Save the current state of a dataset
`api.status`([path, dataset, annex, ...])	Report on the state of dataset content.
`api.update`([path, sibling, merge, how, ...])	Update a dataset from a sibling.
`api.unlock`([path, dataset, recursive, ...])	Unlock file(s) of a dataset

Reproducible execution

`api.run`([cmd, dataset, inputs, outputs, ...])	Run an arbitrary shell command and record its impact on a dataset.
`api.rerun`([revision, since, dataset, ...])	Re-execute previous datalad run commands.
`api.run_procedure`([spec, dataset, discover, ...])	Run prepared procedures (DataLad scripts) on a dataset

Plumbing commands

`api.clean`(*[, dataset, what, dry_run, ...])	Clean up after DataLad (possible temporary files etc.)
`api.clone`(source[, path, git_clone_opts, ...])	Obtain a dataset (copy) from a URL or local directory
`api.copy_file`([path, dataset, recursive, ...])	Copy files and their availability metadata from one dataset to another.
`api.create_test_dataset`([path, spec, seed])	Create test (meta-)dataset.
`api.diff`([path, fr, to, dataset, annex, ...])	Report differences between two states of a dataset (hierarchy)
`api.download_url`(urls, *[, dataset, path, ...])	Download content
`api.foreach_dataset`(cmd, *[, cmd_type, ...])	Run a command or Python code on the dataset and/or each of its sub-datasets.
`api.siblings`([action, dataset, name, url, ...])	Manage sibling configuration
`api.sshrun`(login, cmd, *[, port, ipv4, ...])	Run command on remote machines via SSH.
`api.subdatasets`([path, dataset, state, ...])	Report subdatasets and their properties.

Miscellaneous commands

`api.add_archive_content`(archive, *[, ...])	Add content of an archive under git annex control.
`api.add_readme`([filename, dataset, existing])	Add basic information about DataLad datasets to a README file
`api.addurls`(urlfile, urlformat, ...[, ...])	Create and update a dataset from a list of URLs.
`api.check_dates`(paths, *[, reference_date, ...])	Find repository dates that are more recent than a reference date.
`api.configuration`([action, spec, scope, ...])	Get and set dataset, dataset-clone-local, or global configuration
`api.export_archive`([filename, dataset, ...])	Export the content of a dataset as a TAR/ZIP archive.
`api.export_archive_ora`(target[, opts, ...])	Export an archive of a local annex object store for the ORA remote.
`api.export_to_figshare`([filename, dataset, ...])	Export the content of a dataset as a ZIP archive to figshare
`api.no_annex`(dataset, pattern[, ref_dir, ...])	Configure a dataset to never put some content into the dataset's annex
`api.shell_completion`()	Display shell script for enabling shell completion for DataLad.
`api.wtf`(*[, dataset, sensitive, sections, ...])	Generate a report about the DataLad installation and configuration

Support functionality

`cmd`	Class the starts a subprocess and keeps it around to communicate with it via stdin.
`consts`	constants for datalad
`log`	Logging setup and utilities, including progress reporting
`utils`
`version`
`support.gitrepo`	Internal low-level interface to Git repositories
`support.annexrepo`	Interface to git-annex by Joey Hess.
`support.archives`	Various handlers/functionality for different types of files (e.g. for archives).
`support.extensions`	Support functionality for extension development
`customremotes.base`	Base classes to custom git-annex remotes (e.g. extraction from archives).
`customremotes.archives`	Custom remote to get the load from archives present under annex
`runner.nonasyncrunner`	Thread based subprocess execution with stdout and stderr passed to protocol objects
`runner.protocol`	Base class of a protocol to be used with the DataLad runner

Configuration management

config

Test infrastructure

`tests.utils_pytest`	Miscellaneous utilities to assist with testing
`tests.utils_testrepos`
`tests.heavyoutput`	Helper to provide heavy load on stdout and stderr

Command interface

interface.base

High-level interface generation

Command line interface infrastructure

`cli.exec`	Call a command interface
`cli.main`	This is the main() CLI entryproint
`cli.parser`	Components to build the parser instance for the CLI
`cli.renderer`	Render results in a terminal