Python module reference

This module reference extends the manual with a comprehensive overview of the available functionality built into datalad. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.

High-level user interface

Dataset operations

api.Dataset(*args, **kwargs)

Representation of a DataLad dataset/repository

api.create([path, initopts, force, ...])

Create a new dataset from scratch.

api.create_sibling(sshurl, *[, name, ...])

Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine

api.create_sibling_github(reponame, *[, ...])

Create dataset sibling on GitHub.org (or an enterprise deployment).

api.create_sibling_gitlab([path, site, ...])

Create dataset sibling at a GitLab site

api.create_sibling_gogs(reponame, *[, api, ...])

Create a dataset sibling on a GOGS site

api.create_sibling_gitea(reponame, *[, ...])

Create a dataset sibling on a Gitea site

api.create_sibling_gin(reponame, *[, ...])

Create a dataset sibling on a GIN site (with content hosting)

api.create_sibling_ria(url, name, *[, ...])

Creates a sibling to a dataset in a RIA store

api.drop([path, what, reckless, dataset, ...])

Drop content of individual files or entire (sub)datasets

api.get([path, source, dataset, recursive, ...])

Get any dataset content (files/directories/subdatasets).

api.install([path, source, dataset, ...])

Install one or many datasets from remote URL(s) or local PATH source(s).

api.push([path, dataset, to, since, data, ...])

Push a dataset to a known sibling.

api.remove([path, dataset, drop, reckless, ...])

Remove components from datasets

api.save([path, message, dataset, ...])

Save the current state of a dataset

api.status([path, dataset, annex, ...])

Report on the state of dataset content.

api.update([path, sibling, merge, how, ...])

Update a dataset from a sibling.

api.unlock([path, dataset, recursive, ...])

Unlock file(s) of a dataset

Reproducible execution

api.run([cmd, dataset, inputs, outputs, ...])

Run an arbitrary shell command and record its impact on a dataset.

api.rerun([revision, since, dataset, ...])

Re-execute previous datalad run commands.

api.run_procedure([spec, dataset, discover, ...])

Run prepared procedures (DataLad scripts) on a dataset

Plumbing commands

api.clean(*[, dataset, what, dry_run, ...])

Clean up after DataLad (possible temporary files etc.)

api.clone(source[, path, git_clone_opts, ...])

Obtain a dataset (copy) from a URL or local directory

api.copy_file([path, dataset, recursive, ...])

Copy files and their availability metadata from one dataset to another.

api.create_test_dataset([path, spec, seed])

Create test (meta-)dataset.

api.diff([path, fr, to, dataset, annex, ...])

Report differences between two states of a dataset (hierarchy)

api.download_url(urls, *[, dataset, path, ...])

Download content

api.foreach_dataset(cmd, *[, cmd_type, ...])

Run a command or Python code on the dataset and/or each of its sub-datasets.

api.siblings([action, dataset, name, url, ...])

Manage sibling configuration

api.sshrun(login, cmd, *[, port, ipv4, ...])

Run command on remote machines via SSH.

api.subdatasets([path, dataset, state, ...])

Report subdatasets and their properties.

Miscellaneous commands

api.add_archive_content(archive, *[, ...])

Add content of an archive under git annex control.

api.add_readme([filename, dataset, existing])

Add basic information about DataLad datasets to a README file

api.addurls(urlfile, urlformat, ...[, ...])

Create and update a dataset from a list of URLs.

api.check_dates(paths, *[, reference_date, ...])

Find repository dates that are more recent than a reference date.

api.configuration([action, spec, scope, ...])

Get and set dataset, dataset-clone-local, or global configuration

api.export_archive([filename, dataset, ...])

Export the content of a dataset as a TAR/ZIP archive.

api.export_archive_ora(target[, opts, ...])

Export an archive of a local annex object store for the ORA remote.

api.export_to_figshare([filename, dataset, ...])

Export the content of a dataset as a ZIP archive to figshare

api.no_annex(dataset, pattern[, ref_dir, ...])

Configure a dataset to never put some content into the dataset's annex

api.shell_completion()

Display shell script for enabling shell completion for DataLad.

api.wtf(*[, dataset, sensitive, sections, ...])

Generate a report about the DataLad installation and configuration

Support functionality

cmd

Class the starts a subprocess and keeps it around to communicate with it via stdin.

consts

constants for datalad

log

Logging setup and utilities, including progress reporting

utils

version

support.gitrepo

Internal low-level interface to Git repositories

support.annexrepo

Interface to git-annex by Joey Hess.

support.archives

Various handlers/functionality for different types of files (e.g. for archives).

support.extensions

Support functionality for extension development

customremotes.base

Base classes to custom git-annex remotes (e.g. extraction from archives).

customremotes.archives

Custom remote to get the load from archives present under annex

runner.nonasyncrunner

Thread based subprocess execution with stdout and stderr passed to protocol objects

runner.protocol

Base class of a protocol to be used with the DataLad runner

Configuration management

config

Test infrastructure

tests.utils_pytest

Miscellaneous utilities to assist with testing

tests.utils_testrepos

tests.heavyoutput

Helper to provide heavy load on stdout and stderr

Command interface

interface.base

High-level interface generation

Command line interface infrastructure

cli.exec

Call a command interface

cli.main

This is the main() CLI entryproint

cli.parser

Components to build the parser instance for the CLI

cli.renderer

Render results in a terminal