Python module reference

This module reference extends the manual with a comprehensive overview of the available functionality built into datalad. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.

High-level user interface

Dataset operations

api.Dataset(*args, **kwargs)

Representation of a DataLad dataset/repository

api.create([path, initopts, force, ...])

Create a new dataset from scratch.

api.create_sibling(sshurl, *[, name, ...])

Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine

api.create_sibling_github(reponame, *[, ...])

Create dataset sibling on GitHub.org (or an enterprise deployment).

api.create_sibling_gitlab([path, site, ...])

Create dataset sibling at a GitLab site

api.create_sibling_gogs(reponame, *[, api, ...])

Create a dataset sibling on a GOGS site

api.create_sibling_gitea(reponame, *[, ...])

Create a dataset sibling on a Gitea site

api.create_sibling_gin(reponame, *[, ...])

Create a dataset sibling on a GIN site (with content hosting)

api.create_sibling_ria(url, name, *[, ...])

Creates a sibling to a dataset in a RIA store

api.drop([path, what, reckless, dataset, ...])

Drop content of individual files or entire (sub)datasets

api.get([path, source, dataset, recursive, ...])

Get any dataset content (files/directories/subdatasets).

api.install([path, source, dataset, ...])

Install one or many datasets from remote URL(s) or local PATH source(s).

api.push([path, dataset, to, since, data, ...])

Push a dataset to a known sibling.

api.remove([path, dataset, drop, reckless, ...])

Remove components from datasets

api.save([path, message, dataset, ...])

Save the current state of a dataset

api.status([path, dataset, annex, ...])

Report on the state of dataset content.

api.update([path, sibling, merge, how, ...])

Update a dataset from a sibling.

api.unlock([path, dataset, recursive, ...])

Unlock file(s) of a dataset

Reproducible execution

api.run([cmd, dataset, inputs, outputs, ...])

Run an arbitrary shell command and record its impact on a dataset.

api.rerun([revision, since, dataset, ...])

Re-execute previous datalad run commands.

api.run_procedure([spec, dataset, discover, ...])

Run prepared procedures (DataLad scripts) on a dataset

Plumbing commands

api.clean(*[, dataset, what, dry_run, ...])

Clean up after DataLad (possible temporary files etc.)

api.clone(source[, path, git_clone_opts, ...])

Obtain a dataset (copy) from a URL or local directory

api.copy_file([path, dataset, recursive, ...])

Copy files and their availability metadata from one dataset to another.

api.create_test_dataset([path, spec, seed])

Create test (meta-)dataset.

api.diff([path, fr, to, dataset, annex, ...])

Report differences between two states of a dataset (hierarchy)

api.download_url(urls, *[, dataset, path, ...])

Download content

api.foreach_dataset(cmd, *[, cmd_type, ...])

Run a command or Python code on the dataset and/or each of its sub-datasets.

api.siblings([action, dataset, name, url, ...])

Manage sibling configuration

api.sshrun(login, cmd, *[, port, ipv4, ...])

Run command on remote machines via SSH.

api.subdatasets([path, dataset, state, ...])

Report subdatasets and their properties.

Miscellaneous commands

api.add_archive_content(archive, *[, ...])

Add content of an archive under git annex control.

api.add_readme([filename, dataset, existing])

Add basic information about DataLad datasets to a README file

api.addurls(urlfile, urlformat, ...[, ...])

Create and update a dataset from a list of URLs.

api.check_dates(paths, *[, reference_date, ...])

Find repository dates that are more recent than a reference date.

api.configuration([action, spec, scope, ...])

Get and set dataset, dataset-clone-local, or global configuration

api.export_archive([filename, dataset, ...])

Export the content of a dataset as a TAR/ZIP archive.

api.export_archive_ora(target[, opts, ...])

Export an archive of a local annex object store for the ORA remote.

api.export_to_figshare([filename, dataset, ...])

Export the content of a dataset as a ZIP archive to figshare

api.no_annex(dataset, pattern[, ref_dir, ...])

Configure a dataset to never put some content into the dataset's annex

api.shell_completion()

Display shell script for enabling shell completion for DataLad.

api.wtf(*[, dataset, sensitive, sections, ...])

Generate a report about the DataLad installation and configuration

Support functionality

cmd

Class the starts a subprocess and keeps it around to communicate with it via stdin.

consts

constants for datalad

log

Logging setup and utilities, including progress reporting

utils

version

support.gitrepo

Internal low-level interface to Git repositories

support.annexrepo

Interface to git-annex by Joey Hess.

support.archives

Various handlers/functionality for different types of files (e.g.

support.extensions

Support functionality for extension development

customremotes.base

Base classes to custom git-annex remotes (e.g.

customremotes.archives

Custom remote to get the load from archives present under annex

Configuration management

config

Test infrastructure

tests.utils_pytest

Miscellaneous utilities to assist with testing

tests.utils_testrepos

tests.heavyoutput

Helper to provide heavy load on stdout and stderr

Command interface

interface.base

High-level interface generation

Command line interface infrastructure

cli.exec

Call a command interface

cli.main

This is the main() CLI entryproint

cli.parser

Components to build the parser instance for the CLI

cli.renderer

Render results in a terminal