Python module reference

This module reference extends the manual with a comprehensive overview of the available functionality built into datalad. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.

High-level user interface

Dataset operations

api.Dataset(path) Representation of a DataLad dataset/repository
api.add([path, dataset, to_git, save, …]) Add files/directories to an existing dataset.
api.create([path, force, description, …]) Create a new dataset from scratch.
api.create_sibling(sshurl[, name, …]) Create a dataset sibling on a UNIX-like SSH-accessible machine
api.create_sibling_github(reponame[, …]) Create dataset sibling on Github.
api.drop([path, dataset, recursive, …]) Drop file content from datasets
api.plugin([plugin, dataset, …]) Generic plugin interface
api.get([path, source, dataset, recursive, …]) Get any dataset content (files/directories/subdatasets).
api.install([path, source, dataset, …]) Install a dataset from a (remote) source.
api.publish([path, dataset, to, since, …]) Publish a dataset to a known sibling.
api.remove([path, dataset, recursive, …]) Remove components from datasets[message, path, dataset, …]) Save the current state of a dataset
api.update([path, sibling, merge, dataset, …]) Update a dataset from a sibling.
api.uninstall([path, dataset, recursive, …]) Uninstall subdatasets
api.unlock([path, dataset, recursive, …]) Unlock file(s) of a dataset

Meta data handling[, dataset, search, report, …]) Search within available in datasets’ meta data
api.aggregate_metadata(dataset[, …]) Aggregate meta data of a dataset for later query.

Plumbing commands

api.annotate_paths([path, dataset, …]) Analyze and act upon input paths
api.clean([dataset, what, recursive, …]) Clean up after DataLad (possible temporary files etc.)
api.clone(source[, path, dataset, …]) Obtain a dataset copy from a URL or local source (path)
api.create_test_dataset([path, spec, seed]) Create test (meta-)dataset.
api.diff([path, dataset, revision, staged, …]) Report changes of dataset components.
api.download_url(urls[, path, overwrite, …]) Download content[, recursive, fast, all_, long_, …]) List summary information about URLs and dataset(s)
api.sshrun(login, cmd[, port, no_stdin]) Run command on remote machines via SSH.
api.siblings([action, dataset, name, url, …]) Manage sibling configuration
api.subdatasets([dataset, fulfilled, …]) Report subdatasets and their properties.

Miscellaneous commands

api.add_archive_content(archive[, annex, …]) Add content of an archive under git annex control.
api.crawl([path, is_pipeline, is_template, …]) Crawl online resource to create or update a dataset.
api.crawl_init([args, template, …]) Initialize crawling configuration
api.test([module, verbose, nocapture, pdb, stop]) Run internal DataLad (unit)tests.


DataLad can be customized by plugins. The following plugins are shipped with DataLad.

add_readme add a README file to a dataset
export_tarball export a dataset to a tarball
no_annex configure which dataset parts to never put in the annex
wtf provide information about this DataLad installation

Support functionality

auto Proxy basic file operations (e.g.
cmd Wrapper for command and function calls, allowing for dry runs and output handling
consts constants for datalad
version Defines version to be imported in the module and obtained from
support.annexrepo Interface to git-annex by Joey Hess.
support.archives Various handlers/functionality for different types of files (e.g.
customremotes.base Base classes to custom git-annex remotes (e.g.
customremotes.archives Custom remote to support getting the load from archives present under annex

Configuration management



crawler.base Crawling of external resources (e.g.
crawler.pipeline Pipeline functionality.

Test infrastructure

tests.utils Miscellaneous utilities to assist with testing
tests.heavyoutput Helper to provide heavy load on stdout and stderr

Command line interface infrastructure