datalad.api.Dataset

class datalad.api.Dataset(*args, **kwargs)[source]

Representation of a DataLad dataset/repository

This is the core data type of DataLad: a representation of a dataset. At its core, datasets are (git-annex enabled) Git repositories. This class provides all operations that can be performed on a dataset.

Creating a dataset instance is cheap, all actual operations are delayed until they are actually needed. Creating multiple Dataset class instances for the same Dataset location will automatically yield references to the same object.

A dataset instance comprises of two major components: a repo attribute, and a config attribute. The former offers access to low-level functionality of the Git or git-annex repository. The latter gives access to a dataset’s configuration manager.

Most functionality is available via methods of this class, but also as stand-alone functions with the same name in datalad.api.

__init__(path)[source]

Parameters:: path (str or Path) – Path to the dataset location. This location may or may not exist yet.

Methods

`__init__`(path)	type path:
`add_archive_content`(*[, dataset, annex, ...])	Add content of an archive under git annex control.
`add_readme`(*[, dataset, existing])	Add basic information about DataLad datasets to a README file
`addurls`(urlformat, filenameformat, *[, ...])	Create and update a dataset from a list of URLs.
`clean`(*[, what, dry_run, recursive, ...])	Clean up after DataLad (possible temporary files etc.)
`clone`([path, git_clone_opts, dataset, ...])	Obtain a dataset (copy) from a URL or local directory
`close`()	Perform operations which would close any possible process using this Dataset
`configuration`([spec, scope, dataset, ...])	Get and set dataset, dataset-clone-local, or global configuration
`copy_file`(*[, dataset, recursive, ...])	Copy files and their availability metadata from one dataset to another.
`create`([initopts, force, description, ...])	Create a new dataset from scratch.
`create_sibling`(*[, name, target_dir, ...])	Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine
`create_sibling_gin`(*[, dataset, recursive, ...])	Create a dataset sibling on a GIN site (with content hosting)
`create_sibling_gitea`(*[, dataset, ...])	Create a dataset sibling on a Gitea site
`create_sibling_github`(*[, dataset, ...])	Create dataset sibling on GitHub.org (or an enterprise deployment).
`create_sibling_gitlab`(*[, site, project, ...])	Create dataset sibling at a GitLab site
`create_sibling_gogs`(*[, api, dataset, ...])	Create a dataset sibling on a GOGS site
`create_sibling_ria`(name, *[, dataset, ...])	Creates a sibling to a dataset in a RIA store
`diff`(*[, fr, to, dataset, annex, untracked, ...])	Report differences between two states of a dataset (hierarchy)
`download_url`(*[, dataset, path, overwrite, ...])	Download content
`drop`(*[, what, reckless, dataset, ...])	Drop content of individual files or entire (sub)datasets
`export_archive`(*[, dataset, archivetype, ...])	Export the content of a dataset as a TAR/ZIP archive.
`export_archive_ora`([opts, dataset, remote, ...])	Export an archive of a local annex object store for the ORA remote.
`export_to_figshare`(*[, dataset, ...])	Export the content of a dataset as a ZIP archive to figshare
`foreach_dataset`(*[, cmd_type, dataset, ...])	Run a command or Python code on the dataset and/or each of its sub-datasets.
`get`(*[, source, dataset, recursive, ...])	Get any dataset content (files/directories/subdatasets).
`get_superdataset`([datalad_only, topmost, ...])	Get the dataset's superdataset
`install`(*[, source, dataset, get_data, ...])	Install one or many datasets from remote URL(s) or local PATH source(s).
`is_installed`()	Returns whether a dataset is installed.
`no_annex`(pattern[, ref_dir, makedirs])	Configure a dataset to never put some content into the dataset's annex
`push`(*[, dataset, to, since, data, force, ...])	Push a dataset to a known sibling.
`recall_state`(whereto)	Something that can be used to checkout a particular state (tag, commit) to "undo" a change or switch to a otherwise desired previous state.
`remove`(*[, dataset, drop, reckless, ...])	Remove components from datasets
`rerun`(*[, since, dataset, branch, message, ...])	Re-execute previous datalad run commands.
`run`(*[, dataset, inputs, outputs, expand, ...])	Run an arbitrary shell command and record its impact on a dataset.
`run_procedure`(*[, dataset, discover, help_proc])	Run prepared procedures (DataLad scripts) on a dataset
`save`(*[, message, dataset, version_tag, ...])	Save the current state of a dataset
`siblings`(*[, dataset, name, url, pushurl, ...])	Manage sibling configuration
`status`(*[, dataset, annex, untracked, ...])	Report on the state of dataset content.
`subdatasets`(*[, dataset, state, fulfilled, ...])	Report subdatasets and their properties.
`uninstall`(*[, dataset, recursive, check, ...])	DEPRECATED: use the drop command
`unlock`(*[, dataset, recursive, recursion_limit])	Unlock file(s) of a dataset
`update`(*[, sibling, merge, how, how_subds, ...])	Update a dataset from a sibling.
`wtf`(*[, sensitive, sections, flavor, decor, ...])	Generate a report about the DataLad installation and configuration

Attributes

`config`	Get a `ConfigManager` instance for a dataset's configuration
`id`	Identifier of the dataset.
`path`	path to the dataset
`pathobj`	pathobj for the dataset
`repo`	Get an instance of the version control system/repo for this dataset, or None if there is none yet (or none anymore).