datalad.api.remove(path=None, *, dataset=None, drop='datasets', reckless=None, message=None, jobs=None, recursive=None, check=None, save=None, if_dirty=None)

Remove components from datasets

Removing “unlinks” a dataset component, such as a file or subdataset, from a dataset. Such a removal advances the state of a dataset, just like adding new content. A remove operation can be undone, by restoring a previous dataset state, but might require re-obtaining file content and subdatasets from remote locations.

This command relies on the ‘drop’ command for safe operation. By default, only file content from datasets which will be uninstalled as part of a removal will be dropped. Otherwise file content is retained, such that restoring a previous version also immediately restores file content access, just as it is the case for files directly committed to Git. This default behavior can be changed to always drop content prior removal, for cases where a minimal storage footprint for local datasets installations is desirable.

Removing a dataset component is always a recursive operation. Removing a directory, removes all content underneath the directory too. If subdatasets are located under a to-be-removed path, they will be uninstalled entirely, and all their content dropped. If any subdataset can not be uninstalled safely, the remove operation will fail and halt.

Changed in version 0.16: More in-depth and comprehensive safety-checks are now performed by default. The if_dirty argument is ignored, will be removed in a future release, and can be removed for a safe-by-default behavior. For other cases consider the reckless argument. The save argument is ignored and will be removed in a future release, a dataset modification is now always saved. Consider save’s amend argument for post-remove fix-ups. The recursive argument is ignored, and will be removed in a future release. Removal operations are always recursive, and the parameter can be stripped from calls for a safe-by-default behavior.

Deprecated since version 0.16: The check argument will be removed in a future release. It needs to be replaced with reckless.


Permanently remove a subdataset (and all further subdatasets contained in it) from a dataset:

> remove(dataset='path/to/dataset', path='path/to/subds')

Permanently remove a superdataset (with all subdatasets) from the filesystem:

> remove(dataset='path/to/dataset')

DANGER-ZONE: Fast wipe-out a dataset and all its subdataset, bypassing all safety checks:

> remove(dataset='path/to/dataset', reckless='kill')
  • path (sequence of str or None, optional) – path of a dataset or dataset component to be removed. [Default: None]

  • dataset (Dataset or None, optional) – specify the dataset to perform remove from. If no dataset is given, the current working directory is used as operation context. [Default: None]

  • drop ({'datasets', 'all'}, optional) – which dataset components to drop prior removal. This parameter is passed on to the underlying drop operation as its ‘what’ argument. [Default: ‘datasets’]

  • reckless ({'modification', 'availability', 'undead', 'kill', None}, optional) – disable individual or all data safety measures that would normally prevent potentially irreversible data-loss. With ‘modification’, unsaved modifications in a dataset will not be detected. This improves performance at the cost of permitting potential loss of unsaved or untracked dataset components. With ‘availability’, detection of dataset/branch-states that are only available in the local dataset, and detection of an insufficient number of file- content copies will be disabled. Especially the latter is a potentially expensive check which might involve numerous network transactions. With ‘undead’, detection of whether a to-be-removed local annex is still known to exist in the network of dataset-clones is disabled. This could cause zombie-records of invalid file availability. With ‘kill’, all safety-checks are disabled. [Default: None]

  • message (str or None, optional) – a description of the state or the changes made to a dataset. [Default: None]

  • jobs (int or None or {'auto'}, optional) – how many parallel jobs (where possible) to use. “auto” corresponds to the number defined by ‘datalad.runtime.max-annex-jobs’ configuration item NOTE: This option can only parallelize input retrieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. [Default: None]

  • recursive – DEPRECATED and IGNORED: removal is always a recursive operation. [Default: None]

  • check (bool, optional) – DEPRECATED: use ‘–reckless availability’. [Default: None]

  • save (bool, optional) – DEPRECATED and IGNORED; use save –amend instead. [Default: None]

  • if_dirty – DEPRECATED and IGNORED: use –reckless instead. [Default: None]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]

  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]