datalad.api.copy_file(path=None, dataset=None, recursive=False, target_dir=None, specs_from=None, message=None)

Copy files and their availability metadata from one dataset to another.

The difference to a system copy command is that here additional content availability information, such as registered URLs, is also copied to the target dataset. Moreover, potentially required git-annex special remote configurations are detected in a source dataset and are applied to a target dataset in an analogous fashion. It is possible to copy a file for which no content is available locally, by just copying the required metadata on content identity and availability.


At the moment, only URLs for the special remotes ‘web’ (git-annex built-in) and ‘datalad’ are recognized and transferred.

The interface is modeled after the POSIX ‘cp’ command, but with one additional way to specify what to copy where: specs_from allows the caller to flexibly input source-destination path pairs.

This command can copy files out of and into a hierarchy of nested datasets. Unlike with other DataLad command, the recursive switch does not enable recursion into subdatasets, but is analogous to the POSIX ‘cp’ command switch and enables subdirectory recursion, regardless of dataset boundaries. It is not necessary to enable recursion in order to save changes made to nested target subdatasets.


Copy a file into a dataset ‘myds’ using a path and a target directory specification, and save its addition to ‘myds’:

> copy_file('path/to/myfile', dataset='path/to/myds')

Copy a file to a dataset ‘myds’ and save it under a new name by providing two paths:

> copy_file(path=['path/to/myfile', 'path/to/myds/newname'],

Copy a file into a dataset without saving it:

> copy_file('path/to/myfile', target_dir='path/to/myds/')

Copy a directory and its subdirectories into a dataset ‘myds’ and save the addition in ‘myds’:

> copy_file('path/to/dir/', recursive=True, dataset='path/to/myds')

Copy files using a path and optionally target specification from a file:

> copy_file(dataset='path/to/myds', specs_from='path/to/specfile')
  • path (sequence of str or None, optional) – paths to copy (and possibly a target path to copy to). [Default: None]
  • dataset (Dataset or None, optional) – root dataset to save after copy operations are completed. All destination paths must be within this dataset, or its subdatasets. If no dataset is given, dataset modifications will be left unsaved. [Default: None]
  • recursive (bool, optional) – copy directories recursively. [Default: False]
  • target_dir (str or None, optional) – copy all source files into this DIRECTORY. This value is overridden by any explicit destination path provided via ‘specs_from’. When not given, this defaults to the path of the dataset specified via ‘dataset’. [Default: None]
  • specs_from – read list of source (and destination) path names from a given file, or stdin (with ‘-‘). Each line defines either a source path, or a source/destination path pair (separated by a null byte character). Alternatively, a list of 2-tuples with source/destination pairs can be given. [Default: None]
  • message (str or None, optional) – a description of the state or the changes made to a dataset. [Default: None]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: None]
  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]