datalad.api.save

datalad.api.save(path=None, *, message=None, dataset=None, version_tag=None, recursive=False, recursion_limit=None, updated=False, message_file=None, to_git=None, jobs=None, amend=False, since=None, _since_sub_info=None, _sub_message=None)

Save the current state of a dataset

Saving the state of a dataset records changes that have been made to it. This change record is annotated with a user-provided description. Optionally, an additional tag, such as a version, can be assigned to the saved state. Such tag enables straightforward retrieval of past versions at a later point in time.

Note

Before Git v2.22, any Git repository without an initial commit located inside a Dataset is ignored, and content underneath it will be saved to the respective superdataset. DataLad datasets always have an initial commit, hence are not affected by this behavior.

Examples

Save any content underneath the current directory, without altering any potential subdataset:

> save(path='.')

Save specific content in the dataset:

> save(path='myfile.txt')

Attach a commit message to save:

> save(path='myfile.txt', message='add file')

Save any content underneath the current directory, and recurse into any potential subdatasets:

> save(path='.', recursive=True)

Save any modification of known dataset content in the current directory, but leave untracked files (e.g. temporary files) untouched:

> save(path='.', updated=True)

Tag the most recent saved state of a dataset:

> save(version_tag='bestyet')

Save a specific change but integrate into last commit keeping the already recorded commit message:

> save(path='myfile.txt', amend=True)

Parameters:

path (sequence of str or None, optional) – path/name of the dataset component to save. If given, only changes made to those components are recorded in the new state. [Default: None]
message (str or None, optional) – a description of the state or the changes made to a dataset. [Default: None]
dataset (Dataset or None, optional) – “specify the dataset to save. [Default: None]
version_tag (str or None, optional) – an additional marker for that state. Every dataset that is touched will receive the tag. [Default: None]
recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]
recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]
updated (bool, optional) – if given, only saves previously tracked paths. [Default: False]
message_file (str or None, optional) – take the commit message from this file. This flag is mutually exclusive with -m. [Default: None]
to_git (bool, optional) – flag whether to add data directly to Git, instead of tracking data identity only. Use with caution, there is no guarantee that a file put directly into Git like this will not be annexed in a subsequent save operation. If not specified, it will be up to git-annex to decide how a file is tracked, based on a dataset’s configuration to track particular paths, file types, or file sizes with either Git or git-annex. (see https://git-annex.branchable.com/tips/largefiles). [Default: None]
jobs (int or None or {'auto'}, optional) – how many parallel jobs (where possible) to use. “auto” corresponds to the number defined by ‘datalad.runtime.max-annex-jobs’ configuration item. [Default: None]
amend (bool, optional) – if set, changes are not recorded in a new, separate commit, but are integrated with the changeset of the previous commit, and both together are recorded by replacing that previous commit. This is mutually exclusive with recursive operation. [Default: False]
since (str or None, optional) – if given, compare against this commit instead of HEAD to discover changes, and wrap any intermediate commits made since that point as a merge commit. The merge has first-parent = since (keeping first- parent history linear) and second-parent = the post-save HEAD (all changes). This is used by datalad run to wrap command-created commits but can also be used standalone to close a unit of work as a merge. [Default: None]
on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]