Result records
Result records are the standard return value format for all DataLad commands. Each command invocation yields one or more result records. Result records are routinely inspected throughout the code base, and are used to inform generic error handling, as well as particular calling commands on how to proceed with a specific operation.
The technical implementation of a result record is a Python dictionary. This dictionary must contain a number of mandatory fields/keys (see below). However, an arbitrary number of additional fields may be added to a result record.
The get_status_dict()
function simplifies the creation of result records.
Note
Developers must compose result records with care! DataLad supports custom user-provided hook configurations that use result record fields to decide when to trigger a custom post-result operation. Such custom hooks rely on a persistent naming and composition of result record fields. Changes to result records, including field name changes, field value changes, but also timing/order of record emitting potentially break user set ups!
Mandatory fields
The following keys must be present in any result record. If any of these keys is missing, DataLad’s behavior is undefined.
action
A string label identifying which type of operation a result is associated with.
Labels must not contain white space. They should be compact, and lower-cases,
and use _
(underscore) to separate words in compound labels.
A result without an action
label will not be processed and is discarded.
path
A string with an absolute path describing the local entity a result is
associated with. Paths must be platform-specific (e.g., Windows paths on
Windows, and POSIX paths on other operating systems). When a result is about an
entity that has no meaningful relation to the local file system (e.g., a URL to
be downloaded), to path
value should be determined with respect to the
potential impact of the result on any local entity (e.g., a URL downloaded
to a local file path, a local dataset modified based on remote information).
status
This field indicates the nature of a result in terms of four categories, identified by a string label.
ok
: a standard, to-be-expected resultnotneeded
: an operation that was requested, but found to be unnecessary in order to achieve a desired goalimpossible
: a requested operation cannot be performed, possibly because its preconditions are not meterror
: an error occurred while performing an operation
Based on the status
field, a result is categorized into success (ok
,
notneeded
) and failure (impossible
, error
). Depending on the
on_failure
parameterization of a command call, any failure-result emitted
by a command can lead to an IncompleteResultsError
being raised on command
exit, or a non-zero exit code on the command line. With on_failure='stop'
,
an operation is halted on the first failure and the command errors out
immediately, with on_failure='continue'
an operation will continue despite
intermediate failures and the command only errors out at the very end, with
on_failure='ignore'
the command will not error even when failures occurred.
The latter mode can be used in cases where the initial status-characterization
needs to be corrected for the particular context of an operation (e.g., to
relabel expected and recoverable errors).
Common optional fields
The following fields are not required, but can be used to enrich a result record with additional information that improves its interpretability, or triggers particular optional functionality in generic result processing.
type
This field indicates the type of entity a result is associated with. This may
or may not be the type of the local entity identified by the path
value.
The following values are common, and should be used in matching cases, but
arbitrary other values are supported too:
dataset
: a DataLad datasetfile
: a regular filedirectory
: a directorysymlink
: a symbolic linkkey
: a git-annex keysibling
: a Dataset sibling or Git remote
message
A message providing additional human-readable information on the nature or
provenance of a result. Any non-ok
results should have a message providing
information on the rational of their status characterization.
A message can be a string or a tuple. In case of a tuple, the second item can
contain values for %
-expansion of the message string. Expansion is performed
only immediately prior to actually outputting the message, hence string formatting
runtime costs can be avoided this way, if a message is not actually shown.
logger
If a result record has a message
field, then a given Logger instance
(typically from logging.getLogger()
) will be used to automatically log
this message. The log channel/level is determined based on
datalad.log.result-level
configuration setting. By default, this is
the debug
level. When set to match-status
the log level is determined
based on the status
field of a result record:
debug
for'ok'
, and'notneeded'
resultswarning
for'impossible'
resultserror
for'error'
results
This feature should be used with care. Unconditional logging can lead to confusing double-reporting when results rendered and also visibly logged.
refds
This field can identify a path (using the same semantics and requirements as
the path
field) to a reference dataset that represents the larger context
of an operation. For example, when recursively processing multiple files across
a number of subdatasets, a refds
value may point to the common superdataset.
This value may influence, for example, how paths are rendered in user-output.
parentds
This field can identify a path (using the same semantics and requirements as
the path
field) to a dataset containing an entity.
state
A string label categorizing the state of an entity. Common values are:
clean
untracked
modified
deleted
absent
present
error_message
An error message that was captured or produced while achieving a result.
An error message can be a string or a tuple. In the case of a tuple, the
second item can contain values for %
-expansion of the message string.
exception
An exception that occurred while achieving the reported result.
exception_traceback
A string with a traceback for the exception reported in exception
.
Additional fields observed “in the wild”
Given that arbitrary fields are supported in result records, it is impossible to compose a comprehensive list of field names (keys). However, in order to counteract needless proliferation, the following list describes fields that have been observed in implementations. Developers are encouraged to preferably use compatible names from this list, or extend the list for additional items.
In alphabetical order:
bytesize
The size of an entity in bytes (integer).
gitshasum
SHA1 of an entity (string)
prev_gitshasum
SHA1 of a previous state of an entity (string)
key
The git-annex key associated with a
type
-file
entity.