Progress reporting
Progress reporting is implemented via the logging system. A dedicated function
datalad.log.log_progress()
represents the main API for progress
reporting. For some standard use cases, the utilities
datalad.log.with_progress()
and
datalad.log.with_result_progress()
can simplify result reporting
further.
Design and implementation
This basic idea is to use an instance of datalad’s loggers to emit log messages
with particular attributes that are picked up by
datalad.log.ProgressHandler
(derived from
logging.Handler
), and are acted on differently, depending on
configuration and conditions of a session (e.g., interactive terminal sessions
vs. non-interactive usage in scripts). This variable behavior is implemented
via the use of logging
standard library log filters and handlers.
Roughly speaking, datalad.log.ProgressHandler
will only be used for
interactive sessions. In non-interactive cases, progress log messages are
inspected by datalad.log.filter_noninteractive_progress()
, and are
either discarded or treated like any other log message (see
datalad.log.LoggerHelper.get_initialized_logger()
for details on the
handler and filter setup).
datalad.log.ProgressHandler
inspects incoming log records for
attributes with names starting with dlm_progress. It will only process such
records and pass others on to the underlying original log handler otherwise.
datalad.log.ProgressHandler
takes care of creating, updating and
destroying any number of simultaneously running progress bars. Progress reports
must identify the respective process via an arbitrary string ID. It is the
caller’s responsibility to ensure that this ID is unique to the target
process/activity.
Reporting progress with log_progress()
Typical progress reporting via datalad.log.log_progress()
involves
three types of calls.
1. Start reporting progress about a process
A typical call to start of progress reporting looks like this
log_progress(
# the callable used to emit log messages
lgr.info,
# a unique identifiers of the activity progress is reported for
identifier,
# main message
'Unlocking files',
# optional unit string for a progress bar
unit=' Files',
# optional label to be displayed in a progress bar
label='Unlocking',
# maximum value for a progress bar
total=nfiles,
)
A new progress bar will be created automatically for any report with a previously
unseen activity identifier
. It can be configured via the specification of
a number of arguments, most notably a target total
for the progress bar.
See datalad.log.log_progress()
for a complete overview.
Starting a progress report must be done with a dedicated call. It cannot be combined with a progress update.
2. Update progress information about a process
Any subsequent call to datalad.log.log_progress()
with an activity
identifier that has already been seen either updates, or finishes the progress
reporting for an activity. Updates must contain an update
key which either
specifies a new value (if increment=False, the default) or an increment to
previously known value (if increment=True):
log_progress(
lgr.info,
# must match the identifier used to start the progress reporting
identifier,
# arbitrary message content, string expansion supported just like
# regular log messages
"Files to unlock %i", nfiles,
# critical key for report updates
update=1,
# ``update`` could be an absolute value or an increment
increment=True
)
Updating a progress report can only be done after a progress reporting was initialized (see above).
3. Report completion of a process
A progress bar will remain active until it is explicitly taken down, even if an
initially declared total
value may have been reached. Finishing a progress
report requires a final log message with the corresponding identifiers which,
like the first initializing message, does NOT contain an update
key.
log_progress(
lgr.info,
identifier,
# closing log message
"Completed unlocking files",
)
Progress reporting in non-interactive sessions
datalad.log.log_progress()
takes a noninteractive_level argument
that can be used to specify a log level at which progress is logged when no
progress bars can be used, but actual log messages are produced.
import logging
log_progress(
lgr.info,
identifier,
"Completed unlocking files",
noninteractive_level=logging.INFO
)
Each call to log_progress()
can be given a different
log level, in order to control the verbosity of the reporting in such a scenario.
For example, it is possible to log the start or end of an activity at a higher
level than intermediate updates. It is also possible to single out particular
intermediate events, and report them at a higher level.
If no noninteractive_level is specified, the progress update is unconditionally logged at the level implied by the given logger callable.
Reporting progress with with_(result_)progress()
For cases were a list of items needs to be processes sequentially, and progress
shall be communicated, two additional helpers could be used: the decorators
datalad.log.with_progress()
and
datalad.log.with_result_progress()
. They require a callable that takes
a list (or more generally a sequence) of items to be processed as the first
positional argument. They both set up and perform all necessary calls to
log_progress()
.
The difference between these helpers is that
datalad.log.with_result_progress()
expects a callable to produce
DataLad result records, and supports customs filters to decide which particular
result records to consider for progress reporting (e.g., only records for a
particular action and type).
Output non-progress information without interfering with progress bars
log_progress()
can also be useful when not reporting
progress, but ensuring that no other output is interfering with progress bars,
and vice versa. The argument maint can be used in this case, with no
particular activity identifier (it always impacts all active progress bars):
log_progress(
lgr.info,
None,
'Clear progress bars',
maint='clear',
)
This call will trigger a temporary discontinuation of any progress bar display.
Progress bars can either be re-enabled all at once, by an analog message with
maint='refresh'
, or will re-show themselves automatically when the next
update is received. A no_progress()
context manager helper
can be used to surround your context with those two calls to prevent progress
bars from interfering.