Change log

 ____            _             _                   _
|  _ \    __ _  | |_    __ _  | |       __ _    __| |
| | | |  / _` | | __|  / _` | | |      / _` |  / _` |
| |_| | | (_| | | |_  | (_| | | |___  | (_| | | (_| |
|____/   \__,_|  \__|  \__,_| |_____|  \__,_|  \__,_|
                                           Change Log

This is a high level and scarce summary of the changes between releases. We would recommend to consult log of the DataLad git repository for more details.

0.12.5 (Apr 02, 2020) – a small step for datalad …

 Fix some bugs and make the world an even better place.

Fixes

  • Our log_progress helper mishandled the initial display and step of the progress bar. (#4326)
  • AnnexRepo.get_content_annexinfo is designed to accept init=None, but passing that led to an error. (#4330)
  • Update a regular expression to handle an output change in Git v2.26.0. (#4328)
  • We now set LC_MESSAGES to ‘C’ while running git to avoid failures when parsing output that is marked for translation. (#4342)
  • The helper for decoding JSON streams loaded the last line of input without decoding it if the line didn’t end with a new line, a regression introduced in the 0.12.0 release. (#4361)
  • The clone command failed to git-annex-init a fresh clone whenever it considered to add the origin of the origin as a remote. (#4367)

0.12.4 (Mar 19, 2020) – Windows?!

 The main purpose of this release is to have one on PyPi that has no associated wheel to enable a working installation on Windows (#4315).

Fixes

  • The description of the log.outputs config switch did not keep up with code changes and incorrectly stated that the output would be logged at the DEBUG level; logging actually happens at a lower level. (#4317)

0.12.3 (March 16, 2020) – .

Updates for compatibility with the latest git-annex, along with a few miscellaneous fixes

Major refactoring and deprecations

  • All spots that raised a NoDatasetArgumentFound exception now raise a NoDatasetFound exception to better reflect the situation: it is the dataset rather than the argument that is not found. For compatibility, the latter inherits from the former, but new code should prefer the latter. (#4285)

Fixes

  • Updates for compatibility with git-annex version 8.20200226. (#4214)
  • datalad export-to-figshare failed to export if the generated title was fewer than three characters. It now queries the caller for the title and guards against titles that are too short. (#4140)
  • Authentication was requested multiple times when git-annex launched parallel downloads from the datalad special remote. (#4308)
  • At verbose logging levels, DataLad requests that git-annex display debugging information too. Work around a bug in git-annex that prevented that from happening. (#4212)
  • The internal command runner looked in the wrong place for some configuration variables, including datalad.log.outputs, resulting in the default value always being used. (#4194)
  • publish failed when trying to publish to a git-lfs special remote for the first time. (#4200)
  • AnnexRepo.set_remote_url is supposed to establish shared SSH connections but failed to do so. (#4262)

Enhancements and new features

  • The message provided when a command cannot determine what dataset to operate on has been improved. (#4285)
  • The “aws-s3” authentication type now allows specifying the host through “aws-s3_host”, which was needed to work around an authorization error due to a longstanding upstream bug. (#4239)
  • The xmp metadata extractor now recognizes “.wav” files.

0.12.2 (Jan 28, 2020) – Smoothen the ride

Mostly a bugfix release with various robustifications, but also makes the first step towards versioned dataset installation requests.

Major refactoring and deprecations

  • The minimum required version for GitPython is now 2.1.12. (#4070)

Fixes

  • The class for handling configuration values, ConfigManager, inappropriately considered the current working directory’s dataset, if any, for both reading and writing when instantiated with dataset=None. This misbehavior is fairly inaccessible through typical use of DataLad. It affects datalad.cfg, the top-level configuration instance that should not consider repository-specific values. It also affects Python users that call Dataset with a path that does not yet exist and persists until that dataset is created. (#4078)
  • update saved the dataset when called with --merge, which is unnecessary and risks committing unrelated changes. (#3996)
  • Confusing and irrelevant information about Python defaults have been dropped from the command-line help. (#4002)
  • The logic for automatically propagating the ‘origin’ remote when cloning a local source didn’t properly account for relative paths. (#4045)
  • Various fixes to file name handling and quoting on Windows. (#4049) (#4050)
  • When cloning failed, error lines were not bubbled up to the user in some scenarios. (#4060)

Enhancements and new features

  • clone (and thus install)
    • now propagates the reckless mode from the superdataset when cloning a dataset into it. (#4037)
    • gained support for ria+<protocol>:// URLs that point to RIA stores. (#4022)
    • learned to read “@version” from ria+ URLs and install that version of a dataset (#4036) and to apply URL rewrites configured through Git’s url.*.insteadOf mechanism (#4064).
    • now copies datalad.get.subdataset-source-candidate-<name> options configured within the superdataset into the subdataset. This is particularly useful for RIA data stores. (#4073)
  • Archives are now (optionally) handled with 7-Zip instead of patool. 7-Zip will be used by default, but patool will be used on non-Windows systems if the datalad.runtime.use-patool option is set or the 7z executable is not found. (#4041)

0.12.1 (Jan 15, 2020) – Small bump after big bang

Fix some fallout after major release.

Fixes

  • Revert incorrect relative path adjustment to URLs in clone. (#3538)
  • Various small fixes to internal helpers and test to run on Windows (#2566) (#2534)

0.12.0 (Jan 11, 2020) – Krakatoa

This release is the result of more than a year of development that includes fixes for a large number of issues, yielding more robust behavior across a wider range of use cases, and introduces major changes in API and behavior. It is the first release for which extensive user documentation is available in a dedicated DataLad Handbook. Python 3 (3.5 and later) is now the only supported Python flavor.

Major changes 0.12 vs 0.11

  • save fully replaces add (which is obsolete now, and will be removed in a future release).
  • A new Git-annex aware status command enables detailed inspection of dataset hierarchies. The previously available diff command has been adjusted to match status in argument semantics and behavior.
  • The ability to configure dataset procedures prior and after the execution of particular commands has been replaced by a flexible “hook” mechanism that is able to run arbitrary DataLad commands whenever command results are detected that match a specification.
  • Support of the Windows platform has been improved substantially. While performance and feature coverage on Windows still falls behind Unix-like systems, typical data consumer use cases, and standard dataset operations, such as create and save, are now working. Basic support for data provenance capture via run is also functional.
  • Support for Git-annex direct mode repositories has been removed, following the end of support in Git-annex itself.
  • The semantics of relative paths in command line arguments have changed. Previously, a call datalad save --dataset /tmp/myds some/relpath would have been interpreted as saving a file at /tmp/myds/some/relpath into dataset /tmp/myds. This has changed to saving $PWD/some/relpath into dataset /tmp/myds. More generally, relative paths are now always treated as relative to the current working directory, except for path arguments of Dataset class instance methods of the Python API. The resulting partial duplication of path specifications between path and dataset arguments is mitigated by the introduction of two special symbols that can be given as dataset argument: ^ and ^., which identify the topmost superdataset and the closest dataset that contains the working directory, respectively.
  • The concept of a “core API” has been introduced. Commands situated in the module datalad.core (such as create, save, run, status, diff) receive additional scrutiny regarding API and implementation, and are meant to provide longer-term stability. Application developers are encouraged to preferentially build on these commands.

Major refactoring and deprecations since 0.12.0rc6

  • clone has been incorporated into the growing core API. The public --alternative-source parameter has been removed, and a clone_dataset function with multi-source capabilities is provided instead. The --reckless parameter can now take literal mode labels instead of just beeing a binary flag, but backwards compatibility is maintained.
  • The get_file_content method of GitRepo was no longer used internally or in any known DataLad extensions and has been removed. (#3812)
  • The function get_dataset_root has been replaced by rev_get_dataset_root. rev_get_dataset_root remains as a compatibility alias and will be removed in a later release. (#3815)
  • The add_sibling module, marked obsolete in v0.6.0, has been removed. (#3871)
  • mock is no longer declared as an external dependency because we can rely on it being in the standard library now that our minimum required Python version is 3.5. (#3860)
  • download-url now requires that directories be indicated with a trailing slash rather than interpreting a path as directory when it doesn’t exist. This avoids confusion that can result from typos and makes it possible to support directory targets that do not exist. (#3854)
  • The dataset_only argument of the ConfigManager class is deprecated. Use source="dataset" instead. (#3907)
  • The --proc-pre and --proc-post options have been removed, and configuration values for datalad.COMMAND.proc-pre and datalad.COMMAND.proc-post are no longer honored. The new result hook mechanism provides an alternative for proc-post procedures. (#3963)

Fixes since 0.12.0rc6

  • publish crashed when called with a detached HEAD. It now aborts with an informative message. (#3804)
  • Since 0.12.0rc6 the call to update in siblings resulted in a spurious warning. (#3877)
  • siblings crashed if it encountered an annex repository that was marked as dead. (#3892)
  • The update of rerun in v0.12.0rc3 for the rewritten diff command didn’t account for a change in the output of diff, leading to rerun --report unintentionally including unchanged files in its diff values. (#3873)
  • In 0.12.0rc5 download-url was updated to follow the new path handling logic, but its calls to AnnexRepo weren’t properly adjusted, resulting in incorrect path handling when the called from a dataset subdirectory. (#3850)
  • download-url called git annex addurl in a way that failed to register a URL when its header didn’t report the content size. (#3911)
  • With Git v2.24.0, saving new subdatasets failed due to a bug in that Git release. (#3904)
  • With DataLad configured to stop on failure (e.g., specifying --on-failure=stop from the command line), a failing result record was not rendered. (#3863)
  • Installing a subdataset yielded an “ok” status in cases where the repository was not yet in its final state, making it ineffective for a caller to operate on the repository in response to the result. (#3906)
  • The internal helper for converting git-annex’s JSON output did not relay information from the “error-messages” field. (#3931)
  • run-procedure reported relative paths that were confusingly not relative to the current directory in some cases. It now always reports absolute paths. (#3959)
  • diff inappropriately reported files as deleted in some cases when to was a value other than None. (#3999)
  • An assortment of fixes for Windows compatibility. (#3971) (#3974) (#3975) (#3976) (#3979)
  • Subdatasets installed from a source given by relative path will now have this relative path used as ‘url’ in their .gitmodules record, instead of an absolute path generated by Git. (#3538)
  • clone will now correctly interpret ‘~/…’ paths as absolute path specifications. (#3958)
  • run-procedure mistakenly reported a directory as a procedure. (#3793)
  • The cleanup for batched git-annex processes has been improved. (#3794) (#3851)
  • The function for adding a version ID to an AWS S3 URL doesn’t support URLs with an “s3://” scheme and raises a NotImplementedError exception when it encounters one. The function learned to return a URL untouched if an “s3://” URL comes in with a version ID. (#3842)
  • A few spots needed to be adjusted for compatibility with git-annex’s new --sameas feature, which allows special remotes to share a data store. (#3856)
  • The swallow_logs utility failed to capture some log messages due to an incompatibility with Python 3.7. (#3935)
  • siblings
    • crashed if --inherit was passed but the parent dataset did not have a remote with a matching name. (#3954)
    • configured the wrong pushurl and annexurl values in some cases. (#3955)

Enhancements and new features since 0.12.0rc6

  • By default, datasets cloned from local source paths will now get a configured remote for any recursively discoverable ‘origin’ sibling that is also available from a local path in order to maximize automatic file availability across local annexes. (#3926)
  • The new result hooks mechanism allows callers to specify, via local Git configuration values, DataLad command calls that will be triggered in response to matching result records (i.e., what you see when you call a command with -f json_pp). (#3903)
  • The command interface classes learned to use a new _examples_ attribute to render documentation examples for both the Python and command-line API. (#3821)
  • Candidate URLs for cloning a submodule can now be generated based on configured templates that have access to various properties of the submodule, including its dataset ID. (#3828)
  • DataLad’s check that the user’s Git identity is configured has been sped up and now considers the appropriate environment variables as well. (#3807)
  • The tag method of GitRepo can now tag revisions other than HEAD and accepts a list of arbitrary git tag options. (#3787)
  • When get clones a subdataset and the subdataset’s HEAD differs from the commit that is registered in the parent, the active branch of the subdataset is moved to the registered commit if the registered commit is an ancestor of the subdataset’s HEAD commit. This handling has been moved to a more central location within GitRepo, and now applies to any update_submodule(..., init=True) call. (#3831)
  • The output of datalad -h has been reformatted to improve readability. (#3862)
  • unlock has been sped up. (#3880)
  • run-procedure learned to provide and render more information about discovered procedures, including whether the procedure is overridden by another procedure with the same base name. (#3960)
  • save now (#3817)
    • records the active branch in the superdataset when registering a new subdataset.
    • calls git annex sync when saving a dataset on an adjusted branch so that the changes are brought into the mainline branch.
  • subdatasets now aborts when its dataset argument points to a non-existent dataset. (#3940)
  • wtf now
    • reports the dataset ID if the current working directory is visiting a dataset. (#3888)
    • outputs entries deterministically. (#3927)
  • The ConfigManager class
    • learned to exclude .datalad/config as a source of configuration values, restricting the sources to standard Git configuration files, when called with source="local". (#3907)
    • accepts a value of “override” for its where argument to allow Python callers to more convenient override configuration. (#3970)
  • Commands now accept a dataset value of “^.” as shorthand for “the dataset to which the current directory belongs”. (#3242)

0.12.0rc6 (Oct 19, 2019) – some releases are better than the others

bet we will fix some bugs and make a world even a better place.

Major refactoring and deprecations

  • DataLad no longer supports Python 2. The minimum supported version of Python is now 3.5. (#3629)
  • Much of the user-focused content at http://docs.datalad.org has been removed in favor of more up to date and complete material available in the DataLad Handbook. Going forward, the plan is to restrict http://docs.datalad.org to technical documentation geared at developers. (#3678)
  • update used to allow the caller to specify which dataset(s) to update as a PATH argument or via the the --dataset option; now only the latter is supported. Path arguments only serve to restrict which subdataset are updated when operating recursively. (#3700)
  • Result records from a get call no longer have a “state” key. (#3746)
  • update and get no longer support operating on independent hierarchies of datasets. (#3700) (#3746)
  • The run update in 0.12.0rc4 for the new path resolution logic broke the handling of inputs and outputs for calls from a subdirectory. (#3747)
  • The is_submodule_modified method of GitRepo as well as two helper functions in gitrepo.py, kwargs_to_options and split_remote_branch, were no longer used internally or in any known DataLad extensions and have been removed. (#3702) (#3704)
  • The only_remote option of GitRepo.is_with_annex was not used internally or in any known extensions and has been dropped. (#3768)
  • The get_tags method of GitRepo used to sort tags by committer date. It now sorts them by the tagger date for annotated tags and the committer date for lightweight tags. (#3715)
  • The rev_resolve_path substituted resolve_path helper. (#3797)

Fixes

  • Correctly handle relative paths in publish. (#3799) (#3102)
  • Do not errorneously discover directory as a procedure. (#3793)
  • Correctly extract version from manpage to trigger use of manpages for --help. (#3798)
  • The cfg_yoda procedure saved all modifications in the repository rather than saving only the files it modified. (#3680)
  • Some spots in the documentation that were supposed appear as two hyphen’s were incorrectly rendered in the HTML output en-dash’s. (#3692)
  • create, install, and clone treated paths as relative to the dataset even when the string form was given, violating the new path handling rules. (#3749) (#3777) (#3780)
  • Providing the “^” shortcut to --dataset didn’t work properly when called from a subdirectory of a subdataset. (#3772)
  • We failed to propagate some errors from git-annex when working with its JSON output. (#3751)
  • With the Python API, callers are allowed to pass a string or list of strings as the cfg_proc argument to create, but the string form was mishandled. (#3761)
  • Incorrect command quoting for SSH calls on Windows that rendered basic SSH-related functionality (e.g., sshrun) on Windows unusable. (#3688)
  • Annex JSON result handling assumed platform-specific paths on Windows instead of the POSIX-style that is happening across all platforms. (#3719)
  • path_is_under() was incapable of comparing Windows paths with different drive letters. (#3728)

Enhancements and new features

  • Provide a collection of “public” call_git* helpers within GitRepo and replace use of “private” and less specific _git_custom_command calls. (#3791)
  • status gained a --report-filetype. Setting it to “raw” can give a performance boost for the price of no longer distinguishing symlinks that point to annexed content from other symlinks. (#3701)
  • save disables file type reporting by status to improve performance. (#3712)
  • subdatasets (#3743)
    • now extends its result records with a contains field that lists which contains arguments matched a given subdataset.
    • yields an ‘impossible’ result record when a contains argument wasn’t matched to any of the reported subdatasets.
  • install now shows more readable output when cloning fails. (#3775)
  • SSHConnection now displays a more informative error message when it cannot start the ControlMaster process. (#3776)
  • If the new configuration option datalad.log.result-level is set to a single level, all result records will be logged at that level. If you’ve been bothered by DataLad’s double reporting of failures, consider setting this to “debug”. (#3754)
  • Configuration values from datalad -c OPTION=VALUE ... are now validated to provide better errors. (#3695)
  • rerun learned how to handle history with merges. As was already the case when cherry picking non-run commits, re-creating merges may results in conflicts, and rerun does not yet provide an interface to let the user handle these. (#2754)
  • The fsck method of AnnexRepo has been enhanced to expose more features of the underlying git fsck command. (#3693)
  • GitRepo now has a for_each_ref_ method that wraps git for-each-ref, which is used in various spots that used to rely on GitPython functionality. (#3705)
  • Do not pretend to be able to work in optimized (python -O) mode, crash early with an informative message. (#3803)

0.12.0rc5 (September 04, 2019) – .

Various fixes and enhancements that bring the 0.12.0 release closer.

Major refactoring and deprecations

  • The two modules below have a new home. The old locations still exist as compatibility shims and will be removed in a future release.

    • datalad.distribution.subdatasets has been moved to datalad.local.subdatasets (#3429)
    • datalad.interface.run has been moved to datalad.core.local.run (#3444)
  • The lock method of AnnexRepo and the options parameter of AnnexRepo.unlock were unused internally and have been removed. (#3459)

  • The get_submodules method of GitRepo has been rewritten without GitPython. When the new compat flag is true (the current default), the method returns a value that is compatible with the old return value. This backwards-compatible return value and the compat flag will be removed in a future release. (#3508)

  • The logic for resolving relative paths given to a command has changed (#3435). The new rule is that relative paths are taken as relative to the dataset only if a dataset instance is passed by the caller. In all other scenarios they’re considered relative to the current directory.

    The main user-visible difference from the command line is that using the --dataset argument does not result in relative paths being taken as relative to the specified dataset. (The undocumented distinction between “rel/path” and “./rel/path” no longer exists.)

    All commands under datalad.core and datalad.local, as well as unlock and addurls, follow the new logic. The goal is for all commands to eventually do so.

Fixes

  • The function for loading JSON streams wasn’t clever enough to handle content that included a Unicode line separator like U2028. (#3524)
  • When unlock was called without an explicit target (i.e., a directory or no paths at all), the call failed if any of the files did not have content present. (#3459)
  • AnnexRepo.get_content_info failed in the rare case of a key without size information. (#3534)
  • save ignored --on-failure in its underlying call to status. (#3470)
  • Calling remove with a subdirectory displayed spurious warnings about the subdirectory files not existing. (#3586)
  • Our processing of git-annex --json output mishandled info messages from special remotes. (#3546)
  • create
    • didn’t bypass the “existing subdataset” check when called with --force as of 0.12.0rc3 (#3552)
    • failed to register the up-to-date revision of a subdataset when --cfg-proc was used with --dataset (#3591)
  • The base downloader had some error handling that wasn’t compatible with Python 3. (#3622)
  • Fixed a number of Unicode py2-compatibility issues. (#3602)
  • AnnexRepo.get_content_annexinfo did not properly chunk file arguments to avoid exceeding the command-line character limit. (#3587)

Enhancements and new features

  • New command create-sibling-gitlab provides an interface for creating a publication target on a GitLab instance. (#3447)
  • subdatasets (#3429)
    • now supports path-constrained queries in the same manner as commands like save and status
    • gained a --contains=PATH option that can be used to restrict the output to datasets that include a specific path.
    • now narrows the listed subdatasets to those underneath the current directory when called with no arguments
  • status learned to accept a plain --annex (no value) as shorthand for --annex basic. (#3534)
  • The .dirty property of GitRepo and AnnexRepo has been sped up. (#3460)
  • The get_content_info method of GitRepo, used by status and commands that depend on status, now restricts its git calls to a subset of files, if possible, for a performance gain in repositories with many files. (#3508)
  • Extensions that do not provide a command, such as those that provide only metadata extractors, are now supported. (#3531)
  • When calling git-annex with --json, we log standard error at the debug level rather than the warning level if a non-zero exit is expected behavior. (#3518)
  • create no longer refuses to create a new dataset in the odd scenario of an empty .git/ directory upstairs. (#3475)
  • As of v2.22.0 Git treats a sub-repository on an unborn branch as a repository rather than as a directory. Our documentation and tests have been updated appropriately. (#3476)
  • addurls learned to accept a --cfg-proc value and pass it to its create calls. (#3562)

0.12.0rc4 (May 15, 2019) – the revolution is over

With the replacement of the save command implementation with rev-save the revolution effort is now over, and the set of key commands for local dataset operations (create, run, save, status, diff) is now complete. This new core API is available from datalad.core.local (and also via datalad.api, as any other command).  ### Major refactoring and deprecations

  • The add command is now deprecated. It will be removed in a future release.

Fixes

  • Remove hard-coded dependencies on POSIX path conventions in SSH support code (#3400)
  • Emit an add result when adding a new subdataset during save (#3398)
  • SSH file transfer now actually opens a shared connection, if none exists yet (#3403)

Enhancements and new features

  • SSHConnection now offers methods for file upload and dowload (get(), put(). The previous copy() method only supported upload and was discontinued (#3401)

0.12.0rc3 (May 07, 2019) – the revolution continues

 Continues API consolidation and replaces the create and diff command with more performant implementations.

Major refactoring and deprecations

  • The previous diff command has been replaced by the diff variant from the datalad-revolution extension. (#3366)
  • rev-create has been renamed to create, and the previous create has been removed. (#3383)
  • The procedure setup_yoda_dataset has been renamed to cfg_yoda (#3353).
  • The --nosave of addurls now affects only added content, not newly created subdatasets (#3259).
  • Dataset.get_subdatasets (deprecated since v0.9.0) has been removed. (#3336)
  • The .is_dirty method of GitRepo and AnnexRepo has been replaced by .status or, for a subset of cases, the .dirty property. (#3330)
  • AnnexRepo.get_status has been replaced by AnnexRepo.status. (#3330)

Fixes

  • status
    • reported on directories that contained only ignored files (#3238)
    • gave a confusing failure when called from a subdataset with an explicitly specified dataset argument and “.” as a path (#3325)
    • misleadingly claimed that the locally present content size was zero when --annex basic was specified (#3378)
  • An informative error wasn’t given when a download provider was invalid. (#3258)
  • Calling rev-save PATH saved unspecified untracked subdatasets. (#3288)
  • The available choices for command-line options that take values are now displayed more consistently in the help output. (#3326)
  • The new pathlib-based code had various encoding issues on Python 2. (#3332)

Enhancements and new features

  • wtf now includes information about the Python version. (#3255)
  • When operating in an annex repository, checking whether git-annex is available is now delayed until a call to git-annex is actually needed, allowing systems without git-annex to operate on annex repositories in a restricted fashion. (#3274)
  • The load_stream on helper now supports auto-detection of compressed files. (#3289)
  • create (formerly rev-create)
    • learned to be speedier by passing a path to status (#3294)
    • gained a --cfg-proc (or -c) convenience option for running configuration procedures (or more accurately any procedure that begins with “cfg_”) in the newly created dataset (#3353)
  • AnnexRepo.set_metadata now returns a list while AnnexRepo.set_metadata_ returns a generator, a behavior which is consistent with the add and add_ method pair. (#3298)
  • AnnexRepo.get_metadata now supports batch querying of known annex files. Note, however, that callers should carefully validate the input paths because the batch call will silently hang if given non-annex files. (#3364)
  • status
    • now reports a “bytesize” field for files tracked by Git (#3299)
    • gained a new option eval_subdataset_state that controls how the subdataset state is evaluated. Depending on the information you need, you can select a less expensive mode to make status faster. (#3324)
    • colors deleted files “red” (#3334)
  • Querying repository content is faster due to batching of git cat-file calls. (#3301)
  • The dataset ID of a subdataset is now recorded in the superdataset. (#3304)
  • GitRepo.diffstatus
    • now avoids subdataset recursion when the comparison is not with the working tree, which substantially improves performance when diffing large dataset hierarchies (#3314)
    • got smarter and faster about labeling a subdataset as “modified” (#3343)
  • GitRepo.get_content_info now supports disabling the file type evaluation, which gives a performance boost in cases where this information isn’t needed. (#3362)
  • The XMP metadata extractor now filters based on file name to improve its performance. (#3329)

0.12.0rc2 (Mar 18, 2019) – revolution!

Fixes

  • GitRepo.dirty does not report on nested empty directories (#3196).
  • GitRepo.save() reports results on deleted files.

Enhancements and new features

  • Absorb a new set of core commands from the datalad-revolution extension:
    • rev-status: like git status, but simpler and working with dataset hierarchies
    • rev-save: a 2-in-1 replacement for save and add
    • rev-create: a ~30% faster create
  • JSON support tools can now read and write compressed files.

0.12.0rc1 (Mar 03, 2019) – to boldly go …

Major refactoring and deprecations

  • Discontinued support for git-annex direct-mode (also no longer supported upstream).

Enhancements and new features

  • Dataset and Repo object instances are now hashable, and can be created based on pathlib Path object instances
  • Imported various additional methods for the Repo classes to query information and save changes.

0.11.8 (Oct 11, 2019) – annex-we-are-catching-up

Fixes

  • Our internal command runner failed to capture output in some cases. (#3656)
  • Workaround in the tests around python in cPython >= 3.7.5 ‘;’ in the filename confusing mimetypes (#3769) (#3770)

Enhancements and new features

  • Prepared for upstream changes in git-annex, including support for the latest git-annex
    • 7.20190912 auto-upgrades v5 repositories to v7. (#3648) (#3682)
    • 7.20191009 fixed treatment of (larger/smaller)than in .gitattributes (#3765)
  • The cfg_text2git procedure, as well the --text-no-annex option of create, now configure .gitattributes so that empty files are stored in git rather than annex. (#3667)

0.11.7 (Sep 06, 2019) – python2-we-still-love-you-but-…

Primarily bugfixes with some optimizations and refactorings.

Fixes

  • addurls
    • now provides better handling when the URL file isn’t in the expected format. (#3579)
    • always considered a relative file for the URL file argument as relative to the current working directory, which goes against the convention used by other commands of taking relative paths as relative to the dataset argument. (#3582)
  • run-procedure
    • hard coded “python” when formatting the command for non-executable procedures ending with “.py”. sys.executable is now used. (#3624)
    • failed if arguments needed more complicated quoting than simply surrounding the value with double quotes. This has been resolved for systems that support shlex.quote, but note that on Windows values are left unquoted. (#3626)
  • siblings now displays an informative error message if a local path is given to --url but --name isn’t specified. (#3555)
  • sshrun, the command DataLad uses for GIT_SSH_COMMAND, didn’t support all the parameters that Git expects it to. (#3616)
  • Fixed a number of Unicode py2-compatibility issues. (#3597)
  • download-url now will create leading directories of the output path if they do not exist (#3646)

Enhancements and new features

  • The annotate-paths helper now caches subdatasets it has seen to avoid unnecessary calls. (#3570)
  • A repeated configuration query has been dropped from the handling of --proc-pre and --proc-post. (#3576)
  • Calls to git annex find now use --in=. instead of the alias --in=here to take advantage of an optimization that git-annex (as of the current release, 7.20190730) applies only to the former. (#3574)
  • addurls now suggests close matches when the URL or file format contains an unknown field. (#3594)
  • Shared logic used in the setup.py files of Datalad and its extensions has been moved to modules in the _datalad_build_support/ directory. (#3600)
  • Get ready for upcoming git-annex dropping support for direct mode (#3631)

0.11.6 (Jul 30, 2019) – am I the last of 0.11.x?

Primarily bug fixes to achieve more robust performance

Fixes

  • Our tests needed various adjustments to keep up with upstream changes in Travis and Git. (#3479) (#3492) (#3493)
  • AnnexRepo.is_special_annex_remote was too selective in what it considered to be a special remote. (#3499)
  • We now provide information about unexpected output when git-annex is called with --json. (#3516)
  • Exception logging in the __del__ method of GitRepo and AnnexRepo no longer fails if the names it needs are no longer bound. (#3527)
  • addurls botched the construction of subdataset paths that were more than two levels deep and failed to create datasets in a reliable, breadth-first order. (#3561)
  • Cloning a type=git special remote showed a spurious warning about the remote not being enabled. (#3547)

Enhancements and new features

  • For calls to git and git-annex, we disable automatic garbage collection due to past issues with GitPython’s state becoming stale, but doing so results in a larger .git/objects/ directory that isn’t cleaned up until garbage collection is triggered outside of DataLad. Tests with the latest GitPython didn’t reveal any state issues, so we’ve re-enabled automatic garbage collection. (#3458)
  • rerun learned an --explicit flag, which it relays to its calls to [run][[]]. This makes it possible to call rerun in a dirty working tree (#3498).
  • The metadata command aborts earlier if a metadata extractor is unavailable. (#3525)

0.11.5 (May 23, 2019) – stability is not overrated

Should be faster and less buggy, with a few enhancements.

Fixes

  • create-sibling (#3318)
    • Siblings are no longer configured with a post-update hook unless a web interface is requested with --ui.
    • git submodule update --init is no longer called from the post-update hook.
    • If --inherit is given for a dataset without a superdataset, a warning is now given instead of raising an error.
  • The internal command runner failed on Python 2 when its env argument had unicode values. (#3332)
  • The safeguard that prevents creating a dataset in a subdirectory that already contains tracked files for another repository failed on Git versions before 2.14. For older Git versions, we now warn the caller that the safeguard is not active. (#3347)
  • A regression introduced in v0.11.1 prevented save from committing changes under a subdirectory when the subdirectory was specified as a path argument. (#3106)
  • A workaround introduced in v0.11.1 made it possible for save to do a partial commit with an annex file that has gone below the annex.largefiles threshold. The logic of this workaround was faulty, leading to files being displayed as typechanged in the index following the commit. (#3365)
  • The resolve_path() helper confused paths that had a semicolon for SSH RIs. (#3425)
  • The detection of SSH RIs has been improved. (#3425)

Enhancements and new features

  • The internal command runner was too aggressive in its decision to sleep. (#3322)
  • The “INFO” label in log messages now retains the default text color for the terminal rather than using white, which only worked well for terminals with dark backgrounds. (#3334)
  • A short flag -R is now available for the --recursion-limit flag, a flag shared by several subcommands. (#3340)
  • The authentication logic for create-sibling-github has been revamped and now supports 2FA. (#3180)
  • New configuration option datalad.ui.progressbar can be used to configure the default backend for progress reporting (“none”, for example, results in no progress bars being shown). (#3396)
  • A new progress backend, available by setting datalad.ui.progressbar to “log”, replaces progress bars with a log message upon completion of an action. (#3396)
  • DataLad learned to consult the NO_COLOR environment variable and the new datalad.ui.color configuration option when deciding to color output. The default value, “auto”, retains the current behavior of coloring output if attached to a TTY (#3407).
  • clean now removes annex transfer directories, which is useful for cleaning up failed downloads. (#3374)
  • clone no longer refuses to clone into a local path that looks like a URL, making its behavior consistent with git clone. (#3425)
  • wtf
    • Learned to fall back to the dist package if platform.dist, which has been removed in the yet-to-be-release Python 3.8, does not exist. (#3439)
    • Gained a --section option for limiting the output to specific sections and a --decor option, which currently knows how to format the output as GitHub’s <details> section. (#3440)

0.11.4 (Mar 18, 2019) – get-ready

Largely a bug fix release with a few enhancements

Important

  • 0.11.x series will be the last one with support for direct mode of git-annex which is used on crippled (no symlinks and no locking) filesystems. v7 repositories should be used instead.

Fixes

  • Extraction of .gz files is broken without p7zip installed. We now abort with an informative error in this situation. (#3176)
  • Committing failed in some cases because we didn’t ensure that the path passed to git read-tree --index-output=... resided on the same filesystem as the repository. (#3181)
  • Some pointless warnings during metadata aggregation have been eliminated. (#3186)
  • With Python 3 the LORIS token authenticator did not properly decode a response (#3205).
  • With Python 3 downloaders unnecessarily decoded the response when getting the status, leading to an encoding error. (#3210)
  • In some cases, our internal command Runner did not adjust the environment’s PWD to match the current working directory specified with the cwd parameter. (#3215)
  • The specification of the pyliblzma dependency was broken. (#3220)
  • search displayed an uninformative blank log message in some cases. (#3222)
  • The logic for finding the location of the aggregate metadata DB anchored the search path incorrectly, leading to a spurious warning. (#3241)
  • Some progress bars were still displayed when stdout and stderr were not attached to a tty. (#3281)
  • Check for stdin/out/err to not be closed before checking for .isatty. (#3268)

Enhancements and new features

  • Creating a new repository now aborts if any of the files in the directory are tracked by a repository in a parent directory. (#3211)
  • run learned to replace the {tmpdir} placeholder in commands with a temporary directory. (#3223)
  • duecredit support has been added for citing DataLad itself as well as datasets that an analysis uses. (#3184)
  • The eval_results interface helper unintentionally modified one of its arguments. (#3249)
  • A few DataLad constants have been added, changed, or renamed (#3250):
    • HANDLE_META_DIR is now DATALAD_DOTDIR. The old name should be considered deprecated.
    • METADATA_DIR now refers to DATALAD_DOTDIR/metadata rather than DATALAD_DOTDIR/meta (which is still available as OLDMETADATA_DIR).
    • The new DATASET_METADATA_FILE refers to METADATA_DIR/dataset.json.
    • The new DATASET_CONFIG_FILE refers to DATALAD_DOTDIR/config.
    • METADATA_FILENAME has been renamed to OLDMETADATA_FILENAME.

0.11.3 (Feb 19, 2019) – read-me-gently

Just a few of important fixes and minor enhancements.

Fixes

  • The logic for setting the maximum command line length now works around Python 3.4 returning an unreasonably high value for SC_ARG_MAX on Debian systems. (#3165)
  • DataLad commands that are conceptually “read-only”, such as datalad ls -L, can fail when the caller lacks write permissions because git-annex tries merging remote git-annex branches to update information about availability. DataLad now disables annex.merge-annex-branches in some common “read-only” scenarios to avoid these failures. (#3164)

Enhancements and new features

  • Accessing an “unbound” dataset method now automatically imports the necessary module rather than requiring an explicit import from the Python caller. For example, calling Dataset.add no longer needs to be preceded by from datalad.distribution.add import Add or an import of datalad.api. (#3156)
  • Configuring the new variable datalad.ssh.identityfile instructs DataLad to pass a value to the -i option of ssh. (#3149) (#3168)

0.11.2 (Feb 07, 2019) – live-long-and-prosper

A variety of bugfixes and enhancements

Major refactoring and deprecations

  • All extracted metadata is now placed under git-annex by default. Previously files smaller than 20 kb were stored in git. (#3109)
  • The function datalad.cmd.get_runner has been removed. (#3104)

Fixes

  • Improved handling of long commands:
    • The code that inspected SC_ARG_MAX didn’t check that the reported value was a sensible, positive number. (#3025)
    • More commands that invoke git and git-annex with file arguments learned to split up the command calls when it is likely that the command would fail due to exceeding the maximum supported length. (#3138)
  • The setup_yoda_dataset procedure created a malformed .gitattributes line. (#3057)
  • download-url unnecessarily tried to infer the dataset when --no-save was given. (#3029)
  • rerun aborted too late and with a confusing message when a ref specified via --onto didn’t exist. (#3019)
  • run:
    • run didn’t preserve the current directory prefix (“./”) on inputs and outputs, which is problematic if the caller relies on this representation when formatting the command. (#3037)
    • Fixed a number of unicode py2-compatibility issues. (#3035) (#3046)
    • To proceed with a failed command, the user was confusingly instructed to use save instead of add even though run uses add underneath. (#3080)
  • Fixed a case where the helper class for checking external modules incorrectly reported a module as unknown. (#3051)
  • add-archive-content mishandled the archive path when the leading path contained a symlink. (#3058)
  • Following denied access, the credential code failed to consider a scenario, leading to a type error rather than an appropriate error message. (#3091)
  • Some tests failed when executed from a git worktree checkout of the source repository. (#3129)
  • During metadata extraction, batched annex processes weren’t properly terminated, leading to issues on Windows. (#3137)
  • add incorrectly handled an “invalid repository” exception when trying to add a submodule. (#3141)
  • Pass GIT_SSH_VARIANT=ssh to git processes to be able to specify alternative ports in SSH urls

Enhancements and new features

  • search learned to suggest closely matching keys if there are no hits. (#3089)
  • create-sibling
  • Interface classes can now override the default renderer for summarizing results. (#3061)
  • run:
    • --input and --output can now be shortened to -i and -o. (#3066)
    • Placeholders such as “{inputs}” are now expanded in the command that is shown in the commit message subject. (#3065)
    • interface.run.run_command gained an extra_inputs argument so that wrappers like datalad-container can specify additional inputs that aren’t considered when formatting the command string. (#3038)
    • “–” can now be used to separate options for run and those for the command in ambiguous cases. (#3119)
  • The utilities create_tree and ok_file_has_content now support “.gz” files. (#3049)
  • The Singularity container for 0.11.1 now uses nd_freeze to make its builds reproducible.
  • A publications page has been added to the documentation. (#3099)
  • GitRepo.set_gitattributes now accepts a mode argument that controls whether the .gitattributes file is appended to (default) or overwritten. (#3115)
  • datalad --help now avoids using man so that the list of subcommands is shown. (#3124)

0.11.1 (Nov 26, 2018) – v7-better-than-v6

Rushed out bugfix release to stay fully compatible with recent git-annex which introduced v7 to replace v6.

Fixes

  • install: be able to install recursively into a dataset (#2982)
  • save: be able to commit/save changes whenever files potentially could have swapped their storage between git and annex (#1651) (#2752) (#3009)
  • [aggregate-metadata][]:
    • dataset’s itself is now not “aggregated” if specific paths are provided for aggregation (#3002). That resolves the issue of -r invocation aggregating all subdatasets of the specified dataset as well
    • also compare/verify the actual content checksum of aggregated metadata while considering subdataset metadata for re-aggregation (#3007)
  • annex commands are now chunked assuming 50% “safety margin” on the maximal command line length. Should resolve crashes while operating ot too many files at ones (#3001)
  • run sidecar config processing (#2991)
  • no double trailing period in docs (#2984)
  • correct identification of the repository with symlinks in the paths in the tests (#2972)
  • re-evaluation of dataset properties in case of dataset changes (#2946)
  • [text2git][] procedure to use ds.repo.set_gitattributes (#2974) (#2954)
  • Switch to use plain os.getcwd() if inconsistency with env var $PWD is detected (#2914)
  • Make sure that credential defined in env var takes precedence (#2960) (#2950)

Enhancements and new features

  • shub://datalad/datalad:git-annex-dev provides a Debian buster Singularity image with build environment for git-annex. tools/bisect-git-annex provides a helper for running git bisect on git-annex using that Singularity container (#2995)
  • Added .zenodo.json for better integration with Zenodo for citation
  • run-procedure now provides names and help messages with a custom renderer for (#2993)
  • Documentation: point to datalad-revolution extension (prototype of the greater DataLad future)
  • run
    • support injecting of a detached command (#2937)
  • annex metadata extractor now extracts annex.key metadata record. Should allow now to identify uses of specific files etc (#2952)
  • Test that we can install from http://datasets.datalad.org
  • Proper rendering of CommandError (e.g. in case of “out of space” error) (#2958)

0.11.0 (Oct 23, 2018) – Soon-to-be-perfect

git-annex 6.20180913 (or later) is now required - provides a number of fixes for v6 mode operations etc.

Major refactoring and deprecations

  • datalad.consts.LOCAL_CENTRAL_PATH constant was deprecated in favor of datalad.locations.default-dataset configuration variable (#2835)

Minor refactoring

  • "notneeded" messages are no longer reported by default results renderer
  • run no longer shows commit instructions upon command failure when explicit is true and no outputs are specified (#2922)
  • get_git_dir moved into GitRepo (#2886)
  • _gitpy_custom_call removed from GitRepo (#2894)
  • GitRepo.get_merge_base argument is now called commitishes instead of treeishes (#2903)

Fixes

  • update should not leave the dataset in non-clean state (#2858) and some other enhancements (#2859)
  • Fixed chunking of the long command lines to account for decorators and other arguments (#2864)
  • Progress bar should not crash the process on some missing progress information (#2891)
  • Default value for jobs set to be "auto" (not None) to take advantage of possible parallel get if in -g mode (#2861)
  • wtf must not crash if git-annex is not installed etc (#2865), (#2865), (#2918), (#2917)
  • Fixed paths (with spaces etc) handling while reporting annex error output (#2892), (#2893)
  • __del__ should not access .repo but ._repo to avoid attempts for reinstantiation etc (#2901)
  • Fix up submodule .git right in GitRepo.add_submodule to avoid added submodules being non git-annex friendly (#2909), (#2904)
  • run-procedure (#2905)
    • now will provide dataset into the procedure if called within dataset
    • will not crash if procedure is an executable without .py or .sh suffixes
  • Use centralized .gitattributes handling while setting annex backend (#2912)
  • GlobbedPaths.expand(..., full=True) incorrectly returned relative paths when called more than once (#2921)

Enhancements and new features

  • Report progress on clone when installing from “smart” git servers (#2876)
  • Stale/unused sth_like_file_has_content was removed (#2860)
  • Enhancements to search to operate on “improved” metadata layouts (#2878)
  • Output of git annex init operation is now logged (#2881)
  • New
    • GitRepo.cherry_pick (#2900)
    • GitRepo.format_commit (#2902)
  • run-procedure (#2905)
    • procedures can now recursively be discovered in subdatasets as well. The uppermost has highest priority
    • Procedures in user and system locations now take precedence over those in datasets.

0.10.3.1 (Sep 13, 2018) – Nothing-is-perfect

Emergency bugfix to address forgotten boost of version in datalad/version.py.

0.10.3 (Sep 13, 2018) – Almost-perfect

This is largely a bugfix release which addressed many (but not yet all) issues of working with git-annex direct and version 6 modes, and operation on Windows in general. Among enhancements you will see the support of public S3 buckets (even with periods in their names), ability to configure new providers interactively, and improved egrep search backend.

Although we do not require with this release, it is recommended to make sure that you are using a recent git-annex since it also had a variety of fixes and enhancements in the past months.

Fixes

  • Parsing of combined short options has been broken since DataLad v0.10.0. (#2710)
  • The datalad save instructions shown by datalad run for a command with a non-zero exit were incorrectly formatted. (#2692)
  • Decompression of zip files (e.g., through datalad add-archive-content) failed on Python 3. (#2702)
  • Windows:
    • colored log output was not being processed by colorama. (#2707)
    • more codepaths now try multiple times when removing a file to deal with latency and locking issues on Windows. (#2795)
  • Internal git fetch calls have been updated to work around a GitPython BadName issue. (#2712), (#2794)
  • The progess bar for annex file transferring was unable to handle an empty file. (#2717)
  • datalad add-readme halted when no aggregated metadata was found rather than displaying a warning. (#2731)
  • datalad rerun failed if --onto was specified and the history contained no run commits. (#2761)
  • Processing of a command’s results failed on a result record with a missing value (e.g., absent field or subfield in metadata). Now the missing value is rendered as “N/A”. (#2725).
  • A couple of documentation links in the “Delineation from related solutions” were misformatted. (#2773)
  • With the latest git-annex, several known V6 failures are no longer an issue. (#2777)
  • In direct mode, commit changes would often commit annexed content as regular Git files. A new approach fixes this and resolves a good number of known failures. (#2770)
  • The reporting of command results failed if the current working directory was removed (e.g., after an unsuccessful install). (#2788)
  • When installing into an existing empty directory, datalad install removed the directory after a failed clone. (#2788)
  • datalad run incorrectly handled inputs and outputs for paths with spaces and other characters that require shell escaping. (#2798)
  • Globbing inputs and outputs for datalad run didn’t work correctly if a subdataset wasn’t installed. (#2796)
  • Minor (in)compatibility with git 2.19 - (no) trailing period in an error message now. (#2815)

Enhancements and new features

  • Anonymous access is now supported for S3 and other downloaders. (#2708)
  • A new interface is available to ease setting up new providers. (#2708)
  • Metadata: changes to egrep mode search (#2735)
    • Queries in egrep mode are now case-sensitive when the query contains any uppercase letters and are case-insensitive otherwise. The new mode egrepcs can be used to perform a case-sensitive query with all lower-case letters.
    • Search can now be limited to a specific key.
    • Multiple queries (list of expressions) are evaluated using AND to determine whether something is a hit.
    • A single multi-field query (e.g., pa*:findme) is a hit, when any matching field matches the query.
    • All matching key/value combinations across all (multi-field) queries are reported in the query_matched result field.
    • egrep mode now shows all hits rather than limiting the results to the top 20 hits.
  • The documentation on how to format commands for datalad run has been improved. (#2703)
  • The method for determining the current working directory on Windows has been improved. (#2707)
  • datalad --version now simply shows the version without the license. (#2733)
  • datalad export-archive learned to export under an existing directory via its --filename option. (#2723)
  • datalad export-to-figshare now generates the zip archive in the root of the dataset unless --filename is specified. (#2723)
  • After importing datalad.api, help(datalad.api) (or datalad.api? in IPython) now shows a summary of the available DataLad commands. (#2728)
  • Support for using datalad from IPython has been improved. (#2722)
  • datalad wtf now returns structured data and reports the version of each extension. (#2741)
  • The internal handling of gitattributes information has been improved. A user-visible consequence is that datalad create --force no longer duplicates existing attributes. (#2744)
  • The “annex” metadata extractor can now be used even when no content is present. (#2724)
  • The add_url_to_file method (called by commands like datalad download-url and datalad add-archive-content) learned how to display a progress bar. (#2738)

0.10.2 (Jul 09, 2018) – Thesecuriestever

Primarily a bugfix release to accommodate recent git-annex release forbidding file:// and http://localhost/ URLs which might lead to revealing private files if annex is publicly shared.

Fixes

  • fixed testing to be compatible with recent git-annex (6.20180626)
  • download-url will now download to current directory instead of the top of the dataset

Enhancements and new features

  • do not quote ~ in URLs to be consistent with quote implementation in Python 3.7 which now follows RFC 3986
  • run support for user-configured placeholder values
  • documentation on native git-annex metadata support
  • handle 401 errors from LORIS tokens
  • yoda procedure will instantiate README.md
  • --discover option added to run-procedure to list available procedures

0.10.1 (Jun 17, 2018) – OHBM polish

The is a minor bugfix release.

Fixes

  • Be able to use backports.lzma as a drop-in replacement for pyliblzma.
  • Give help when not specifying a procedure name in run-procedure.
  • Abort early when a downloader received no filename.
  • Avoid rerun error when trying to unlock non-available files.

0.10.0 (Jun 09, 2018) – The Release

This release is a major leap forward in metadata support.

Major refactoring and deprecations

  • Metadata
    • Prior metadata provided by datasets under .datalad/meta is no longer used or supported. Metadata must be reaggregated using 0.10 version
    • Metadata extractor types are no longer auto-guessed and must be explicitly specified in datalad.metadata.nativetype config (could contain multiple values)
    • Metadata aggregation of a dataset hierarchy no longer updates all datasets in the tree with new metadata. Instead, only the target dataset is updated. This behavior can be changed via the –update-mode switch. The new default prevents needless modification of (3rd-party) subdatasets.
    • Neuroimaging metadata support has been moved into a dedicated extension: https://github.com/datalad/datalad-neuroimaging
  • Crawler
  • export_tarball plugin has been generalized to export_archive and can now also generate ZIP archives.
  • By default a dataset X is now only considered to be a super-dataset of another dataset Y, if Y is also a registered subdataset of X.

Fixes

A number of fixes did not make it into the 0.9.x series:

  • Dynamic configuration overrides via the -c option were not in effect.
  • save is now more robust with respect to invocation in subdirectories of a dataset.
  • unlock now reports correct paths when running in a dataset subdirectory.
  • get is more robust to path that contain symbolic links.
  • symlinks to subdatasets of a dataset are now correctly treated as a symlink, and not as a subdataset
  • add now correctly saves staged subdataset additions.
  • Running datalad save in a dataset no longer adds untracked content to the dataset. In order to add content a path has to be given, e.g. datalad save .
  • wtf now works reliably with a DataLad that wasn’t installed from Git (but, e.g., via pip)
  • More robust URL handling in simple_with_archives crawler pipeline.

Enhancements and new features

  • Support for DataLad extension that can contribute API components from 3rd-party sources, incl. commands, metadata extractors, and test case implementations. See https://github.com/datalad/datalad-extension-template for a demo extension.
  • Metadata (everything has changed!)
    • Metadata extraction and aggregation is now supported for datasets and individual files.
    • Metadata query via search can now discover individual files.
    • Extracted metadata can now be stored in XZ compressed files, is optionally annexed (when exceeding a configurable size threshold), and obtained on demand (new configuration option datalad.metadata.create-aggregate-annex-limit).
    • Status and availability of aggregated metadata can now be reported via metadata --get-aggregates
    • New configuration option datalad.metadata.maxfieldsize to exclude too large metadata fields from aggregation.
    • The type of metadata is no longer guessed during metadata extraction. A new configuration option datalad.metadata.nativetype was introduced to enable one or more particular metadata extractors for a dataset.
    • New configuration option datalad.metadata.store-aggregate-content to enable the storage of aggregated metadata for dataset content (i.e. file-based metadata) in contrast to just metadata describing a dataset as a whole.
  • search was completely reimplemented. It offers three different modes now:
    • ‘egrep’ (default): expression matching in a plain string version of metadata
    • ‘textblob’: search a text version of all metadata using a fully featured query language (fast indexing, good for keyword search)
    • ‘autofield’: search an auto-generated index that preserves individual fields of metadata that can be represented in a tabular structure (substantial indexing cost, enables the most detailed queries of all modes)
  • New extensions:
    • addurls, an extension for creating a dataset (and possibly subdatasets) from a list of URLs.
    • export_to_figshare
    • extract_metadata
  • add_readme makes use of available metadata
  • By default the wtf extension now hides sensitive information, which can be included in the output by passing --senstive=some or --senstive=all.
  • Reduced startup latency by only importing commands necessary for a particular command line call.
  • create:
    • -d <parent> --nosave now registers subdatasets, when possible.
    • --fake-dates configures dataset to use fake-dates
  • run now provides a way for the caller to save the result when a command has a non-zero exit status.
  • datalad rerun now has a --script option that can be used to extract previous commands into a file.
  • A DataLad Singularity container is now available on Singularity Hub.
  • More casts have been embedded in the use case section of the documentation.
  • datalad --report-status has a new value ‘all’ that can be used to temporarily re-enable reporting that was disable by configuration settings.

0.9.3 (Mar 16, 2018) – pi+0.02 release

Some important bug fixes which should improve usability

Fixes

  • datalad-archives special remote now will lock on acquiring or extracting an archive - this allows for it to be used with -J flag for parallel operation
  • relax introduced in 0.9.2 demand on git being configured for datalad operation - now we will just issue a warning
  • datalad ls should now list “authored date” and work also for datasets in detached HEAD mode
  • datalad save will now save original file as well, if file was “git mv”ed, so you can now datalad run git mv old new and have changes recorded

Enhancements and new features

  • --jobs argument now could take auto value which would decide on # of jobs depending on the # of available CPUs. git-annex > 6.20180314 is recommended to avoid regression with -J.
  • memoize calls to RI meta-constructor – should speed up operation a bit
  • DATALAD_SEED environment variable could be used to seed Python RNG and provide reproducible UUIDs etc (useful for testing and demos)

0.9.2 (Mar 04, 2018) – it is (again) better than ever

Largely a bugfix release with a few enhancements.

Fixes

  • Execution of external commands (git) should not get stuck when lots of both stdout and stderr output, and should not loose remaining output in some cases
  • Config overrides provided in the command line (-c) should now be handled correctly
  • Consider more remotes (not just tracking one, which might be none) while installing subdatasets
  • Compatibility with git 2.16 with some changed behaviors/annotations for submodules
  • Fail remove if annex drop failed
  • Do not fail operating on files which start with dash (-)
  • URL unquote paths within S3, URLs and DataLad RIs (///)
  • In non-interactive mode fail if authentication/access fails
  • Web UI:
    • refactored a little to fix incorrect listing of submodules in subdirectories
    • now auto-focuses on search edit box upon entering the page
  • Assure that extracted from tarballs directories have executable bit set

Enhancements and new features

  • A log message and progress bar will now inform if a tarball to be downloaded while getting specific files (requires git-annex > 6.20180206)
  • A dedicated datalad rerun command capable of rerunning entire sequences of previously run commands. Reproducibility through VCS. Use ``run`` even if not interested in ``rerun``
  • Alert the user if git is not yet configured but git operations are requested
  • Delay collection of previous ssh connections until it is actually needed. Also do not require ‘:’ while specifying ssh host
  • AutomagicIO: Added proxying of isfile, lzma.LZMAFile and io.open
  • Testing:
    • added DATALAD_DATASETS_TOPURL=http://datasets-tests.datalad.org to run tests against another website to not obscure access stats
    • tests run against temporary HOME to avoid side-effects
    • better unit-testing of interactions with special remotes
  • CONTRIBUTING.md describes how to setup and use git-hub tool to “attach” commits to an issue making it into a PR
  • DATALAD_USE_DEFAULT_GIT env variable could be used to cause DataLad to use default (not the one possibly bundled with git-annex) git
  • Be more robust while handling not supported requests by annex in special remotes
  • Use of swallow_logs in the code was refactored away – less mysteries now, just increase logging level
  • wtf plugin will report more information about environment, externals and the system

0.9.1 (Oct 01, 2017) – “DATALAD!”(JBTM)

Minor bugfix release

Fixes

  • Should work correctly with subdatasets named as numbers of bool values (requires also GitPython >= 2.1.6)
  • Custom special remotes should work without crashing with git-annex >= 6.20170924

0.9.0 (Sep 19, 2017) – isn’t it a lucky day even though not a Friday?

Major refactoring and deprecations

  • the files argument of save has been renamed to path to be uniform with any other command
  • all major commands now implement more uniform API semantics and result reporting. Functionality for modification detection of dataset content has been completely replaced with a more efficient implementation
  • publish now features a --transfer-data switch that allows for a disambiguous specification of whether to publish data – independent of the selection which datasets to publish (which is done via their paths). Moreover, publish now transfers data before repository content is pushed.

Fixes

  • drop no longer errors when some subdatasets are not installed
  • install will no longer report nothing when a Dataset instance was given as a source argument, but rather perform as expected
  • remove doesn’t remove when some files of a dataset could not be dropped
  • publish
    • no longer hides error during a repository push
    • publish behaves “correctly” for --since= in considering only the differences the last “pushed” state
    • data transfer handling while publishing with dependencies, to github
  • improved robustness with broken Git configuration
  • search should search for unicode strings correctly and not crash
  • robustify git-annex special remotes protocol handling to allow for spaces in the last argument
  • UI credentials interface should now allow to Ctrl-C the entry
  • should not fail while operating on submodules named with numerics only or by bool (true/false) names
  • crawl templates should not now override settings for largefiles if specified in .gitattributes

Enhancements and new features

  • Exciting new feature run command to protocol execution of an external command and rerun computation if desired. See screencast
  • save now uses Git for detecting with sundatasets need to be inspected for potential changes, instead of performing a complete traversal of a dataset tree
  • add looks for changes relative to the last commited state of a dataset to discover files to add more efficiently
  • diff can now report untracked files in addition to modified files
  • [uninstall][] will check itself whether a subdataset is properly registered in a superdataset, even when no superdataset is given in a call
  • subdatasets can now configure subdatasets for exclusion from recursive installation (datalad-recursiveinstall submodule configuration property)
  • precrafted pipelines of [crawl][] now will not override annex.largefiles setting if any was set within .gitattribues (e.g. by datalad create --text-no-annex)
  • framework for screencasts: tools/cast* tools and sample cast scripts under doc/casts which are published at datalad.org/features.html
  • new project YouTube channel
  • tests failing in direct and/or v6 modes marked explicitly

0.8.1 (Aug 13, 2017) – the best birthday gift

Bugfixes

Fixes

  • Do not attempt to update a not installed sub-dataset
  • In case of too many files to be specified for get or copy_to, we will make multiple invocations of underlying git-annex command to not overfill command line
  • More robust handling of unicode output in terminals which might not support it

Enhancements and new features

  • Ship a copy of numpy.testing to facilitate [test][] without requiring numpy as dependency. Also allow to pass to command which test(s) to run
  • In get and copy_to provide actual original requested paths, not the ones we deduced need to be transferred, solely for knowing the total

0.8.0 (Jul 31, 2017) – it is better than ever

A variety of fixes and enhancements

Fixes

  • publish would now push merged git-annex branch even if no other changes were done
  • publish should be able to publish using relative path within SSH URI (git hook would use relative paths)
  • publish should better tollerate publishing to pure git and git-annex special remotes

Enhancements and new features

  • plugin mechanism came to replace export. See export_tarball for the replacement of export. Now it should be easy to extend datalad’s interface with custom functionality to be invoked along with other commands.
  • Minimalistic coloring of the results rendering
  • publish/copy_to got progress bar report now and support of --jobs
  • minor fixes and enhancements to crawler (e.g. support of recursive removes)

0.7.0 (Jun 25, 2017) – when it works - it is quite awesome!

New features, refactorings, and bug fixes.

Major refactoring and deprecations

Enhancements and new features

  • siblings can now be used to query and configure a local repository by using the sibling name here
  • siblings can now query and set annex preferred content configuration. This includes wanted (as previously supported in other commands), and now also required
  • New metadata command to interface with datasets/files meta-data
  • Documentation for all commands is now built in a uniform fashion
  • Significant parts of the documentation of been updated
  • Instantiate GitPython’s Repo instances lazily

Fixes

  • API documentation is now rendered properly as HTML, and is easier to browse by having more compact pages
  • Closed files left open on various occasions (Popen PIPEs, etc)
  • Restored basic (consumer mode of operation) compatibility with Windows OS

0.6.0 (Jun 14, 2017) – German perfectionism

This release includes a huge refactoring to make code base and functionality more robust and flexible

  • outputs from API commands could now be highly customized. See --output-format, --report-status, --report-type, and --report-type options for datalad command.
  • effort was made to refactor code base so that underlying functions behave as generators where possible
  • input paths/arguments analysis was redone for majority of the commands to provide unified behavior

Major refactoring and deprecations

  • add-sibling and rewrite-urls were refactored in favor of new siblings command which should be used for siblings manipulations
  • ‘datalad.api.alwaysrender’ config setting/support is removed in favor of new outputs processing

Fixes

  • Do not flush manually git index in pre-commit to avoid “Death by the Lock” issue
  • Deployed by publish post-update hook script now should be more robust (tolerate directory names with spaces, etc.)
  • A variety of fixes, see list of pull requests and issues closed for more information

Enhancements and new features

  • new annotate-paths plumbing command to inspect and annotate provided paths. Use --modified to summarize changes between different points in the history
  • new clone plumbing command to provide a subset (install a single dataset from a URL) functionality of install
  • new diff plumbing command
  • new siblings command to list or manipulate siblings
  • new subdatasets command to list subdatasets and their properties
  • drop and remove commands were refactored
  • benchmarks/ collection of Airspeed velocity benchmarks initiated. See reports at http://datalad.github.io/datalad/
  • crawler would try to download a new url multiple times increasing delay between attempts. Helps to resolve problems with extended crawls of Amazon S3
  • CRCNS crawler pipeline now also fetches and aggregates meta-data for the datasets from datacite
  • overall optimisations to benefit from the aforementioned refactoring and improve user-experience
  • a few stub and not (yet) implemented commands (e.g. move) were removed from the interface
  • Web frontend got proper coloring for the breadcrumbs and some additional caching to speed up interactions. See http://datasets.datalad.org
  • Small improvements to the online documentation. See e.g. summary of differences between git/git-annex/datalad

0.5.1 (Mar 25, 2017) – cannot stop the progress

A bugfix release

Fixes

  • add was forcing addition of files to annex regardless of settings in .gitattributes. Now that decision is left to annex by default
  • tools/testing/run_doc_examples used to run doc examples as tests, fixed up to provide status per each example and not fail at once
  • doc/examples
  • progress bars
    • should no longer crash datalad and report correct sizes and speeds
    • should provide progress reports while using Python 3.x

Enhancements and new features

  • doc/examples
    • nipype_workshop_dataset.sh new example to demonstrate how new super- and sub- datasets were established as a part of our datasets collection

0.5.0 (Mar 20, 2017) – it’s huge

This release includes an avalanche of bug fixes, enhancements, and additions which at large should stay consistent with previous behavior but provide better functioning. Lots of code was refactored to provide more consistent code-base, and some API breakage has happened. Further work is ongoing to standardize output and results reporting (#1350)

Most notable changes

  • requires git-annex >= 6.20161210 (or better even >= 6.20161210 for improved functionality)
  • commands should now operate on paths specified (if any), without causing side-effects on other dirty/staged files
  • save
    • -a is deprecated in favor of -u or --all-updates so only changes known components get saved, and no new files automagically added
    • -S does no longer store the originating dataset in its commit message
  • add
    • can specify commit/save message with -m
  • add-sibling and create-sibling
    • now take the name of the sibling (remote) as a -s (--name) option, not a positional argument
    • --publish-depends to setup publishing data and code to multiple repositories (e.g. github + webserve) should now be functional see this comment
    • got --publish-by-default to specify what refs should be published by default
    • got --annex-wanted, --annex-groupwanted and --annex-group settings which would be used to instruct annex about preferred content. publish then will publish data using those settings if wanted is set.
    • got --inherit option to automagically figure out url/wanted and other git/annex settings for new remote sub-dataset to be constructed
  • publish
    • got --skip-failing refactored into --missing option which could use new feature of create-sibling --inherit

Fixes

  • More consistent interaction through ssh - all ssh connections go through sshrun shim for a “single point of authentication”, etc.
  • More robust ls operation outside of the datasets
  • A number of fixes for direct and v6 mode of annex

Enhancements and new features

  • New drop and remove commands
  • clean
    • got --what to specify explicitly what cleaning steps to perform and now could be invoked with -r
  • datalad and git-annex-remote* scripts now do not use setuptools entry points mechanism and rely on simple import to shorten start up time
  • Dataset is also now using Flyweight pattern, so the same instance is reused for the same dataset
  • progressbars should not add more empty lines

Internal refactoring

  • Majority of the commands now go through _prep for arguments validation and pre-processing to avoid recursive invocations

0.4.1 (Nov 10, 2016) – CA release

Requires now GitPython >= 2.1.0

Fixes

  • save
    • to not save staged files if explicit paths were provided
  • improved (but not yet complete) support for direct mode
  • update to not crash if some sub-datasets are not installed
  • do not log calls to git config to avoid leakage of possibly sensitive settings to the logs

Enhancements and new features

  • New rfc822-compliant metadata format
  • save
    • -S to save the change also within all super-datasets
  • add now has progress-bar reporting
  • create-sibling-github to create a :term:sibling of a dataset on github
  • OpenfMRI crawler and datasets were enriched with URLs to separate files where also available from openfmri s3 bucket (if upgrading your datalad datasets, you might need to run git annex enableremote datalad to make them available)
  • various enhancements to log messages
  • web interface
    • populates “install” box first thus making UX better over slower connections

0.4 (Oct 22, 2016) – Paris is waiting

Primarily it is a bugfix release but because of significant refactoring of the install and get implementation, it gets a new minor release.

Fixes

  • be able to get or install while providing paths while being outside of a dataset
  • remote annex datasets get properly initialized
  • robust detection of outdated git-annex

Enhancements and new features

  • interface changes
    • get --recursion-limit=existing to not recurse into not-installed subdatasets
    • get -n to possibly install sub-datasets without getting any data
    • install --jobs|-J to specify number of parallel jobs for annex get call could use (ATM would not work when data comes from archives)
  • more (unit-)testing
  • documentation: see http://docs.datalad.org/en/latest/basics.html for basic principles and useful shortcuts in referring to datasets
  • various webface improvements: breadcrumb paths, instructions how to install dataset, show version from the tags, etc.

0.3.1 (Oct 1, 2016) – what a wonderful week

Primarily bugfixes but also a number of enhancements and core refactorings

Fixes

  • do not build manpages and examples during installation to avoid problems with possibly previously outdated dependencies
  • install can be called on already installed dataset (with -r or -g)

Enhancements and new features

  • complete overhaul of datalad configuration settings handling (see Configuration documentation), so majority of the environment. Now uses git format and stores persistent configuration settings under .datalad/config and local within .git/config variables we have used were renamed to match configuration names
  • create-sibling does not now by default upload web front-end
  • export command with a plug-in interface and tarball plugin to export datasets
  • in Python, .api functions with rendering of results in command line got a _-suffixed sibling, which would render results as well in Python as well (e.g., using search_ instead of search would also render results, not only output them back as Python objects)
  • get
    • --jobs option (passed to annex get) for parallel downloads
    • total and per-download (with git-annex >= 6.20160923) progress bars (note that if content is to be obtained from an archive, no progress will be reported yet)
  • install --reckless mode option
  • search
    • highlights locations and fieldmaps for better readability
    • supports -d^ or -d/// to point to top-most or centrally installed meta-datasets
    • “complete” paths to the datasets are reported now
    • -s option to specify which fields (only) to search
  • various enhancements and small fixes to meta-data handling, ls, custom remotes, code-base formatting, downloaders, etc
  • completely switched to tqdm library (progressbar is no longer used/supported)

0.3 (Sep 23, 2016) – winter is coming

Lots of everything, including but not limited to

0.2.3 (Jun 28, 2016) – busy OHBM

New features and bugfix release

0.2.2 (Jun 20, 2016) – OHBM we are coming!

New feature and bugfix release

  • greately improved documentation
  • publish command API RFing allows for custom options to annex, and uses –to REMOTE for consistent with annex invocation
  • variety of fixes and enhancements throughout

0.2.1 (Jun 10, 2016)

  • variety of fixes and enhancements throughout

0.2 (May 20, 2016)

Major RFing to switch from relying on rdf to git native submodules etc

0.1 (Oct 14, 2015)

Release primarily focusing on interface functionality including initial publishing