datalad download

Synopsis

datalad download [-h] [-d DATASET] [--force {overwrite-existing}] [--credential NAME] [--hash ALGORITHM] [--version] <path>|<url>|<url-path-pair> [<path>|<url>|<url-path-pair> ...]

Description

Download from URLs

This command is the front-end to an extensible framework for performing downloads from a variety of URL schemes. Built-in support for the schemes 'http', 'https', 'file', and 'ssh' is provided. Extension packages may add additional support.

In contrast to other downloader tools, this command integrates with the DataLad credential management and is able to auto-discover credentials. If no credential is available, it automatically prompts for them, and offers to store them for reuse after a successful authentication.

Simultaneous hashing (checksumming) of downloaded content is supported with user-specified algorithms.

The command can process any number of downloads (serially). it can read download specifications from (command line) arguments, files, or STDIN. It can deposit downloads to individual files, or stream to STDOUT.

Implementation and extensibility

Each URL scheme is processed by a dedicated handler. Additional schemes can be supported by sub-classing datalad_next.url_operations.UrlOperations and implementing the download() method. Extension packages can register new handlers, by patching them into the datalad_next.download._urlscheme_handlers registry dict.

Examples

Download webpage to "myfile.txt":

% datalad download "http://example.com myfile.txt"

Read download specification from STDIN (e.g. JSON-lines):

% datalad download -

Simultaneously hash download, hexdigest reported in result record:

% datalad download --hash sha256 http://example.com/data.xml"

Download from SSH server:

% datalad download "ssh://example.com/home/user/data.xml"

Stream a download to STDOUT:

% datalad -f disabled download "http://example.com -"

Options

<path>|<url>|<url-path-pair>

Download sources and targets can be given in a variety of formats: as a URL, or as a URL-path-pair that is mapping a source URL to a dedicated download target path. Any number of URLs or URL-path-pairs can be provided, either as an argument list, or read from a file (one item per line). Such a specification input file can be given as a path to an existing file (as a single value, not as part of a URL-path-pair). When the special path identifier '-' is used, the download is written to STDOUT. A specification can also be read in JSON-lines encoding (each line being a string with a URL or an object mapping a URL-string to a path-string).

-h, --help, --help-np

show this help message. --help-np forcefully disables the use of a pager for displaying the help message

-d DATASET, --dataset DATASET

Dataset to be used as a configuration source. Beyond reading configuration items, this command does not interact with the dataset.

--force {overwrite-existing}

By default, a target path for a download must not exist yet. 'force-overwrite' disabled this check.

--credential NAME

name of a credential to be used for authorization. If no credential is identified, the last-used credential for the authentication realm associated with the download target will be used. If there is no credential available yet, it will be prompted for. Once used successfully, a prompt for entering to save such a new credential will be presented.

--hash ALGORITHM

Name of a hashing algorithm supported by the Python 'hashlib' module, e.g. 'md5' or 'sha256'. This option can be given more than once.

--version

show the module and its version which provides the command

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.