datalad_next.url_operations.UrlOperations

class datalad_next.url_operations.UrlOperations(*, cfg: ConfigManager | None = None)[source]

Bases: object

Abstraction for operations on URLs

Support for specific URL schemes can be implemented via sub-classes. Such classes must comply with the following conditions:

  • Any configuration look-up must be performed with the self.cfg property, which is guaranteed to be a ConfigManager instance.

  • When downloads are to be supported, implement the download() method and comply with the behavior described in its documentation.

This class provides a range of helper methods to aid computation of hashes and progress reporting.

property cfg: ConfigManager
delete(url: str, *, credential: str | None = None, timeout: float | None = None) Dict[source]

Delete a resource identified by a URL

Parameters:
  • url (str) -- Valid URL with any scheme supported by a particular implementation.

  • credential (str, optional) -- The name of a dedicated credential to be used for authentication in order to perform the deletion. Particular implementations may or may not require or support authentication. They also may or may not support automatic credential lookup.

  • timeout (float, optional) -- If given, specifies a timeout in seconds. If the operation is not completed within this time, it will raise a TimeoutError-exception. If timeout is None, the operation will never timeout.

Returns:

A mapping of property names to values for the deletion.

Return type:

dict

Raises:
  • UrlOperationsRemoteError -- This exception is raised on any deletion-related error on the remote side, with a summary of the underlying issues as its message. It may carry a status code (e.g. HTTP status code) as its status_code property. Any underlying exception must be linked via the __cause__ property (e.g. raise UrlOperationsRemoteError(...) from ...).

  • UrlOperationsInteractionError --

  • UrlOperationsAuthenticationError --

  • UrlOperationsAuthorizationError --

  • UrlOperationsResourceUnknown -- Implementations that can distinguish several remote error types beyond indication a general UrlOperationsRemoteError: UrlOperationsInteractionError general issues in communicating with the remote side; UrlOperationsAuthenticationError for errors related to (failed) authentication at the remote; UrlOperationsAuthorizationError for (lack of) authorizating to access a particular resource of perform a particular operation; UrlOperationsResourceUnknown if the target of an operation does not exist.

  • TimeoutError -- If timeout is given and the operation does not complete within the number of seconds that a specified by timeout.

download(from_url: str, to_path: Path | None, *, credential: str | None = None, hash: list[str] | None = None, timeout: float | None = None) Dict[source]

Download from a URL to a local file or stream to stdout

Parameters:
  • from_url (str) -- Valid URL with any scheme supported by a particular implementation.

  • to_path (Path or None) -- A local platform-native path or None. If None the downloaded data is written to stdout, otherwise it is written to a file at the given path. The path is assumed to not exist. Any existing file will be overwritten.

  • credential (str, optional) -- The name of a dedicated credential to be used for authentication in order to perform the download. Particular implementations may or may not require or support authentication. They also may or may not support automatic credential lookup.

  • hash (list(algorithm_names), optional) -- If given, must be a list of hash algorithm names supported by the hashlib module. A corresponding hash will be computed simultaenous to the download (without reading the data twice), and included in the return value.

  • timeout (float, optional) -- If given, specifies a timeout in seconds. If the operation is not completed within this time, it will raise a TimeoutError-exception. If timeout is None, the operation will never timeout.

Returns:

A mapping of property names to values for the completed download. If hash algorithm names are provided, a corresponding key for each algorithm is included in this mapping, with the hexdigest of the corresponding checksum as the value.

Return type:

dict

Raises:
  • UrlOperationsRemoteError -- This exception is raised on any deletion-related error on the remote side, with a summary of the underlying issues as its message. It may carry a status code (e.g. HTTP status code) as its status_code property. Any underlying exception must be linked via the __cause__ property (e.g. raise UrlOperationsRemoteError(...) from ...).

  • UrlOperationsInteractionError --

  • UrlOperationsAuthenticationError --

  • UrlOperationsAuthorizationError --

  • UrlOperationsResourceUnknown -- Implementations that can distinguish several remote error types beyond indication a general UrlOperationsRemoteError: UrlOperationsInteractionError general issues in communicating with the remote side; UrlOperationsAuthenticationError for errors related to (failed) authentication at the remote; UrlOperationsAuthorizationError for (lack of) authorizating to access a particular resource of perform a particular operation; UrlOperationsResourceUnknown if the target of an operation does not exist.

  • TimeoutError -- If timeout is given and the operation does not complete within the number of seconds that a specified by timeout.

stat(url: str, *, credential: str | None = None, timeout: float | None = None) Dict[source]

Gather information on a URL target, without downloading it

Returns:

A mapping of property names to values of the URL target. The particular composition of properties depends on the specific URL. A standard property is 'content-length', indicating the size of a download.

Return type:

dict

Raises:
  • UrlOperationsRemoteError -- This exception is raised on any access-related error on the remote side, with a summary of the underlying issues as its message. It may carry a status code (e.g. HTTP status code) as its status_code property. Any underlying exception must be linked via the __cause__ property (e.g. raise UrlOperationsRemoteError(...) from ...).

  • UrlOperationsInteractionError --

  • UrlOperationsAuthenticationError --

  • UrlOperationsAuthorizationError --

  • UrlOperationsResourceUnknown -- Implementations that can distinguish several remote error types beyond indication a general UrlOperationsRemoteError: UrlOperationsInteractionError general issues in communicating with the remote side; UrlOperationsAuthenticationError for errors related to (failed) authentication at the remote; UrlOperationsAuthorizationError for (lack of) authorizating to access a particular resource of perform a particular operation; UrlOperationsResourceUnknown if the target of an operation does not exist.

  • TimeoutError -- If timeout is given and the operation does not complete within the number of seconds that a specified by timeout.

upload(from_path: Path | None, to_url: str, *, credential: str | None = None, hash: list[str] | None = None, timeout: float | None = None) Dict[source]

Upload from a local file or stream to a URL

Parameters:
  • from_path (Path or None) -- A local platform-native path or None. If None the upload data is read from stdin, otherwise it is read from a file at the given path.

  • to_url (str) -- Valid URL with any scheme supported by a particular implementation. The target is assumed to not conflict with existing content, and may be overwritten.

  • credential (str, optional) -- The name of a dedicated credential to be used for authentication in order to perform the upload. Particular implementations may or may not require or support authentication. They also may or may not support automatic credential lookup.

  • hash (list(algorithm_names), optional) -- If given, must be a list of hash algorithm names supported by the hashlib module. A corresponding hash will be computed simultaenous to the upload (without reading the data twice), and included in the return value.

  • timeout (float, optional) -- If given, specifies a timeout in seconds. If the operation is not completed within this time, it will raise a TimeoutError-exception. If timeout is None, the operation will never timeout.

Returns:

A mapping of property names to values for the completed upload. If hash algorithm names are provided, a corresponding key for each algorithm is included in this mapping, with the hexdigest of the corresponding checksum as the value.

Return type:

dict

Raises:
  • FileNotFoundError -- If the source file cannot be found.

  • UrlOperationsRemoteError -- This exception is raised on any deletion-related error on the remote side, with a summary of the underlying issues as its message. It may carry a status code (e.g. HTTP status code) as its status_code property. Any underlying exception must be linked via the __cause__ property (e.g. raise UrlOperationsRemoteError(...) from ...).

  • UrlOperationsInteractionError --

  • UrlOperationsAuthenticationError --

  • UrlOperationsAuthorizationError --

  • UrlOperationsResourceUnknown -- Implementations that can distinguish several remote error types beyond indication a general UrlOperationsRemoteError: UrlOperationsInteractionError general issues in communicating with the remote side; UrlOperationsAuthenticationError for errors related to (failed) authentication at the remote; UrlOperationsAuthorizationError for (lack of) authorizating to access a particular resource of perform a particular operation; UrlOperationsResourceUnknown if the target of an operation does not exist.

  • TimeoutError -- If timeout is given and the operation does not complete within the number of seconds that a specified by timeout.