datalad_next.types.ArchivistLocator

class datalad_next.types.ArchivistLocator(akey: AnnexKey, member: PurePosixPath, size: int | None = None, atype: ArchiveType | None = None)[source]

Bases: object

Representation of a dl+archive: archive member locator

These locators are used by the datalad-archives and archivist git-annex special remotes. They identify a member of a archive that is itself identified by an annex key.

Each member is annotated with its size (in bytes). Optionally, the file format type of the archive can be annotated too.

Syntax of dl+archives: locators

The locators the following minimal form:

dl+archive:<archive-key>#path=<path-in-archive>

where <archive-key> is a regular git-annex key of an archive file, and <path-in-archive> is a POSIX-style relative path pointing to a member within the archive.

Two optional, additional attributes size and atype are recognized (only size is also understood by the datalad-archives special remote).

size declares the size of the (extracted) archive member in bytes:

dl+archive:<archive-key>#path=<path-in-archive>&size=<size-in-bytes>

atype declares the type of the containing archive using a label. Currently recognized labels are tar (a TAR archive, compressed or not), and zip (a ZIP archive). See ArchiveType for all recognized labels.

If no type information is given, ArchivistLocator.from_str() will try to determine the archive type from the archive key (via *E-type git-annex backends, such as DataLad's default MD5E).

The order in the fragment part of the URL (after #) is significant. path must come first, followed by size or atype. If both size and atype are present, size must be declared first. A complete example of a URL is:

dl+archive:MD5-s389--e9f624eb778e6f945771c543b6e9c7b2#path=dir/file.csv&size=234&atype=tar
akey: AnnexKey
atype: ArchiveType | None = None
classmethod from_str(url: str)[source]

Return ArchivistLocator from str form

member: PurePosixPath
size: int | None = None