datalad_next.itertools.itemize
- datalad_next.itertools.itemize(iterable: Iterable[T], sep: T | None, *, keep_ends: bool = False) Generator[T, None, None] [source]
Yields complete items (only), assembled from an iterable
This function consumes chunks from an iterable and yields items defined by a separator. An item might span multiple input chunks. Input (chunks) can be
bytes
,bytearray
, orstr
objects. The result type is determined by the type of the first input chunk. During its runtime, the type of the elements initerable
must not change.Items are defined by a separator given via
sep
. Ifsep
isNone
, the line-separators built intostr.splitlines()
are used, and each yielded item will be a line. Ifsep
is not None, its type must be compatible to the type of the elements initerable
.A separator could, for example, be
b'\n'
, in which case the items would be terminated by Unix line-endings, i.e. each yielded item is a single line. The separator could also be,b'\x00'
(or'\x00'
), to split zero-byte delimited content, like the output ofgit ls-files -z
.Separators can be longer than one byte or character, e.g.
b'\r\n'
, orb'\n-------------------\n'
.Content after the last separator, possibly merged across input chunks, is always yielded as the last item, even if it is not terminated by the separator.
Performance notes:
Using
None
as a separator (splitlines-mode) is slower than providing a specific separator.If another separator than
None
is used, the runtime withkeep_end=False
is faster than withkeep_end=True
.
- Parameters:
iterable (Iterable[str | bytes | bytearray]) -- The iterable that yields the input data
sep (str | bytes | bytearray | None) -- The separator that defines items. If
None
, the items are determined by the line-separators that are built intostr.splitlines()
.keep_ends (bool) -- If True, the item-separator will remain at the end of a yielded item. If False, items will not contain the separator. Preserving separators implies a runtime cost, unless the separator is
None
.
- Yields:
str | bytes | bytearray -- The items determined from the input iterable. The type of the yielded items depends on the type of the first element in
iterable
.
Examples
>>> from datalad_next.itertools import itemize >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=None))[0:2]) ('root:x:0:0:root:/root:/bin/bash', 'systemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin') >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=':'))[0:10]) ('root', 'x', '0', '0', 'root', '/root', '/bin/bash\nsystemd-timesync', 'x', '497', '497') >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=':', keep_ends=True))[0:10]) ('root:', 'x:', '0:', '0:', 'root:', '/root:', '/bin/bash\nsystemd-timesync:', 'x:', '497:', '497:')