datalad_metalad.extractors.annex

MetadataRecord extractor for Git-annex metadata

This extractor only deals with the metadata that can be assigned to annexed files via git-annex’s metadata command. It does not deal with other implicit git-annex metadata, such as file availability information. This is already handled by the metalad_core extractor.

There is no standard way to define a vocabulary that is used for this kind of metadata.

class datalad_metalad.extractors.annex.AnnexMetadataExtractor[source]

Bases: datalad_metalad.extractors.base.MetadataExtractor

get_state(dataset)[source]

Report on extractor-related state and configuration

Extractors can reimplement this method to report arbitrary information in a dictionary. This information will be included in the metadata aggregate catalog in each dataset. Consequently, this information should be brief/compact and limited to essential facts on a comprehensive state of an extractor that “fully” determines its behavior. Only plain key-value items, with simple values, such a string int, float, or lists thereof, are supported.

Any change in the reported state in comparison to a recorded state for an existing metadata aggregate will cause a re-extraction of metadata. The nature of the state change does not matter, as the entire dictionary will be compared. Primarily, this is useful for reporting per-extractor version information (such as a version for the extractor output format, or critical version information on external software components employed by the extractor), and potential configuration settings that determine the behavior of on extractor.

State information can be dataset-specific. The respective Dataset object instance is passed via the method’s dataset argument.