datalad_metalad.extractors.core

MetadataRecord extractor for Datalad’s own core storage

class datalad_metalad.extractors.core.DataladCoreExtractor[source]

Bases: datalad_metalad.extractors.base.MetadataExtractor

get_state(dataset)[source]

Report on extractor-related state and configuration

Extractors can reimplement this method to report arbitrary information in a dictionary. This information will be included in the metadata aggregate catalog in each dataset. Consequently, this information should be brief/compact and limited to essential facts on a comprehensive state of an extractor that “fully” determines its behavior. Only plain key-value items, with simple values, such a string int, float, or lists thereof, are supported.

Any change in the reported state in comparison to a recorded state for an existing metadata aggregate will cause a re-extraction of metadata. The nature of the state change does not matter, as the entire dictionary will be compared. Primarily, this is useful for reporting per-extractor version information (such as a version for the extractor output format, or critical version information on external software components employed by the extractor), and potential configuration settings that determine the behavior of on extractor.

State information can be dataset-specific. The respective Dataset object instance is passed via the method’s dataset argument.

datalad_metalad.extractors.core.ri2url(ri)[source]
datalad_metalad.extractors.core.whereis_file(self, path)[source]

Same as whereis_file_(), but for a single path and return-dict

datalad_metalad.extractors.core.whereis_file_(self, paths)[source]
Parameters:paths (iterable) – Paths of files to query for, either absolute paths matching the repository root (self.path), or paths relative to the root of the repository
Yields:dict – A response dictionary to each query path with the following keys: ‘path’ with the queried path in the same form t was provided; ‘status’ {ok|error} indicating whether git annex was queried successfully for a path; ‘key’ with the annex key for the file; ‘remotes’ with a dictionary of remotes that have a copy of the respective file (annex UUIDs are keys, and values are dictionaries with keys: ‘description’, ‘here’, ‘urls’ (list) that contain the values of the respective ‘git annex whereis’ response.