datalad meta-extract

Synopsis

datalad meta-extract [-h] [-d DATASET] [-c CONTEXT] [--get-context] [--force-dataset-level] [--version] EXTRACTOR_NAME [FILE] [EXTRACTOR_ARGUMENTS [EXTRACTOR_ARGUMENTS ...]]

Description

Run a metadata extractor on a dataset or file.

This command distinguishes between dataset-level extraction and file-level extraction.

If no “path” argument is given, the command assumes that a given extractor is a dataset-level extractor and executes it on the dataset that is given by the current working directory or by the “-d” argument.

If a path is given, the command assumes that the path identifies a file and that the given extractor is a file-level extractor, which will then be executed on the specified file. If the file level extractor requests the content of a file that is not present, the command might “get” the file content to make it locally available. Path must not refer to a sub-dataset. Path must not be a directory.

NOTE

If you want to insert sub-dataset-metadata into the super-dataset’s metadata, you currently have to do the following: first, extract dataset metadata of the sub-dataset using a dataset- level extractor, second add the extracted metadata with sub-dataset information (i.e. dataset_path, root_dataset_id, root-dataset- version) to the metadata of the super-dataset.

The extractor configuration can be parameterized with key-value pairs given as additional arguments. Each key-value pair consists of two arguments, first the key, followed by the value. If dataset level extraction should be performed and you want to provide extractor arguments, you have to specify ‘–force_dataset_level’ to ensure dataset-level extraction. i.e. to prevent interpretation of the key of the first extractor argument as path for a file-level extraction.

The command can also take legacy datalad-metalad extractors and will execute them in either “content” or “dataset” mode, depending on the whether file-level- or dataset-level extraction is requested.

Examples

Use the metalad_example_file-extractor to extract metadatafrom the file “subdir/data_file_1.txt”. The dataset is given by the current working directory:

% datalad meta-extract metalad_example_file subdir/data_file_1.txt

Use the metalad_example_file-extractor to extract metadata from the file “subdir/data_file_1.txt” in the dataset /home/datasets/ds0001:

% datalad meta-extract -d /home/datasets/ds0001 metalad_example_file subdir/data_file_1.txt

Use the metalad_example_dataset-extractor to extract dataset-level metadata from the dataset given by the current working directory:

% datalad meta-extract metalad_example_dataset

Use the metalad_example_dataset-extractor to extract dataset-level metadata from the dataset in /home/datasets/ds0001:

% datalad meta-extract -d /home/datasets/ds0001 metalad_example_dataset

Options

EXTRACTOR_NAME

Name of a metadata extractor to be executed.

FILE

Path of a file or dataset to extract metadata from. The path should be relative to the root of the dataset. If this argument is provided, we assume a file extractor is requested, if the path is not given, or if it identifies the root of a dataset, i.e. “”, we assume a dataset level metadata extractor is specified. You might provide an absolute file path, but it has to contain the dataset path as prefix. Constraints: value must be a string or value must be NONE

EXTRACTOR_ARGUMENTS

Extractor arguments given as string arguments to the extractor. The extractor arguments are interpreted as key-value pairs. The first argument is the name of the key, the next argument is the value for that key, and so on. Consequently, there should be an even number of extractor arguments. If dataset level extraction should be performed and you want to provide extractor arguments. you have tp specify ‘–force-dataset-level’ to ensure dataset-level extraction. i.e. to prevent interpretation of the key of the first extractor argument as path for a file-level extraction. Constraints: value must be a string or value must be NONE

-h, –help, –help-np

show this help message. –help-np forcefully disables the use of a pager for displaying the help message

-d DATASET, –dataset DATASET

Dataset to extract metadata from. If no dataset is given, the dataset is determined by the current work directory. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

-c CONTEXT, –context CONTEXT

Context, a JSON-serialized dictionary that provides constant data which has been gathered before, so meta-extract will not have re-gather this data. Keys and values are strings. meta-extract will look for the following key: ‘dataset_version’. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

–get-context

Show the context that meta-extract determines with the given parameters and exit. The context can be used in subsequent calls to meta-extract with identical parameter, except from –get-context, to speed up the execution of meta-extract.

–force-dataset-level

–version

show the module and its version which provides the command

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.