datalad meta-aggregate

Synopsis

datalad meta-aggregate [-h] [-d ROOT_DATASET] [--version] SUB_DATASET_PATH [SUB_DATASET_PATH ...]

Description

Aggregate metadata of one or more sub-datasets for later reporting.

NOTE

MetadataRecord storage is not forced to reside inside the datalad repository of the dataset. MetadataRecord might be stored within the repository that is used by a dataset, but it might as well be stored in another repository (or a non-git backend, once those exist). To distinguish metadata storage from the dataset storage, we refer to metadata storage as metadata-store. For now, the metadata-store is usually the git-repository that holds the dataset.

NOTE

The distinction is the reason for the “double”-path arguments below. for each source metadata-store that should be integrated into the root metadata-store, we have to give the source metadata-store itself and the intra-dataset-path with regard to the root-dataset.

MetadataRecord aggregation refers to a procedure that combines metadata from different sub-datasets into a root dataset, i.e. a dataset that contains all the sub-datasets. Aggregated metadata is “prefixed” with the intra-dataset-paths of the sub-datasets. The intra-dataset-path for a sub-dataset is the path from the top-level directory of the root dataset, i.e. the directory that contains the “.datalad”-entry, to the top-level directory of the respective sub-dataset.

Aggregate works on existing metadata, it will not extract meta data from data file. To create metadata, use the meta-extract command.

As a result of the aggregation, the metadata of all specified sub-datasets will be available in the root metadata-store. A datalad meta-dump command on the root metadata-store will therefore be able to process metadata from the root dataset, as well as all aggregated sub-datasets.

Examples

For example, if the root dataset path is ‘/home/root_ds’, the following command can be used to aggregate metadata of two sub- datasets, e.g. ‘/home/root_ds/sub_ds1’ and ‘/home/root_ds/sub_ds2’, into the root dataset:

% datalad meta-aggregate -d /home/root_ds /home/root_ds/sub_ds1 /home/root_ds/sub_ds2

Options

SUB_DATASET_PATH

SUB_DATASET_PATH is a path to a sub-dataset whose metadata shall be aggregated into the topmost dataset (ROOT_DATASET). The sub-dataset must be located within the directory of the topmost dataset. Note: if SUB_DATASET_PATH is relative, it is resolved against the current working directory, not against the path of the topmost dataset. Constraints: value must be a string or value must be NONE

-h, –help, –help-np

show this help message. –help-np forcefully disables the use of a pager for displaying the help message

-d ROOT_DATASET, –dataset ROOT_DATASET

Topmost dataset metadata will be aggregated into. If no dataset is specified, a dataset will be discovered based on the current working directory. MetadataRecord for aggregated datasets will contain a dataset path that is relative to the top-dataset. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

–version

show the module and its version which provides the command

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.