Installation

You can install and run DataLad Catalog on all major operating systems by following the steps below in the command line.

Step 1 - Setup and activate a virtual environment

With your virtual environment manager of choice, create a virtual environment and ensure you have a recent version of Python installed. Then activate the environment.

With venv:

python -m venv my_catalog_env
source my_catalog_env/bin/activate

With miniconda:

conda create -n my_catalog_env python=3.11
conda activate my_catalog_env

Step 2 - Install via PyPI

pip install datalad-catalog

Congratulations! You have now installed DataLad Catalog!

Optional - Clone the repo and install the package

If you want to access the latest, unreleased version of the software or contribute to the code, access the repository via GitHub:

git clone https://github.com/datalad/datalad-catalog.git
cd datalad-catalog
pip install -e .

Dependencies

Because this is an extension to datalad and builds on metadata handling functionality, the installation process also installed datalad and datalad-metalad as dependencies, although these do not have to be used as the only sources of metadata for a catalog. In addition datalad-next is installed in order to use the latest improvements and patches to the datalad core package.

While the catalog generation process does not expect data to be structured as DataLad datasets, it can still be very useful to do so when building a full (meta)data management pipeline from raw data to catalog publishing. For complete instructions on how to install datalad and git-annex, please refer to the DataLad Handbook.

Similarly, the metadata input to datalad-catalog can come from any source as long as it conforms to the catalog schema. While the catalog does not expect metadata originating only from datalad-metalad's extractors, this tool has advanced metadata handling capabilities that will integrate seamlessly with DataLad datasets and the catalog generation process.

In order to translate metadata extracted using datalad-metalad into the catalog schema, datalad-catalog provides translation modules that are dependent on jq.