Use case 1: Publishing and cloning datasets¶
Imagine you have been creating a reproducible workflow using DataLad from the get go. Everything is finished now, code, data, and paper are ready. Last thing to do: Publish your data, code, results, and workflows – ideally, all together, easily accessible, and also fast.
The solution: Publish the complete dataset to the OSF and let others clone the project to get access to data, code, version history, and workflows.
Therefore, you decide on the
annex sibling mode.
Creating the OSF sibling¶
Given OSF credentials are set, we can create a sibling in
We will also make the project public (
--public), and attach some meta data (
--tag) to it.
The code below will create a new public OSF project called
best-study-ever, a dataset sibling called
osf-annex, and a readily configured storage sibling
The project on the OSF will have a description with details on how to clone it and some meta data.
# inside of the tutorial DataLad dataset $ datalad create-sibling-osf --title best-study-ever \ -s osf-annex \ --category data \ --tag reproducibility \ --public create-sibling-osf(ok): https://osf.io/<id>/ [INFO ] Configure additional publication dependency on "osf-annex-storage" configure-sibling(ok): /tmp/collab_osf (sibling)
Publishing the dataset¶
Afterwards, all that’s left to do is a
datalad push to publish the dataset to the OSF.
$ datalad push --to osf-annex
The resulting dataset has all data and its Git history, but is not as human-readable as on a local computer:
Cloning the dataset¶
The dataset can be cloned with an
osf://<id> URL, where ID is the project ID assigned at project creation:
$ datalad clone osf://n6bgd/ best-study-ever install(ok): /tmp/best-study-ever (dataset)
All data can subsequently be obtained using