DataLad — data management and publication multitool
Welcome to DataLad’s technical documentation. Information here is targeting software developers and is focused on the Python API and CLI, as well as software design, employed technologies, and key features. Comprehensive user documentation with information on installation, basic operation, support, and (advanced) use case descriptions is available in the DataLad handbook.
Content
Concepts and technologies
- Background and motivation
- Delineation from related solutions
- Basic principles
- Credentials
- Customization and extension of functionality
- Design
- Command line interface
- Provenance capture
- Application-type vs. library-type usage
- File URL handling
- Result records
dataset
argument- Log levels
- Drop dataset components
- Python import statements
- Miscellaneous patterns
- Exception handling
- Credential management
- URL substitution
- Threaded runner
- BatchedCommand and BatchedAnnex
- Standard parameters
- Positional vs Keyword parameters
- Docstrings
- Progress reporting
- GitHub Action
- Continuous integration and testing
- User messaging: result records vs exceptions vs logging
- Glossary
Commands and API
Extension packages
DataLad can be customized and additional functionality can be integrated via extensions. Each extension provides its own documentation:
Advanced metadata tooling with JSON-LD reporting and additional metadata extractors
Staged additions, performance and user experience improvements for DataLad
Resources for working with the UKBiobank as a DataLad dataset
Deposit and retrieve DataLad datasets via the Open Science Framework
Special interest functionality or drafts of future additions to DataLad proper