This repo contains my own utils for parsing tabby input for DataLad catalog, with specific focus on SFB1451.
The code is functional and presented in the form of argparse-parameterised scripts, but heavily proototype in nature; parts of the code rely on files being saved or having been saved in the code / working directory; parts are unused.
The key scripts are:
load_inbox.py
covers reading an excel file or bunch of tsv files, and placing renamed tsv files in target directoryload_tabby.py
reads a dataset tabby file and produces a catalog schema translation
- A convention for reading the tabby collection is provided with this repository
- JSON-LD expansion and compaction is used to align set of terms (roughly) with catalog schema
- Processing of values (e.g. reshaping, ensuring type) is done via a set of
process-*
functions - Optionally, a “temporary” catalog in the working directory can be populated with created entries to allow “live” preview while prototyping. It needs to be created by hand with a permissive config, you can use this one.
- Some terms are resolved with API queries, done with simple requests and cached using requests_cache
“` python -m venv /tmp/my_env source /tmp/my_env/bin/activate
pip install requirements-devel.txt “`
Scripts are scattered across this repo (tabby-related) and the sfb1451 projects catalog (non-tabby) repo. To add a tabby subdataset to an existing project dataset, the following is needed:
- save the tabby files in a dataset (ideally recursively, so that superdataset is updated too)
- extract and add the project metadata (tabby addition means it is in a new version):
python .../extract_project.py PROJECTDIR OUTDIR datalad catalog-add -c ... -F ... -m ...
- “inject” metadata that does not come from any extractor (keywords, etc)
python .../inject_metadata.py --funding --keywords PROJECTDIR
- finally, add tabby metadata:
python .../load_tabby.py --catalog ... TABBYFILE