Website: https://data.sfb1451.de
The catalog is placed in the docs
directory (to allow other repository content at root level),
and served to GitHub pages from there.
- SFB superdataset: sfb1451/all-projects
- Utilities for handling tabby files: sfb1451/tabby-utils
- Custom extractors & translators: mslw/datalad-wackyextra
The following DataLad extensions are required to generate the catalog: DataLad-next is used to interact with WebDAV remotes, DataLad-metalad provides metadata extraction, DataLad-wackyextra provides custom metadata extractors and translators, DataLad-catalog is used to generate the catalog.
Project datasets are tracked in the Projects
superdataset. GitHub mirror: https://github.com/sfb1451/all-projects
datalad clone -d . webdavs://<base URL>/Projects/<Project>/<subfolder> <Project>
(using project names as folder names, because these would be displayed as subdataset names).
Tip: until datalad-next/issues/108 makes it automatic, storage remote can be reconfigured with:
git annex initremote my-sciebo-storage --private --sameas <name or uuid> exporttree=yes type=webdav url="<url>"
or with clone url substitution - check:
datalad configuration | grep 'datalad.clone.url-substitute'
to get some examples how you can alter all such URLs at once.
Generation is done incrementally. Utility scripts are provided in the code directory. One script usually does one thing.
Scripts have similar CLI: they need to be pointed to a dataset, a folder to store intermediate metadata, and (optionally) to a catalog they will update. See command line help or source code for usage instructions.
Key scripts are:
extract_selected.py
: extract metadata from a project’s subdataset. It allows selection of extractors to be used. Can optionally generate a file list, too.extract_project.py
: extract metadata from a project’s superdataset. Conductsproject_extract_pipeline.json
and translates the result. This pipeline combines metalad_core, CFF, and bibliography (ris/nbib/crossref) extractors. It also adds handcrafted metadata, like funding or keywords.extract_superdataset
: extract superdataset metadata. Runsmetalad_core
andmetalad_studyminimeta
extractors.
Additional scripts are:
utils.py
: shared functions or classeslist_files.py
: generate a file listing based ondatalad status
. Deprecated, useextract_selected.py --files
.scrape_projects.py
: archived script that was used to scrape project descriptions.scrape_publications.py
: archived script that was used to scrape project publications.