SFB 1451 publication catalog

Repository overview

The catalog is placed in the docs directory (to allow other repository content at root level), and served to GitHub pages from there.

Related repositories:

SFB superdataset: sfb1451/all-projects
Utilities for handling tabby files: sfb1451/tabby-utils
Custom extractors & translators: mslw/datalad-wackyextra

Required extensions

The following DataLad extensions are required to generate the catalog: DataLad-next is used to interact with WebDAV remotes, DataLad-metalad provides metadata extraction, DataLad-wackyextra provides custom metadata extractors and translators, DataLad-catalog is used to generate the catalog.

Dataset locations

Project datasets are tracked in the Projects superdataset. GitHub mirror: https://github.com/sfb1451/all-projects

Obtaining a subdataset from Sciebo:

datalad clone -d . webdavs://<base URL>/Projects/<Project>/<subfolder> <Project>

(using project names as folder names, because these would be displayed as subdataset names).

Tip: until datalad-next/issues/108 makes it automatic, storage remote can be reconfigured with:

git annex initremote my-sciebo-storage --private --sameas <name or uuid> exporttree=yes type=webdav url="<url>"

or with clone url substitution - check:

datalad configuration  | grep 'datalad.clone.url-substitute'

to get some examples how you can alter all such URLs at once.

Generation

Generation is done incrementally. Utility scripts are provided in the code directory. One script usually does one thing.

Scripts have similar CLI: they need to be pointed to a dataset, a folder to store intermediate metadata, and (optionally) to a catalog they will update. See command line help or source code for usage instructions.

Key scripts are:

extract_selected.py: extract metadata from a project’s subdataset. It allows selection of extractors to be used. Can optionally generate a file list, too.
extract_project.py: extract metadata from a project’s superdataset. Conducts project_extract_pipeline.json and translates the result. This pipeline combines metalad_core, CFF, and bibliography (ris/nbib/crossref) extractors. It also adds handcrafted metadata, like funding or keywords.
extract_superdataset: extract superdataset metadata. Runs metalad_core and metalad_studyminimeta extractors.

Additional scripts are:

utils.py: shared functions or classes
list_files.py: generate a file listing based on datalad status. Deprecated, use extract_selected.py --files.
scrape_projects.py: archived script that was used to scrape project descriptions.
scrape_publications.py: archived script that was used to scrape project publications.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
code		code
docs		docs
inputs		inputs
metadata		metadata
.flake8		.flake8
.gitignore		.gitignore
PATCHES.org		PATCHES.org
README.org		README.org
requirements-scraper.txt		requirements-scraper.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFB 1451 publication catalog

Repository overview

Related repositories:

Required extensions

Dataset locations

Obtaining a subdataset from Sciebo:

Generation

About

Releases

Packages

Contributors 3

Languages

sfb1451/metadata-catalog

Folders and files

Latest commit

History

Repository files navigation

SFB 1451 publication catalog

Repository overview

Related repositories:

Required extensions

Dataset locations

Obtaining a subdataset from Sciebo:

Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages