Skip to content

sfb1451/metadata-catalog

Repository files navigation

SFB 1451 publication catalog

Website: https://data.sfb1451.de

Repository overview

The catalog is placed in the docs directory (to allow other repository content at root level), and served to GitHub pages from there.

Related repositories:

Required extensions

The following DataLad extensions are required to generate the catalog: DataLad-next is used to interact with WebDAV remotes, DataLad-metalad provides metadata extraction, DataLad-wackyextra provides custom metadata extractors and translators, DataLad-catalog is used to generate the catalog.

Dataset locations

Project datasets are tracked in the Projects superdataset. GitHub mirror: https://github.com/sfb1451/all-projects

Obtaining a subdataset from Sciebo:

datalad clone -d . webdavs://<base URL>/Projects/<Project>/<subfolder> <Project>

(using project names as folder names, because these would be displayed as subdataset names).

Tip: until datalad-next/issues/108 makes it automatic, storage remote can be reconfigured with:

git annex initremote my-sciebo-storage --private --sameas <name or uuid> exporttree=yes type=webdav url="<url>"

or with clone url substitution - check:

datalad configuration  | grep 'datalad.clone.url-substitute'

to get some examples how you can alter all such URLs at once.

Generation

Generation is done incrementally. Utility scripts are provided in the code directory. One script usually does one thing.

Scripts have similar CLI: they need to be pointed to a dataset, a folder to store intermediate metadata, and (optionally) to a catalog they will update. See command line help or source code for usage instructions.

Key scripts are:

  • extract_selected.py: extract metadata from a project’s subdataset. It allows selection of extractors to be used. Can optionally generate a file list, too.
  • extract_project.py: extract metadata from a project’s superdataset. Conducts project_extract_pipeline.json and translates the result. This pipeline combines metalad_core, CFF, and bibliography (ris/nbib/crossref) extractors. It also adds handcrafted metadata, like funding or keywords.
  • extract_superdataset: extract superdataset metadata. Runs metalad_core and metalad_studyminimeta extractors.

Additional scripts are:

  • utils.py: shared functions or classes
  • list_files.py: generate a file listing based on datalad status. Deprecated, use extract_selected.py --files.
  • scrape_projects.py: archived script that was used to scrape project descriptions.
  • scrape_publications.py: archived script that was used to scrape project publications.

About

The SFB 1451 data portal (metadata catalog)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages