-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds metadata source specification #484
base: main
Are you sure you want to change the base?
Conversation
The source specification defines how to structure a collection of metadata records that together form the source material for a catalog instance. It separates metadata source files and formats from tooling, ensuring that users can provide and maintain a metadata collection without depending on datalad-catalog tools, while providing a validated structure from which automated tools can generate datalad-catalog-compatible records to be rendered. This commit adds the specification as part of the project docs. Future commits should update the 'Pipeline description' section of the docs to suggest the use of tools that understand the metadata source specification, and should also remove or update the 'Metadata formats' section of the docs accordingly.
✅ Deploy Preview for datalad-catalog canceled.
|
├── config/ | ||
│ └── <config-version-id>/ | ||
│ └── config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm uncertain about here, wrt versioned configs, is how the ingestion pipeline will know which config version to use to create the catalog entries. It will have to be parameterized somehow, but ideally the agent that created the metadata collection should be the one to specify which config version to use. I.e. that argument should be part of the collection somehow?
This directory should contain the catalog-level configuration file(s), one per version, | ||
with the name ``config.json``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, datalad-catalog
can also read YAML config files. Should we allow all possibilities (.json
, .yml
, .yaml
), or just specify a single option?
├── config/ | ||
│ └── <config-version-id>/ | ||
│ └── config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point about the config, it can also include a logo path (specified relative to the location of the config, within the context of the environment running the datalad-catalog
code). For the purposes of the collection, this logo will either have to be provided as an image file in the collection itself (likely alongside the config.json
file) or as a downloadable URL. Thoughts?
This should be a unique filename of a single record, with identifying characters that | ||
can be parsed in order to match the specific file format with a specific reader or processing | ||
tool. There is no restriction on the number of files contained in a given ``<dataset-version-id>`` | ||
directory, they should just all be unique. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just occurred to me that it might not always be individual files, e.g. a tabby collection might be included here as a directory containing all the related tabby files?
The source specification defines how to structure a collection of metadata records that together form the source material for a catalog instance. It separates metadata source files and formats from tooling, ensuring that users can provide and maintain a metadata collection without depending on datalad-catalog tools, while providing a validated structure from which automated tools can generate datalad-catalog-compatible records to be rendered.
This commit adds the specification as part of the project docs. Future commits should update the 'Pipeline description' section of the docs to suggest the use of tools that understand the metadata source specification, and should also remove or update the 'Metadata formats' section of the docs accordingly.
Closes #482