Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: move from single entrypoint API to entrypoint per command #309

Merged
merged 43 commits into from
Jul 28, 2023

Conversation

jsheunis
Copy link
Member

@jsheunis jsheunis commented Jun 2, 2023

The main goal of this PR is to close #245. IN the process, the PR evolved a lot and now includes these main changes:

  • update API to entrypoint per command: create, add, validate, serve, get, set, remove, translate, workflow
  • introducing datalad-next dependency, at first for constraint system but in future for whatever functionality is useful
  • refactor most of the code related to the main commands

Some specific outcomes to take note of:

  • Validation is now done according to the schema of a specific catalog (now located at path-to-catalog/schema/*), where previously validation was always done according to the latest schema of the installed package version (which is now the fallback).
  • Metadata can be passed to any catalog command that takes it as in argument, in different formats: json lines from STDIN, JSON-serialized string, and a file with JSON lines.
  • pytest fixtures are now located in tests/fixtures.py and exposed to all tests
  • catalog-get|set provides an extensible set of commands for configuring a catalog and reporting its properties
  • the handling of config files is refactored:
    • The main changes relate to how/when a config file is provided.
    • Previously, config files were specified during instantiation of the WebCatalog class, and a whole lot of obscure code was necessary to determine how this config applies to the catalog or dataset level.
    • The refactored code receives the config file via the Create() and Add() commands or via the WebCatalog.create() or Node.add_attributes() python methods.
    • Now, WebCatalog can be instantiated with location alone (no 'action' necessary anymore, and no config_file)
    • Now, config is set on WebCatalog or Node instances after/during running their respective create() methods and not during instantiation
  • Translation functionality is refactored, and requires updated to existing translators:
    • With the new catalog argument, translation occurs to the specific schema version of the catalog (if supported by an available translator); if the catalog argument is not supplied, the expected schema version is the latest supported by the package installation.
    • The translator matching process is streamlined by keeping track of previously matched and instantiated translators
    • Functionality to get the supported schema version, source name, and source version have been moved to the translator base class in order to support the abovementioned changes. This means existing and new translators will have to override these functions.

Issues that are addressed by this PR:

This commit starts the process of introducing multiple
entrypoints by creating new and separate modules for
'create' and 'add'.

It also adds the following changes:
- introduces a dependency on datalad-next
- introduces the use of dl-next's constraint system
for parameter validation and coercion of new commands
- allows multiple formats for the metadata argument to
the 'add' command, including STDIN, stringified JSON,
and a file with JSON lines.
- introduces common fixtures via a new module and conftest
- updates the tests associated with 'create' and 'add'
- updates relevant test data
- adds entrypoints for 'create' and 'add' commands

The goal is for future commits adding more entrypoints
to build on top of the basis established here.
This commit is necessary to allow validation according to the schema
of a specific catalog, where previously validation was always done
according to the latest schema of the installed package version.

All catalog schemas now hava a dedicated location in
'catalog-name'/schema. If a metadata item is to be validated, this
location will be inspected in order to build up a schema store
that will be validated against. If this location does not exist,
the fallback is the default schema store of the package
installation.

Subsequent changes include:
- updating and moving packahe/catalog/tests/schema locations
to the constants module
- including the new schema location in package data
- adding methods to the webcatalog class to locate
the relevant schema store and validator
- updating create and add commands and their tests
accordingly
jsheunis and others added 11 commits June 8, 2023 13:50
…on from a catalog, including metadata, config, and home page specs
This commit includes several changes that relate
to the refactoring of the config functionality. The
main changes relate to how/when a config file is provided.
Previously, config files were specified during
instantiation of the WebCatalog class, and a whole lot
of obscure code was necessary to determine how this config
applies to the catalog or dataset level. The refactored code
receives the config file via the Create() and Add() commands or via
the WebCatalog.create() or Node.add_attributes() python methods.

Important outcomes of the changed code include:
- WebCatalog can be instantiated with location alone (no 'action'
  necessary anymore, and no config_file)
- config is set on WebCatalog or Node instances after/during
  running their respective create() methods and not during
  instantiation
- WebCatalog class gained an 'add_record()' method to handle
  what was previously only done in Add() interface/command.
- MetaItem instantiation now handles config_file input, in order
  to pass it to Node.add_attributes() at the appropriate time.

Specific other changes include:
- splitting Node class into its own module
- updating utilities
- updating tests to conform to the refactors code
catalog-translate command line entrypoint. Changes include:
- using datalad-next constraints via validatedInterface class
- introducing the catalog argument, which allows the
  translation to occur to the specific schema version (if
  supported by an available translator); if the catalog
  argument is not supplied, the expected schema version is
  the latest supported by the package installation.
- streamlining the translator matching process by keeping
  track of previously matched and instantiated translators
- moving functionality to get the supported schema version,
  source name, and source version to the translator base
  class. Existing and new translators will have to override
  these functions.
@jsheunis
Copy link
Member Author

@mslw I think this comment from above is particularly relevant to you, since you also maintain a couple of translators:

Translation functionality is refactored, and requires updated to existing translators:

  • With the new catalog argument, translation occurs to the specific schema version of the catalog (if supported by an available translator); if the catalog argument is not supplied, the expected schema version is the latest supported by the package installation.
  • The translator matching process is streamlined by keeping track of previously matched and instantiated translators
  • Functionality to get the supported schema version, source name, and source version have been moved to the translator base class in order to support the abovementioned changes. This means existing and new translators will have to override these functions.

@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2023

Codecov Report

Patch coverage: 91.45% and project coverage change: +3.81 🎉

Comparison is base (f612cb1) 82.32% compared to head (57bf6b4) 86.14%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   82.32%   86.14%   +3.81%     
==========================================
  Files          32       43      +11     
  Lines        2173     2700     +527     
==========================================
+ Hits         1789     2326     +537     
+ Misses        384      374      -10     
Flag Coverage Δ
unittests 79.00% <88.17%> (+7.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
datalad_catalog/__init__.py 100.00% <ø> (ø)
...lad_catalog/translators/bids_dataset_translator.py 48.05% <ø> (ø)
...lad_catalog/translators/datacite_gin_translator.py 100.00% <ø> (ø)
...lad_catalog/translators/metalad_core_translator.py 91.66% <ø> (ø)
...og/translators/metalad_studyminimeta_translator.py 94.44% <ø> (ø)
datalad_catalog/workflow.py 67.70% <67.70%> (ø)
datalad_catalog/catalog.py 90.00% <75.00%> (+9.88%) ⬆️
datalad_catalog/serve.py 77.77% <77.77%> (ø)
datalad_catalog/add.py 80.95% <80.95%> (ø)
datalad_catalog/webcatalog.py 81.81% <81.31%> (-8.06%) ⬇️
... and 26 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move from subcommands to an extended command suite
3 participants