Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not matching schema causes error only at render step for NERDm records #305

Open
bmaranville opened this issue Jan 17, 2023 · 3 comments
Open

Comments

@bmaranville
Copy link
Member

We made some (nominally) NERDm records and uploaded to the dev API, and they were accepted. It was only when trying to render them with the angular server /lps endpoint that we discovered a problem - the page returned an error and did not render the record. The error trace said something about a value being undefined.

After some slow investigation and looking at the schema, I found that a) the problem was caused by us not complying with the schema and b) it was because records contained the top-level key "keywords": string[] instead of the required "keyword": string[]

We already made the change to use the key keyword on our end, but I wonder what the appropriate place to validate the records is? At render-time is not ideal. Would it make sense to validate as part of the ingestion, returning an error if the record does not match the schema? I feel like that is a better guarantee of consistency than asking submitters to validate before PUT/PATCH on a record.

@RayPlante
Copy link
Collaborator

RayPlante commented Jan 17, 2023

The service validates records at two points: when NERDm record is submitted and when the service receives the "publish" action (actually, it may be "finalize" action which is implied by "publish"). The former validation is a laxed version that allows for required properties (like "keyword") to be missing; this allows one to upload a partially incomplete record, but then update with missing information. The latter validation is fully strict.

You have hit upon one gotcha: the schemas, on purpose, allow for additional properties to be included that are not part of any of the defined schemas; this is a key extension feature. On the other hand, the service (i.e. independent of the schema) will filter out unrecognized properties (I'm pretty sure) silently. As a result, if you mis-spell a property, it goes undetected (as you demonstrated).

I should note something about the landing page service: while the service should not have complained about the missing "keyword" data, we had not originally intended to run the landing page service with the publishing service, as the latter was intended to be fully automatic without support for human interactions. While we now plan to make the landing page service available for these records, we haven't actually set that up yet.

That said, I think we can improve the service to try to catch near-misses in keyword spellings (perhaps by rejecting records containing properties we would otherwise filter out).

@bmaranville
Copy link
Member Author

bmaranville commented Jan 17, 2023

Ah - I think I understand. I'm not able to test our workflow against the second validation (on "finalize"), without setting up an entire mock PDR database, right?

Should we have expected to see errors returned from the test endpoint on "finalize"? We did publish some of our malformed records there - and didn't get any errors back.

@RayPlante
Copy link
Collaborator

Actually, "finalize" should work. I was just looking at the code, and now I have a question about whether the intended validation actually gets run as it should. Nevertheless, that is the point of finalize: it makes all remaining tweaks to the record so that it looks fully like what will get published, and then check everything. If there is a problem, the client has one last opportunity to fix it (which normally wouldn't happen in a fully automated workflow).

I should correct two mis-statements I made above: (1) the validation done when NERDm metadata is submitted is not laxed, and (2) unrecognized properties are not filtered out. I was confusing this service with another one we are developing that has a lower trust of the client. Regardless, since it does allow custom properties, it will miss mis-spellings like "keywords".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants