-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-populate degree dictionary #34
Comments
We need a way to deal with duplicates, e.g.: Both are valid, by the way |
are there meaningful differences between these?
|
@cthoyt this is only an example, there are many cases like that. No duplicates, but specializations, and we can always use a more general term. That is what I do when manually curating these keys. Pruning it automatically may prove itself an endless task due to the variety of possible items. While we don't have an workflow for curating this duplicates, I'd rather roll back to the manually curated only version of the file. |
It's fine for me if you want to roll back, but I am optimistic that creating rules for processing data would be possible. Maybe you can start by assessing how big the overlap really is by adjusting the data structure that's returned from being a dict to being more of TSV-like data |
@cthoyt actually I think the duplicates appeared when I merged my curations with the automatic dict. |
Using a SPARQL query to get all subclasses of academic title (Q3529618) would be a nice way to pre-populate
degrees.json
. The following SPARQL query (run at https://w.wiki/5o9H) gets the job done:Caveats:
Alternate Multi-lingual SPARQL
Note that
DISTINCT
doesn't collapse entries tagged with multiple languages, but still have the same text.The text was updated successfully, but these errors were encountered: