-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename uniprot to swissprot #4
Comments
Not sure what to do about this, I want files in this repo to correspond to semantic spaces. UniProt is definitely an issue given it's so big and I don't want to include trembl |
Is there a downstream use case that merits me spending brain power on this? |
potential solution: create subspace relatonship in bioregistry |
but the subspace idea makes sense. E.g. when I run the ingest, I would get something like:
this makes it clear you are only ingesting a subset this means that if people do want to do a run ingesting all of treambl they can do this in a compatible way I am not sure if the subsets need to be registered in bioregistry. there are a lot of ways to subdivide a large resource. are you looking for use cases that require more than swissprot? For many non-human organisms, swissprot coverage is not complete (in fact it's not even 100% complete for all human genes). The most useful subset of uniprot for an organism is often the gene-centric reference proteome subset, which will be a mix of swissprot and trembl (but not all of trembl - just one representative per gene) |
This ingest is currently causing a lot of confusion - people read it and think it's all of uniprot, but in fact it's just swissprot (i.e reviewed subset). I think the immediate action is just to rename this from uniprot to swissprot. |
the uniprot obo file is actually just swissprot
which is useful in its own right, but it should be called swissprot
uniprot has another 229m entries from trembl, which might be harder to get by github size limits
another useful slice is all the reference proteomes. For human this more or less equates to swissprot but for other organisms it gives a representative entry for each gene
The text was updated successfully, but these errors were encountered: