-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't align transcripts with different numbers of exons #195
Comments
I have discovered an issue with transcript NM_001278433.1 (gene PRKAR1A), which I believe is an example of this issue. If my understanding is incorrect, please let me know. Exon sets for the transcript:
The GRCh37 splign chromosomal alignment has 10 exons:
The "self" alignment has 11 exons:
By looking at exon lengths, the discrepancy is in exon 1 so when doing g-to-c calculations using hgvs, variants along the entire transcript give bad results. My assumption was that "transcript" is the relevant self-alignment, and not "transcript/8ecabff0" or "transcript/92190059" |
First, I'm impressed that you dove this far into UTA internals! I don't know the story for this transcript specifically, and these data are 4-6 years old, perhaps from the time before NCBI released gff files. So, this might be hard to reproduce now from sources. When The presence of / nearly always mean that the assembly and/or alignments are problematic. So, proceed with caution. In uta_20190926, I see this:
So, it looks to me as though you should upgrade to uta_20190926, in which NM_001278433.1 aligns to NC_000017.10 and NC_000017.11 without issues. Please close if that answers your question. |
Reece: Thank you very much for your time-- that was helpful. I don't see uta_20190926 as a tag on the dockerhub page, so I wasn't sure if it was advisable to use: Is this version an "official" release that was built/validated to the same standards as the uta_20180821 version? Also, if we did update to the 2019 uta, which versions of hgvs and seqrepo would you recommend moving up to? We currently use:
Thanks again. Matt |
uta_20190926 currently has an issue (#228) that prevents us from building a docker images. A change was made to materialize a very large view, and it takes >12 hours (when I killed it) to materialize data. We'll need to unwind that before distributing docker images. You should be able to use any version of hgvs. The change log may help you figure out whether any of the changes since 1.3.0 are relevant to you. Unfortunately, you'll have to wait on the uta fixes. No ETA yet. |
Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #195
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07
UTA historically has aligned transcript and genomic exons even when the number of exons in each exon set differs. This practice masks real issues in underlying data and should be discontinued.
The text was updated successfully, but these errors were encountered: