Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF 4.4/4.5 specs: rlen should be computed rather than populated from END INFO field #1820

Open
davmlaw opened this issue Aug 12, 2024 · 3 comments
Assignees

Comments

@davmlaw
Copy link

davmlaw commented Aug 12, 2024

The current behavior of bcf_update_info in vcf.c is to use the END info tag to set rlen

This field should instead be computed for symbolic variants (eg <DEL>)

My interpretation of the specs is that it is optional in VCF 4.4 and officially deprecated in 4.5:

VCF 4.4 spec

END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig
CHROM. Normally this is the position of the last base in the REF allele, so it can be derived from POS and
the length of REF, and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or
structural variants, an explicit END INFO field provides variant span information that is otherwise unknown.
If a record containing a symbolic structural variant allele does not have an END field, it must be computed
from the SVLEN field
as per Section 3

VCF 4.5 spec - END has been deprecated

@davmlaw davmlaw changed the title VCF 4.4 and VCF4.5 specs: rlen should be computed rather than populated from END INFO field VCF 4.4/4.5 specs: rlen should be computed rather than populated from END INFO field Aug 12, 2024
@pd3
Copy link
Member

pd3 commented Aug 12, 2024

As commented here samtools/hts-specs#769 (comment), I did not find any mention of deprecating the INFO/END tag. This should be revisited and discussed more, I find that to be a terrible idea.

@davmlaw
Copy link
Author

davmlaw commented Aug 12, 2024

I like the robustness principle

be conservative in what you do, be liberal in what you accept from others

And def agree that if I was writing a VCF in order to maximise compatibility I would put END in there

But htslib should also be able to read and accept specification conforming files that are missing the END info

@daviesrob
Copy link
Member

HTSlib hasn't caught up with VCF 4.4 yet, let alone 4.5. When it does, it will compute END as required by the new versions of the specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants