Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF 4.4: clarification on CNVs with position 0 #792

Closed
ctsa opened this issue Sep 24, 2024 · 6 comments
Closed

VCF 4.4: clarification on CNVs with position 0 #792

ctsa opened this issue Sep 24, 2024 · 6 comments

Comments

@ctsa
Copy link

ctsa commented Sep 24, 2024

The VCF 4.4 spec states:

"Note that the position of symbolic structural variant alleles is the position of the base immediately preceding the
variant."

Does this imply that any kind of CNV detected from the beginning of a chromosome/contig will have position 0, and a REF value of N? If so is this considered valid or best practice? I see that htslib/bcftools will generate a warning for VCFs using positions less than 1, so not clear that this CNV representation is okay.

@jkbonfield
Copy link
Contributor

Also see samtools/htslib#1573 for htslib handling of POS=0. It's somewhat of a low priority as you can see by the speed of review. Personally I think this is a badly thought out feature, compounded by BCF and BAI indices which store pos-1 so position 0 becomes negative, arbitrarily limiting the choice of various data types and also breaking the index binning calculations. It sounds like picard also doesn't query this data correctly either, so I suspect the POS 0 is a feature that's pretty much unsupported in the wild.

Not sure what that says about the spec and this question though. Over to you Daniel :)

@ctsa
Copy link
Author

ctsa commented Sep 24, 2024

Thanks for the additional context James. Regardless of future design decisions, it sounds like in the short term we could special-case these CNVs to start at position 1 to avoid indexing headaches. These are all imprecise variant types anyway so this doesn't meaningfully change the call.

@ctsa
Copy link
Author

ctsa commented Sep 24, 2024

Further looking into indexing complications, it appears that IGV isn't rendering the symbolic allele range in a way that's consistent with the spec, but rather interpreting POS as included in the range.

@d-cameron
Copy link
Contributor

I suspect the POS 0 is a feature that's pretty much unsupported in the wild.
Not sure what that says about the spec and this question though. Over to you Daniel :)

POS 0 has been part of the specs since at least VCFv4.1 and there's been no change in this regard - the Section 5.4.5 POS=0 teleomeric example was there back in 4.1. The only change in more recent version has been additional reminder text about the sematics of symbolic SV interpretation.

It's not something that I particularly like but it's always been part of the specs and changing it now just penalises libraries and tools that actually do follow the specs so it's not something that's current on the agenda.

If VCFv5 ever comes about then it's something I'd like to revisit but that would be as part of a complete design of the specifications to properly support all types of genomic rearrangements.

@jkbonfield
Copy link
Contributor

Agreed it's not something we can remove. However in SAM we've sometimes made recommendations which restrict the specification, in order to avoid problematic places.

Personally I'd at least be tempted to add a footnote acknowledging reality. While POS 0 is legal within the specification, it is highly likely that a lot of tooling will break as historically it's simply not been well supported.

@ctsa
Copy link
Author

ctsa commented Sep 30, 2024

Thank you both, very helpful. I think I clearly understand now that I'm interpreting the letter of the spec correctly for this type of CNV, but should consider modifications for practical library support. Will close as answered.

@ctsa ctsa closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants