Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document convention for "QC squeezing" in population VCF #527

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mlin
Copy link
Member

@mlin mlin commented Sep 7, 2020

This PR documents a convention developed in spVCF to reduce the size of population-wide VCF files (presenting the full locus x sample matrix) by selectively omitting FORMAT fields. As written, this is not a spec change but merely suggests a useful invocation of an existing clause (referenced inline). We suggest it may be worth documenting expressly because we've encountered some downstream tools that do get tripped up by it.

In our experiments, applying this convention to WGS/WES VCF files for cohorts like 1KGP and UKB (generated with different pipelines) delivers 4-6X file size reduction without doing anything else.

Related PRs:

@hts-specs-bot
Copy link

Changed PDFs as of 508c8c6: VCFv4.4.draft (diff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants