-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major updates in v2.3 #271
Comments
Major enhancementsIncluded strand-bias annotation for ivarNGS data are prone to certain types of artifact variant calls, strand bias is a clear example. For example, all but one variant-supporting reads are on the reverse strand whereas reference-supporting reads are equally represented on both strands giving rise to a False positive scenario known as Strand bias [1]. Most nowadays variant callers support for strand-bias filtering, but ivar still lacks this functionality andersen-lab/ivar#5. viralrecon new release offers now this funcionality taking this artifact into consideration while converting iVar variants tsv file to vcf format inside the ivar_variants_to_vcf.py script. In order to do that a Fisher exact test is performed and
Input tsv:
Output vcf:
Fisher exact test is based a contingency table as stated in the GATK literature [2]:
Code - contigency legend:
Strand-bias filtering is not always a recommended filter for all type of experiments, amplicon data due to the enrichment preparation procedure based on PCRs are prone to strand-bias artifacts that not necessarily means a greater probability of a false positive, moreover amplicon experiments normally generates deep coverage data that does not need this type of filtering. That's we Consecutive variants called by ivar belonging to the same codon are now collapsed in one line in order to fix annotationDuring variant analysis of Sars-Cov-2 some complex variants as a the triplet nucleotide change which change the entire codon in the B.1.1.7 VOC, variant callers reports three nucleotide changes instead of just one change including the three nucleotide changes, with the subsequent wrong aminoacid annotation. This is also a known problem in ivar andersen-lab/ivar#92 which we have fixed in this new viralrecon release also through Input tsv file with three variant lines and wrong
Output vcf with three variants belonging to the same codon merged in just one line:
Fixed annotation with snpeff:
As for the strand-bias implementation the script comes with the parameter Script logic for consecutive and same codon variants detection.The script
Once the dict is full we evaluate as follows: Option to generate consensus with BCFTools / BEDTools using iVar variantsAnother new functionality is that viralrecon now allows to determine which software use for variant calling (iVar or Bcftools) and consensus genome generation (iVar or Bcftools), so you can combine them (#246). Previous viralrecon versions had iVar as default for both variant calling and consensus genome generation. This combination had some drawbacks related with the issues associated with iVar (andersen-lab/ivar#103 , andersen-lab/ivar#97, andersen-lab/ivar#85). Now, viralrecon performs variant calling using iVar, then it will filter those variants as explained before in strand-bias and merged codons, and finally it will generate the consensus genome using the filtered variants called by iVar. This generates the following differences in the final consensus fasta files:
This is fixed when creating the consensus with iVar filtered variants: First sequence is reference, second sequence is the consensus generated by Bcftools and third sequence is consensus generated with iVar. iVar's tsv file will look like this:
The deletion has frequency lower than 0.75 as determined in the consensus filter, but it is being added to the iVar consensus, but not with Bcftools consensus.
First sequence is reference, second sequence is the consensus generated by bcftools and third sequence is consensus generated with iVar. iVar's .tsv file will look like this:
It was supposed to be a deletion, not a N nucleotide, and the N will not appear when creating the consensus with Bcftools.
As explained in iVar's manual if one base is not enough to match a given frequency, then an ambiguous nucleotide is called at that position, which means including low frequency variants. Example:
This variants are at 0.3 AF, so the reference nucleotide AF is not enough to reach the minimum 0.75 AF, then both are included in the consensus as ambiguous nucleotides: First sequence is reference, second sequence is the consensus generated by Bcftools and third sequence is consensus generated with iVar. iVar consensus is introducing R (A or G) in position 27665 and S (G or C) in position 27666 when the reference only should be included. This is fixed when creating the consensus with Bcftools.
When there are deletions in iVar's tsv file with low allele frequency, the reference should be included, but iVar introduces Ns instead: First sequence is reference, second sequence is the consensus generated by Bcftools and third sequence is consensus generated with iVar. The tsv file looks like this:
In the consensus, the reference nucleotides should be included as with Bcftools. New variants and linage report tableviralrecon now provides a new table for variants report unifying variant calling, annotation and linage if desired. This table can be really useful for variants inspection, co-infections or metagenomics data as sewage sars-cov-2 sequencing.
Pipeline validation and benchmarkingThe pipeline has been validated using 54 SARS-Cov-2 samples using Artic amplicon scheme v4. This samples have a mixed composition of SARS-Cov-2 linages including B.1.1.7, AY.* and BA.*, which are known to have problematic deletions and triplets. Bibliography:[1] Koboldt, D.C. Best practices for variant calling in clinical sequencing. Genome Med 12, 91 (2020). [2] Fisher’s Exact Test GATK Team (2020). Special acknowledgement for this documentation to: |
Please see below for a summary of changes.
The text was updated successfully, but these errors were encountered: