How do general filterings on MQ, BQ and coverage work together? #68

Sfeng666 · 2024-01-15T22:29:52Z

Hi Michael,

Thanks for developing JACUSA & JACUSA2. Those are well-annotated tools with thoughtful functions for RNA editing analyses.

I have a couple of questions about: how sites were filtered based on mapping quality (MQ), base quality (BQ) and minimum coverage, assuming a call-1 scenario:

When filtering by mapping quality (-m) and one read has MQ below the threshold, does JACUSA2 remove all genomic sites (i.e., 1 bp positions) that were covered by this read (even if other reads has MQ above threshold) , or it simply discard this read, and only count reads that have MQ above threshold for coverage and allele count? My guess is the latter one.
Similarly, when filtering by base quality (-q) and one read has BQ below the threshold at a given site, does JACUSA2 remove this genomic site (even if other reads has BQ above threshold at the same site) , or it simply don't count this read at this site, and only count read bases that have BQ above threshold for coverage and allele count? My guess is also the latter one.
Just to confirm: if both -m and -q works to discard reads that fail the filter, is the min-coverage filter (-c) based on the coverage calculated from above-threshold reads/bases?
I also support @y9c on adding the option of accept sites of coverage = 0. This will be helpful for downstream analysis involving multiple samples.

I know these could be basic questions, but they were not clearly explained in the JACUSA2 manual. Since BQ is assigned to each base of each read, and MQ is assigned to each read, a given genomic position could have multiple BQ/MQs. The manual explanation filter positions with BQ/MQ < min-BQ/MQ is ambiguous about how the filtering decision is made with those BQ/MQs.

Thanks

The text was updated successfully, but these errors were encountered:

piechottam · 2024-01-18T08:31:24Z

Thank you, for your questions and feedback!

Answers

MQ is a read specific info, therefore, JACUSA2 discards the entire reads when the criteria is not met.
BQ is read and position specific. JACUSA2 discards the position of a read where the criteria is not met - other positions are not affected.
Yes, "-c" is "pileup-specific". All base calls (BG), from reads (MQ) are aggregated and only position that have sufficient coverage (-c) are in the output.
Unfortunately, this is not possible. The test-statistics expects to have equal number of replicates for each site in a single comparison run. If you allow zero coverage sites, that specific would be counted without contributing any base calls.

Filtering is carried out on multiple levels:

reads, e.g.: Mapping quality, Tags, Flags
basecall, e.g.: Base call quality
Pileup, e.g.: minimal coverage (-c)
Parallel Pileup (comparing pileups from 2 condition), e.g. Only sites that contain differences are output (or use "-A" to output all sites)
"Feature", e.g.: HomozygousFilter for RNA-differences comparisons, where you want to remove polymorphic positions (-a H:condition=1 -> require condition 1 to be homozygous)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do general filterings on MQ, BQ and coverage work together? #68

How do general filterings on MQ, BQ and coverage work together? #68

Sfeng666 commented Jan 15, 2024

piechottam commented Jan 18, 2024

How do general filterings on MQ, BQ and coverage work together? #68

How do general filterings on MQ, BQ and coverage work together? #68

Comments

Sfeng666 commented Jan 15, 2024

piechottam commented Jan 18, 2024