BQSR: avoid throwing an error when read group is missing in the recal table, and some refactoring. #9020
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses #6242.
Current behavior: when all the reads in a read group are filtered in the base recalibration step, the read group is not logged in the recal table. Then ApplyBQSR encounters these reads, can't find the read group in the recal table, and throws an error.
New behavior: if
--allow-read-group
flag is set to true, then ApplyBQSR outputs the original quantities (after quantizing).I avoided the alternative approach of collapsing (marginalizing) across the read groups, mostly because it would require a complete overhaul of the code. I also think that using recal data from other read groups might not be a good idea. In any case, using OQ should be good enough; I assume that these "missing" read groups are low enough quality to be filtered out and are likely to be thrown out by downstream tools.
I also refactored the BQSR code, mostly to update the variable and class names to be more accurate and descriptive. For instance:
ReadCovariates.java -> PerReadCovariateMatrix.java
EstimatedQReported -> ReportedQuality