Skip to content

4.3.0.0

Compare
Choose a tag to compare
@droazen droazen released this 13 Oct 01:13
· 216 commits to master since this release
8dbb78f

Download release: gatk-4.3.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.3.0.0 release:

  • Support for the Ultima Genomics flow-based sequencing platform

  • A next-generation suite of tools for variant filtration based on site-level annotation, intended to eventually supersede the older VariantRecalibrator workflow

  • CompareReferences and CheckReferenceCompatibility: new tools for comparing and checking compatibility with genomic references

  • Support in HaplotypeCaller/Mutect2 for supplementing the variants discovered in local assembly with variants discovered via a pileup-based approach

Full list of changes:

  • Support for the Ultima Genomics flow-based sequencing platform (#7876)

    • Added a new --flow-mode argument to HaplotypeCaller which better supports flow-based calling
      • Added a new Haplotype Filtering step after assembly which removes suspicious haplotypes from the genotyper
      • Added two new likelihoods models, FlowBasedHMM and the FlowBasedAlignmentLkelihoodEngine
    • Added a new --flow-mode argument to Mutect2 which better supports flow-based calling
    • Added support for uncertain read end-positions in MarkDuplicatesSpark
    • Added a new tool FlowFeatureMapper for quick heuristic calling of bams for diagnostics
    • Added a new tool GroundTruthReadsBuilder to generate ground truth files for Basecalling
    • Added a new diagnostic tool HaplotypeBasedVariantRecaller for recalling VCF files using the HaplotypeCallerEngine
    • Added a new tool breaking up CRAM files by their blocks, SplitCram
    • Added a new read interface called FlowBasedRead that manages the new features for FlowBased data
    • Added a number of flow-specific read filters
    • Added a number of flow-specific variant annotations
    • Added support for read annotation-clipping as part of clipreads and GATKRead
    • Added a new PartialReadsWalker that supports terminating before traversal is finished
  • Next-generation suite of tools for variant filtration based on site-level annotations (#7954) (#8049)

    • This tool suite is intended to eventually supersede the older VariantRecalibrator workflow
    • The new tools include:
      • ExtractVariantAnnotations: extracts site-level variant annotations, labels, and other metadata from a VCF file to HDF5 files
      • TrainVariantAnnotationsModel: trains a model for scoring variant calls based on site-level annotations
      • ScoreVariantAnnotations: scores variant calls in a VCF file based on site-level annotations using a previously trained model
  • New Reference Comparison Tools

    • CompareReferences: a new tool for analyzing the differences between references at both the dictionary and the base level (#7930) (#7987) (#7973)
      • In its default mode, this tool uses the reference dictionaries to generate an MD5-keyed table comparing the specified references, and does an analysis to summarize the differences between the references provided.
      • Comparisons are made against a "primary" reference, specified with the -R argument. Subsequent references to be compared may be specified using the ``--references-to-compare` argument.
      • A supplementary table keyed by sequence name can be displayed using the --display-sequences-by-name argument; to display only sequence names for which the references are not consistent, run with the --display-only-differing-sequences argument as well.
      • MD5s can be recalculated from the actual sequence when missing from the dictionary
      • When run with --base-comparison FULL_ALIGNMENT, the tool performs full-sequence alignment on the differing reference sequences to produce a VCF with SNPs and Indels. However, this mode ignores IUPAC / N bases.
      • Running with --base-comparison FIND_SNPS_ONLY finds single-base differences between differing reference sequences of the same length. This mode can handle IUPAC / N bases correctly, but not indels.
      • To perform the full-sequence alignment, GATK now packages a distribution of MUMmer for x86_64 Mac and Linux, which can be invoked from within the GATK using the new MummerExecutor class.
    • CheckReferenceCompatibility: a new tool to check a BAM/CRAM/VCF for compatibility against a set of references (#7959) (#7973)
      • This tool generates a table analyzing the compatibility of a BAM/CRAM/VCF input file against provided references.
      • The tool works to compare BAM/CRAMs (specified using the -I argument) as well as VCFs (specified using the -V argument) against provided reference(s), specified using the --references-to-compare argument.
      • When MD5s are present, the tool decides compatibility based on all sequence information (MD5, name, length); when MD5s are missing, the tool makes compatibility calls based only on sequence name and length.
  • HaplotypeCaller/Mutect2

    • Added an optional "Pileup Detection" step to Mutect2 and HaplotypeCaller before assembly that supplements the variants from local assembly with variants that show up in the pileups (#7432)
    • Fixed a Mutect2 IndexOutOfBoundException with germline resource (#7979)
    • Mutect3 dataset enhancements: optional truth VCF for labels, seq error likelihood annotation (#7975)
    • Added Mutect3 dataset generation to the Mutect2 WDL (#7992)
    • GetPileupSummaries now streams its output rather than storing it in memory (#7664)
    • Fixed a rare edge case in the AdaptiveChainPruner where the JavaPriorityQueue is undefined for tied elements (#7851)
  • SV Calling

    • CondenseDepthEvidence: a new tool that combines adjacent intervals in DepthEvidence files (#7926)
    • LocusDepthtoBAF: a new tool that merges locus-sorted LocusDepth evidence files, calculates the bi-allelic frequency (baf) for each sample and site, and writes these values as a BafEvidence output file (#7776)
    • PrintReadCounts: a new tool that prints (and optionally subsets) an read depth (DepthEvidence) file or a counts file as one or more (for multi-sample DepthEvidence files) counts files for CNV determination (#8015)
    • CollectSVEvidence: fixed a bug where trailing SNP sites and depth intervals without read coverage were being omitted from the output (#8045)
    • CollectSVEvidence: added read depth generation and raw-counts output (#8015)
    • Improved PrintSVEvidence performance by tweaking the MultiFeatureWalker traversal (#7869)
    • Fixes related to BafEvidence (biallelic-frequency of a sample at some locus) (#7861)
    • Fixed a bug where the end coordinate was being incorrectly compared when sorting discordant read pair evidence (#7835)
    • Sort output from SVClusterEngine (#7779)
    • Remove abandoned SV filtering project and unneeded build dependency (#7950)
  • CNV Calling

    • Fix a no-call genotype ploidy bug in JointGermlineCNVSegmentation (#7779)
    • Added numerical-stability tests and updated test data for all ModelSegments single-sample and multiple-sample modes (#7652)
    • Added a gCNV integration test to detect numerical differences in the outputs (#7889)
  • GenomicsDB

    • GenomicsDBImport: added the ability to specify explicit index locations via the sample name map file (#7967)
      • Each line in the sample name map file may now optionally contain a third column with the path/URI to the index. This is useful when the index is not in the same location as the corresponding GVCF.
  • Bug Fixes

    • Fixed an issue where we weren't properly merging AD values when combining GVCFs and no PLs were present (#7836)
    • Fixed a bug in ReblockGVCF that could cause the first position on a contig to be dropped (#8028)
    • Fixed an allele-ordering issue in the allele-specific annotation code (#7585)
    • VariantRecalibrator: type change int -> long to prevent tranche novel variant count overflow (#7864)
    • Fixed an issue with tabix index generation (#7858)
    • Fixed a bug in SiteDepthCodec (#7910)
  • Miscellaneous Changes

    • VariantsToTable now includes all fields when none are specified (#7911)
    • SelectVariants now warns the user about poor performance when the sample names in the VCF header are unsorted (#7887)
    • VariantRecalibrator now has a --dont-run-rscript argument to disable execution of its R script but still output the actual R script file (#7900)
    • Added some generic read tag/expression filters for use on numeric tags (#7746)
    • Replaced Travis CI with Github Actions for our continuous testing (#7754)
    • Switched over to Github Actions for building our nightly docker image (#7775)
    • Created a new build_docker_remote.sh script for building the docker image remotely with Google Cloud Build (#7951)
    • Added an argument mode manager for group arguments and a demonstration of how it might be used in HaplotypeCaller --dragen-mode (#7745)
    • Added unit tests for the Utils.concat() methods (#7918)
    • Added a test to validate WDLs in the scripts directory. (#7826)
    • Added a use_allele_specific_annotation arg and fixed task with empty input in the JointVcfFiltering WDL (#8027)
    • Fixed an issue in the GATK stats script in which the first day's downloads on a new release were set to 0 (#7794)
    • Fixed a typo in the Dockerfile that broke git lfs pull (#7806)
    • Removed unused code in the utils.solver package (#7922)
    • Corrected the time for GATK nightly build cron jobs (#7784)
    • Disabled the red "X" from failing CodeCov builds and delaying the posting of coverage information to complete test (#7817)
    • Some minor misc engine changes (#7744)
  • Documentation

    • Marked JointGermlineCNVSegmentation as a DocumentedFeature (#7871)
    • Marked SVAnnotate as a DocumentedFeature (#7833)
    • Marked CollectSVEvidence as a DocumentedFeature (#8041)
    • Docs clarification in GenotypeGVCFs for some reblocking-related funkiness (#7846)
    • Updated the GATK Readme to reflect the switch from Travis CI to Github Actions (#7808)
  • Dependencies

    • Updated HTSJDK to 3.0.1 (#8025)
    • Updated Picard to 2.27.5 (#8025)
    • Updated protobuf to 3.21.6 (#8036)
    • Updated gsalib to 2.2.1 (#8048)
    • Pinned typing_extensions Python package to 4.1.1 in the GATK conda environment (#7802)