-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Merge master to ah var store again [VS-1178] #8890
Draft
mcovarr
wants to merge
66
commits into
ah_var_store
Choose a base branch
from
vs_1178_merge_master_to_ah_var_store_again
base: ah_var_store
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… names on output for B37 aligned files (#8539)
* hmer ondel must have mon length * Revert "hmer ondel must have mon length" This reverts commit 7852871. * remove superfluous variant type condition * fix error message to actually reflect missing argument * fixed unittest to include variant type * Remove conflict
* Additional fix + logging fixes * Added missing initialization
New Tool: GroundTruthScorer Update: FlowFeatureMapper
…dependency on the ADAM library (#8606) * Add a native GATK implementation for 2bit references, with comprehensive unit tests * For now, this is only hooked up to the Spark codepath, but it could easily be hooked up to ReferenceDataSource and the Walker codepath as well * Remove the dependency on the ADAM library, to resolve conflicts with future dependency upgrades
…curity scanner to build.gradle (#8607) * Updated many GATK dependencies to address known security vulnerabilities * Added a security scanner to build.gradle * There are still some remaining vulnerabilities in GATK dependencies, but this eliminates most of them
* Update http-nio and wire it so it's configured at startup along with GCS setttings.
* New experimental tool to print out human readable file diagnostics for cram/crai/bai files.
…#8438) * GATK's lack of support for az:// URIs means that although GenomicsDB can natively read them, parts of the java code crash when interacting with them * Adding --avoid-nio and --header arguments These allow disabling all of the java interaction with the az:// links and simply passing them through to genomicsdb This disables some safeguards but allows operating on files in azur * Update GenomicsDB version to 1.5.1 for azure improved support * There are no direct tests on azure since we do not yet have any infrastructure to generate the necessary tokens, there is a disabled test which requires #8612 before we can enable it. --------- Co-authored-by: Nalini Ganapati <[email protected]> Co-authored-by: Nalini Ganapati <[email protected]>
For having variable ploidy in different regions, like making haploid calls outside the PAR on chrX or chrY, there is now a --ploidy-regions flag. The -ploidy flag sets the default ploidy to use everywhere, and --ploidy-regions should be a .bed or .interval_list with "name" column containing the desired ploidy to use in that region when genotyping. Note that variants near the boundary may not have the matching ploidy since the ploidy used will be determined using the following precedence: * ploidy given in --ploidy-regions for all intervals overlapping the active region when calling your variant with ties broken by using largest ploidy); note ploidy interval may only overlap the active region and determine the ploidy of your variant even if the end coordinate written for your variant lies outside the given region * ploidy given via global -ploidy flag * ploidy determined by the default global built-in constant for humans (2). --------- Co-authored-by: Ty Kay <[email protected]> Co-authored-by: rickymagner <[email protected]>
* Update the GATK base image to the latest Ubuntu LTS release (22.04) * Add some additional useful utilities to the base image * Switch to a newer conda version with a much faster solver * Update the scripts and documentation for building the base image * Update the VETS integration tests to allow for a small epsilon during numeric comparisons, and include the full diff output in exception messages when a mismatch is detected
…oud-based docker build, and add a release script (#8247) * Added a -r argument to build_docker_remote.sh to toggle the RELEASE flag during docker builds * Added a release_prebuilt_docker_image.sh to release a prebuilt docker image to the official repos
* update to htsjdk 4.1.0 which enables http-nio in more cases * remove several test cases handling genomicsdb path parsing which were testing nonsensical paths that are now illegal
…nstant in build.gradle (#8625)
* This should make http access seamless in many places * The way this handles query parameters is not ideal for signed url cases so we'll need to revisit that
…ervals output (#8621) * Write gCNV interval output ID=GT header as Type=String Incorrectly writing this as Type=Integer causes bcftools to misparse the genotype field. * Use correct header types and numbers in test VCF file
* include normal seq error log likelihood in Permutect dataset * handle different alelle representations in multiallelic / indel variants for Permutect training data mode * set the default artifact to non-artifact ratio to 1 in Permutect training data mode
…exceptions in HaplotypeCaller (#8731)
…image (#8745) * Update README to include list of popular software included in docker image
… events (#8717) * M2 bad haplotype filter does not filter variants that share a haplotype with a germline event * two ECNT annotations -- haplotype and region -- and clustered events filter looks at both
…37/hg19 conversion (#8758) Don't print the very long and misleading "The following contigs are present in b37 and missing in the input VCF sequence dictionary" log message when we're not even doing b37/hg19 conversion.
Co-authored-by: Dror Kessler <[email protected]>
Set GGVCFs --all-sites GQ0 hom-refs to no-calls Set regular GGVCFs GQ0 hom-refs to no-calls (any DP, PL) for better AF/AN annotations Remove PLs in "no data" case where DP=0 for more accurate QUAL score
Several files tracked by git lfs were accidentally reimported as normal files. This makes them stubs again.
…elimited) (#8771) * Enable ReblockGVCF to subset AS annotations that aren't "raw" (i.e. pipe-delimited) * Fix tests by removing AssemblyComplexity from default annotations
* Add MappingQualityReadFilter * Added additional warnings for mmq * Fixed doc typo
* Add malaria spanning deletion exception regression test with fix * Disabling codecov. --------- Co-authored-by: Jonn Smith <[email protected]>
* Fixed a bug that prevented filtering by SOR in many cases
…f in GenomicsDB (#8759) * Allow for GT to be a nocall if GQ and PL[0] are zero instead of homref in GenomicsDB * Move to 1.5.3 release from snapshot --------- Co-authored-by: Nalini Ganapati <[email protected]> Co-authored-by: Nalini Ganapati <[email protected]>
* Reduced total layers in the GATK docker image from 44 down to 16. * Reduced GATK base image layers from 20 to 3. * This might be a better solution than a full squash down to a single layer, because: If we are hosting this in a premium ACR, the limit is 10,000 readOps per minute. So with 16 layers, you get around 625 pulls per minute. Also, this will be able to still take advantage of parallel pulls (default is 3, but at most 16 threads in this case, I believe) as opposed to one big layer which will not download in parallel. There's the potential of that being a lot slower and subsequent jobs falling into the same "minute" because others are not done, making it easier to hit that 10k readOps limit. Lastly, people using GATK outside data pipelines will not be able to take advantage of layer caching too. Resolves #8684
…in VCF header (#8831) Added a --mask-description argument to VariantFiltration to write a custom description for the mask filter in the VCF header
… with truth VCF (#8836) * added 20 more Permutect read features * Permutect test data can, like training data, be annotated with a truth VCF
* [BIOIN-1570] Fixed edge case in variant annotation when the variant is close to the edge of the reference
…r_to_ah_var_store_again
…_master_to_ah_var_store_again
Github actions tests reported job failures from actions build 9650450183
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.