Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug in VCFEncoder.addGenotypeData when reading a list of Variant and then using the list in a multithreaded env. #1026

Open
lindenb opened this issue Nov 14, 2017 · 1 comment · May be fixed by #1636

Comments

@lindenb
Copy link
Contributor

lindenb commented Nov 14, 2017

Subject of the issue

Genotypes in variantcontext look mutable (??)

Minimal example

I've posted an example here: https://gist.github.com/lindenb/7ee76e30dfb662048fff3cf6125b980d

Description

This is a strange bug, may be it's not about htsjdk but I would be happy to know the cause of this.

I'm reading a list of variants from a VCFFileReader:

final CloseableIterator<VariantContext> iter2=vcfFileReader.iterator();
final List<VariantContext> inMemoryVariants =  Collections.unmodifiableList(iter2.stream().
		limit(3).
		collect(Collectors.toList())
		);
iter2.close();
vcfFileReader.close();

this read-only List is then used by some java Threads , I wait for the end of the thread and I write some variants of a new VCF file:

final VariantContextWriter w = (...)
final VCFHeader header2= (...)
w.writeHeader(header2);
w.add(bestResult.ctx1);
w.add(bestResult.ctx2);
w.close();

but here, I get an exception form VCFEncoder.writeAllele called from addGenotypeData :

java.lang.RuntimeException: Allele C* is not an allele in the variant context
	at htsjdk.variant.vcf.VCFEncoder.writeAllele

at the beginning of VCFEncoder.addGenotypeData if I add the statement below, I see my message on stderr: boummm3 {C=1, .=., T*=0} [C*, C*] class java.util.HashMap

final List<Allele> saveAllele = Collections.unmodifiableList(g.getAlleles());
if(!alleleMap.containsKey(g.getAllele(0))) {
                    System.err.println(
                            " boummm3 "+alleleMap+" "+saveAllele+" "+alleleMap.getClass());
                  
                }

Adding the following statement (accessing the alleles) before using the list inMemoryVariants fixes the problem.

inMemoryVariants.forEach(C->{long i=C.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).count();});`

I'm puzlled, do you have any idea ?

Your environment

  • version of htsjdk 2.13.0
  • version of java 1_8_152
  • which OS linux 32
lindenb added a commit to lindenb/jvarkit that referenced this issue Nov 14, 2017
@lbergelson
Copy link
Member

I don't fully understand, but this is almost certainly a concurrency bug with LazyGenotypesContext which seems to be highly unsafe for use with threads. I would expect that making LazyGenotypesContext.decode() synchronized would probably fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants