-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in htslib/bcftools 1.1: [E::bcf_hdr_add_sample_len] Duplicated sample name #1408
Comments
Edit - just figured that this is because one of the sample names contains an unusual special character "B�R1" - once I edited the name to remove the vcf could be parsed OK. |
This is due to the way HTSlib looks for tabs in the header line, which currently (on x86-64) mistakes UTF-8 characters for the end of the sample name. As a result is splits the name "B�R1" into several parts, and complains about a duplicate when it gets to "R1". Casting The VCF4.3 specification says it uses UTF-8 so presumably names like this ought to be allowed? |
The VCF spec is (as usual) vague on this. samtools/hts-specs#18 motivated UTF-8 by saying “in order to address the need to represent non-ASCII characters in INFO field values, VCF files are assumed to be encoded in UTF-8 […]” which I read as intending UTF-8 for use primarily in free text description fields and the like, as in SAM, but not necessarily in fields like these sample IDs that tools need to compare/etc. PR samtools/hts-specs#414 proposes rules for VCF sample IDs but remains vague. In a different context, ga4gh/refget#2 (comment) wisely noted that allowing arbitrary Unicode would make testing sample IDs for equivalence (e.g. Fortunately this HTSlib parsing code could be fixed to trigger only on |
If we have a test.vcf and use htslib/bcftools 1.1 to:
bcftools view test.vcf
We get
[E::bcf_hdr_add_sample_len] Duplicated sample name .... Failed to read from test.vcf: could not parse header
We get the same error for running any bcftools command.
But using an old version bcftools (1.9-207-g2299ab6 Using htslib 1.9-271-g6738132), then the file can be viewed OK.
The text was updated successfully, but these errors were encountered: