You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly thx for your amazing work.
I really like the concept behind your tool :)
Now concerning the bug:
I worked with a fresh Colab notebook
MMSplice version:
Installation with pip: mmsplice-2.4.0
Python version:
Python 3.10.13
Operating System:
Colab notebook
Description
I wanted to try out mmsplice, so I've downloaded your example data and gave it a go.
But even though I used your test data, I constantly received the following error: ValueError: Fasta chrom names do not match with vcf chrom names.
After hours of wrapping my head around and hacking with the package code I found out:
Not only a VCF-file is required as input, but also its index version (i.e. vcf.gz.tbi).
That was never stated (check your ReadMe)...
Please fix that.
Also, when hacking around, I realized the parsing of the seqnames seems buggy.
It only parses seqnames when an indexed VCF file is provided
But it also always includes {'1'} as seqname for the VCF, no matter what is provided?!
# In your vcf_dataloader.py
def _check_chrom_annotation(self):
# I've added these two lines
fasta_chroms = set(self.fasta.fasta.keys())
vcf_chroms = set(self.vcf.seqnames)
print("fasta: ", fasta_chroms, flush=True)
print("vcf_chroms", vcf_chroms, flush=True)
if not fasta_chroms.intersection(vcf_chroms):
raise ValueError(
'Fasta chrom names do not match with vcf chrom names')
--> Output:
fasta: {'17'}
vcf_chroms {'1', '17'}
...
The VCF seqnames should only include 17, since I've just provided your example.
Firstly thx for your amazing work.
I really like the concept behind your tool :)
Now concerning the bug:
I worked with a fresh Colab notebook
MMSplice version:
Installation with pip:
mmsplice-2.4.0
Python version:
Python 3.10.13
Operating System:
Colab notebook
Description
I wanted to try out mmsplice, so I've downloaded your example data and gave it a go.
But even though I used your test data, I constantly received the following error:
ValueError: Fasta chrom names do not match with vcf chrom names
.After hours of wrapping my head around and hacking with the package code I found out:
Not only a VCF-file is required as input, but also its index version (i.e.
vcf.gz.tbi
).That was never stated (check your ReadMe)...
Please fix that.
Also, when hacking around, I realized the parsing of the seqnames seems buggy.
{'1'}
as seqname for the VCF, no matter what is provided?!The VCF seqnames should only include 17, since I've just provided your example.
What I Did
Take a look:
https://colab.research.google.com/drive/1hx6PAYT_lKuEYtnHCq0PN2lyvNBNDud1?usp=sharing
The text was updated successfully, but these errors were encountered: