You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thanks for the great tools, I use Tabix/Bgzip extensively in my work and am very grateful for the continued support of you folks continuously making them better (especially the extension of S3/GCS support)!
I think this may be related to this #1037, and/or if it is or part of another issue I missed in my brief search of the issues list, feel free to close/merge it in there. Related to that @daviesrob may be interested in this ticket.
I noticed recently that when running many concurrent tabix queries---using GNU parallel with -j80---against a small set of bgzipped/indexed files on an S3 bucket from an EC2 instance in the same AWS region, that I was seeing empty results from a few of them when there should have been actual records pulled down, but no errors were reported (return status was 0 for all queries).
I am using bash with set -exo pipefail, so I found this odd. [I'm fine with a minority of errors cropping up as long as they're reported---I'll just re-run those queries.]
My working hypothesis is that I'm overloading the networking stack (probably a receive buffer somewhere) on the system and that libcurl is reporting errors for a few of the concurrent jobs, but these aren't being fully caught and reported by Tabix. That said, libcurl maybe the culprit but I'm assuming it's not in this case.
I'm using version htslib 1.20, but the section of the code where I think this issue is (below) doesn't appear to be different between 1.20 and the current development branch.
I went back and added some of my own manual debug fprintf's to hfile_libcurl.c where I think the problem may be occurring, just before this line,
The one test instance where I saw something relevant was here:
in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 18882,0,-1,-1,0
in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 32193,0,-1,-1,0
in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 25206,1,0,-1,0
in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 25206,1,0,-1,0
in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 0,1,0,-1,0
[W::bgzf_read_block] EOF marker is absent. The input may be truncated
Command being timed: "htslib-1.20/tabix -D s3://S3_PATH_TO_BUCKET/allpairs.byfeature.gz chr12:11456460-11457010"
....
Exit status: 0
That range has records in the bgzipped file on S3, but the output was empty and I noticed that got here was 0 which is not being caught by libcurl_read(...) in this case.
My quick and dirty solution was to simply add:
if(got == 0) { return -1; }
and that seemed to fix it (in the sense of reporting an error when this happens, which is all I want) though I haven't run extensive tests.
I'm not claiming this fixes all the issues, but it does seem to get at a potential gap in the error checking in that file.
Thanks,
Chris
The text was updated successfully, but these errors were encountered:
Using your fprintf statement I get this: in libcurl_read: got,fp->finished, fp->final_result, to_skip, errno: 0,1,0,-1,0
at the end of every download from s3. It looks like a normal part of the process.
Can you check if it appears on your working tabixes?
Hi,
First, thanks for the great tools, I use Tabix/Bgzip extensively in my work and am very grateful for the continued support of you folks continuously making them better (especially the extension of S3/GCS support)!
I think this may be related to this #1037, and/or if it is or part of another issue I missed in my brief search of the issues list, feel free to close/merge it in there. Related to that @daviesrob may be interested in this ticket.
I noticed recently that when running many concurrent tabix queries---using GNU parallel with
-j80
---against a small set of bgzipped/indexed files on an S3 bucket from an EC2 instance in the same AWS region, that I was seeing empty results from a few of them when there should have been actual records pulled down, but no errors were reported (return status was 0 for all queries).I am using bash with
set -exo pipefail
, so I found this odd. [I'm fine with a minority of errors cropping up as long as they're reported---I'll just re-run those queries.]My working hypothesis is that I'm overloading the networking stack (probably a receive buffer somewhere) on the system and that libcurl is reporting errors for a few of the concurrent jobs, but these aren't being fully caught and reported by Tabix. That said, libcurl maybe the culprit but I'm assuming it's not in this case.
I'm using version htslib 1.20, but the section of the code where I think this issue is (below) doesn't appear to be different between 1.20 and the current development branch.
I went back and added some of my own manual debug
fprintf
's tohfile_libcurl.c
where I think the problem may be occurring, just before this line,htslib/hfile_libcurl.c
Line 859 in ca92061
The one test instance where I saw something relevant was here:
That range has records in the bgzipped file on S3, but the output was empty and I noticed that
got
here was 0 which is not being caught bylibcurl_read(...)
in this case.My quick and dirty solution was to simply add:
and that seemed to fix it (in the sense of reporting an error when this happens, which is all I want) though I haven't run extensive tests.
I'm not claiming this fixes all the issues, but it does seem to get at a potential gap in the error checking in that file.
Thanks,
Chris
The text was updated successfully, but these errors were encountered: