Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sam_read1 stuck after reaching end of the SAM file when using hts_tpool #1855

Open
yangao07 opened this issue Oct 30, 2024 · 0 comments · May be fixed by #1856
Open

sam_read1 stuck after reaching end of the SAM file when using hts_tpool #1855

yangao07 opened this issue Oct 30, 2024 · 0 comments · May be fixed by #1856
Assignees

Comments

@yangao07
Copy link

To re-produce:

// test.c
#include <stdio.h>
#include <stdlib.h>
#include "htslib/sam.h"
#include "htslib/thread_pool.h"

int main(int argc, char *argv[]) {
    // Open BAM file
    samFile *bam_file = sam_open(argv[1], "r");
    if (bam_file == NULL) {
        fprintf(stderr, "Failed to open BAM file\n");
        return 1;
    }
    hts_tpool *p = hts_tpool_init(1);
    htsThreadPool thread_pool = {p, 0};
    if (hts_set_thread_pool(bam_file, &thread_pool) != 0) {
        fprintf(stderr, "Failed to set thread pool.\n");
        return 1;
    }
    // Load BAM header
    bam_hdr_t *header = sam_hdr_read(bam_file);
    if (header == NULL) {
        fprintf(stderr, "Failed to read BAM header\n");
        sam_close(bam_file);
        return 1;
    }

    // Initialize the alignment data structure
    bam1_t *aln = bam_init1();

    // Iterate through the BAM file
    while (sam_read1(bam_file, header, aln) >= 0);
    int r;
    r = sam_read1(bam_file, header, aln);
    printf("r: %d\n", r);

    // Cleanup
    bam_destroy1(aln);
    bam_hdr_destroy(header);
    sam_close(bam_file);

    return 0;
}
HTSLIB_DIR=
gcc test.c -I$HTSLIB_DIR $HTSLIB_DIR/libhts.a -lm -lz -lpthread -llzma -lbz2 -lcurl -lcrypto; ./a.out input.sam

This happens when the input is SAM, but not with BAM.

jkbonfield added a commit to jkbonfield/htslib that referenced this issue Oct 31, 2024
The SAM sam_dispatcher_read decodes blocks of SAM records into blocks
of BAM records.  As this is (hopefully) reading ahead of the sam_read1
consumer code, when it hits EOF it adds a final NULL block as a
sentinel.  This works well and it forces sam_read1 to return EOF too.

However, if we ignore that and call sam_read1 again, it's consumed our
sentinel block and it gets stuck waiting for the next block of BAM
records.  We now cache the EOF status and check first.

Note this doesn't impact on iterators as they work at a different
level already and it's the iterator itself which tracks EOF.

Fixes samtools#1855
@jkbonfield jkbonfield linked a pull request Oct 31, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants