Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lr-kallisto dorado unaligned bam files #450

Open
MustafaElshani opened this issue Jul 25, 2024 · 6 comments
Open

lr-kallisto dorado unaligned bam files #450

MustafaElshani opened this issue Jul 25, 2024 · 6 comments

Comments

@MustafaElshani
Copy link

Hi

It is great to hear the support for long-reads coming to kallisto I would like to introduce it to our pipelines

I have few question regarding running lr-kallisto on ONT dorado basecalled reads.

  1. my first step is kallisto bus --long -x bulk -i "$INDEX_PATH" -o "$OUTPUT_DIR/$SAMPLE_NAME" --bam "$BAM_FILE" -t $SLURM_NTASKS this is using the dorado .bam output file, is this correct? or should I use fastq files from the likes of pychopper orientated full length files.
    I attempted to run --bam is through an error?
Error: in order to use BAM, must compile with BAM option enabled
Threshold not in (0,1). Setting default threshold for unmapped kmers to 0.8
Error: --bam not supported in this mode
  1. My second step bustools sort -t $SLURM_NTASKS -o "$OUTPUT_DIR/$SAMPLE_NAME/sorted.bus" "$OUTPUT_DIR/$SAMPLE_NAME/output.bus" bustools count -o "$OUTPUT_DIR/$SAMPLE_NAME/count" -g "$GTF_PATH" -e "$OUTPUT_DIR/$SAMPLE_NAME/matrix.ec" -t "$OUTPUT_DIR/$SAMPLE_NAME/transcripts.txt" --cm "$OUTPUT_DIR$
    is this correct ?

3.Third step being kallisto quant-tcc -i "$INDEX_PATH" -o "$OUTPUT_DIR/$SAMPLE_NAME" --long -P ONT --gtf "$GTF_PATH" --matrix-to-files -t $SLURM_NTASKS "$OUTPUT_DIR/$SAMPLE_NAME/count.mtx"

Is this the correct approach ?
Additionally if I had 10x Genomics Visuim ONT reads can I process these using the -x Visium?

@Yenaled
Copy link
Collaborator

Yenaled commented Jul 25, 2024

You need to compile with -DUSE_BAM=ON in cmake. I haven’t throughly tested whether BAM works though, so FASTQ is the safer option in terms of avoiding bugs.

Your other commands seem fine.

I don’t think -x VISIUM works because that requires barcodes and UMIs to be at fixed positions in R1 with the sequence to be mapped being in R2 — ONT data doesn’t look like that.

@MustafaElshani
Copy link
Author

Thank you that's great I think I will be using pychopper fastq as I have already come into a bug. while building

 make
[  2%] Performing configure step for 'htslib'
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
configure: error: cannot find install-sh, install.sh, or shtool in "." "./.." "./../.."
make[2]: *** [CMakeFiles/htslib.dir/build.make:92: /home/m/scratch/bioinformatic_tools/kallisto/ext/htslib/src/htslib-stamp/htslib-configure] Error 1

It would have been good if VISIUM long reads would have been supported maybe something for the future

Regards

Mustafa

@Yenaled
Copy link
Collaborator

Yenaled commented Jul 25, 2024

You might be able to get it to work — I can compile htslib just fine but I think I needed to use the right C compiler or build in a docker. It’s a bit tricky.

For VISIUM, you can probably get it to work — I’m just not familiar with the read structure.

Copy link
Collaborator

Hi! lr-kallisto is designed to run directly with fastq files. Please let me know if you run into any other issues!

Copy link
Collaborator

Apologies! I hadn't seen your other questions! I think you are missing the genemap for the -g flag which can be generated by kb ref. For the visium ont, the -x visium is for paired end reads, so it won't directly work. I'd be interested to hear more about your visium ont data though! I'd be interested in adding the processing steps for it to our pipelines!

@hyeon9
Copy link

hyeon9 commented Aug 30, 2024

Hi! I'm wondering if the '-x 10xv2 or 10xv3' options are also for paired end reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants