Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: reads file does not look like a FASTQ file #129

Open
amitjavilaventura opened this issue Feb 7, 2022 · 2 comments
Open

Error: reads file does not look like a FASTQ file #129

amitjavilaventura opened this issue Feb 7, 2022 · 2 comments

Comments

@amitjavilaventura
Copy link

Dear Dr. Langmead,

I am trying to use Bowtie in a pipeline for small RNA-seq. I have been using it for months, but now, using the same command, it throws an error telling that the "read file does not look like a FASTQ file":

Time loading forward index: 00:00:09
Time loading mirror index: 00:00:09
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'

The command is this one:

bowtie -v 1  -M 1  --seed 666 --best --strata  --quiet  --threads 8 --chunkmbs 1024 --time --sam <refGenomeIndexPrefix> <fastq.gz>

The only difference between now and before is that now I am running this in a cluster using a singularity image and before I was running bowtie locally using conda. The previous steps in the pipeline are adapter trimming with cutadapt and quality filtering with fastq_quality_filter.

I was looking at the FASTQ.gz files and they look normal:

@7001450:617:CD9F5ANXX:4:2309:3398:1995 1:N:0:TGACCA
TCTCAGNTTGTCATTTGGAGACTCCCCA
+
BBBCCE#>?FGGGGGGGGGGGGGGGGGG
@7001450:617:CD9F5ANXX:4:2309:3690:1999 1:N:0:TGACCA
TGAACGGAGAATAGAGTACATTGAAGCGA
+
CBBBBGGGGGGGGGGGGGEEGGGGGGGGC

I used three different approaches to validate them:

  • Looking at the sequence string and the quality string and counting the number of cases in those 2 strings are different in length (0 cases where the sequence and the quality strings are different).
  • Using fastq_info from fastq_utils:
fastq_utils 0.25.1
DEFAULT_HASHSIZE=39000001
Scanning and indexing all reads from results/01_fastq/caroli1.filt.fastq.gz
CASAVA=1.8
43600000Scanning complete.

Reads processed: 43600732
Memory used in indexing: ~3346 MB
------------------------------------
Number of reads: 43600732
Quality encoding range: 35 71
Quality encoding: 33
Read length: 19 36 30
OK
  • Using validatefastq from biopet:
INFO  [2022-02-07 17:18:52,605] [ValidateFastq$] - Start
INFO  [2022-02-07 17:18:52,969] [ValidateFastq$$anonfun$main$1] - 100000 reads processed
INFO  [2022-02-07 17:18:53,156] [ValidateFastq$$anonfun$main$1] - 200000 reads processed
...
...
INFO  [2022-02-07 17:20:16,953] [ValidateFastq$$anonfun$main$1] - 43600000 reads processed
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Possible quality encodings found: Sanger, Illumina 1.8+
INFO  [2022-02-07 17:20:16,955] [ValidateFastq$] - Done processing 43600732 fastq records, no errors found
INFO  [2022-02-07 17:20:16,956] [ValidateFastq$] - Done

Non of the approaches resulted in a "unvalid" FASTQ.

Why can this happen?

Thank you.

Best regards,
Adrià.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Feb 7, 2022

Hello,

What version of bowtie are you using on the server? It's possible that that version does not support compressed (GZIP) input.

@amitjavilaventura
Copy link
Author

Hi,

Thanks for the response. I have just noticed that the versions are different.

The version used in the singulairity image from the cluster is:

/opt/miniconda3/bin/bowtie version 1.0.0
64-bit
Built on 4d87110594ec
Wed Mar 23 19:06:59 UTC 2016
Compiler: gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) 
Options: -O3 -m64  -Wl,--hash-style=both  
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

And the version I have locally in the conda environment is:

bowtie-align version 1.2
64-bit
Built on testing-gce-ab28e1d1-a823-4ae9-9c55-f53e1e445058
Sat May  6 18:08:00 UTC 2017
Compiler: gcc version 4.8.5 (GCC) 
Options: -O3 -m64 -I/home/amitjavila/anaconda3/envs/smallRNA/include -L/home/amitjavila/anaconda3/envs/smallRNA/lib -Wl,--hash-style=both -DWITH_TBB -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1  
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

I understand that the version 1.0.0 does not allow for .gz compression.

Thanks.

Adrià.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants