MetaBlast is a Metagenomic pipeline for parallel Blast analysis and report generation. After splitting the fasta file in chunks, it Blasts each read and then reports up to one top hit for each read. A simple summary file is then generated, reporting the number of reads supporting each top hit. Moreover, the full taxonomy of each top hit is retrieved.
Prerequisites
- Miniconda3.
Tested with conda 4.9.2.
which conda
should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
Then, after completing MetaBlast installation, set the MINICONDA_DIR variable in config_MetaBlast.sh to the full path to miniconda3 directory.
- A Blast-indexed database. For example, you may download NCBI nt db with the following instruction:
mkdir NCBI_nt_db
cd NCBI_nt_db
echo `date +%Y-%m-%d` > download_date.txt
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*
targz_files=$(find . | grep \.tar\.gz$ | sed 's/\.\///g')
for f in $targz_files; do tar -xzvf $f; rm $f; done
Or, in case you want to use a custom <fasta file> downloaded from NCBI, you can create a Blast-indexed database with the following instruction:
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz
gunzip nucl_gb.accession2taxid.gz
tail -n+2 nucl_gb.accession2taxid | cut -f2,3 > Taxid_map.txt
makeblastdb -in <fasta file> -parse_seqids -dbtype nucl -taxid_map Taxid_map.txt
- A fasta file containing sequences you want to Blast. Be sure your fasta file has 2 lines for each sequence. If you need reformatting it do:
seqtk seq -A <sample_name_tmp.fasta> > <sample_name.fasta>
Installation
git clone https://github.com/MaestSi/MetaBlast.git
cd MetaBlast
chmod 755 *
./install.sh
Otherwise, you can download a docker image with:
docker pull maestsi/metablast:latest
A conda environment named MetaBlast_env is created, where blast, krona, parallel and R with packages taxize and data.table are installed. Then, you can open the config_MetaBlast.sh file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.
MetaBlast.sh
Usage: ./MetaBlast.sh -f <sample_name.fasta> -db <blast_db>
Note: modify config_MetaBlast.sh before running;
Inputs:
- <sample_name.fasta>: fasta file containing reads to be analysed
- <blast_db>: Blast-indexed database
Outputs:
- <sample_name>_blast_hits_unique_min_id_perc_$MIN_ID_PERC_min_query_cov_$MIN_QUERY_COV.txt: file containing top Blast hit for each read
- <sample_name>_summary_blast_hits_unique_min_id_perc_$MIN_ID_PERC_min_query_cov_$MIN_QUERY_COV.txt: file containing summary of top Blast hits with full taxonomy
Collapse_taxa.sh
Usage: ./Collapse_taxa.sh <sample_name>_summary_blast_hits.txt <taxonomic level>
Inputs:
- <sample_name>_summary_blast_hits.txt: file produced by MetaBlast.sh
- <taxonomic level>: number indicating the column of the input file representing the desired taxonomic level
Output
- <sample_name>_summary_blast_hits_collapsed_level<taxonomic level>.txt: summary file collapsed at the desired taxonomic level
In the following, auxiliary scripts run by MetaBlast.sh are listed. These scripts should not be called directly.
config_MetaBlast.sh
Note: configuration script, should be edited before running MetaBlast.sh.
Filter_Blast_hits.R
Note: script run by MetaBlast.sh for filtering Blast results, retaining up to 1 top hit for each read.
Retrieve_taxonomy.R
Note: script run by MetaBlast.sh to retrieve the full taxonomy for each top hit.
After completing the analysis, you can open <sample_name>_blast_hits_unique_min_id_perc_$MIN_ID_PERC_min_query_cov_$MIN_QUERY_COV_Krona_report.html with a browser.
If this tool is useful for your work, please consider citing our manuscripts.
Marcolungo L, Passera A, Maestri S, Segala E, Alfano M, Gaffuri F, Marturano G, Casati P, Bianco PA, Delledonne M. Real-Time On-Site Diagnosis of Quarantine Pathogens in Plant Tissues by Nanopore-Based Sequencing. Pathogens. 2022; 11(2):199. https://doi.org/10.3390/pathogens11020199.
Tarquini G, Martini M, Maestri S, Firrao G, Ermacora P. The Virome of ‘Lamon Bean’: Application of MinION Sequencing to Investigate the Virus Population Associated with Symptomatic Beans in the Lamon Area, Italy. Plants. 2022; 11(6):779. https://doi.org/10.3390/plants11060779
Tarquini G, Maestri S, Ermacora P, Martini M. The Oxford Nanopore MinION as a Versatile Technology for the Diagnosis and Characterization of Emerging Plant Viruses. Methods Mol Biol. 2024;2732:235-249. doi:10.1007/978-1-0716-3515-5_16
Please refer to the following manuscripts for further information.
Tange, O. (2020, September 22). GNU Parallel 20200922 ('Ginsburg'). Zenodo. https://doi.org/10.5281/zenodo.4045386
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.
Scott Chamberlain, Eduard Szocs (2013). “taxize - taxonomic search and retrieval in R.” F1000Research.