Blast++
From ScientificComputing
Contents
Category
BioinformaticsDescription
BLAST++ is a suite of command-line tools to run BLAST. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.Available versions (Euler, old software stack)
Legacy versions | Supported versions | New versions |
---|---|---|
2.10.0, 2.2.30, 2.7.1 |
Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.
Environment modules (Euler, old software stack)
Version | Module load command | Additional modules loaded automatically |
---|---|---|
2.10.0 | module load gcc/6.3.0 blast/2.10.0 | |
2.2.30 | module load gcc/4.8.2 blast/2.2.30 | |
2.7.1 | module load gcc/4.8.2 blast/2.7.1 |
Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.
How to submit a job
You can submit a blast+ job in batch mode with the following command:sbatch [Slurm options] --wrap="[blast executable] [blast options]"
Here you need to replace [blast executable] with one of the following executables:
[sfux@eu-login-05 bin]$ ls blastdb_aliastool blastn deltablast makembindex rpstblastn tblastn blastdbcheck blastp dustmasker makeprofiledb seedtop tblastx blastdbcmd blastx gene_info_reader project_tree_builder segmasker update_blastdb.pl blastdbcp convert2blastmask legacy_blast.pl psiblast seqdb_demo windowmasker blast_formatter datatool makeblastdb rpsblast seqdb_perf windowmasker_2.2.22_adapter.pyFurther more, you need to replace [blast options] with blast command line options and [Slurm options] with Slurm parameters for the resource requirements of the job. Please find a documentation about the parameters of sbatch on the wiki page about the batch system.
Example
As an example for a blast++ job, we are doing a simple query of the sequence test.fasta[sfux@eu-login-06 ~]$ cat test.fasta >sequence1 MIKKIGVLTSGGDAPGMNAAIRGVVRSALTEGLEVMGIYDGYLGLYEDRMVQLDRYSVSD MINRGGTFLGSARFPEFRDENIRAVAIENLKKRGIDALVVIGGDGSYMGAMRLTEMGFPC IGLPGTIDNDIKGTDYTIGFFTALSTVVEAIDRLRDTSSSHQRISVVEVMGRYCGDLTLA AAIAGGCEFVVVPEVEFSREDLVNEIKAGIAKGKKHAIVAITEHMCDVDELAHFIEKETG RETRATVLGHIQRGGSPVPYDRILASRMGAYAIDLLLAGYGGRCVGIQNEQLVHHDIIDA IENMKRPFKGDWLDCAKKLY
and compare it against the nt database.
[sfux@eu-login-06 ~]$ module load gcc/4.8.2 blast/2.2.30 [sfux@eu-login-06 ~]$ bsub -n 1 -W 4:00 -R "rusage[mem=2048]" "blastp -query test.fasta -out output.blast.txt -db nr" #Generic job. #Job <33641518> is submitted to queue <normal.4h>. [sfux@eu-login-06 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 33641518 leonhard PEND normal.4h euler06 [sfux@eu-login-06 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 33641518 leonhard RUN normal.4h euler06 e1057 *xt -db nr Dec 6 10:02 [sfux@eu-login-06 ~]$ bjobs No unfinished job found
The result is then written to the output file output.blast.txt:
[sfux@eu-login-06 ~]$ sed -n '25,40p' output.blast.txt Query= sequence1 Length=320 Score E Sequences producing significant alignments: (Bits) Value ref|WP_000591795.1| MULTISPECIES: ATP-dependent 6-phosphofructok... 650 0.0 gb|EFJ85506.1| 6-phosphofructokinase [Escherichia coli MS 84-1] 651 0.0 gb|ABF05609.1| 6-phosphofructokinase I [Shigella flexneri 5 str.... 651 0.0 ref|WP_024228092.1| ATP-dependent 6-phosphofructokinase [Escheri... 650 0.0 gb|ABE09911.1| 6-phosphofructokinase isozyme I [Escherichia coli... 651 0.0 gb|ADX52955.1| 6-phosphofructokinase [Escherichia coli KO11FL] 651 0.0 ref|WP_000967668.1| ATP-dependent 6-phosphofructokinase [Escheri... 651 0.0 ref|WP_032279226.1| MULTISPECIES: ATP-dependent 6-phosphofructok... 649 0.0 gb|EEJ48186.1| 6-phosphofructokinase [Escherichia coli 83972] 650 0.0 ref|WP_039061908.1| MULTISPECIES: ATP-dependent 6-phosphofructok... 649 0.0You can find the resource usage summary of the job in the corresponding LSF log file.
License information
LGPLv2.1Notes
On the Euler cluster, we provide a local copy of the NCBI BLAST databases, which is synchronized once per week. If you use the centrally installed blast executables, then the local copy of the BLAST databases will be used. It is stored at/cluster/project/clcgenomics/CLC_BLAST_DBIf you would like to use the NCBI BLAST databases with other applications than the NCBI tools, then please specify the above-mentioned path to point your application to the databases.
Links
https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Downloadhttps://www.ncbi.nlm.nih.gov/guide/howto/run-blast-local
https://www.ncbi.nlm.nih.gov/books/NBK279675
https://www.ncbi.nlm.nih.gov/books/NBK279668