Blast++

From ScientificComputing
Jump to: navigation, search

Category

Bioinformatics

Description

BLAST++ is a suite of command-line tools to run BLAST. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Available versions (Euler, old software stack)

Legacy versions Supported versions New versions
2.10.0, 2.2.30, 2.7.1

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

Environment modules (Euler, old software stack)

Version Module load command Additional modules loaded automatically
2.10.0 module load gcc/6.3.0 blast/2.10.0
2.2.30 module load gcc/4.8.2 blast/2.2.30
2.7.1 module load gcc/4.8.2 blast/2.7.1

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

How to submit a job

You can submit a blast+ job in batch mode with the following command:
sbatch [Slurm options] --wrap="[blast executable] [blast options]"

Here you need to replace [blast executable] with one of the following executables:

[sfux@eu-login-05 bin]$ ls
blastdb_aliastool  blastn             deltablast        makembindex           rpstblastn  tblastn
blastdbcheck       blastp             dustmasker        makeprofiledb         seedtop     tblastx
blastdbcmd         blastx             gene_info_reader  project_tree_builder  segmasker   update_blastdb.pl
blastdbcp          convert2blastmask  legacy_blast.pl   psiblast              seqdb_demo  windowmasker
blast_formatter    datatool           makeblastdb       rpsblast              seqdb_perf  windowmasker_2.2.22_adapter.py
Further more, you need to replace [blast options] with blast command line options and [Slurm options] with Slurm parameters for the resource requirements of the job. Please find a documentation about the parameters of sbatch on the wiki page about the batch system.

Example

As an example for a blast++ job, we are doing a simple query of the sequence test.fasta
[sfux@eu-login-06 ~]$ cat test.fasta 
>sequence1
MIKKIGVLTSGGDAPGMNAAIRGVVRSALTEGLEVMGIYDGYLGLYEDRMVQLDRYSVSD
MINRGGTFLGSARFPEFRDENIRAVAIENLKKRGIDALVVIGGDGSYMGAMRLTEMGFPC
IGLPGTIDNDIKGTDYTIGFFTALSTVVEAIDRLRDTSSSHQRISVVEVMGRYCGDLTLA
AAIAGGCEFVVVPEVEFSREDLVNEIKAGIAKGKKHAIVAITEHMCDVDELAHFIEKETG
RETRATVLGHIQRGGSPVPYDRILASRMGAYAIDLLLAGYGGRCVGIQNEQLVHHDIIDA
IENMKRPFKGDWLDCAKKLY

and compare it against the nt database.

[sfux@eu-login-06 ~]$ module load gcc/4.8.2 blast/2.2.30
[sfux@eu-login-06 ~]$ bsub -n 1 -W 4:00 -R "rusage[mem=2048]" "blastp -query test.fasta -out output.blast.txt -db nr"
#Generic job.
#Job <33641518> is submitted to queue <normal.4h>.
[sfux@eu-login-06 ~]$ bjobs
JOBID      USER      STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
33641518   leonhard  PEND  normal.4h  euler06
[sfux@eu-login-06 ~]$ bjobs
JOBID      USER      STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
33641518   leonhard  RUN   normal.4h  euler06     e1057       *xt -db nr Dec  6 10:02
[sfux@eu-login-06 ~]$ bjobs
No unfinished job found

The result is then written to the output file output.blast.txt:

[sfux@eu-login-06 ~]$ sed -n '25,40p' output.blast.txt 
Query= sequence1

Length=320
                                                                     Score     E
Sequences producing significant alignments:                          (Bits)  Value

ref|WP_000591795.1|  MULTISPECIES: ATP-dependent 6-phosphofructok...    650   0.0   
gb|EFJ85506.1|  6-phosphofructokinase [Escherichia coli MS 84-1]        651   0.0   
gb|ABF05609.1|  6-phosphofructokinase I [Shigella flexneri 5 str....    651   0.0   
ref|WP_024228092.1|  ATP-dependent 6-phosphofructokinase [Escheri...    650   0.0   
gb|ABE09911.1|  6-phosphofructokinase isozyme I [Escherichia coli...    651   0.0   
gb|ADX52955.1|  6-phosphofructokinase [Escherichia coli KO11FL]         651   0.0   
ref|WP_000967668.1|  ATP-dependent 6-phosphofructokinase [Escheri...    651   0.0   
ref|WP_032279226.1|  MULTISPECIES: ATP-dependent 6-phosphofructok...    649   0.0   
gb|EEJ48186.1|  6-phosphofructokinase [Escherichia coli 83972]          650   0.0   
ref|WP_039061908.1|  MULTISPECIES: ATP-dependent 6-phosphofructok...    649   0.0   
You can find the resource usage summary of the job in the corresponding LSF log file.

License information

LGPLv2.1

Notes

On the Euler cluster, we provide a local copy of the NCBI BLAST databases, which is synchronized once per week. If you use the centrally installed blast executables, then the local copy of the BLAST databases will be used. It is stored at
/cluster/project/clcgenomics/CLC_BLAST_DB
If you would like to use the NCBI BLAST databases with other applications than the NCBI tools, then please specify the above-mentioned path to point your application to the databases.

Links

https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

https://www.ncbi.nlm.nih.gov/guide/howto/run-blast-local
https://www.ncbi.nlm.nih.gov/books/NBK279675
https://www.ncbi.nlm.nih.gov/books/NBK279668