GATK

From ScientificComputing
Jump to: navigation, search

Category

Bioinformatics

Description

The GATK is used for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic variant calling tools, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data. These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.

Available versions

Legacy versions Supported versions New versions
3.4.46, 3.5, 3.7, 3.8

Environment modules

Version Module load command Additional modules loaded automatically
3.4.46 module load gcc/4.8.2 java/1.8.0_91 gatk/3.4.46
3.5 module load gcc/4.8.2 java/1.8.0_91 gatk/3.5
3.7 module load gcc/4.8.2 java/1.8.0_91 gatk/3.7
3.8 module load gcc/4.8.2 java/1.8.0_91 gatk/3.8

How to submit a job

You can submit a GATK job in batch mode with the following command:
bsub [LSF options] "GATK [GATK options]"
Here you need to replace [GATK options] with GATK command line options and [LSF options] with LSF parameters for the resource requirements of the job. Please find a documentation about the parameters of bsub on the wiki page about the batch system.

License information

GATK license

Links

https://software.broadinstitute.org/gatk

https://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-1
http://www.intel.com/content/www/us/en/healthcare-it/solutions/genomicscode-gatk.html
http://gatkforums.broadinstitute.org/gatk