GATK

From ScientificComputing
Jump to: navigation, search

Category

Bioinformatics

Description

The GATK is used for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic variant calling tools, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data. These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.

Available versions (Euler, old software stack)

Legacy versions Supported versions New versions
3.4.46, 3.5, 3.7, 3.8

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

Environment modules (Euler, old software stack)

Version Module load command Additional modules loaded automatically
3.4.46 module load gcc/4.8.2 java/1.8.0_91 gatk/3.4.46
3.5 module load gcc/4.8.2 java/1.8.0_91 gatk/3.5
3.7 module load gcc/4.8.2 java/1.8.0_91 gatk/3.7
3.8 module load gcc/4.8.2 java/1.8.0_91 gatk/3.8

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

How to submit a job

You can submit a GATK job in batch mode with the following command:
sbatch [Slurm options] --wrap="GATK [GATK options]"
Here you need to replace [GATK options] with GATK command line options and [Slurm options] with Slurm parameters for the resource requirements of the job. Please find a documentation about the parameters of sbatch on the wiki page about the batch system.

License information

GATK license

Links

https://software.broadinstitute.org/gatk

https://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-1
http://www.intel.com/content/www/us/en/healthcare-it/solutions/genomicscode-gatk.html
http://gatkforums.broadinstitute.org/gatk