Galaxy Depot Software Stack

From ScientificComputing
Jump to: navigation, search

Introduction

The Galaxy depot software stack [1] includes over 85 thousands containers for various versions of the software tools. They are mainly related to bioinformatics, but not only.

The containers are in runnable state so they are optimized for size, which ensures faster download.

The singularity containers can be used on Euler as single tools or in workflows, eg Snakemake.

Single containers

First, you would need to fetch the container from Galaxy depot.

wget https://depot.galaxyproject.org/singularity/hisat2:2.1.0--py37hc9558a2_4

Then you can run it with Singularity.

singularity run \
--bind /cluster/scratch/michalo/Anthony_RNA/:/mnt2 \
--bind /cluster/home/michalo/project_michalo/hisat/grch38/:/genomes \
./.snakemake/singularity/b61389370eb9d44658d3a60b6471b2b6.simg

Snakemake use

You can find an example snakemake workflow for RNA-seq with the use of Galaxy stack containers at

https://github.com/michalogit/snake_hisat/blob/master/Snakefile_containers

The rules include in the singularity clause a location of the container.

rule hisat_map:
   input:
       "trimmed_data/{sample}.fastq.gz"
   output:
       "mapped_reads/{sample}.sam"
   singularity:
       "https://depot.galaxyproject.org/singularity/hisat2:2.1.0--py37hc9558a2_4"
   shell:
       "hisat2  -q -p "+CORES+" -x /genomes/"+GENOME+" -U /mnt2/trimmed_data/{input} -S /mnt2/mapped_reads/{wildcards.sample}.sam "
rule samtools_convert:
   input:
       "mapped_reads/{sample}.sam"
   output:
       "mapped_reads/{sample}.bam"
   singularity:
       "https://depot.galaxyproject.org/singularity/samtools:1.9--h91753b0_8"
   shell:
       "samtools view -@ "+CORES+" -bS {input} > {output} "


You can run the whole snakemake workflow on Euler as shown below. The containers are loaded on the first use and then re-used by snakemake.

The singularity arguments are defining the folders on the cluster available to the containers:

--singularity-args "--bind /cluster/scratch/michalo/Anthony_RNA/:/mnt2 --bind /cluster/home/michalo/project_michalo/hisat/grch38/:/genomes"

Then you can run the container with Slurm as follows

snakemake -p -j 999 --use-singularity --cluster-config cluster.json \
--cluster "sbatch --time {cluster.time} -n 1 --cpus-per-task={cluster.n}" \
--singularity-args "--bind /cluster/scratch/michalo/Anthony_RNA/:/mnt2 --bind /cluster/home/michalo/project_michalo/hisat/grch38/:/genomes --bind /cluster/home/michalo/project_michalo/hg38/:/annots"