Bioconductor R

From ScientificComputing
Jump to: navigation, search

Category

Mathematics, Statistics, Bioinformatics

Description

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, 1296 software packages, and an active user community.

Available versions (Euler, old software stack)

Legacy versions Supported versions New versions
3.0, 3.4, 3.6

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

Environment modules (Euler, old software stack)

Version Module load command Additional modules loaded automatically
3.0 module load gcc/4.8.2 bioconductor/3.0 openblas/0.2.8_seq
3.4 module load gcc/4.8.2 bioconductor/3.4 openblas/0.2.13_seq bzip2/1.0.6 zlib/1.2.8 xz/5.2.2 pcre/8.38 curl/7.49.1 legacy centos_cruft/6
3.6 module load gcc/4.8.2 bioconductor/3.6 openblas/0.2.13_seq bzip2/1.0.6 zlib/1.2.8 xz/5.2.2 pcre/8.38 curl/7.49.1 legacy centos_cruft/6

Please note that this page refers to installations from the old software stack. There are two software stacks on Euler. Newer versions of software are found in the new software stack.

Interactive session

You can start an interactive Bioconductor R session by typing the command R:
[sfux@eu-login-03 ~]$ module load gcc/4.8.2 bioconductor/3.0
Using OpenBLAS build of bioconductor R-3.0
[sfux@eu-login-03 ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-slackware-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>
Please use interactive sessions only to debug your R code or to install extension packages. Jobs need to be submitted through the batch system.

How to submit a job

For small tests, pre- and post-processing with Bioconductor, you can start an interactive R session on the login nodes. All other Bioconductor jobs have to be submitted through the batch system. You can submit a Bioconductor job (inputfile.R) in batch mode with the following command:
sbatch [Slurm options] --wrap="R --vanilla --slave < inputfile.R > outputfile"
Here you need to replace [Slurm options] with Slurm parameters for the resource requirements of the job. Please find a documentation about the parameters of sbatch on the wiki page about the batch system. In this case, stdout is redirected into "outputfile".

Example

As an example for using Bioconductor, we will compare two globally aligned strings and create a consensus matrix.
[leonhard@euler03 ~]$ module load gcc/4.8.2 bioconductor/3.0
Using OpenBLAS build of bioconductor R-3.0
[leonhard@euler03 ~]$ cat test.R 
library(Biostrings)
## Compare two globally aligned strings
string1 <- "ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG"
string2 <- "GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC"
compareStrings(string1, string2)
## Create a consensus matrix
nw1 <-
pairwiseAlignment(AAStringSet(c("HLDNLKGTF", "HVDDMPNAL")), AAString("SMDDTEKMSMKL"),
substitutionMatrix = "BLOSUM50", gapOpening = 3, gapExtension = 1)
consensusMatrix(nw1)
[leonhard@euler03 ~]$ bsub -n 1 -W 4:00 -R "rusage[mem=2048]" "R --vanilla --slave < test.R > test.out"
Generic job.
Job <31331058> is submitted to queue <normal.4h>.
[leonhard@euler03 ~]$ bjobs
JOBID      USER        STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
31331058   leonhard    PEND  normal.4h  euler03                 * test.out Nov  8 13:38
[leonhard@euler03 ~]$ bjobs
JOBID      USER        STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
31331058   leonhard    RUN   normal.4h  euler03     e1442       * test.out Nov  8 13:38
[leonhard@euler03 ~]$ bjobs
No unfinished job found
[leonhard@euler03 ~]$ grep "[1]" lsf.o31331058
[1] "??TTCAC?A??TCC?T???GGTAAGT??AT?---AAA??---AAA???A?A?TTTTCA??"
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
-    0    0    0    0    2    2    2    1    1     0     0     0
A    0    0    0    0    0    0    0    0    0     0     1     0
D    0    0    2    1    0    0    0    0    0     0     0     0
F    0    0    0    0    0    0    0    0    0     0     0     1
K    0    0    0    0    0    0    0    0    0     0     1     0
L    0    1    0    0    0    0    0    0    0     1     0     1
M    0    0    0    0    0    0    0    1    0     0     0     0
N    0    0    0    1    0    0    0    0    0     1     0     0
P    0    0    0    0    0    0    0    0    1     0     0     0
V    0    1    0    0    0    0    0    0    0     0     0     0 
You can find more examples in the reference manuals of the corresponding Bioconductor packages.

Extensions

Bioconductor is based on R and can therefore be extended with additional packages, that can be downloaded from the Bioconductor web site. For installing a package, you first need to start an interactive R session:
module load gcc/4.8.2 bioconductor/3.0
R

Afterwards you need to source the biocLite.R script:

source("http://bioconductor.org/biocLite.R")

As an example, we are installing the Bioconductor package a4:

biocLite("a4")

Since users do not have write permission in the Bioconductor installation directory, you will be asked if you would like to install the package locally:

Would you like to use a personal library instead?  (y/n) y
Would you like to create a personal library
~/R/x86_64-slackware-linux-gnu-library/3.1
to install packages into?  (y/n)
After confirming that you would like to go for a local installation, Bioconductor will download and install the package as well as its dependencies. At the end you will be asked if you would like to update some already installed packages for which there exist newer versions. Please reply with no, as your user does not have write permission in the installation directory.

License information

GPLv2

Links

https://www.bioconductor.org

https://en.wikipedia.org/wiki/Bioconductor
https://twitter.com/bioconductor