Using the CLC genomics service

From ScientificComputing
Jump to: navigation, search

Introduction

CLC Genomics Server is a software solution for centralized bioinformatics analysis and sharing of data generated from all High-Throughput Sequencing platforms. It contains the same tools as the CLC Genomics Workbench, such as mapping of reads to a known reference, de novo assembly, and variant calling. With a single click within CLC Genomics Workbench it is possible to offload resource-demanding tasks to an HPC cluster, that would not possible to analyse in a desktop computer environment.

Account request

The CLC genomics server uses local accounts for authentication. If you would like to use this service, then please contact cluster support to request your CLC account.

Workbench versions compatible with genomics server

  • 11.0.0

Requirements for using the CLC genomics server

For using the CLC genomics server installation on Euler, certain requirements need to be fulfilled. First of all, you need an installation of the CLC genomics workbench client on your local computer. The client software is provided by IDES. Furthermore, you need to install the CLC workbench client plugin, that is used for the communication between the CLC genomics workbench and the CLC genomics server.

  1. Start the CLC genomics workbench (on Windows computers you have to start it as administrator, i.e., right-click the CLC genomics icon and choose run as administrator)
  2. Click on the Plug-in button
  3. Click on the Download Plug-ins tab and choose the CLC workbench client plugin and click on Download and Install
Plug-in button
Download and install plug-in

As a last requirement you need to request a CLC genomics server account. For requesting an account, please contact cluster-support@id.ethz.ch

Login to the CLC genomics server from the CLC genomics workbench client

For connecting the CLC genomics workbench client to the CLC genomics server, an SSH-tunnel is no longer required. The CLC genomics server on Euler is running in a virtual machine and the clients can directly connect to this virtual machine.

Connecting the client to the server

  1. Open the CLC genomics workbench client (first only the local data is shown in the menu at the top left)
  2. Open the File menu and click on the entry CLC Server Login
  3. Enter the username, password of your CLC genomics server account
  4. Click on "Advanced" and enter clc01.hpc-lca.ethz.ch as server host and 7777 as server port. Then click on the Login button

After the login procedure, the server data locations are displayed in the Navigation Area menu. When connected to the CLC genomics server, you will be able to see all server data locations (the folder with a blue dot next to them) but not their content. You will only be able to see and use the content of your own data location (unless you explicitly ask us to change the permissions in case you would like to share data with other users).

Local data locations
Login option in the "File" menu
Enter username, password, server host and port
Server data locations

Connecting to the server via the web interface

The CLC genomics server provides a web interface which allows the users to connect to the server via their browser. It is possible to do more user-oriented things like browsing data, upload/download data, access/edit meta-data on data and do data-queries.

  1. Open a web browser
  2. Enter clc01.hpc-lca.ethz.ch:7777 in the address field of your browser
  3. Enter your ETH username and password
Login screen of the web interface
Browsing data in the web interface

Data management

The user data that is processed by the CLC genomics server installation on Euler first needs to be imported into the server. Therefore we attach a server data location (one folder) to each CLC genomics server account that is created on Euler. Unless a user owns some permanent space in Euler, the server data locations are considered as scratch space that can be used for temporary storage of data and will be purged on a regular basis. After the jobs have finished, the results should be copied back on a local machine or any other storage location. Please note that there is no backup for these data sets.

In general there are two different ways of importing data to a server data location. On one hand, the data can directly be imported into the CLC genomics workbench client and then be moved to the server data location by drag-and-drop within the client. For this, one has to click on a file in the local CLC data location and move it to the server data location that is attached to each CLC genomics server account. Mounting NAS shares from the IT services storage group been tested on Euler and should work.

Submitting jobs from the CLC genomics workbench client

As an example for demonstrating how to submit a job from the CLC genomics workbench client to the Euler cluster, we choose a BLAST search. For all other tasks that can be achieved with the CLC genomics workbench client, it works the same way. There is a single difference when comparing a CLC job on Euler with a local run using the CLC genomics workbench client. You need to choose the grid option instead of workbench and then in a next step, you can choose a queue.

For CLC on Euler, we have several queues that range from 1 to 24 cores.Please be aware that not all of the applications of the CLC genomics server can make use of multiple cores. Only choose a queue with more than 1 core, if the application you would like to use is listed here. Otherwise, please choose the 1 core queue. The list is based on genomics server manual (http://resources.qiagenbioinformatics.com/manuals/clcserver/800/admin/User_Manual.pdf):

  • Basic Variant Detection
  • BLAST
  • Create Alignment
  • Create Detailed Mapping Report
  • Create Sequencing QC Report
  • De Novo Assembly
  • Extract and Count
  • Fixed Ploidy Variant Detection
  • K-mer Based Tree Construction
  • Large Gap Read Mapper (current in beta, part of the Transcript Discovery plug-in)
  • Locale Realignment
  • Low Frequency Variant Detection
  • Map Reads to Contigs
  • Map Reads to Reference
  • Maximum Likelihood Phylogeny
  • Model Testing
  • Probabilistic Variant Detection (legacy)
  • Quality-based Variant Detection (legacy)
  • RNA-Seq Analysis
  • Trim Sequences

When setting up a BLAST search, you can set the option in the workbench how many threads should be used. Please set this to 12, when using the 12 core queue.


Click on data in server location and an application
Choose CLC Server option
Job is submitted to cluster
Job is queued
Job is running
Job has finished, data can be copied back

Local BLAST Searches

Euler provides a local BLAST database, which is currently static but in the future it will be updated once a week from the NCBI reference. The local BLAST search is much faster than the BLAST requests, sent to the NCBI. At a later stage of the project, users will also be able to provide their own databases in addition to the BLAST ones.

Reference Genomes

Information about available reference genomes

Tutorials

Please find below a list of tutorials for the CLC Genomics Workbench provided by Qiagen. You can find the links to the data required for the tutorials inside the PDF documents.

Description Document link
An introduction to workflows Workflow-intro.pdf
Assemble sequences to a reference Assemble_sequences.pdf
BLAST searches BLAST_tips.pdf
Phylogenic trees and metadata Phylogenetic_trees.pdf
Molecular biology basics Getting_started_mol_bio.pdf
Gateway cloning Gateway_cloning-1.pdf
Folding RNA molecules Simple_RNA_folding.pdf
Bisulfite sequencing (requires Bisulfite Sequencing plugin) Bisulfite_Sequencing.pdf
ChIP sequencing (requires Histone ChIP-Seq or Transcript Discovery (Beta) plugin) ChIP-seq_peakshape.pdf
Visualize variants on protein structure visualize_variant_on_structure.pdf
Small RNA analysis Small_RNA_analysis_Illumina.pdf
Resequencing analysis using tracks Resequencing-and-tracks-chrM.pdf
Resequencing – map reads to reference and variant detection Resequencing.pdf
Reference genome and annotation tracks Reference_genome_tracks.pdf
Read mapping in detail Read_mapping_in_detail.pdf
Comparative analysis of three bovine genomes Comparative_analysis_of_three_bovine_genomes.pdf
De novo assembly and BLAST De_novo_assembly_and_BLAST.pdf
De novo assembly of paired data De_novo_assembly_paired_data.pdf
Microarray-based expression analysis Expression_analysis.pdf
Expression analysis with the Advanced RNA-Seq plugin (requires Advanced RNA-Seq plugin) RNASeq-droso.pdf
Whole metagenome functional analysis (beta) (requires CLC Microbial Genomics Module and MetaGeneMark plugin) Microbial_Analysis_Functional.pdf
Typing and epidemiological clustering of common pathogens (beta) (requires CLC Microbial Genomics Module plugin) Typing_Epidemiological_Clustering.pdf
OTU clustering and analysis of microbial communities (requires CLC Microbial Genomics Module plugin) OTU_Clustering_Microbial_Analysis.pdf
Microbiome profiling using workflows (requires CLC Microbial Genomics Module plugin) OTU_Clustering_Microbial_Analysis_Quickguide.pdf

Documentation on the CLC Genomics Workbench

CLC Bio provides a variety of documentations and tutorials to help the users getting started: