Using the CLC genomics service
- 1 Introduction
- 2 Account request
- 3 Workbench versions compatible with genomics server
- 4 Requirements for using the CLC genomics server
- 5 Login to the CLC genomics server from the CLC genomics workbench client
- 6 Data management
- 7 Submitting jobs from the CLC genomics workbench client
- 8 Local BLAST Searches
- 9 Reference Genomes
- 10 Tutorials
- 11 Documentation on the CLC Genomics Workbench
CLC Genomics Server is a software solution for centralized bioinformatics analysis and sharing of data generated from all High-Throughput Sequencing platforms. It contains the same tools as the CLC Genomics Workbench, such as mapping of reads to a known reference, de novo assembly, and variant calling. With a single click within CLC Genomics Workbench it is possible to offload resource-demanding tasks to an HPC cluster, that would not possible to analyse in a desktop computer environment. Please find further information about the CLC genomics workbench on the sharepoint page of D-BIOL.
Workbench versions compatible with genomics server
Requirements for using the CLC genomics server
For using the CLC genomics server installation on Euler, certain requirements need to be fulfilled. First of all, you need an installation of the CLC genomics workbench client on your local computer. The client software is provided by IDES (www.ides.ethz.ch). Furthermore, you need to install the CLC workbench client plugin, that is used for the communication between the CLC genomics workbench and the CLC genomics server.
- Start the CLC genomics workbench (on Windows computers you have to start it as administrator, i.e., right-click the CLC genomics icon and choose run as administrator)
- Click on the Plug-in button
- Click on the Download Plug-ins tab and choose the CLC workbench client plugin and click on Download and Install
As a last requirement you need to request a CLC genomics server account. For requesting an account, please contact firstname.lastname@example.org
Login to the CLC genomics server from the CLC genomics workbench client
For connecting the CLC genomics workbench client to the CLC genomics server, an SSH-tunnel is no longer required. The CLC genomics server on Euler is running in a virtual machine and the clients can directly connect to this virtual machine.
Connecting the client to the server
- Open the CLC genomics workbench client (first only the local data is shown in the menu at the top left)
- Open the File menu and click on the entry CLC Server Login
- Enter the username, password of your CLC genomics server account
- Click on "Advanced" and enter clc01.hpc-lca.ethz.ch as server host and 7777 as server port. Then click on the Login button
After the login procedure, the server data locations are displayed in the Navigation Area menu. When connected to the CLC genomics server, you will be able to see all server data locations (the folder with a blue dot next to them) but not their content. You will only be able to see and use the content of your own data location (unless you explicitly ask us to change the permissions in case you would like to share data with other users).
Connecting to the server via the web interface
The CLC genomics server provides a web interface which allows the users to connect to the server via their browser. It is possible to do more user-oriented things like browsing data, upload/download data, access/edit meta-data on data and do data-queries.
- Open a web browser
- Enter clc01.hpc-lca.ethz.ch:7777 in the address field of your browser
- Enter your NETHZ username and password
The user data that is processed by the CLC genomics server installation on Euler first needs to be imported into the server. Therefore we attach a server data location (one folder) to each CLC genomics server account that is created on Euler. Unless a user owns some permanent space in Euler, the server data locations are considered as scratch space that can be used for temporary storage of data and will be purged on a regular basis. After the jobs have finished, the results should be copied back on a local machine or any other storage location. Please note that there is no backup for these data sets.
In general there are two different ways of importing data to a server data location. On one hand, the data can directly be imported into the CLC genomics workbench client and then be moved to the server data location by drag-and-drop within the client. For this, one has to click on a file in the local CLC data location and move it to the server data location that is attached to each CLC genomics server account. Mounting NAS shares from the IT services storage group been tested on Euler and should work.
Submitting jobs from the CLC genomics workbench client
As an example for demonstrating how to submit a job from the CLC genomics workbench client to the Euler cluster, we choose a BLAST search. For all other tasks that can be achieved with the CLC genomics workbench client, it works the same way. There is a single difference when comparing a CLC job on Euler with a local run using the CLC genomics workbench client. You need to choose the grid option instead of workbench and then in a next step, you can choose a queue.
For CLC on Euler, we have several queues that range from 1 to 24 cores.Please be aware that not all of the applications of the CLC genomics server can make use of multiple cores. Only choose a queue with more than 1 core, if the application you would like to use is listed here. Otherwise, please choose the 1 core queue. The list is based on genomics server manual (http://resources.qiagenbioinformatics.com/manuals/clcserver/800/admin/User_Manual.pdf):
- Basic Variant Detection
- Create Alignment
- Create Detailed Mapping Report
- Create Sequencing QC Report
- De Novo Assembly
- Extract and Count
- Fixed Ploidy Variant Detection
- K-mer Based Tree Construction
- Large Gap Read Mapper (current in beta, part of the Transcript Discovery plug-in)
- Locale Realignment
- Low Frequency Variant Detection
- Map Reads to Contigs
- Map Reads to Reference
- Maximum Likelihood Phylogeny
- Model Testing
- Probabilistic Variant Detection (legacy)
- Quality-based Variant Detection (legacy)
- RNA-Seq Analysis
- Trim Sequences
When setting up a BLAST search, you can set the option in the workbench how many threads should be used. Please set this to 12, when using the 12 core queue.
Local BLAST Searches
Euler provides a local BLAST database, which is currently static but in the future it will be updated once a week from the NCBI reference. The local BLAST search is much faster than the BLAST requests, sent to the NCBI. At a later stage of the project, users will also be able to provide their own databases in addition to the BLAST ones.
Please find below a list of tutorials for the CLC Genomics Workbench provided by Qiagen. You can find the links to the data required for the tutorials inside the PDF documents.
|An introduction to workflows||Workflow-intro.pdf|
|Assemble sequences to a reference||Assemble_sequences.pdf|
|Phylogenic trees and metadata||Phylogenetic_trees.pdf|
|Molecular biology basics||Getting_started_mol_bio.pdf|
|Folding RNA molecules||Simple_RNA_folding.pdf|
|Bisulfite sequencing (requires Bisulfite Sequencing plugin)||Bisulfite_Sequencing.pdf|
|ChIP sequencing (requires Histone ChIP-Seq or Transcript Discovery (Beta) plugin)||ChIP-seq_peakshape.pdf|
|Visualize variants on protein structure||visualize_variant_on_structure.pdf|
|Small RNA analysis||Small_RNA_analysis_Illumina.pdf|
|Resequencing analysis using tracks||Resequencing-and-tracks-chrM.pdf|
|Resequencing – map reads to reference and variant detection||Resequencing.pdf|
|Reference genome and annotation tracks||Reference_genome_tracks.pdf|
|Read mapping in detail||Read_mapping_in_detail.pdf|
|Comparative analysis of three bovine genomes||Comparative_analysis_of_three_bovine_genomes.pdf|
|De novo assembly and BLAST||De_novo_assembly_and_BLAST.pdf|
|De novo assembly of paired data||De_novo_assembly_paired_data.pdf|
|Microarray-based expression analysis||Expression_analysis.pdf|
|Expression analysis with the Advanced RNA-Seq plugin (requires Advanced RNA-Seq plugin)||RNASeq-droso.pdf|
|Whole metagenome functional analysis (beta) (requires CLC Microbial Genomics Module and MetaGeneMark plugin)||Microbial_Analysis_Functional.pdf|
|Typing and epidemiological clustering of common pathogens (beta) (requires CLC Microbial Genomics Module plugin)||Typing_Epidemiological_Clustering.pdf|
|OTU clustering and analysis of microbial communities (requires CLC Microbial Genomics Module plugin)||OTU_Clustering_Microbial_Analysis.pdf|
|Microbiome profiling using workflows (requires CLC Microbial Genomics Module plugin)||OTU_Clustering_Microbial_Analysis_Quickguide.pdf|
Documentation on the CLC Genomics Workbench
CLC Bio provides a variety of documentations and tutorials to help the users getting started: