CLC genomics server

From ScientificComputing
Revision as of 08:34, 15 August 2016 by Sfux (talk | contribs) (Created page with "CLC Genomics Server 7.5.1 (for further informations see http://www.clcbio.com/products/clc-genomics-server) is installed on the Euler cluster. == Introduction == The CLC Gen...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CLC Genomics Server 7.5.1 (for further informations see http://www.clcbio.com/products/clc-genomics-server) is installed on the Euler cluster.

Introduction

The CLC Genomics Workbench (http://www.clcbio.com/products/clc-genomics-workbench/) is a next generation sequencing solution that provides numerous features within the fields of genomics, transcriptomics and epigenomics and additionally includes all features of CLC Main Workbench. Further information about the CLC Genomics Workbench are provided on the sharepoint page of D-BIOL.

The CLC Genomics Workbench can used as stand-alone application, but for calculations that require larger amounts of computational resources, it may reaches its limitations. Therefore CLC Bio provides the CLC Genomics Server installation, which allows users to offload their resource-demanding tasks from the CLC Genomics Workbench clients to the server installation on Euler. The jobs are then submitted from the CLC Genomics Workbench and processed on the Euler cluster.

Workbench versions compatible with genomics server on Euler

  • 8.5.x

Requirements for Using the CLC Genomics Server Installation on Euler

For using the CLC Genomics Server installation on Euler, certain requirements need to be fulfilled. First of all, an installation of the CLC Genomics Workbench client on you local computer is needed. The client software is provided by IDES (www.ides.ethz.ch). Furthermore, you need to install the CLC Workbench Client Plugin, that is used for the communication between the CLC Genomics Workbench and the CLC Genomics Server.

  1. Start the CLC Genomics Workbench (on Windows Computers you have to start it as administrator, i.e., right-click the CLC workbench icon and choose run as administrator)
  2. Click on the Plug-in button
  3. Click on the Donwload Plug-ins tab and choose the CLC Workbench Client Plugin and click on Download and Install
Plug-in button
Download and install plug-in

As a last requirement a CLC Genomics Server account is needed to use the CLC Genomics Server installation on the Brutus cluster. For requesting an account, please contact cluster-support@id.ethz.ch

Login to the CLC Genomics Server from the CLC Genomics Workbench Client

For connecting the CLC Genomics Workbench client to the CLC Genomics Server, an SSH-tunnel is no longer required. The CLC Genomics Server on Euler is running in a virtual machine and the clients can directly connect to this virtual machine.

Connecting the Client to the Server

  1. Open the CLC Genomics Workbench client (first only the local data is shown in the menu at the top left)
  2. Open the File menu and click on the entry CLC Server Login
  3. Enter the username, password of your CLC Genomics Server account
  4. Click on "Advanced" and enter clc01.hpc-lca.ethz.ch as server host and 7777 as server port. Then click on the Login button

After the login procedure, the server data locations will be displayed in the Navigation Area menu. When connected to the CLC Genomics Server, you will be able to see all server data locations (the folder with a blue dot next to them) but not their content. You will only be able to see and use the content of your own data location (unless you explicitly ask us to change the permissions in case you would like to share data with other users).

Local data locations
Login option in the "File" menu
Enter username, password, server host and port
Server data locations

Connecting to the Server via the Web Interface

The CLC Genomics Server provides a web interface which allows the users to connect to the server via their browser. It is possible to do more user-oriented things like browsing data, upload/download data, access/edit meta-data on data and do data-queries.

  1. Open a web browser
  2. Enter clc01.hpc-lca.ethz.ch:7777 in the address field of your browser
  3. Enter your NETHZ username and password
Login screen of the web interface
Browsing data in the web interface

Data Management

The data sets that users would like to use for the CLC Genomics Server installation on Euler need to be imported to the server before they can be used. Therefore we attach a server data location (one folder) to each CLC Genomics Server account that is created on Euler. Unless a user owns some permanent space in Euler, the server data locations are considered as scratch space that can be used for temporary storage of data and will be purged on a regular basis. After the jobs have finished, the results should be copied back on a local machine or any other storage location. Please note that there is no backup for these data sets.

In general there are two different ways of importing data to a server data location. On one hand, the data can directly be imported into the CLC Genomics Workbench client and then be moved to the server data location by drag-and-drop within the client. For this, one has to click on a file in the local CLC data location and move it to the server data location that is attached to each CLC Genomics Server account. Mounting NAS shares from the IT services storage group been tested on Euler and should work.

Submitting Jobs from the CLC Genomics Workbench Client to Euler

As an example for demonstrating how to submit a job from the CLC Genomics Workbench client to the Euler cluster, we choose a BLAST search. For all other tasks that can be achieved with the CLC Genomics Workbench client, it works the same way. In principle there is a single difference when comparing to run a job on Brutus instead of the CLC Genomics Workbench client. You have to choose the Grid option instead of Workbench and then you have to choose a queue.

For CLC on Euler, we have several queues that range from 1 to 24 cores.Please be aware that not all of the applications of the CLC Genomics Server can make use of multiple cores. Only choose a queue with more than 1 core, if the application you would like to use is listed here. Otherwise, please choose the 1 core queue:

  • Trim Sequences
  • Create Alignment
  • Map Reads to Reference
  • De Novo Assembly
  • RNA-Seq Analysis
  • Probabilistic Variant Detection
  • Create Sequencing QC Report
  • Create Detailed Mapping Report
  • BLAST
  • Large Gap Read Mapper (current in beta, part of the Transcript Discovery plug-in)

When setting up a BLAST search, you can set the option in the workbench how many threads should be used. Please set this to 12, when using the 12 core queue.


Click on data in server location and an application
Choose CLC Server option
Job is submitted to cluster
Job is queued
Job is running
Job has finished, data can be copied back

Local BLAST Searches

Euler provides a local BLAST database, which is currently static but in the future it will be updated once a week from the NCBI reference. The local BLAST search is much faster than the BLAST requests, sent to the NCBI. At a later stage of the project, users will also be able to provide their own databases in addition to the BLAST ones.

Documentation and Tutorials on the CLC Genomics Workbench

CLC Bio provides a variety of documentations and tutorials to help the users getting started: