Difference between revisions of "CLC genomics server"

From ScientificComputing
Jump to: navigation, search
Line 1: Line 1:
==Introduction ==
 
 
 
The CLC genomics workbench (http://www.clcbio.com/products/clc-genomics-workbench/) is a '''next generation sequencing solution''' that provides numerous features within the fields of genomics, transcriptomics and epigenomics and additionally includes all features of CLC main workbench. Please find further information about the CLC genomics workbench on the [https://sharepoint.biol.ethz.ch/it/clc/SitePages/Home.aspx sharepoint] page of D-BIOL.
 
The CLC genomics workbench (http://www.clcbio.com/products/clc-genomics-workbench/) is a '''next generation sequencing solution''' that provides numerous features within the fields of genomics, transcriptomics and epigenomics and additionally includes all features of CLC main workbench. Please find further information about the CLC genomics workbench on the [https://sharepoint.biol.ethz.ch/it/clc/SitePages/Home.aspx sharepoint] page of D-BIOL.
  
 
You can use the CLC genomics workbench as a stand-alone application, but for calculations that require larger amounts of computational resources, it may reaches its limitations. Therefore Qiagen provides the CLC genomics server, which allows you to '''offload''' your '''resource-demanding tasks''' from the CLC genomics workbench '''to the genomics server''', which runs on the Euler cluster.
 
You can use the CLC genomics workbench as a stand-alone application, but for calculations that require larger amounts of computational resources, it may reaches its limitations. Therefore Qiagen provides the CLC genomics server, which allows you to '''offload''' your '''resource-demanding tasks''' from the CLC genomics workbench '''to the genomics server''', which runs on the Euler cluster.
 
==Workbench versions compatible with genomics server==
 
 
*'''8.5.x'''
 
 
==Requirements for using the CLC genomics server==
 
 
For using the CLC genomics server installation on Euler, certain requirements need to be fulfilled. First of all, you need an '''installation of the CLC genomics workbench client''' on your local computer. The client software is provided by IDES (www.ides.ethz.ch). Furthermore, you need to install the '''CLC workbench client plugin''', that is used for the communication between the CLC genomics workbench and the CLC genomics server.
 
 
# Start the CLC genomics workbench (on Windows computers you have to start it as '''administrator''', i.e., right-click the CLC genomics icon and choose '''run as administrator''')
 
# Click on the '''Plug-in''' button
 
# Click on the '''Download Plug-ins''' tab and choose the '''CLC workbench client plugin''' and click on '''Download and Install'''
 
 
{|
 
|-
 
|[[Image:Eulerclcplugin1.png|thumb|370px|Plug-in button]]
 
|[[Image:Eulerclcplugin2.png|thumb|400px|Download and install plug-in]]
 
 
|}
 
 
As a last requirement you need to request a '''CLC genomics server account'''. For requesting an account, please contact cluster-support@id.ethz.ch
 
 
== Login to the CLC genomics server from the CLC genomics workbench client ==
 
 
For connecting the CLC genomics workbench client to the CLC genomics server, an SSH-tunnel is no longer required. The CLC genomics server on Euler is running in a virtual machine and the clients can directly connect to this virtual machine.
 
 
===Connecting the client to the server===
 
 
# Open the CLC genomics workbench client (first only the '''local data''' is shown in the menu at the top left)
 
# Open the '''File''' menu and click on the entry '''CLC Server Login'''
 
# Enter the '''username''', '''password''' of your CLC genomics server account
 
# Click on "Advanced" and enter '''clc01.hpc-lca.ethz.ch''' as server host and '''7777''' as server port. Then click on the '''Login''' button
 
 
After the login procedure, the server data locations are displayed in the '''Navigation Area''' menu. When connected to the CLC genomics server, you will be able to see all server data locations (the folder with a blue dot next to them) but not their content. You will only be able to see and use the content of your own data location (unless you explicitly ask us to change the permissions in case you would like to share data with other users).
 
 
{|
 
|-
 
|[[Image:Clcwiki1.png|thumb|360px|Local data locations]]
 
|[[Image:Clcwiki2.png|thumb|360px|Login option in the "File" menu]]
 
|-
 
|[[Image:Clcwiki3.png|thumb|360px|Enter username, password, server host and port]]
 
|[[Image:Clcwiki4.png|thumb|360px|Server data locations]]
 
|}
 
 
===Connecting to the server via the web interface===
 
 
The CLC genomics server provides a web interface which allows the users to connect to the server via their browser. It is possible to do more user-oriented things like browsing data, upload/download data, access/edit meta-data on data and do data-queries.
 
 
# Open a web browser
 
# Enter '''clc01.hpc-lca.ethz.ch:7777''' in the address field of your browser
 
# Enter your NETHZ username and password
 
 
{|
 
|-
 
|[[Image:Clcweb1.png|thumb|360px|Login screen of the web interface]]
 
|[[Image:Clcweb2.png|thumb|360px|Browsing data in the web interface]]
 
|-
 
|}
 
 
==Data management==
 
 
The ''' user data''' that is processed by the CLC genomics server installation on Euler first needs to be imported into the server. Therefore we attach a '''server data location''' (one folder) to each CLC genomics server account that is created on Euler. Unless a user owns some permanent space in Euler, the server data locations are considered as scratch space that can be used for temporary storage of data and will be purged on a regular basis. After the jobs have finished, the results should be copied back on a local machine or any other storage location. Please note that there is no backup for these data sets.
 
 
In general there are two different ways of importing data to a server data location. On one hand, the '''data can directly be imported into the CLC genomics workbench client''' and then be moved to the server data location by drag-and-drop within the client. For this, one has to click on a file in the local CLC data location and move it to the server data location that is attached to each CLC genomics server account. Mounting NAS shares from the IT services storage group been tested on Euler and should work.
 
 
==Submitting jobs from the CLC genomics workbench client==
 
 
As an example for demonstrating how to submit a job from the CLC genomics workbench client to the Euler cluster, we choose a BLAST search. For all other tasks that can be achieved with the CLC genomics workbench client, it works the same way. '''There is a single difference when comparing a CLC job on Euler with a local run using the CLC genomics workbench client'''. You need to '''choose the grid option instead of workbench''' and then in a next step, you can choose a queue.
 
 
For CLC on Euler, we have several queues that range from 1 to 24 cores.Please '''be aware that not all of the applications of the CLC genomics server can make use of multiple cores'''. <font color="red">Only choose a queue with more than 1 core, if the application you would like to use is listed here. Otherwise, please choose the 1 core queue</font>:
 
 
*Trim Sequences
 
*Create Alignment
 
*Map Reads to Reference
 
*De Novo Assembly
 
*RNA-Seq Analysis
 
*Probabilistic Variant Detection
 
*Create Sequencing QC Report
 
*Create Detailed Mapping Report
 
*BLAST
 
*Large Gap Read Mapper (current in beta, part of the Transcript Discovery plug-in)
 
'''When setting up a BLAST search, you can set the option in the workbench how many threads should be used. Please set this to 12, when using the 12 core queue.'''
 
 
 
 
{|
 
|-
 
|[[Image:Eulerclcsubjob1.png|thumb|360px|Click on data in server location and an application]]
 
|[[Image:Eulerclcsubjob2.png|thumb|360px|Choose '''CLC Server''' option]]
 
|-
 
|[[Image:Eulerclcsubjob3.png|thumb|360px|Job is submitted to cluster]]
 
|[[Image:Eulerclcsubjob4.png|thumb|360px|Job is queued]]
 
|-
 
|[[Image:Eulerclcsubjob5.png|thumb|360px|Job is running]]
 
|[[Image:Eulerclcsubjob6.png|thumb|360px|Job has finished, data can be copied back]]
 
|}
 
 
==Local BLAST Searches==
 
Euler provides a local BLAST database, which is currently static but in the future it will be updated once a week from the NCBI reference. The local BLAST search is much faster than the BLAST requests, sent to the NCBI. At a later stage of the project, users will also be able to provide their own databases in addition to the BLAST ones.
 
 
==Documentation and Tutorials on the CLC Genomics Workbench==
 
 
CLC Bio provides a variety of documentations and tutorials to help the users getting started:
 
* [http://www.clcbio.com/products/clc-genomics-workbench Main page]
 
* User manual, both [http://www.clcsupport.com/clcgenomicsworkbench/current online] and in [http://www.clcbio.com/files/usermanuals/CLC_Genomics_Workbench_User_Manual.pdf PDF format]
 
* [http://helpdesk.clcbio.com/index.php?pg=kb.book&id=7 FAQ]
 
* [http://www.clcbio.com/desktop-applications/top-features Features of the Genomics Workbench]
 
* [http://www.clcbio.com/support/tutorials Tutorials]
 
* [http://www.clcbio.tv/channel/629214 Video tutorials]
 

Revision as of 09:02, 18 August 2016

The CLC genomics workbench (http://www.clcbio.com/products/clc-genomics-workbench/) is a next generation sequencing solution that provides numerous features within the fields of genomics, transcriptomics and epigenomics and additionally includes all features of CLC main workbench. Please find further information about the CLC genomics workbench on the sharepoint page of D-BIOL.

You can use the CLC genomics workbench as a stand-alone application, but for calculations that require larger amounts of computational resources, it may reaches its limitations. Therefore Qiagen provides the CLC genomics server, which allows you to offload your resource-demanding tasks from the CLC genomics workbench to the genomics server, which runs on the Euler cluster.