Difference between revisions of "AlphaFold2"

From ScientificComputing
Jump to: navigation, search
Line 4: Line 4:
 
== Load modules ==
 
== Load modules ==
 
[https://github.com/deepmind/alphafold AlphaFold2] is installed in the new software stack can be loaded as following.
 
[https://github.com/deepmind/alphafold AlphaFold2] is installed in the new software stack can be loaded as following.
  [jarunanp@eu-login-18 ~]$ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1
+
  $ env2lmod
 +
$ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1
 
  Now run 'alphafold_init' to initialize the virtual environment
 
  Now run 'alphafold_init' to initialize the virtual environment
 
   
 
   
Line 10: Line 11:
 
   1) gcc/4.8.5 => gcc/6.3.0
 
   1) gcc/4.8.5 => gcc/6.3.0
 
   
 
   
  [jarunanp@eu-login-18 ~]$ alphafold_init
+
  $ alphafold_init
 
  (venv_alphafold) [jarunanp@eu-login-18 ~]$  
 
  (venv_alphafold) [jarunanp@eu-login-18 ~]$  
  
 
== Databases ==
 
== Databases ==
The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH if you have enough space. You can check your free space by using the command
+
The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH. However, if there are several users of AlphaFold in your group, institute or department, we recommend to use a group storage.
  
 +
For D-BIOL members, the AlphaFold databases are currently located at /cluster/work/biol/alphafold.
 +
 +
== Download the AlphaFold databases to your $SCRATCH ==
 +
* Download and install aria2c in your $HOME
 +
$ cd $HOME
 +
$ wget https://github.com/aria2/aria2/releases/download/release-1.36.0/aria2-1.36.0.tar.gz
 +
$ tar xvzf aria2-1.36.0.tar.gz
 +
$ cd aria2-1.36.0
 +
$ module load gcc/6.3.0 gnutls/3.5.13 openssl/1.0.1e
 +
$ ./configure --prefix=$HOME/.local
 +
$ make
 +
$ make install
 +
$ export PATH="$HOME/.local/bin:$PATH"
 +
$ which aria2c
 +
~/.local/bin/aria2c
 +
 +
* Check if you have enough space in your $SCRATCH. You may need to free up your $SCRATCH in case there is not enough space.
 
  $ lquota
 
  $ lquota
 +
+-----------------------------+-------------+------------------+------------------+------------------+
 +
| Storage location:          | Quota type: | Used:            | Soft quota:      | Hard quota:      |
 +
+-----------------------------+-------------+------------------+------------------+------------------+
 +
| /cluster/home/jarunanp      | space      |        10.38 GB |        17.18 GB |        21.47 GB |
 +
| /cluster/home/jarunanp      | files      |            85658 |          160000 |          200000 |
 +
+-----------------------------+-------------+------------------+------------------+------------------+
 +
| /cluster/shadow            | space      |        16.38 kB |          2.15 GB |          2.15 GB |
 +
| /cluster/shadow            | files      |                7 |            50000 |            50000 |
 +
+-----------------------------+-------------+------------------+------------------+------------------+
 +
| /cluster/scratch/jarunanp  | space      |          2.42 TB |          2.50 TB |          2.70 TB |
 +
| /cluster/scratch/jarunanp  | files      |          201844 |          1000000 |          1500000 |
 +
+-----------------------------+-------------+------------------+------------------+------------------+
  
However, if there are several users of AlphaFold in your group, institute or department, we recommend to use a group storage.
+
* Create a folder for the databases
 +
$ cd $SCRATCH
 +
$ mkdir alphafold_databases
  
For D-BIOL members, the AlphaFold databases is currently located at /cluster/work/biol/alphafold.  
+
* Download the databases: you can call a script to download all the databases or call a script for each databases. These scripts are in the same directory $ALPHAFOLD_ROOT/scripts/.  
  
 +
$ bsub -W 24:00 "$ALPHAFOLD_ROOT/scripts/download_all_data.sh $SCRATCH/alphafold_databases"
  
 
== Submit a job ==
 
== Submit a job ==

Revision as of 18:33, 6 December 2021

< Examples

Load modules

AlphaFold2 is installed in the new software stack can be loaded as following.

$ env2lmod
$ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1
Now run 'alphafold_init' to initialize the virtual environment

The following have been reloaded with a version change:
  1) gcc/4.8.5 => gcc/6.3.0

$ alphafold_init
(venv_alphafold) [jarunanp@eu-login-18 ~]$ 

Databases

The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH. However, if there are several users of AlphaFold in your group, institute or department, we recommend to use a group storage.

For D-BIOL members, the AlphaFold databases are currently located at /cluster/work/biol/alphafold.

Download the AlphaFold databases to your $SCRATCH

  • Download and install aria2c in your $HOME
$ cd $HOME
$ wget https://github.com/aria2/aria2/releases/download/release-1.36.0/aria2-1.36.0.tar.gz
$ tar xvzf aria2-1.36.0.tar.gz
$ cd aria2-1.36.0
$ module load gcc/6.3.0 gnutls/3.5.13 openssl/1.0.1e
$ ./configure --prefix=$HOME/.local
$ make
$ make install
$ export PATH="$HOME/.local/bin:$PATH"
$ which aria2c
~/.local/bin/aria2c
  • Check if you have enough space in your $SCRATCH. You may need to free up your $SCRATCH in case there is not enough space.
$ lquota
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/home/jarunanp      | space       |         10.38 GB |         17.18 GB |         21.47 GB |
| /cluster/home/jarunanp      | files       |            85658 |           160000 |           200000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/shadow             | space       |         16.38 kB |          2.15 GB |          2.15 GB |
| /cluster/shadow             | files       |                7 |            50000 |            50000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/scratch/jarunanp   | space       |          2.42 TB |          2.50 TB |          2.70 TB |
| /cluster/scratch/jarunanp   | files       |           201844 |          1000000 |          1500000 |
+-----------------------------+-------------+------------------+------------------+------------------+
  • Create a folder for the databases
$ cd $SCRATCH
$ mkdir alphafold_databases
  • Download the databases: you can call a script to download all the databases or call a script for each databases. These scripts are in the same directory $ALPHAFOLD_ROOT/scripts/.
$ bsub -W 24:00 "$ALPHAFOLD_ROOT/scripts/download_all_data.sh $SCRATCH/alphafold_databases"

Submit a job

Here is an example of a job submission script which requests 12 CPU cores, in total 120GB of memory, in total 120GB of local scratch space and one GPU.

#!/usr/bin/bash
#BSUB -n 12
#BSUB -W 24:00
#BSUB -R "rusage[mem=10000, scratch=10000, ngpus_excl_p=1]"
#BSUB -J alphafold

source /cluster/apps/local/env2lmod.sh
module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1
source /cluster/apps/nss/alphafold/venv_alphafold/bin/activate

# Define paths to databases
DATA_DIR="/cluster/scratch/jarunanp/21_10_alphafold_databases"

python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$TMPDIR \
--max_template_date="2021-12-06" \
--bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \
--uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path=$DATA_DIR/pdb70/pdb70 \
--template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \
--fasta_paths=ubiquitin.fasta

# Copy the results from the compute node
mkdir -p output
cp -r $TMPDIR/* output

A small example as ubiquitin.fasta took around 40 minutes to finish for the databases stored on $SCRATCH. The screen output is saved in the output file named starting with lsf.o followed by the JobID, e.g., lsf.o195525946.

< Examples