Difference between revisions of "AlphaFold2"
Line 4: | Line 4: | ||
== Load modules == | == Load modules == | ||
[https://github.com/deepmind/alphafold AlphaFold2] is installed in the new software stack can be loaded as following. | [https://github.com/deepmind/alphafold AlphaFold2] is installed in the new software stack can be loaded as following. | ||
− | + | $ env2lmod | |
+ | $ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 | ||
Now run 'alphafold_init' to initialize the virtual environment | Now run 'alphafold_init' to initialize the virtual environment | ||
Line 10: | Line 11: | ||
1) gcc/4.8.5 => gcc/6.3.0 | 1) gcc/4.8.5 => gcc/6.3.0 | ||
− | + | $ alphafold_init | |
(venv_alphafold) [jarunanp@eu-login-18 ~]$ | (venv_alphafold) [jarunanp@eu-login-18 ~]$ | ||
== Databases == | == Databases == | ||
− | The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH if | + | The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH. However, if there are several users of AlphaFold in your group, institute or department, we recommend to use a group storage. |
+ | For D-BIOL members, the AlphaFold databases are currently located at /cluster/work/biol/alphafold. | ||
+ | |||
+ | == Download the AlphaFold databases to your $SCRATCH == | ||
+ | * Download and install aria2c in your $HOME | ||
+ | $ cd $HOME | ||
+ | $ wget https://github.com/aria2/aria2/releases/download/release-1.36.0/aria2-1.36.0.tar.gz | ||
+ | $ tar xvzf aria2-1.36.0.tar.gz | ||
+ | $ cd aria2-1.36.0 | ||
+ | $ module load gcc/6.3.0 gnutls/3.5.13 openssl/1.0.1e | ||
+ | $ ./configure --prefix=$HOME/.local | ||
+ | $ make | ||
+ | $ make install | ||
+ | $ export PATH="$HOME/.local/bin:$PATH" | ||
+ | $ which aria2c | ||
+ | ~/.local/bin/aria2c | ||
+ | |||
+ | * Check if you have enough space in your $SCRATCH. You may need to free up your $SCRATCH in case there is not enough space. | ||
$ lquota | $ lquota | ||
+ | +-----------------------------+-------------+------------------+------------------+------------------+ | ||
+ | | Storage location: | Quota type: | Used: | Soft quota: | Hard quota: | | ||
+ | +-----------------------------+-------------+------------------+------------------+------------------+ | ||
+ | | /cluster/home/jarunanp | space | 10.38 GB | 17.18 GB | 21.47 GB | | ||
+ | | /cluster/home/jarunanp | files | 85658 | 160000 | 200000 | | ||
+ | +-----------------------------+-------------+------------------+------------------+------------------+ | ||
+ | | /cluster/shadow | space | 16.38 kB | 2.15 GB | 2.15 GB | | ||
+ | | /cluster/shadow | files | 7 | 50000 | 50000 | | ||
+ | +-----------------------------+-------------+------------------+------------------+------------------+ | ||
+ | | /cluster/scratch/jarunanp | space | 2.42 TB | 2.50 TB | 2.70 TB | | ||
+ | | /cluster/scratch/jarunanp | files | 201844 | 1000000 | 1500000 | | ||
+ | +-----------------------------+-------------+------------------+------------------+------------------+ | ||
− | + | * Create a folder for the databases | |
+ | $ cd $SCRATCH | ||
+ | $ mkdir alphafold_databases | ||
− | + | * Download the databases: you can call a script to download all the databases or call a script for each databases. These scripts are in the same directory $ALPHAFOLD_ROOT/scripts/. | |
+ | $ bsub -W 24:00 "$ALPHAFOLD_ROOT/scripts/download_all_data.sh $SCRATCH/alphafold_databases" | ||
== Submit a job == | == Submit a job == |
Revision as of 18:33, 6 December 2021
< Examples |
Load modules
AlphaFold2 is installed in the new software stack can be loaded as following.
$ env2lmod $ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 Now run 'alphafold_init' to initialize the virtual environment The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0 $ alphafold_init (venv_alphafold) [jarunanp@eu-login-18 ~]$
Databases
The AlphaFold databases has the total size when unzipped of 2.2 TB. Users can download the databases to $SCRATCH. However, if there are several users of AlphaFold in your group, institute or department, we recommend to use a group storage.
For D-BIOL members, the AlphaFold databases are currently located at /cluster/work/biol/alphafold.
Download the AlphaFold databases to your $SCRATCH
- Download and install aria2c in your $HOME
$ cd $HOME $ wget https://github.com/aria2/aria2/releases/download/release-1.36.0/aria2-1.36.0.tar.gz $ tar xvzf aria2-1.36.0.tar.gz $ cd aria2-1.36.0 $ module load gcc/6.3.0 gnutls/3.5.13 openssl/1.0.1e $ ./configure --prefix=$HOME/.local $ make $ make install $ export PATH="$HOME/.local/bin:$PATH" $ which aria2c ~/.local/bin/aria2c
- Check if you have enough space in your $SCRATCH. You may need to free up your $SCRATCH in case there is not enough space.
$ lquota +-----------------------------+-------------+------------------+------------------+------------------+ | Storage location: | Quota type: | Used: | Soft quota: | Hard quota: | +-----------------------------+-------------+------------------+------------------+------------------+ | /cluster/home/jarunanp | space | 10.38 GB | 17.18 GB | 21.47 GB | | /cluster/home/jarunanp | files | 85658 | 160000 | 200000 | +-----------------------------+-------------+------------------+------------------+------------------+ | /cluster/shadow | space | 16.38 kB | 2.15 GB | 2.15 GB | | /cluster/shadow | files | 7 | 50000 | 50000 | +-----------------------------+-------------+------------------+------------------+------------------+ | /cluster/scratch/jarunanp | space | 2.42 TB | 2.50 TB | 2.70 TB | | /cluster/scratch/jarunanp | files | 201844 | 1000000 | 1500000 | +-----------------------------+-------------+------------------+------------------+------------------+
- Create a folder for the databases
$ cd $SCRATCH $ mkdir alphafold_databases
- Download the databases: you can call a script to download all the databases or call a script for each databases. These scripts are in the same directory $ALPHAFOLD_ROOT/scripts/.
$ bsub -W 24:00 "$ALPHAFOLD_ROOT/scripts/download_all_data.sh $SCRATCH/alphafold_databases"
Submit a job
Here is an example of a job submission script which requests 12 CPU cores, in total 120GB of memory, in total 120GB of local scratch space and one GPU.
#!/usr/bin/bash #BSUB -n 12 #BSUB -W 24:00 #BSUB -R "rusage[mem=10000, scratch=10000, ngpus_excl_p=1]" #BSUB -J alphafold source /cluster/apps/local/env2lmod.sh module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 source /cluster/apps/nss/alphafold/venv_alphafold/bin/activate # Define paths to databases DATA_DIR="/cluster/scratch/jarunanp/21_10_alphafold_databases" python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ --data_dir=$DATA_DIR \ --output_dir=$TMPDIR \ --max_template_date="2021-12-06" \ --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \ --uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path=$DATA_DIR/pdb70/pdb70 \ --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \ --fasta_paths=ubiquitin.fasta # Copy the results from the compute node mkdir -p output cp -r $TMPDIR/* output
A small example as ubiquitin.fasta took around 40 minutes to finish for the databases stored on $SCRATCH. The screen output is saved in the output file named starting with lsf.o followed by the JobID, e.g., lsf.o195525946.
< Examples |