AlphaFold2
< Examples |
AlphaFold2 predicts a protein's 3D folding structure by its amino acid sequence with the accuracy that is competitive with experimental results. This AI-powered structure prediction of AlphaFold2 has been recognized as the scientific breakthrough of the year 2021. The AlphaFold package is now installed in the new software stack on Euler.
Contents
Changelog
23/07/2024 - AlphaPulldown is available on Euler as a container in /cluster/apps/nss/alphafold/containers/AlphaPulldown
15/04/2024 - AlphaFold 2.3.2 is available on Euler as a container in /cluster/apps/nss/alphafold/alphafold-2.3.2
12/09/2023 - Branch for the new script using AlphaFold 2.3.1 merged with main branch and available for all users
3/08/2023 - Uniref90 has been updated
25/07/2023 - New branch of the [ https://gitlab.ethz.ch/sis/alphafold_on_euler alphafold helper script] is currently being tested. This branch uses AlphaFold 2.3.1 and is fully migrated to SLURM.
20/07/2023 - Updated bfd, mgnify, pdb, uniprot and uniref30 databases. Uniref90 is in the process of being updated
17/07/2023 - AlphaFold 2.3.1 is available on Euler. Release notes are available here
Create a job script
A job script is a BASH script containing commands to request computing resources, set up the computing environment, run the application and retrieve the results. You can generate this script by using our custom script available here.
Setup script
This setup script creates a job script with estimate computing resources depending on the input protein sequence. To download the setup script:
git clone https://gitlab.ethz.ch/sis/alphafold_on_euler.git
This script uses a containerised version of Alphafold 2.3.2. If older versions of alphafold are needed, please contact cluster support. Here is a quick rundown on how to use the script :
git clone https://gitlab.ethz.ch/sis/alphafold_on_euler cd ./alphafold_on_euler/setup_run_script_AF2.3.2 module load stack/2024-06 gcc/12.2.0 python/3.11.6 #any reasonably recent version of python 3 would do
To display the full list of options :
python generate_SLURM_script.py --help
General usage :
python generate_SLURM_script.py -f [path ot fastafile] -o [output/working directory] -s [your share for GPU usage]
A simple example :
[nmarounina@eu-login-43 setup_run_script_container]$ python generate_SLURM_script.py -f ../fastafiles/Ubiquitin.fasta -o /cluster/scratch/nmarounina -s es_hpc -c 8 Estimate required resources, please adjust as needed in the final script: Run time: 04:00:00 (hh:mm:ss) Number of CPUs: 8 CPU memory per CPU: 30 (GB) Number of GPUs: 1 Total GPU memory: 11 (GB) Total scratch space: 120 (GB) Output directory of the script : /cluster/scratch/nmarounina /cluster/scratch/nmarounina/Ubiquitin.sbatch
For SLURM, submit the script with the command
$ sbatch < run_alphafold.sbatch
Databases
The AlphaFold databases are available for all cluster users at /cluster/project/alphafold.
If you wish to download databases separately, you can see the instruction here.
Expected outputs
The Ubiquitin fastafile is provided with the AlphaFold setup script. It can be used to test AlphaFold2 on Euler. If the working directory is on $SCRATCH, a successful run would complete in ~40 min (depending on the type of resources allocated by the batch system) and generate the following files :
Ubiquitin.done Ubiquitin.out Ubiquitin.err
Ubiquitin ├── features.pkl ├── msas │ ├── bfd_uniclust_hits.a3m │ ├── mgnify_hits.sto │ ├── pdb_hits.hhr │ └── uniref90_hits.sto ├── ranked_0.pdb ├── ranked_1.pdb ├── ranked_2.pdb ├── ranked_3.pdb ├── ranked_4.pdb ├── ranking_debug.json ├── relaxed_model_1_pred_0.pdb ├── relaxed_model_2_pred_0.pdb ├── relaxed_model_3_pred_0.pdb ├── relaxed_model_4_pred_0.pdb ├── relaxed_model_5_pred_0.pdb ├── result_model_1_pred_0.pkl ├── result_model_2_pred_0.pkl ├── result_model_3_pred_0.pkl ├── result_model_4_pred_0.pkl ├── result_model_5_pred_0.pkl ├── timings.json ├── unrelaxed_model_1_pred_0.pdb ├── unrelaxed_model_2_pred_0.pdb ├── unrelaxed_model_3_pred_0.pdb ├── unrelaxed_model_4_pred_0.pdb └── unrelaxed_model_5_pred_0.pdb
Further readings
- DeepMind Blog post: "AlphaFold: a solution to a 50-year-old grand challenge in biology"
- ETH News: "Computer algorithms are currently revolutionising biology"
- AlphaFold2 presentation slides 21 March 2022
- AlphaFold2 workshop 17 January 2024
- Downloading AlphaFold databases and benchmark results
< Examples |