Difference between revisions of "AlphaFold2"
Nmarounina (talk | contribs) |
|||
(53 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
{{back_to_tutorials}} | {{back_to_tutorials}} | ||
− | + | [https://deepmind.com/research/case-studies/alphafold AlphaFold2] predicts a protein's 3D folding structure by its amino acid sequence with [https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology the accuracy that is competitive with experimental results]. This AI-powered structure prediction of AlphaFold2 has been recognized as [https://www.science.org/content/article/breakthrough-2021#section_breakthrough the scientific breakthrough of the year 2021]. [https://github.com/deepmind/alphafold The AlphaFold package] is now installed in the new software stack on Euler. | |
− | [https://github.com/deepmind/alphafold | + | |
+ | <!-- == Load modules == | ||
+ | The AlphaFold module can be loaded as following. | ||
$ env2lmod | $ env2lmod | ||
$ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 | $ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 | ||
Line 13: | Line 14: | ||
$ alphafold_init | $ alphafold_init | ||
(venv_alphafold) [jarunanp@eu-login-18 ~]$ | (venv_alphafold) [jarunanp@eu-login-18 ~]$ | ||
+ | --> | ||
− | == | + | == Changelog == |
− | + | 12/09/2023 - Branch for the new script using AlphaFold 2.3.1 merged with main branch and available for all users | |
− | + | 3/08/2023 - Uniref90 has been updated | |
− | + | 25/07/2023 - New branch of the [ https://gitlab.ethz.ch/sis/alphafold_on_euler alphafold helper script] is currently being tested. This branch uses AlphaFold 2.3.1 and is fully migrated to SLURM. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 20/07/2023 - Updated bfd, mgnify, pdb, uniprot and uniref30 databases. Uniref90 is in the process of being updated | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 17/07/2023 - AlphaFold 2.3.1 is available on Euler. Release notes are available [https://github.com/deepmind/alphafold/releases/tag/v2.3.1 here] | |
− | |||
− | |||
− | + | == Create a job script == | |
+ | A job script is a BASH script containing commands to request computing resources, set up the computing environment, run the application and retrieve the results. | ||
+ | Here we propose a breakdown of a typical job script for Alphafold2 on Euler. Please note that you can generate this script by using our custom script available [https://gitlab.ethz.ch/sis/alphafold_on_euler here]. | ||
+ | |||
+ | === Request computing resources === | ||
− | + | AlphaFold2 can run with CPUs only, or with CPUs and GPUs which helps speed up the computation significantly. Here we request 8 CPU cores, in total 240GB of memory, 120GB of local scratch space and one GPU. Your SLURM script should start with #!/usr/bin/bash (the shebang) and the #SBATCH pragmas, that detail, line by line, which resources you would like to request for your alphafold run : | |
− | == | + | #!/usr/bin/bash |
− | + | #SBATCH -n 8 # Number of CPUs | |
+ | #SBATCH --time=24:00:00 # Runtime | ||
+ | #SBATCH --mem-per-cpu=30000 # CPU memory per CPU core | ||
+ | #SBATCH --nodes=1 # All CPUs in the same host | ||
+ | #SBATCH -G 1 # Number of GPUs | ||
+ | #SBATCH --gres=gpumem:10240 # GPU memory | ||
+ | #SBATCH --tmp=120000 # Scratch space per CPU core | ||
+ | #SBATCH -A es_share # Shareholder group name | ||
+ | #SBATCH -J alphafold # Job name | ||
+ | |||
+ | <!-- For LSF : | ||
#!/usr/bin/bash | #!/usr/bin/bash | ||
− | #BSUB -n 12 | + | #BSUB -n 12 # Number of CPUs |
− | #BSUB -W | + | #BSUB -W 24:00 # Runtime |
− | #BSUB -R "rusage[mem=10000, scratch=10000 | + | #BSUB -R "rusage[mem=10000, scratch=10000]" # CPU memory and scratch space per CPU core |
− | #BSUB -J alphafold | + | #BSUB -R "rusage[ngpus_excl_p=1] select[gpu_mtotal0>=10240]" # Number of GPUs and GPU memory |
− | + | #BSUB -R "span[hosts=1]" # All CPUs in the same host | |
+ | #BSUB -J alphafold # Job name | ||
+ | --> | ||
+ | |||
+ | === Set up a computing environment for AlphaFold === | ||
source /cluster/apps/local/env2lmod.sh | source /cluster/apps/local/env2lmod.sh | ||
− | module load gcc/6.3.0 openmpi/4.0.2 alphafold/2. | + | module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.3.1 |
− | source /cluster/apps/nss/alphafold/ | + | source /cluster/apps/nss/alphafold/venv_alphafold_2.3.1/bin/activate |
− | + | ||
− | # Define paths to databases | + | === Enable Unified Memory (if needed) === |
− | DATA_DIR="/cluster/ | + | If the input protein sequence is too large for a single GPU memory (approximately larger than 1500aa), enable Unified Memory to bridge the system memory to the GPU memory so that you can oversubscribe the GPU memory of a single GPU. |
− | + | ||
+ | export TF_FORCE_UNIFIED_MEMORY=1 | ||
+ | export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0" | ||
+ | |||
+ | === Define paths === | ||
+ | # Define paths to databases, fasta file and output directory | ||
+ | DATA_DIR="/cluster/project/alphafold" #Path to all of the alphafold databases on the cluster | ||
+ | FASTA_DIR="/cluster/home/jarunanp/fastafiles" #Path to where the fastafile is stored | ||
+ | OUTPUT_DIR=${TMPDIR}/output #Path to the immediate output of the run (in the automatically-generated script it would be the local scratch) | ||
+ | |||
+ | For the output directory, there are two options. | ||
+ | * Use $SCRATCH (max 2.7TB), $HOME (max. 20GB) or group storage (/cluster/project or /cluster/work), e.g., | ||
+ | OUTPUT_DIR=${SCRATCH}/protein_name/output | ||
+ | |||
+ | * Use the local scratch as the output directory. To do so, request the scratch space with #SBATCH options (e.g., in this example we are requesting 120GB local scratch space in total using the --tmp option). At the end of the computation, don't forget to copy the result from there. | ||
+ | |||
+ | OUTPUT_DIR=${TMPDIR}/output | ||
+ | ... | ||
+ | python /path/run_alphafold.py ... | ||
+ | ... | ||
+ | cp ${TMPDIR}/output /to/desired/location | ||
+ | or | ||
+ | rsync -av $TMPDIR/output/ /to/desired/location | ||
+ | |||
+ | <!-- === Start Multi-Process Service on GPU (version >= 2.1.2, only for LSF) === | ||
+ | From the version 2.1.2, it is possible to enable running relaxation on GPU with the option --use_gpu_relax=1. This option will try to create multiple contexts on the GPU but, for LSF, the default GPU computing mode is exclusive and does not allow creating multiple contexts. This can be circumvented by starting [https://docs.nvidia.com/deploy/mps/index.html Multi-Process Service] with the command | ||
+ | |||
+ | nvidia-cuda-mps-control -d | ||
+ | |||
+ | For SLURM the default computing mode allows the creation of multiple contexts on GPUs, therefore the use of this option will be redundant. --> | ||
+ | |||
+ | === Call Python run script === | ||
python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ | python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ | ||
--data_dir=$DATA_DIR \ | --data_dir=$DATA_DIR \ | ||
− | --output_dir=$ | + | --output_dir=$OUTPUT_DIR \ |
--max_template_date="2021-12-06" \ | --max_template_date="2021-12-06" \ | ||
--bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ | --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ | ||
Line 80: | Line 104: | ||
--uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ | --uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ | ||
--mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ | --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ | ||
− | |||
--template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ | --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ | ||
--obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \ | --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \ | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | Submit a job with the command | + | Then, define the input fasta file, select the model preset (monomer or multimer) and define the path to structure databases accordingly. |
− | $ bsub < | + | * For a monomeric protein |
+ | --fasta_paths=$FASTA_DIR/some_protein.fasta \ | ||
+ | --model_preset=monomer \ | ||
+ | --pdb70_database_path=$DATA_DIR/pdb70/pdb70 | ||
+ | |||
+ | * For a multimeric protein | ||
+ | --fasta_paths=$FASTA_DIR/some_complicated_protein.fasta \ | ||
+ | --model_preset=multimer \ | ||
+ | --pdb_seqres_database_path=$DATA_DIR/pdb_seqres/pdb_seqres.txt \ | ||
+ | --uniprot_database_path=$DATA_DIR/uniprot/uniprot.fasta | ||
+ | |||
+ | ''' Enable relaxation on GPU (version >= 2.1.2)'''<br> | ||
+ | In this version, it is possible to enable running relaxation on GPU with the option --use_gpu_relax. Please see above how to start MPS to use this option. | ||
+ | --use_gpu_relax=1 | ||
+ | |||
+ | <!-- === Disable Multi-Process Service (version >= 2.1.2, only for LSF) === | ||
+ | If MPS is enabled before running AlphaFold, disable MPS with the command | ||
+ | |||
+ | echo quit | nvidia-cuda-mps-control --> | ||
+ | |||
+ | == Submit a job == | ||
+ | For SLURM, submit a job with the command | ||
+ | $ sbatch < run_alphafold.sbatch | ||
+ | The screen output will be save in the slurm-'''JobID'''.out file, e.g slurm-3435300.out, unless other names for the standard output/error files has been defined with #SBATCH pragmas at the beginning of the script. | ||
+ | <!-- For LSF, submit a job with the command | ||
+ | $ bsub < run_alphafold.bsub --> | ||
+ | |||
+ | From [[Downloading_Alphafold_databases#Benchmark_results|our benchmark]], it took around 40 minutes to fold Ubiquitin[76aa] and 2.5 hours to fold T1050[779aa]. | ||
+ | |||
+ | == Setup script == | ||
+ | |||
+ | This setup script creates a job script with estimate computing resources depending on the input protein sequence. To download the setup script: | ||
+ | |||
+ | git clone https://gitlab.ethz.ch/sis/alphafold_on_euler.git | ||
+ | |||
+ | Usage: | ||
+ | |||
+ | ./setup_alphafold_run_script.sh -f [Fasta file] -w [work directory] --max_template_date yyyy-mm-dd -b [LSF/SLURM] | ||
+ | |||
+ | Example: | ||
+ | |||
+ | $ ./setup_alphafold_run_script.sh -f ../../fastafiles/IFGSC_6mer.fasta -w $SCRATCH | ||
+ | Reading /cluster/home/jarunanp/alphafold_run/fastafiles/IFGSC_6mer.fasta | ||
+ | Protein name: IFGSC_6mer | ||
+ | Number of sequences: 6 | ||
+ | Protein type: multimer | ||
+ | Number of amino acids: | ||
+ | sum: 1246 | ||
+ | max: 242 | ||
+ | Estimate required resources: | ||
+ | Run time: 24:00 | ||
+ | Number of CPUs: 12 | ||
+ | Total CPU memory: 120000 | ||
+ | Number of GPUs: 1 | ||
+ | Total GPU memory: 20480 | ||
+ | Total scratch space: 120000 | ||
+ | Output an LSF run script for AlphaFold2: /cluster/scratch/jarunanp/run_alphafold.bsub | ||
+ | |||
+ | For SLURM, submit the script with the command | ||
+ | $ sbatch < run_alphafold.sbatch | ||
+ | |||
+ | <!-- For LSF, submit the script with the command | ||
+ | $ bsub < run_alphafold.bsub --> | ||
+ | |||
+ | == Postprocessing == | ||
+ | |||
+ | Similar plots as generated by the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb Colabfold jupyter notebook] can be created by the [https://gitlab.ethz.ch/sis/alphafold-postprocessing alphafold-postprocessing python script]. | ||
+ | It is available on Euler as a module | ||
+ | module load gcc/6.3.0 alphafold-postprocessing | ||
+ | postprocessing.py -o plots/ work_directory/ | ||
− | The | + | The above command will process ''pkl'' files generated by ''alphafold'' in the folder ''work_directory/'' and put the resulting plots into a folder ''plots/''. |
− | + | The postprocessing is integrated in the setup script described above. | |
− | == | + | == Databases == |
− | + | The AlphaFold databases are available for all cluster users at '''/cluster/project/alphafold'''. | |
− | + | If you wish to download databases separately, you can see the instruction [[Downloading Alphafold databases|here]]. | |
− | |||
− | |||
− | |||
− | + | == Example == | |
− | + | The Ubiquitin fastafile is provided with the AlphaFold setup script. It can be used to test AlphaFold2 on Euler. If the working directory is on $SCRATCH, a successful run would complete in ~40 min (depending on the type of resources allocated by the batch system) and generate the following files : | |
+ | Ubiquitin.done | ||
+ | Ubiquitin.out | ||
+ | Ubiquitin.err | ||
− | + | Ubiquitin | |
+ | ├── features.pkl | ||
+ | ├── msas | ||
+ | │ ├── bfd_uniclust_hits.a3m | ||
+ | │ ├── mgnify_hits.sto | ||
+ | │ ├── pdb_hits.hhr | ||
+ | │ └── uniref90_hits.sto | ||
+ | ├── ranked_0.pdb | ||
+ | ├── ranked_1.pdb | ||
+ | ├── ranked_2.pdb | ||
+ | ├── ranked_3.pdb | ||
+ | ├── ranked_4.pdb | ||
+ | ├── ranking_debug.json | ||
+ | ├── relaxed_model_1_pred_0.pdb | ||
+ | ├── relaxed_model_2_pred_0.pdb | ||
+ | ├── relaxed_model_3_pred_0.pdb | ||
+ | ├── relaxed_model_4_pred_0.pdb | ||
+ | ├── relaxed_model_5_pred_0.pdb | ||
+ | ├── result_model_1_pred_0.pkl | ||
+ | ├── result_model_2_pred_0.pkl | ||
+ | ├── result_model_3_pred_0.pkl | ||
+ | ├── result_model_4_pred_0.pkl | ||
+ | ├── result_model_5_pred_0.pkl | ||
+ | ├── timings.json | ||
+ | ├── unrelaxed_model_1_pred_0.pdb | ||
+ | ├── unrelaxed_model_2_pred_0.pdb | ||
+ | ├── unrelaxed_model_3_pred_0.pdb | ||
+ | ├── unrelaxed_model_4_pred_0.pdb | ||
+ | └── unrelaxed_model_5_pred_0.pdb | ||
== Further readings == | == Further readings == | ||
* [https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology DeepMind Blog post: "AlphaFold: a solution to a 50-year-old grand challenge in biology"] | * [https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology DeepMind Blog post: "AlphaFold: a solution to a 50-year-old grand challenge in biology"] | ||
* [https://ethz.ch/en/news-and-events/eth-news/news/2021/08/computer-algorithms-revolutionise-biology.html ETH News: "Computer algorithms are currently revolutionising biology"] | * [https://ethz.ch/en/news-and-events/eth-news/news/2021/08/computer-algorithms-revolutionise-biology.html ETH News: "Computer algorithms are currently revolutionising biology"] | ||
+ | * [[AlphaFold2_presentation_21_March_2022#Slides | AlphaFold2 presentation slides 21 March 2022]] | ||
+ | * [[Downloading_Alphafold_databases| Downloading AlphaFold databases and benchmark results]] | ||
{{back_to_tutorials}} | {{back_to_tutorials}} |
Latest revision as of 10:36, 28 September 2023
< Examples |
AlphaFold2 predicts a protein's 3D folding structure by its amino acid sequence with the accuracy that is competitive with experimental results. This AI-powered structure prediction of AlphaFold2 has been recognized as the scientific breakthrough of the year 2021. The AlphaFold package is now installed in the new software stack on Euler.
Contents
Changelog
12/09/2023 - Branch for the new script using AlphaFold 2.3.1 merged with main branch and available for all users
3/08/2023 - Uniref90 has been updated
25/07/2023 - New branch of the [ https://gitlab.ethz.ch/sis/alphafold_on_euler alphafold helper script] is currently being tested. This branch uses AlphaFold 2.3.1 and is fully migrated to SLURM.
20/07/2023 - Updated bfd, mgnify, pdb, uniprot and uniref30 databases. Uniref90 is in the process of being updated
17/07/2023 - AlphaFold 2.3.1 is available on Euler. Release notes are available here
Create a job script
A job script is a BASH script containing commands to request computing resources, set up the computing environment, run the application and retrieve the results. Here we propose a breakdown of a typical job script for Alphafold2 on Euler. Please note that you can generate this script by using our custom script available here.
Request computing resources
AlphaFold2 can run with CPUs only, or with CPUs and GPUs which helps speed up the computation significantly. Here we request 8 CPU cores, in total 240GB of memory, 120GB of local scratch space and one GPU. Your SLURM script should start with #!/usr/bin/bash (the shebang) and the #SBATCH pragmas, that detail, line by line, which resources you would like to request for your alphafold run :
#!/usr/bin/bash #SBATCH -n 8 # Number of CPUs #SBATCH --time=24:00:00 # Runtime #SBATCH --mem-per-cpu=30000 # CPU memory per CPU core #SBATCH --nodes=1 # All CPUs in the same host #SBATCH -G 1 # Number of GPUs #SBATCH --gres=gpumem:10240 # GPU memory #SBATCH --tmp=120000 # Scratch space per CPU core #SBATCH -A es_share # Shareholder group name #SBATCH -J alphafold # Job name
Set up a computing environment for AlphaFold
source /cluster/apps/local/env2lmod.sh module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.3.1 source /cluster/apps/nss/alphafold/venv_alphafold_2.3.1/bin/activate
Enable Unified Memory (if needed)
If the input protein sequence is too large for a single GPU memory (approximately larger than 1500aa), enable Unified Memory to bridge the system memory to the GPU memory so that you can oversubscribe the GPU memory of a single GPU.
export TF_FORCE_UNIFIED_MEMORY=1 export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0"
Define paths
# Define paths to databases, fasta file and output directory DATA_DIR="/cluster/project/alphafold" #Path to all of the alphafold databases on the cluster FASTA_DIR="/cluster/home/jarunanp/fastafiles" #Path to where the fastafile is stored OUTPUT_DIR=${TMPDIR}/output #Path to the immediate output of the run (in the automatically-generated script it would be the local scratch)
For the output directory, there are two options.
- Use $SCRATCH (max 2.7TB), $HOME (max. 20GB) or group storage (/cluster/project or /cluster/work), e.g.,
OUTPUT_DIR=${SCRATCH}/protein_name/output
- Use the local scratch as the output directory. To do so, request the scratch space with #SBATCH options (e.g., in this example we are requesting 120GB local scratch space in total using the --tmp option). At the end of the computation, don't forget to copy the result from there.
OUTPUT_DIR=${TMPDIR}/output ... python /path/run_alphafold.py ... ... cp ${TMPDIR}/output /to/desired/location or rsync -av $TMPDIR/output/ /to/desired/location
Call Python run script
python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ --data_dir=$DATA_DIR \ --output_dir=$OUTPUT_DIR \ --max_template_date="2021-12-06" \ --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \ --uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \
Then, define the input fasta file, select the model preset (monomer or multimer) and define the path to structure databases accordingly.
- For a monomeric protein
--fasta_paths=$FASTA_DIR/some_protein.fasta \ --model_preset=monomer \ --pdb70_database_path=$DATA_DIR/pdb70/pdb70
- For a multimeric protein
--fasta_paths=$FASTA_DIR/some_complicated_protein.fasta \ --model_preset=multimer \ --pdb_seqres_database_path=$DATA_DIR/pdb_seqres/pdb_seqres.txt \ --uniprot_database_path=$DATA_DIR/uniprot/uniprot.fasta
Enable relaxation on GPU (version >= 2.1.2)
In this version, it is possible to enable running relaxation on GPU with the option --use_gpu_relax. Please see above how to start MPS to use this option.
--use_gpu_relax=1
Submit a job
For SLURM, submit a job with the command
$ sbatch < run_alphafold.sbatch
The screen output will be save in the slurm-JobID.out file, e.g slurm-3435300.out, unless other names for the standard output/error files has been defined with #SBATCH pragmas at the beginning of the script.
From our benchmark, it took around 40 minutes to fold Ubiquitin[76aa] and 2.5 hours to fold T1050[779aa].
Setup script
This setup script creates a job script with estimate computing resources depending on the input protein sequence. To download the setup script:
git clone https://gitlab.ethz.ch/sis/alphafold_on_euler.git
Usage:
./setup_alphafold_run_script.sh -f [Fasta file] -w [work directory] --max_template_date yyyy-mm-dd -b [LSF/SLURM]
Example:
$ ./setup_alphafold_run_script.sh -f ../../fastafiles/IFGSC_6mer.fasta -w $SCRATCH Reading /cluster/home/jarunanp/alphafold_run/fastafiles/IFGSC_6mer.fasta Protein name: IFGSC_6mer Number of sequences: 6 Protein type: multimer Number of amino acids: sum: 1246 max: 242 Estimate required resources: Run time: 24:00 Number of CPUs: 12 Total CPU memory: 120000 Number of GPUs: 1 Total GPU memory: 20480 Total scratch space: 120000 Output an LSF run script for AlphaFold2: /cluster/scratch/jarunanp/run_alphafold.bsub
For SLURM, submit the script with the command
$ sbatch < run_alphafold.sbatch
Postprocessing
Similar plots as generated by the Colabfold jupyter notebook can be created by the alphafold-postprocessing python script. It is available on Euler as a module
module load gcc/6.3.0 alphafold-postprocessing postprocessing.py -o plots/ work_directory/
The above command will process pkl files generated by alphafold in the folder work_directory/ and put the resulting plots into a folder plots/.
The postprocessing is integrated in the setup script described above.
Databases
The AlphaFold databases are available for all cluster users at /cluster/project/alphafold.
If you wish to download databases separately, you can see the instruction here.
Example
The Ubiquitin fastafile is provided with the AlphaFold setup script. It can be used to test AlphaFold2 on Euler. If the working directory is on $SCRATCH, a successful run would complete in ~40 min (depending on the type of resources allocated by the batch system) and generate the following files :
Ubiquitin.done Ubiquitin.out Ubiquitin.err
Ubiquitin ├── features.pkl ├── msas │ ├── bfd_uniclust_hits.a3m │ ├── mgnify_hits.sto │ ├── pdb_hits.hhr │ └── uniref90_hits.sto ├── ranked_0.pdb ├── ranked_1.pdb ├── ranked_2.pdb ├── ranked_3.pdb ├── ranked_4.pdb ├── ranking_debug.json ├── relaxed_model_1_pred_0.pdb ├── relaxed_model_2_pred_0.pdb ├── relaxed_model_3_pred_0.pdb ├── relaxed_model_4_pred_0.pdb ├── relaxed_model_5_pred_0.pdb ├── result_model_1_pred_0.pkl ├── result_model_2_pred_0.pkl ├── result_model_3_pred_0.pkl ├── result_model_4_pred_0.pkl ├── result_model_5_pred_0.pkl ├── timings.json ├── unrelaxed_model_1_pred_0.pdb ├── unrelaxed_model_2_pred_0.pdb ├── unrelaxed_model_3_pred_0.pdb ├── unrelaxed_model_4_pred_0.pdb └── unrelaxed_model_5_pred_0.pdb
Further readings
- DeepMind Blog post: "AlphaFold: a solution to a 50-year-old grand challenge in biology"
- ETH News: "Computer algorithms are currently revolutionising biology"
- AlphaFold2 presentation slides 21 March 2022
- Downloading AlphaFold databases and benchmark results
< Examples |