Difference between revisions of "AlphaFold2"
Line 17: | Line 17: | ||
== Databases == | == Databases == | ||
− | The AlphaFold databases are available for all cluster users at /cluster/project/alphafold. | + | The AlphaFold databases are available for all cluster users at '''/cluster/project/alphafold'''. |
If you wish to download databases separately, you can see the instruction [[Downloading Alphafold databases|here]]. | If you wish to download databases separately, you can see the instruction [[Downloading Alphafold databases|here]]. |
Revision as of 09:34, 22 March 2022
< Examples |
AlphaFold2 predicts a protein's 3D folding structure by its amino acid sequence with the accuracy that is competitive with experimental results. This AI-powered structure prediction of AlphaFold2 has been recognized as the scientific breakthrough of the year 2021. The AlphaFold package is now installed in the new software stack on Euler.
Load modules
The AlphaFold module can be loaded as following.
$ env2lmod $ module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 Now run 'alphafold_init' to initialize the virtual environment The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0 $ alphafold_init (venv_alphafold) [jarunanp@eu-login-18 ~]$
Databases
The AlphaFold databases are available for all cluster users at /cluster/project/alphafold.
If you wish to download databases separately, you can see the instruction here.
Submit a job
AlphaFold2 can run with CPUs only, or with CPUs and GPUs which help speed up the computation significantly.
Here is an example of a job submission script (run_alphafold.bsub) which requests 12 CPU cores, in total 120GB of memory, in total 120GB of local scratch space and one GPU. This job is to fold a monomeric protein Ubiquitin (76aa).
#!/usr/bin/bash #BSUB -n 12 #BSUB -W 4:00 #BSUB -R "rusage[mem=10000, scratch=10000, ngpus_excl_p=1]" #BSUB -J alphafold source /cluster/apps/local/env2lmod.sh module load gcc/6.3.0 openmpi/4.0.2 alphafold/2.1.1 source /cluster/apps/nss/alphafold/venv_alphafold/bin/activate # Define paths to databases, fasta file and output directory DATA_DIR="/cluster/project/alphafold" FASTA_DIR="/cluster/home/jarunanp/fastafiles" OUTPUT_DIR=$TMPDIR/output python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ --data_dir=$DATA_DIR \ --output_dir=$OUTPUT_DIR \ --max_template_date="2021-12-06" \ --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \ --uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path=$DATA_DIR/pdb70/pdb70 \ --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \ --fasta_paths=$FASTA_DIR/ubiquitin.fasta # Copy the results from the compute node mkdir -p output cp -r $OUTPUT_DIR/* output
To fold a multimeric protein, the option --model_preset=multimer has to be called, and --pdb_seqres_database_path and --uniprot_database_path have to be set. The command to run AlphaFold becomes:
python /cluster/apps/nss/alphafold/alphafold-2.1.1/run_alphafold.py \ --data_dir=$DATA_DIR \ --output_dir=$OUTPUT_DIR \ --max_template_date="2021-12-06" \ --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \ --uniclust30_database_path=$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2018_12.fa \ --pdb_seqres_database_path=$DATA_DIR/pdb_seqres/pdb_seqres.txt \ --uniprot_database_path=$DATA_DIR/uniprot/uniprot.fasta \ --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \ --model_preset=multimer \ --fasta_paths=$FASTA_DIR/IFGSC_6mer.fasta
Submit a job with the command
$ bsub < run_alphafold.bsub
The screen output is saved in the output file named starting with lsf.o followed by the JobID, e.g., lsf.o195525946. Please see this page for how to read the output file.
From our benchmark, it took around 40 minutes to fold Ubiquitin[76aa] and 2.5 hours to fold T1050[779aa].
Further readings
- DeepMind Blog post: "AlphaFold: a solution to a 50-year-old grand challenge in biology"
- ETH News: "Computer algorithms are currently revolutionising biology"
- AlphaFold2 presentation slides 21 March 2022
< Examples |