AlphaFold 3

From ScientificComputing
Jump to: navigation, search

Introduction

AlphaFold 3 (AF3) is an updated version of the Alphafold software. Full documentation and the link to their paper can be found in their GitHub page.

Prerequisites

Before running an AF3 job you would need to download the AF3 model parameters and to ensure that you can use Apptainer/Singularity on Euler.

AF3 model parameters can be requested using this form. Unfortunately, due to Terms and Services that are quite constraining for institutions, we cannot provide AF3 parameters centrally. However, the file takes up only ~1GB of storage. Please be mindful of the 15-day purge if you save the parameters in your personal scratch.

To ensure that you can run containers on the cluster, please run the `get_access` command in the terminal connected to Euler. Additional information is available here.

Test Job

First, you would need to prepare an input file. Full instruction are available in the AF3 documentation. For this test, the input file should be named `input.json`. Here, we used again Ubiquitin, that we also used as a test for the AF2 version :

{
  "name": "Job name goes here",
  "modelSeeds": [1, 2],
  "sequences": [
    { "protein": {
      "id": "A",
      "sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQRESTLHLVLRLRGG"
       }
    }
  ],
  "dialect": "alphafold3",
  "version": 2
}

To run a first AF3 job and to familiarise yourself with the container on Euler, you can ask for an interactive job on the cluster. Here is an example of such a job for a suggestion of resources to request to fold a small (~100 AA) protein :

srun -n 8  -G 1 --time=04:00:00 --mem-per-cpu=15g --pty bash

Once you get a prompt on a compute node, you will need to run the AF3 container. You would need to replace the <…> parts with something that makes sense for you :

singularity exec --nv \
--bind <...path on euler to the folder containing the AF3 input json file...>:/root/af_input     \
 --bind <...path on euler to where the container should write the outputs...>:/root/af_output     \
 --bind <...path on euler to pre-downloaded AF3 weights...>:/root/models     \
 --bind /cluster/project/alphafold/alphafold3:/root/public_databases  \
/cluster/apps/nss/alphafold/containers/AlphaFold3/af3.sif python3 /app/alphafold/alphafold-3.0.1/run_alphafold.py --json_path=/root/af_input/input.json --model_dir=/root/models --db_dir=/root/public_databases --output_dir=/root/af_output

If you work with a GPU that has a compute capability lower than 8.0 (i.e., any GPU model outside of an A100 and RTX 4090 on the cluster), you will get an explicit error from AF3 requesting additional XLA options. If you have not requested your GPU model explicitly, you can see it by running 'nvidia_smi' command in the prompt of the interactive job. Full list of GPU models on Euler is available on the bottom of this page. This is the full command that will accommodate the GPUs with lower compute capability:

singularity exec --nv \
--bind <...path on euler to the folder containing the AF3 input json file...>:/root/af_input     \
 --bind <...path on euler to where the container should write the outputs...>:/root/af_output     \
 --bind <...path on euler to pre-downloaded AF3 weights...>:/root/models     \
 --bind /cluster/project/alphafold/alphafold3:/root/public_databases  \
--env=XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter”    \
/cluster/apps/nss/alphafold/containers/AlphaFold3/af3.sif python3 /app/alphafold/alphafold-3.0.1/run_alphafold.py --json_path=/root/af_input/input.json --model_dir=/root/models --db_dir=/root/public_databases --output_dir=/root/af_output --flash_attention_implementation=xla

Known Issues

In the AF3 outputs, you may see :

[...] xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[...] xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory

Those warnings are expected. AF3 runs check whether a TPU is available. TPUs are a proprietary chip of Google that are not commercialised, however, one can use it on google cloud or google colab. AF3 also run a detection of rocm or cuda, therefore you will get a warning that rocm has not been find if you are on an NVIDA GPU and that cuda has not been found if you are on an AMD GPU.

The following warning is due to an outdated driver. We are working on updating drivers on our GPUs:

[...] external/xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.4 which is older than the PTX compiler version 12.6.77. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.