Skip to content

Commit 1b09a5a

Browse files
Merge pull request #1599 from oracle-devrel/nim-alphafold
Nim alphafold on OKE and GPU
2 parents f696196 + 69dcb2b commit 1b09a5a

File tree

21 files changed

+1219
-0
lines changed

21 files changed

+1219
-0
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Using NVIDIA BioNeMo on Oracle Cloud Infrastructure (OCI)
2+
3+
This repository showcases how to deploy NVIDIA NIM's from the BioNeMo suite on OCI at scale in order to tackle a practical problem of drug discovery.
4+
5+
Reviewed: 21.02.2025
6+
7+
# Table of Contents
8+
9+
1. [Use case overview](#use-case-overview)
10+
2. [Objective](#objective)
11+
3. [Protein Structure Prediction for DHFR Inhibitor Discovery](#protein-structure-prediction-for-dhfr-inhibitor-discovery)
12+
13+
*More steps to follow soon*
14+
15+
# Use case overview
16+
17+
Dihydrofolate reductase (DHFR) is a crucial enzyme in cellular metabolism, playing a vital role in DNA synthesis and cell proliferation. DHFR catalyzes the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate (THF), an essential cofactor for several one-carbon transfer reactions in purine and pyrimidine synthesis. This reaction is critical for maintaining the intracellular pool of THF, which is necessary for the de novo synthesis of purines, thymidylate, and certain amino acids.
18+
19+
The importance of DHFR in DNA synthesis stems from its role in producing THF, which is required for the synthesis of nucleic acid precursors. Without sufficient THF, cells cannot efficiently produce the building blocks needed for DNA replication and cell division. This makes DHFR essential for rapidly dividing cells, such as cancer cells and bacteria.
20+
21+
DHFR has become a common target for antimicrobial and anticancer drugs due to its critical role in cell proliferation. By inhibiting DHFR, these drugs deplete the THF pool within cells, leading to disruption of DNA synthesis, slowed cell proliferation, and eventually cell death. This mechanism of action is particularly effective against rapidly dividing cells, making DHFR inhibitors valuable in treating cancer and bacterial infections.
22+
23+
Antifolate medications, which target DHFR, have been widely used in cancer treatment. For example, methotrexate is a well-known DHFR inhibitor used in cancer therapy and for treating rheumatoid arthritis. In antimicrobial applications, trimethoprim is a classic DHFR inhibitor used to combat bacterial infections.
24+
25+
The effectiveness of DHFR as a drug target has led to ongoing research into developing new inhibitors with improved efficacy and the ability to overcome resistance mechanisms. This includes efforts to design compounds that can inhibit both wild-type and mutant forms of DHFR, potentially leading to antibiotics less prone to resistance development.
26+
27+
# Objective
28+
29+
To develop a novel inhibitor for dihydrofolate reductase (DHFR), we are using NVIDIA BioNeMo and other open-source tools.
30+
31+
# [Protein Structure Prediction for DHFR Inhibitor Discovery](./alphafold2-oke/README.md)
32+
33+
This step makes use of the Alphafold2 NIM. The detailed explanation is availbale [here](./alphafold2-oke/README.md)
34+
35+
# Useful links
36+
37+
- [Build a Generative Protein Binder Design Pipeline](https://build.nvidia.com/nvidia/protein-binder-design-for-drug-discovery)
38+
- [Protein structure prediction with Alphafold2 NIM](https://github.com/NVIDIA/bionemo-examples/blob/62aef816070399814e478234dc47eb2ccddfd1a0/examples/nims/alphafold2/AlphaFold2-NIM-example.ipynb)
39+
- [Overview of Kubernetes Engine in OCI](https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengoverview.htm)
40+
41+
# Acknowledgments
42+
43+
- **Authors** - Bruno Garbaccio (GPU Specialist), Wajahat Aziz (GPU Specialist leader)
44+
45+
# License
46+
47+
Copyright (c) 2024 Oracle and/or its affiliates.
48+
49+
Licensed under the Universal Permissive License (UPL), Version 1.0.
50+
51+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#Copyright (c) 2024 Oracle and/or its affiliates.
2+
#
3+
#The Universal Permissive License (UPL), Version 1.0
4+
#
5+
#Subject to the condition set forth below, permission is hereby granted to any
6+
#person obtaining a copy of this software, associated documentation and/or data
7+
#(collectively the "Software"), free of charge and under any and all copyright
8+
#rights in the Software, and any and all patent rights owned or freely
9+
#licensable by each licensor hereunder covering either (i) the unmodified
10+
#Software as contributed to or provided by such licensor, or (ii) the Larger
11+
#Works (as defined below), to deal in both
12+
#
13+
#(a) the Software, and
14+
#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
#one is included with the Software (each a "Larger Work" to which the Software
16+
#is contributed by such licensors),
17+
#
18+
#without restriction, including without limitation the rights to copy, create
19+
#derivative works of, display, perform, and distribute the Software and make,
20+
#use, sell, offer for sale, import, export, have made, and have sold the
21+
#Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
#either these or other terms.
23+
#
24+
#This license is subject to the following condition:
25+
#The above copyright notice and either this complete permission notice or at
26+
#a minimum a reference to the UPL must be included in all copies or
27+
#substantial portions of the Software.
28+
#
29+
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
#SOFTWARE.
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Protein Structure Prediction for DHFR Inhibitor Discovery using NVIDIA NIM for Alphafold2
2+
![protein structure visualisation after Alphafold2 prediction](./protein_image_pymol.png)
3+
4+
Reviewed: 21.02.2025
5+
6+
## Introduction
7+
This tutorial demonstrates how to deploy [NVIDIA NIM for Alphafold2](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/index.html) on Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) in order to do protein structure prediction.
8+
9+
### Objectives
10+
- Achieve a scalable deployment of NVIDIA NIM for Alphafold2
11+
- Get protein structure prediction from their amino acid sequences
12+
- Visualise the protein structure with Pymol
13+
14+
### Prerequisites
15+
- Access to an Oracle Cloud Infrastructure (OCI) tenancy.
16+
17+
- Access to shapes with NVIDIA GPU such as A10 GPUs (i.e., `VM.GPU.A10.1`). For more information on requests to increase the limit, see [Service Limits](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/servicelimits.htm).
18+
19+
- Access to NVIDIA NGC with valid personnal keys. This is required to use the container. For more information, see [Creating a NGC account and generating an API key](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/prerequisites.html#ngc-nvidia-gpu-cloud-account).
20+
21+
- Knowledge of basic terminology of Kubernetes and Helm.
22+
23+
## Task 1: Deploy an OKE cluster
24+
Create an OKE cluster from the "quick create" tab with node type `managed`. For more information, see [Using the Console to create a Cluster with Default Settings in the 'Quick Create' workflow](https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingclusterusingoke_topic-Using_the_Console_to_create_a_Quick_Cluster_with_Default_Settings.htm).
25+
26+
- Start by creating 1 node pool called `management` that will be used for default pods deployment (i.e., `VM.Standard.E4.Flex` with 5 OCPU and 80GB RAM) with the default image.
27+
28+
- Once your cluster is up, create another node pool with 1 GPU node (i.e., `VM.GPU.A10.1`) called `NIM` with the default image with the GPU drivers (i.e., `Oracle-Linux-8.X-Gen2-GPU-XXXX.XX.XX`).
29+
30+
> [!IMPORTANT]
31+
> Make sure to increase the boot volume to 2.5TB and add the following [cloud-init](./cloud-init) script in **Show advanced options** and **Initialization script**. On the first deployment, Alphafold2 will download the models and database which take a lot of disk space. One can also upload their ssh public key in case access to the node is required. Note that in this case, a [bastion session](https://docs.oracle.com/en-us/iaas/Content/Bastion/Concepts/bastionoverview.htm) will be required to access the machine in a private subnet.
32+
33+
## Task 2: Deploy the application using Helm in OCI Cloud Shell
34+
To access OCI Cloud Shell, see [To access Cloud Shell via the Console](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cloudshellgettingstarted.htm#:~:text=Login%20to%20the%20Console.,the%20Cloud%20Shell%20was%20started.).
35+
36+
1. You can find the Helm configuration in the folder [`helm`](./helm), where you can update `values.yaml`. There is 1 replica by default (can be more if the number of `VM.GPU.A10.1` increases) and `service.type` is set to `LoadBalancer` to create a flexible load balancer with a public IP in order to access the API endpoint of the container.
37+
Upload the folder to your OCI Cloud Shell environment. For more information, see [To upload a file to Cloud Shell using the menu](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/devcloudshellgettingstarted.htm#:~:text=To%20upload%20a%20file%20to%20Cloud%20Shell%20using%20the%20menu,click%20select%20from%20your%20computer.).
38+
39+
2. Set your NGC key as Kubernetes secret
40+
```
41+
kubectl create secret generic ngc-registry-secret --from-literal=NGC_REGISTRY_KEY=<YOUR_NGC_REGISTRY_KEY>
42+
```
43+
44+
3. Set an environment varible with your desired chart name
45+
```
46+
export CHART_NAME=<your-chart-name>
47+
```
48+
49+
4. Install Helm Chart:
50+
> [!IMPORTANT]
51+
> The first deployment will take a lot of time because the models need to be downloaded. Once they are cached, the download will no longer be necessary on thothese machine as long as `persistence.hostPath`remains the same. This also means that the values of `livenessProbe` and `readinessProbe` can be adjusted accordingly. Once the models are downloaded, the materialisation of the workspace can take up to 2h on a `VM.GPU.A10.1`, therefore the values might be set to `7200` after the initial deployment.
52+
```
53+
cd helm
54+
helm install "${CHART_NAME}" . --debug
55+
```
56+
57+
5. Follow the container initialisation and get the logs if needed
58+
````
59+
kubectl get pods
60+
NAME READY STATUS RESTARTS AGE
61+
alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx 1/1 Running 1 (121m ago) 165m
62+
63+
# describe pod
64+
kubectl describe pods alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx
65+
66+
#get log
67+
kubectl logs alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx --follow
68+
````
69+
70+
6. Get the external IP of the load balancer:
71+
```
72+
kubectl get svc
73+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
74+
alphafoldnim-protein-design-chart-alphafold2 LoadBalancer 10.96.69.193 <EXTERNAL_IP> 8081:30449/TCP 75m
75+
```
76+
77+
## Task 3 (optional): Adapt the load balancer listener timeout
78+
Because the requests can take a lot of time to be prcessed, it is possible one needs to increase the default [timeout set for the load balancer listeners](https://docs.oracle.com/en-us/iaas/Content/Balance/Reference/connectionreuse.htm). The default is 300s, to avoid issues it can be increased to 3600s (1 hour).
79+
80+
## Task 4: Perform protein structure predictions
81+
The protein structure predictions code can be found in [alphafold2.ipynb](./alphafold2.ipynb). One simple way to run this notebook is to spin up a small VM (i.e `VM.Standard.E4.Flex` with 2 OCPU and 16GB of RAM) in any public subnet, setup a python virtual environment, install the requirements and start the jupyter server there. A local alternative is also possible. The following describe the jupyter notebook setup:
82+
83+
1. [Create](https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/launchinginstance.htm) a `VM.Standard.E4.Flex` with 2 OCPU and 16GB of RAM in any public subnet. Use the default OL8 image and provide a public key in order to ssh to it.
84+
85+
2. Once ssh is up, ssh to it and install a python virtual environment. Use the following [requirements.txt](./requirements.txt)
86+
```
87+
ssh opc@<PUBLIC_IP>
88+
89+
# install miniconda
90+
mkdir -p ~/miniconda3
91+
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
92+
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
93+
rm ~/miniconda3/miniconda.sh
94+
95+
# activate the new venv
96+
source ~/miniconda3/bin/activate
97+
conda create -n "nim" python=3.12
98+
conda activate nim
99+
100+
# install the requirements
101+
pip install -r requirements.txt
102+
```
103+
104+
3. Open port 8000 on the machine
105+
```
106+
# Open port 8000 on the machine for jupyter
107+
sudo firewall-cmd --permanent --add-port=8000/tcp
108+
sudo firewall-cmd --reload
109+
```
110+
111+
4. Install Tmux and start the jupyter server
112+
```
113+
# install tmux:
114+
sudo yum install tmux -y
115+
116+
# in a tmux shell:
117+
source ~/miniconda3/bin/activate
118+
conda activate nim
119+
jupyter lab --port=8000
120+
```
121+
122+
5. Keep the output from the last command, it will be needed to connect to the notebook i.e `http://localhost:8000/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
123+
124+
6. In another terminal window, create a local port forwarding:
125+
```
126+
ssh -L 8000:localhost:8000 opc@<PUBLIC_IP>
127+
```
128+
129+
7. In your web browser, connect to `http://localhost:8000/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
130+
131+
8. Run [alphafold2.ipynb](./alphafold2.ipynb)
132+
133+
## Task 5: Clean up the Deployment
134+
135+
1. Once you have finished using NVIDIA NIM for Alphafold2, you should use helm to delete the deployment.
136+
```
137+
$ helm list
138+
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
139+
alphafoldnim default 1 2025-02-21 07:46:03.84342028 +0000 UTC deployed protein-design-chart-0.1.0 1.0.0
140+
141+
$ helm uninstall "${CHART_NAME}" --wait
142+
```
143+
144+
# Acknowledgments
145+
146+
- **Authors** - Bruno Garbaccio (GPU Specialist), Wajahat Aziz (GPU Specialist leader)
147+
148+
# License
149+
150+
Copyright (c) 2024 Oracle and/or its affiliates.
151+
152+
Licensed under the Universal Permissive License (UPL), Version 1.0.
153+
154+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.

0 commit comments

Comments
 (0)