oracle-devrel
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/README.md
Lines changed: 51 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/README.md
Lines changed: 51 additions & 0 deletions
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/alphafold2-oke/LICENSE
Lines changed: 35 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/alphafold2-oke/LICENSE
Lines changed: 35 additions & 0 deletions
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/alphafold2-oke/README.md
Lines changed: 154 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/bionemo/alphafold2-oke/README.md
Lines changed: 154 additions & 0 deletions
@@ -0,0 +1,51 @@
+# Using NVIDIA BioNeMo on Oracle Cloud Infrastructure (OCI)
+
+This repository showcases how to deploy NVIDIA NIM's from the BioNeMo suite on OCI at scale in order to tackle a practical problem of drug discovery. 
+
+Reviewed: 21.02.2025
+
+# Table of Contents
+
+1. [Use case overview](#use-case-overview)
+2. [Objective](#objective)
+3. [Protein Structure Prediction for DHFR Inhibitor Discovery](#protein-structure-prediction-for-dhfr-inhibitor-discovery)
+
+*More steps to follow soon*
+
+# Use case overview
+
+Dihydrofolate reductase (DHFR) is a crucial enzyme in cellular metabolism, playing a vital role in DNA synthesis and cell proliferation. DHFR catalyzes the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate (THF), an essential cofactor for several one-carbon transfer reactions in purine and pyrimidine synthesis. This reaction is critical for maintaining the intracellular pool of THF, which is necessary for the de novo synthesis of purines, thymidylate, and certain amino acids.
+
+The importance of DHFR in DNA synthesis stems from its role in producing THF, which is required for the synthesis of nucleic acid precursors. Without sufficient THF, cells cannot efficiently produce the building blocks needed for DNA replication and cell division. This makes DHFR essential for rapidly dividing cells, such as cancer cells and bacteria.
+
+DHFR has become a common target for antimicrobial and anticancer drugs due to its critical role in cell proliferation. By inhibiting DHFR, these drugs deplete the THF pool within cells, leading to disruption of DNA synthesis, slowed cell proliferation, and eventually cell death. This mechanism of action is particularly effective against rapidly dividing cells, making DHFR inhibitors valuable in treating cancer and bacterial infections.
+
+Antifolate medications, which target DHFR, have been widely used in cancer treatment. For example, methotrexate is a well-known DHFR inhibitor used in cancer therapy and for treating rheumatoid arthritis. In antimicrobial applications, trimethoprim is a classic DHFR inhibitor used to combat bacterial infections.
+
+The effectiveness of DHFR as a drug target has led to ongoing research into developing new inhibitors with improved efficacy and the ability to overcome resistance mechanisms. This includes efforts to design compounds that can inhibit both wild-type and mutant forms of DHFR, potentially leading to antibiotics less prone to resistance development.
+
+# Objective
+
+To develop a novel inhibitor for dihydrofolate reductase (DHFR), we are using NVIDIA BioNeMo and other open-source tools.
+
+# [Protein Structure Prediction for DHFR Inhibitor Discovery](./alphafold2-oke/README.md)
+
+This step makes use of the Alphafold2 NIM. The detailed explanation is availbale [here](./alphafold2-oke/README.md)
+
+# Useful links
+
+- [Build a Generative Protein Binder Design Pipeline](https://build.nvidia.com/nvidia/protein-binder-design-for-drug-discovery)
+- [Protein structure prediction with Alphafold2 NIM](https://github.com/NVIDIA/bionemo-examples/blob/62aef816070399814e478234dc47eb2ccddfd1a0/examples/nims/alphafold2/AlphaFold2-NIM-example.ipynb)
+- [Overview of Kubernetes Engine in OCI](https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengoverview.htm) 
+
+# Acknowledgments
+
+- **Authors** - Bruno Garbaccio (GPU Specialist), Wajahat Aziz (GPU Specialist leader)
+
+# License
+
+Copyright (c) 2024 Oracle and/or its affiliates.
+
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+
+See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
@@ -0,0 +1,35 @@
+#Copyright (c) 2024 Oracle and/or its affiliates.
+#
+#The Universal Permissive License (UPL), Version 1.0
+#
+#Subject to the condition set forth below, permission is hereby granted to any
+#person obtaining a copy of this software, associated documentation and/or data
+#(collectively the "Software"), free of charge and under any and all copyright
+#rights in the Software, and any and all patent rights owned or freely
+#licensable by each licensor hereunder covering either (i) the unmodified
+#Software as contributed to or provided by such licensor, or (ii) the Larger
+#Works (as defined below), to deal in both
+#
+#(a) the Software, and
+#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
+#one is included with the Software (each a "Larger Work" to which the Software
+#is contributed by such licensors),
+#
+#without restriction, including without limitation the rights to copy, create
+#derivative works of, display, perform, and distribute the Software and make,
+#use, sell, offer for sale, import, export, have made, and have sold the
+#Software and the Larger Work(s), and to sublicense the foregoing rights on
+#either these or other terms.
+#
+#This license is subject to the following condition:
+#The above copyright notice and either this complete permission notice or at
+#a minimum a reference to the UPL must be included in all copies or
+#substantial portions of the Software.
+#
+#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+#SOFTWARE.
@@ -0,0 +1,154 @@
+# Protein Structure Prediction for DHFR Inhibitor Discovery using NVIDIA NIM for Alphafold2 
+![protein structure visualisation after Alphafold2 prediction](./protein_image_pymol.png)
+
+Reviewed: 21.02.2025
+
+## Introduction
+This tutorial demonstrates how to deploy [NVIDIA NIM for Alphafold2](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/index.html) on Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) in order to do protein structure prediction. 
+
+### Objectives
+- Achieve a scalable deployment of NVIDIA NIM for Alphafold2
+- Get protein structure prediction from their amino acid sequences
+- Visualise the protein structure with Pymol
+
+### Prerequisites
+- Access to an Oracle Cloud Infrastructure (OCI) tenancy.
+
+- Access to shapes with NVIDIA GPU such as A10 GPUs (i.e., `VM.GPU.A10.1`). For more information on requests to increase the limit, see [Service Limits](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/servicelimits.htm).
+
+- Access to NVIDIA NGC with valid personnal keys. This is required to use the container. For more information, see [Creating a NGC account and generating an API key](https://docs.nvidia.com/nim/bionemo/alphafold2/latest/prerequisites.html#ngc-nvidia-gpu-cloud-account).
+
+- Knowledge of basic terminology of Kubernetes and Helm.
+
+## Task 1: Deploy an OKE cluster
+Create an OKE cluster from the "quick create" tab with node type `managed`. For more information, see [Using the Console to create a Cluster with Default Settings in the 'Quick Create' workflow](https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingclusterusingoke_topic-Using_the_Console_to_create_a_Quick_Cluster_with_Default_Settings.htm).
+
+- Start by creating 1 node pool called `management` that will be used for default pods deployment (i.e., `VM.Standard.E4.Flex` with 5 OCPU and 80GB RAM) with the default image.
+
+- Once your cluster is up, create another node pool with 1 GPU node (i.e., `VM.GPU.A10.1`) called `NIM` with the default image with the GPU drivers (i.e., `Oracle-Linux-8.X-Gen2-GPU-XXXX.XX.XX`).
+
+> [!IMPORTANT] 
+> Make sure to increase the boot volume to 2.5TB and add the following [cloud-init](./cloud-init) script in **Show advanced options** and **Initialization script**. On the first deployment, Alphafold2 will download the models and database which take a lot of disk space. One can also upload their ssh public key in case access to the node is required. Note that in this case, a [bastion session](https://docs.oracle.com/en-us/iaas/Content/Bastion/Concepts/bastionoverview.htm) will be required to access the machine in a private subnet.
+
+## Task 2: Deploy the application using Helm in OCI Cloud Shell
+To access OCI Cloud Shell, see [To access Cloud Shell via the Console](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cloudshellgettingstarted.htm#:~:text=Login%20to%20the%20Console.,the%20Cloud%20Shell%20was%20started.).
+
+1. You can find the Helm configuration in the folder [`helm`](./helm), where you can update `values.yaml`. There is 1 replica by default (can be more if the number of `VM.GPU.A10.1` increases) and `service.type` is set to `LoadBalancer` to create a flexible load balancer with a public IP in order to access the API endpoint of the container. 
+Upload the folder to your OCI Cloud Shell environment. For more information, see [To upload a file to Cloud Shell using the menu](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/devcloudshellgettingstarted.htm#:~:text=To%20upload%20a%20file%20to%20Cloud%20Shell%20using%20the%20menu,click%20select%20from%20your%20computer.). 
+
+2. Set your NGC key as Kubernetes secret
+```
+kubectl create secret generic ngc-registry-secret --from-literal=NGC_REGISTRY_KEY=<YOUR_NGC_REGISTRY_KEY>
+```
+
+3. Set an environment varible with your desired chart name
+```
+export CHART_NAME=<your-chart-name>
+```
+
+4. Install Helm Chart:
+> [!IMPORTANT] 
+> The first deployment will take a lot of time because the models need to be downloaded. Once they are cached, the download will no longer be necessary on thothese machine as long as `persistence.hostPath`remains the same. This also means that the values of `livenessProbe` and `readinessProbe` can be adjusted accordingly. Once the models are downloaded, the materialisation of the workspace can take up to 2h on a `VM.GPU.A10.1`, therefore the values might be set to `7200` after the initial deployment.
+```
+cd helm
+helm install "${CHART_NAME}" . --debug
+```
+
+5. Follow the container initialisation and get the logs if needed
+````
+kubectl get pods
+NAME                                                           READY   STATUS    RESTARTS       AGE
+alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx   1/1     Running   1 (121m ago)   165m
+
+# describe pod
+kubectl describe pods alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx
+
+#get log 
+kubectl logs alphafoldnim-protein-design-chart-alphafold2-xxxx-xxxx --follow
+````
+
+6. Get the external IP of the load balancer: 
+```
+kubectl get svc
+NAME                                           TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)             AGE
+alphafoldnim-protein-design-chart-alphafold2   LoadBalancer   10.96.69.193   <EXTERNAL_IP>   8081:30449/TCP      75m
+```
+
+## Task 3 (optional): Adapt the load balancer listener timeout
+Because the requests can take a lot of time to be prcessed, it is possible one needs to increase the default [timeout set for the load balancer listeners](https://docs.oracle.com/en-us/iaas/Content/Balance/Reference/connectionreuse.htm). The default is 300s, to avoid issues it can be increased to 3600s (1 hour).
+
+## Task 4: Perform protein structure predictions
+The protein structure predictions code can be found in [alphafold2.ipynb](./alphafold2.ipynb). One simple way to run this notebook is to spin up a small VM (i.e `VM.Standard.E4.Flex` with 2 OCPU and 16GB of RAM) in any public subnet, setup a python virtual environment, install the requirements and start the jupyter server there. A local alternative is also possible. The following describe the jupyter notebook setup:
+
+1. [Create](https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/launchinginstance.htm) a `VM.Standard.E4.Flex` with 2 OCPU and 16GB of RAM in any public subnet. Use the default OL8 image and provide a public key in order to ssh to it. 
+
+2. Once ssh is up, ssh to it and install a python virtual environment. Use the following [requirements.txt](./requirements.txt)
+```
+ssh opc@<PUBLIC_IP>
+
+# install miniconda
+mkdir -p ~/miniconda3
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
+bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
+rm ~/miniconda3/miniconda.sh
+
+# activate the new venv 
+source ~/miniconda3/bin/activate
+conda create -n "nim" python=3.12 
+conda activate nim
+
+# install the requirements
+pip install -r requirements.txt
+```
+
+3. Open port 8000 on the machine
+```
+# Open port 8000 on the machine for jupyter
+sudo firewall-cmd --permanent --add-port=8000/tcp
+sudo firewall-cmd --reload
+```
+
+4. Install Tmux and start the jupyter server 
+```
+# install tmux:
+sudo yum install tmux -y
+
+# in a tmux shell:
+source ~/miniconda3/bin/activate
+conda activate nim
+jupyter lab --port=8000
+```
+
+5. Keep the output from the last command, it will be needed to connect to the notebook i.e `http://localhost:8000/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
+
+6. In another terminal window, create a local port forwarding:
+```
+ssh -L 8000:localhost:8000 opc@<PUBLIC_IP>
+``` 
+
+7. In your web browser, connect to `http://localhost:8000/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
+
+8. Run [alphafold2.ipynb](./alphafold2.ipynb)
+
+## Task 5: Clean up the Deployment
+
+1. Once you have finished using NVIDIA NIM for Alphafold2, you should use helm to delete the deployment.
+```
+$ helm list
+NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
+alphafoldnim    default         1               2025-02-21 07:46:03.84342028 +0000 UTC  deployed        protein-design-chart-0.1.0      1.0.0  
+
+$ helm uninstall "${CHART_NAME}"  --wait
+```
+
+# Acknowledgments
+
+- **Authors** - Bruno Garbaccio (GPU Specialist), Wajahat Aziz (GPU Specialist leader)
+
+# License
+
+Copyright (c) 2024 Oracle and/or its affiliates.
+
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+
+See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.