pulumi · SaraDPH · Mar 25, 2025 · Mar 12, 2025 · Mar 12, 2025 · Mar 12, 2025
diff --git a/.../ai-ml-on-kubernetes-google-cloud-llm-rag/ai-ml-kubernetes-gcp-blog-article.png b/.../ai-ml-on-kubernetes-google-cloud-llm-rag/ai-ml-kubernetes-gcp-blog-article.png
diff --git a/content/blog/ai-ml-on-kubernetes-google-cloud-llm-rag/index.md b/content/blog/ai-ml-on-kubernetes-google-cloud-llm-rag/index.md
@@ -0,0 +1,216 @@
+---
+title: "AI/ML on Kubernetes: Deploying Models with Pulumi on Google Cloud"
+allow_long_title: true
+
+# The date represents the post's publish date, and by default corresponds with
+# the date and time this file was generated. Dates are used for display and
+# ordering purposes only; they have no effect on whether or when a post is
+# published. To influence the ordering of posts published on the same date, use
+# the time portion of the date value; posts are sorted in descending order by
+# date/time.
+date: 2025-03-24T07:32:19Z
+
+# The draft setting determines whether a post is published. Set it to true if
+# you want to be able to merge the post without publishing it.
+draft: false
+
+# Use the meta_desc property to provide a brief summary (one or two sentences)
+# of the content of the post, which is useful for targeting search results or
+# social-media previews. This field is required or the build will fail the
+# linter test. Max length is 160 characters.
+meta_desc: Run AI/ML workloads on Kubernetes with Pulumi and Google Cloud. Automate model training, deployment, and scaling with Infrastructure as Code for seamless ML ops
+
+# The meta_image appears in social-media previews and on the blog home page. A
+# placeholder image representing the recommended format, dimensions and aspect
+# ratio has been provided for you.
+meta_image: ai-ml-kubernetes-gcp-blog-article.png
+
+
+# At least one author is required. The values in this list correspond with the
+# `id` properties of the team member files at /data/team/team. Create a file for
+# yourself if you don't already have one.
+authors:
+    - sara-huddleston
+
+# At least one tag is required. Lowercase, hyphen-delimited is recommended.
+tags:
+    - kubernetes
+    - ai
+    - llm
+    - google-cloud
+    - gke
+    - k8s
+
+# The social copy used to promote this post on Twitter and Linkedin. These
+# properties do not actually create the post and have no effect on the
+# generated blog page. They are here strictly for reference.
+
+# Here are some examples of posts we have made in the past for inspiration:
+# https://www.linkedin.com/feed/update/urn:li:activity:7171191945841561601
+# https://www.linkedin.com/feed/update/urn:li:activity:7169021002394296320
+# https://www.linkedin.com/feed/update/urn:li:activity:7155606616455737345
+# https://twitter.com/PulumiCorp/status/1763265391042654623
+# https://twitter.com/PulumiCorp/status/1762900472489185492
+# https://twitter.com/PulumiCorp/status/1755637618631405655
+
+social:
+    twitter:
+    linkedin:
+
+# See the blogging docs at https://github.com/pulumi/docs/blob/master/BLOGGING.md
+# for details, and please remove these comments before submitting for review.
+---
+
+Kubernetes has transformed cloud infrastructure by enabling scalable, containerized applications. While it initially gained traction for managing web applications and microservices, its capabilities now extend to AI/ML workloads, making it the go-to platform for data scientists and machine learning engineers.
+
+Running AI/ML workloads on Kubernetes presents unique challenges, including:
+
+- **Specialized hardware** requirements (e.g., GPUs, TPUs)
+- **Scalability** for model training and inference
+- **Complex data pipelines** that integrate various cloud services
+- **Infrastructure automation** for seamless deployment
+
+Google Cloud Kubernetes (GKE) provides a robust foundation for AI/ML workloads, but managing infrastructure manually can be cumbersome. This is where Pulumi comes in—enabling Infrastructure as Code (IaC) to automate and simplify AI/ML infrastructure on Kubernetes.
+
+<!--more-->
+
+
+## Pulumi: Automating AI/ML Infrastructure on Google Cloud
+
+Pulumi is a modern Infrastructure as Code (IaC) tool that allows teams to define and manage cloud infrastructure using general-purpose programming languages like Python, TypeScript, and Go. This approach is particularly beneficial for AI/ML teams, as Python is already the dominant language in data science and machine learning.
+
+With Pulumi, you can:
+
+- **Provision and scale Kubernetes clusters** on [Google Cloud](https://www.pulumi.com/docs/iac/clouds/gcp/) automatically.
+- **Define AI/ML environments as code**, making deployments repeatable and version-controlled.
+- **Integrate infrastructure with machine learning pipelines**, reducing operational overhead.
+
+{{% notes "info" %}}
+Want to geek out and get the inside scoop on AI/ML using Kubernetes? Meet us at:
+
+- **KubeCon Europe Booth S450** - [Request a 1:1 Demo](https://www.pulumi.com/kubecon-europe/)
+- **Google Next'25 booth 1589** - [Join us for a Happy House and/or Request your 1:1 Demo](https://www.pulumi.com/google-next/)
+
+Don’t miss the chance to see Pulumi in action, ask questions, and explore AI/ML innovations on Kubernetes.🚀
+{{% /notes %}}
+
+## Deploying AI/ML Workloads on Kubernetes
+
+Below, we explore two use cases for running AI/ML workloads on [Google Kubernetes Engine (GKE)](https://www.pulumi.com/templates/kubernetes/gcp/) using Pulumi.
+
+{{< youtube "CoZM9BCJcJ4?rel=0" >}}
+
+### Use Case 1: Deploying a Large Language Model (LLM) with Retrieval Augmented Generation (RAG)
+
+Large Language Models (LLMs) like **GPT-3, Whisper, and DALL-E** require significant infrastructure for training and inference. **Retrieval Augmented Generation (RAG)** enhances LLMs by integrating external knowledge sources, improving accuracy and relevance.
+
+Using Pulumi, you can **automate the deployment of an open-source [LLM with RAG](https://www.pulumi.com/blog/codegen-learnings/)** on Kubernetes.
+
+#### Step 1: Set Up a Kubernetes Cluster on Google Cloud
+
+```python
+import pulumi
+import pulumi_gcp as gcp
+
+# Create a GKE cluster for AI/ML workloads
+cluster = gcp.container.Cluster("ml-cluster",
+    location="us-central1",
+    initial_node_count=3,
+    node_version="1.23",
+    min_master_version="1.23")
+
+# Create a node pool optimized for ML workloads
+node_pool = gcp.container.NodePool("ml-node-pool",
+    cluster=cluster.name,
+    node_config=gcp.container.NodePoolNodeConfigArgs(
+        machine_type="n1-standard-4",
+        oauth_scopes=[
+            "https://www.googleapis.com/auth/devstorage.read_only",
+            "https://www.googleapis.com/auth/logging.write",
+        ],
+        labels={"team": "ml"},
+        shielded_instance_config=gcp.container.NodePoolNodeConfigShieldedInstanceConfigArgs(
+            enable_secure_boot=True,
+            enable_integrity_monitoring=True,
+        ),
+    ),
+    initial_node_count=2,
+    location="us-central1",
+    version="1.23")
+...
+```
+
+This Pulumi script:
+
+- Provisions a **Kubernetes cluster** on Google Cloud.
+- Creates a **dedicated node pool** optimized for AI/ML workloads.
+- Ensures **security best practices** for machine learning environments.
+
+#### Step 2: Deploy the LLM with RAG Model
+
+Once the cluster is set up, we can deploy the **LLM with RAG model** using Pulumi’s [Kubernetes provider](https://www.pulumi.com/blog/pko-2-0-ga/):
+
+1. Define **Kubernetes Deployments** for the LLM and RAG model.
+2. Package models as **Docker containers** and deploy them to the cluster.
+3. Configure **Kubernetes Services** to expose APIs for model inference.
+4. Use **ConfigMaps and Secrets** to manage parameters and credentials.
+
+By defining these resources in Pulumi, deployments become **fully automated, repeatable, and scalable**.
+
+### Use Case 2: Training and Serving Custom Machine Learning Models
+
+Beyond pre-trained LLMs, [Kubernetes](https://www.pulumi.com/docs/iac/clouds/kubernetes/) is ideal for **training and serving custom AI/ML models**. Pulumi can help automate every stage of the ML lifecycle.
+
+#### Step 1: Set Up the Model Training Environment
+
+Using Pulumi, we define a training environment with:
+
+- A **Kubernetes Deployment** for training jobs (GPU-enabled).
+- **Persistent Volume Claims (PVCs)** for storing training data and model artifacts.
+- **Monitoring tools** for tracking performance.
+
+#### Step 2: Deploy and Serve the Trained Model
+
+Once the model is trained, Pulumi can be used to:
+
+- Deploy the trained model as a **Kubernetes Deployment**.
+- Expose the model via a **Kubernetes Service** (REST API or gRPC).
+- Add **autoscaling rules** for dynamic inference scaling.
+
+Pulumi allows teams to **manage the entire AI/ML pipeline** in a structured and automated way.
+
+Try Jay’s demo code on [Creating an AI Training Platform on GKE with Pulumi](https://github.com/jasonsmithio/pulumi-experiments/tree/main/ai-ml-platform/gke-training).
+
+## Why Use Pulumi for AI/ML on Kubernetes?
+
+Pulumi provides several advantages for AI/ML teams running workloads on Kubernetes:
+
+### 1. Use General-Purpose Languages for Infrastructure as Code
+
+- Most AI/ML engineers already work with **Python** or **Go**, and Pulumi lets them manage infrastructure using the same language.
+- No need to learn YAML or Kubernetes manifests—define everything programmatically.
+
+### 2. Automate AI/ML Workflows
+
+- Define infrastructure, training jobs, and model serving in **[one unified IaC framework](https://www.pulumi.com/blog/unified-programmatic-approach-infrastructure-management-bmw-using-pulumi/)**.
+- Ensure **consistency** across development, staging, and production environments.
+
+### 3. Improve Scalability and Cost Efficiency
+
+- Pulumi integrates with **Google Cloud AI services** for optimized compute resources.
+- Automate **autoscaling and resource allocation** for AI/ML workloads.
+
+### 4. Increase Security and Compliance
+
+- Manage credentials and secrets securely with **[Pulumi ESC (Secrets Management)](https://www.pulumi.com/docs/esc/)**.
+- Apply **[policy-as-code](https://www.pulumi.com/docs/iac/using-pulumi/crossguard/)** to enforce security best practices.
+
+## Get Started with AI/ML on Kubernetes with Pulumi
+Pulumi makes it easy to deploy, scale, and manage AI/ML workloads on Kubernetes, leveraging Google Cloud's AI infrastructure. Whether you're serving LLMs, training custom models, or automating ML pipelines, Pulumi provides a developer-friendly, scalable, and secure solution.
+
+- [Explore AI/ML Projects using Pulumi](https://www.pulumi.com/blog/tag/ml/)
+- [Discover Essential Kubernetes Best Practices](https://www.pulumi.com/blog/kubernetes-best-practices-i-wish-i-had-known-before/)
+- [Get Started with Pulumi on Google Cloud](https://www.pulumi.com/docs/iac/clouds/gcp/)
+- [Sign up for Pulumi ➡️](https://app.pulumi.com/signup)
+
+By combining Kubernetes, Google Cloud, and Pulumi, you can accelerate AI/ML innovation while reducing infrastructure complexity.
diff --git a/content/blog/ai-ml-on-kubernetes-google-cloud-llm-rag/meta.png b/content/blog/ai-ml-on-kubernetes-google-cloud-llm-rag/meta.png
diff --git a/content/blog/mlops-the-ai-challenge-is-cloud-not-code/index.md b/content/blog/mlops-the-ai-challenge-is-cloud-not-code/index.md
@@ -21,8 +21,6 @@ tags:
 
 The AI industry is stealing the show as tech's goldrush of the '20s. Just looking at ChatGPT's [record setting user growth](https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/), and rapid 3rd party integration [by top brands](https://www.forbes.com/sites/bernardmarr/2023/05/30/10-amazing-real-world-examples-of-how-companies-are-using-chatgpt-in-2023/?sh=ed1c90f14418), it is not surprising the hype suggests this is the beginning of a major digital transformation.
 
-{{< youtube "IWfPPlzzBzA?rel=0" >}}
-
 However, using AI/ML in your own products has some major challenges and obstacles. Below is a diagram of the end to end workflow of building and using an AI model:  preparing the data, training a model, fine-tuning a model, hosting and running a model, building a backend service to serve the model, and building the user interface that interacts with the model. Most AI engineers are only involved in a few steps of the process. However, there is one challenge that is common across the entire workflow:  creating and managing the cloud infrastructure is hard.
 
 > Figure 1. A sneak peak at the map of ML development and product path
@@ -49,6 +47,8 @@ Before we get into the details of using Pulumi to deploy a private LlaMa 2 API c
 
 If you are ready to try the *#MLOpsChallenge* along with us, then let's double check a few requirements before starting the clock on our 30 minute chatbot challenge. While Pulumi supports many languages, we chose Python for this project, as Python is a familiar language in the AI industry.
 
+{{< youtube "IWfPPlzzBzA?rel=0" >}}
+
 ### Requirements
 
 * [Pulumi CLI](https://www.pulumi.com/docs/install/)