feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

inardini · 2024-12-30T14:38:05Z

Description

This notebook showcases how to deploy Gemma 2 2B from the Hugging Face Hub with multiple LoRA adapters fine-tuned for different purposes such as coding, or SQL using HuggingFace's Text Generation Inference (TGI) Deep Learning Container (DLC) in combination with a custom handler on Vertex AI.

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

review-notebook-app · 2024-12-30T14:38:10Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

code-review-assist

Hi @inardini and reviewers,

I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary of the changes for you and other reviewers to get up to speed:

This pull request adds a new notebook demonstrating how to deploy the Gemma 2 2B model from Hugging Face Hub to Vertex AI, using multiple LoRA adapters for different tasks (like coding and SQL). It leverages Hugging Face's Text Generation Inference (TGI) Deep Learning Container (DLC) along with a custom handler for enhanced flexibility.

Here's a breakdown of the changes:

open-models/README.md: A new entry was added (lines 9-12) to the README file, pointing to the new vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb notebook.
open-models/serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb: This is a completely new Jupyter Notebook (1569 lines of code) that details the entire process, from setting up the environment and authentication to deploying the model and making predictions using different LoRA adapters. The notebook covers:
- Setting up the environment and authentication (Hugging Face and Google Cloud).
- Creating and testing a custom handler for managing multiple LoRA adapters.
- Downloading the base Gemma 2 model and LoRA adapters from Hugging Face Hub.
- Implementing an LLM-based router to select the appropriate adapter based on the user prompt.
- Registering the model on Vertex AI.
- Deploying the model to a Vertex AI endpoint.
- Making online predictions via the Vertex AI SDK, gcloud CLI, and curl.
- Cleaning up resources.

The overall intent is to provide a comprehensive and practical guide for deploying Gemma 2 with LoRA adapters on Vertex AI using the TGI DLC and a custom handler. This allows users to easily leverage the power of fine-tuned models for various tasks within a production-ready environment.

And now, a little haiku to lighten the mood:

Code flows like a stream,
LoRA adapts, sharp and keen,
Vertex AI serves well.

code-review-assist · 2024-12-30T14:42:55Z

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

code-review-assist · 2024-12-30T14:42:57Z

Gemini encountered an error creating the summary. You can try again by commenting @code-review-assist summarize.

code-review-assist

Hi team,

I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary of the changes for you and other reviewers:

This pull request adds a new Jupyter notebook demonstrating how to deploy the Gemma 2 2B model from Hugging Face, along with multiple LoRA adapters, using Hugging Face's Text Generation Inference (TGI) on Google Vertex AI. The key improvements are:

New Notebook: open-models/serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb has been added. This notebook provides a comprehensive guide and code examples for deploying Gemma 2 with LoRA adapters for tasks like coding and SQL query generation.
Custom Handler: The notebook implements a custom handler to manage the pre-processing, inference, and post-processing steps for efficient model serving on Vertex AI. This allows for flexible handling of multiple LoRA adapters based on the user's prompt.
README Update: The open-models/README.md file has been updated to include a link to the new notebook, improving discoverability.
Spelling Updates: The .github/actions/spelling/allow.txt file has been updated to include some additional words, likely to address false positives in the spelling checks.

The notebook covers the entire process, from setting up the environment and authentication to deploying the model and making online predictions using both the Vertex AI SDK and the gcloud CLI. It also includes a section on cleaning up resources after the deployment.

Overall, this PR significantly enhances the existing examples by providing a practical and detailed guide for deploying a more sophisticated, adapter-based LLM deployment on Vertex AI.

Here's a little haiku to lighten the mood:

Code flows like a stream,
LoRA adapts, sharp and keen,
Vertex AI awaits.

code-review-assist · 2024-12-30T14:47:11Z

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

…erative-ai into pr/inardini/1586

open-models/README.md

holtskinner · 2025-01-02T15:45:41Z

open-models/serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb

+      },
+      "outputs": [],
+      "source": [
+        "handler_module = '''\n",


Would it be possible to move this to a separate .py file and upload it directly? This would allow proper syntax hi-lighting and linting.

I understand the formatting requirement. For learning purposes, don't you think it is better to have the code the code embedded in the notebook? This approach would allow learners to review the code alongside the explanations and possibly have a deeper understanding of the code's logic.

@holtskinner let me know what you think

inardini added 2 commits December 30, 2024 14:29

multilora notebook

549586e

nox passed

4c55883

inardini requested a review from a team as a code owner December 30, 2024 14:38

code-review-assist bot reviewed Dec 30, 2024

View reviewed changes

allow new terms

c9b27c6

inardini requested a review from gericdong December 30, 2024 14:44

code-review-assist bot reviewed Dec 30, 2024

View reviewed changes

holtskinner and others added 4 commits January 2, 2025 09:39

Formatting

af4b581

Merge branch 'main' into inardini--multilora

b5877b0

spelling

58d1a3e

Merge branch 'inardini--multilora' of https://github.com/inardini/gen…

724a8e2

…erative-ai into pr/inardini/1586

holtskinner requested changes Jan 2, 2025

View reviewed changes

holtskinner assigned inardini Jan 2, 2025

inardini added 2 commits January 6, 2025 08:16

holt's review

d0e1055

nox passed

812e11d

inardini removed the request for review from gericdong January 6, 2025 08:27

fix adapter loading

7eaad21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

inardini commented Dec 30, 2024

review-notebook-app bot commented Dec 30, 2024

code-review-assist bot left a comment

code-review-assist bot commented Dec 30, 2024

code-review-assist bot commented Dec 30, 2024

code-review-assist bot left a comment

code-review-assist bot commented Dec 30, 2024

holtskinner Jan 2, 2025

inardini Jan 6, 2025

inardini Jan 7, 2025

feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

Are you sure you want to change the base?

feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

Conversation

inardini commented Dec 30, 2024

Description

review-notebook-app bot commented Dec 30, 2024

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot commented Dec 30, 2024

code-review-assist bot commented Dec 30, 2024

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot commented Dec 30, 2024

holtskinner Jan 2, 2025

Choose a reason for hiding this comment

inardini Jan 6, 2025

Choose a reason for hiding this comment

inardini Jan 7, 2025

Choose a reason for hiding this comment