Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Serving Gemma 2 with multiple LoRA adapters with Text Generation Inference (TGI) on Vertex AI notebook #1586

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

inardini
Copy link
Contributor

Description

This notebook showcases how to deploy Gemma 2 2B from the Hugging Face Hub with multiple LoRA adapters fine-tuned for different purposes such as coding, or SQL using HuggingFace's Text Generation Inference (TGI) Deep Learning Container (DLC) in combination with a custom handler on Vertex AI.

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

@inardini inardini requested a review from a team as a code owner December 30, 2024 14:38
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @inardini and reviewers,

I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary of the changes for you and other reviewers to get up to speed:

This pull request adds a new notebook demonstrating how to deploy the Gemma 2 2B model from Hugging Face Hub to Vertex AI, using multiple LoRA adapters for different tasks (like coding and SQL). It leverages Hugging Face's Text Generation Inference (TGI) Deep Learning Container (DLC) along with a custom handler for enhanced flexibility.

Here's a breakdown of the changes:

  • open-models/README.md: A new entry was added (lines 9-12) to the README file, pointing to the new vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb notebook.
  • open-models/serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb: This is a completely new Jupyter Notebook (1569 lines of code) that details the entire process, from setting up the environment and authentication to deploying the model and making predictions using different LoRA adapters. The notebook covers:
    • Setting up the environment and authentication (Hugging Face and Google Cloud).
    • Creating and testing a custom handler for managing multiple LoRA adapters.
    • Downloading the base Gemma 2 model and LoRA adapters from Hugging Face Hub.
    • Implementing an LLM-based router to select the appropriate adapter based on the user prompt.
    • Registering the model on Vertex AI.
    • Deploying the model to a Vertex AI endpoint.
    • Making online predictions via the Vertex AI SDK, gcloud CLI, and curl.
    • Cleaning up resources.

The overall intent is to provide a comprehensive and practical guide for deploying Gemma 2 with LoRA adapters on Vertex AI using the TGI DLC and a custom handler. This allows users to easily leverage the power of fine-tuned models for various tasks within a production-ready environment.


And now, a little haiku to lighten the mood:

Code flows like a stream,
LoRA adapts, sharp and keen,
Vertex AI serves well.

Copy link
Contributor

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

Copy link
Contributor

Gemini encountered an error creating the summary. You can try again by commenting @code-review-assist summarize.

@inardini inardini requested a review from gericdong December 30, 2024 14:44
Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary of the changes for you and other reviewers:

This pull request adds a new Jupyter notebook demonstrating how to deploy the Gemma 2 2B model from Hugging Face, along with multiple LoRA adapters, using Hugging Face's Text Generation Inference (TGI) on Google Vertex AI. The key improvements are:

  • New Notebook: open-models/serving/vertex_ai_tgi_gemma_multi_lora_adapters_deployment.ipynb has been added. This notebook provides a comprehensive guide and code examples for deploying Gemma 2 with LoRA adapters for tasks like coding and SQL query generation.
  • Custom Handler: The notebook implements a custom handler to manage the pre-processing, inference, and post-processing steps for efficient model serving on Vertex AI. This allows for flexible handling of multiple LoRA adapters based on the user's prompt.
  • README Update: The open-models/README.md file has been updated to include a link to the new notebook, improving discoverability.
  • Spelling Updates: The .github/actions/spelling/allow.txt file has been updated to include some additional words, likely to address false positives in the spelling checks.

The notebook covers the entire process, from setting up the environment and authentication to deploying the model and making online predictions using both the Vertex AI SDK and the gcloud CLI. It also includes a section on cleaning up resources after the deployment.

Overall, this PR significantly enhances the existing examples by providing a practical and detailed guide for deploying a more sophisticated, adapter-based LLM deployment on Vertex AI.


Here's a little haiku to lighten the mood:

Code flows like a stream,
LoRA adapts, sharp and keen,
Vertex AI awaits.

Copy link
Contributor

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

open-models/README.md Outdated Show resolved Hide resolved
},
"outputs": [],
"source": [
"handler_module = '''\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move this to a separate .py file and upload it directly? This would allow proper syntax hi-lighting and linting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the formatting requirement. For learning purposes, don't you think it is better to have the code the code embedded in the notebook? This approach would allow learners to review the code alongside the explanations and possibly have a deeper understanding of the code's logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holtskinner let me know what you think

@inardini inardini removed the request for review from gericdong January 6, 2025 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants