Skip to content

feat: add Google embedding integration #1304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

bwook00
Copy link
Contributor

@bwook00 bwook00 commented Jul 24, 2025

Description

Add Google Embedding provider

@Pouyanpi,

I used langchain-google-genai.

However, if you're worried about the ‘langchain dependency’, I can change the method to directly use from google import genai.

Related Issue(s)

Fixes #1292

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

Copy link

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1304

@Pouyanpi Pouyanpi requested review from Copilot and Pouyanpi July 29, 2025 12:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Google embedding integration to the NeMo Guardrails project by implementing a new GoogleEmbeddingModel provider that uses the langchain-google-genai library.

  • Implements GoogleEmbeddingModel class with sync/async embedding capabilities
  • Adds comprehensive test suite for Google embeddings functionality
  • Updates documentation to include Google as a supported embedding provider

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
nemoguardrails/embeddings/providers/google.py Main implementation of GoogleEmbeddingModel class with encoding methods
nemoguardrails/embeddings/providers/init.py Registers the Google embedding provider in the system
tests/test_embeddings_google.py Comprehensive test suite including sync/async tests and live integration tests
tests/test_configs/with_google_embeddings/config.yml Test configuration for Google embeddings
tests/test_configs/with_google_embeddings/config.co Test flow configuration
docs/user-guides/configuration-guide.md Documentation update adding Google to supported providers table
Comments suppressed due to low confidence (1)

tests/test_embeddings_google.py:70

  • This function has the same name as the async function on line 52 but different signature. Consider renaming to 'test_sync_live_query' to differentiate from the async version.
def test_live_query(app):

Comment on lines +64 to +67
self.embedding_size = self.embedding_size_dict[self.model]
else:
# Perform a first encoding to get the embedding size
self.embedding_size = len(self.encode(["test"])[0])
Copy link
Preview

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making an actual API call during initialization to determine embedding size could cause unnecessary latency and API costs. Consider using a placeholder or lazy initialization approach.

Suggested change
self.embedding_size = self.embedding_size_dict[self.model]
else:
# Perform a first encoding to get the embedding size
self.embedding_size = len(self.encode(["test"])[0])
self._embedding_size = self.embedding_size_dict[self.model]
else:
# Defer embedding size determination until it is accessed
self._embedding_size = None

Copilot uses AI. Check for mistakes.

Comment on lines +57 to +64
self.embedding_size_dict = {
"gemini-embedding-001": 3072,
"text-embedding-005": 768,
"text-multilingual-embedding-002": 768,
}

if self.model in self.embedding_size_dict:
self.embedding_size = self.embedding_size_dict[self.model]
Copy link
Preview

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The embedding size dictionary is hardcoded in the constructor. Consider moving this to a class-level constant or configuration file to improve maintainability when new models are added.

Suggested change
self.embedding_size_dict = {
"gemini-embedding-001": 3072,
"text-embedding-005": 768,
"text-multilingual-embedding-002": 768,
}
if self.model in self.embedding_size_dict:
self.embedding_size = self.embedding_size_dict[self.model]
# Mapping of embedding models to their respective sizes.
embedding_size_dict = {
"gemini-embedding-001": 3072,
"text-embedding-005": 768,
"text-multilingual-embedding-002": 768,
}
if self.model in self.__class__.embedding_size_dict:
self.embedding_size = self.__class__.embedding_size_dict[self.model]

Copilot uses AI. Check for mistakes.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 40.90909% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.40%. Comparing base (bee719b) to head (4e0cd50).

Files with missing lines Patch % Lines
nemoguardrails/embeddings/providers/google.py 35.00% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1304      +/-   ##
===========================================
- Coverage    70.45%   70.40%   -0.05%     
===========================================
  Files          161      162       +1     
  Lines        16214    16235      +21     
===========================================
+ Hits         11423    11431       +8     
- Misses        4791     4804      +13     
Flag Coverage Δ
python 70.40% <40.90%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/embeddings/providers/__init__.py 96.42% <100.00%> (+0.13%) ⬆️
nemoguardrails/embeddings/providers/google.py 35.00% <35.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: Add Google embedding provider
2 participants