-
Notifications
You must be signed in to change notification settings - Fork 514
feat: add Google embedding integration #1304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Google embedding integration to the NeMo Guardrails project by implementing a new GoogleEmbeddingModel provider that uses the langchain-google-genai library.
- Implements GoogleEmbeddingModel class with sync/async embedding capabilities
- Adds comprehensive test suite for Google embeddings functionality
- Updates documentation to include Google as a supported embedding provider
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
nemoguardrails/embeddings/providers/google.py | Main implementation of GoogleEmbeddingModel class with encoding methods |
nemoguardrails/embeddings/providers/init.py | Registers the Google embedding provider in the system |
tests/test_embeddings_google.py | Comprehensive test suite including sync/async tests and live integration tests |
tests/test_configs/with_google_embeddings/config.yml | Test configuration for Google embeddings |
tests/test_configs/with_google_embeddings/config.co | Test flow configuration |
docs/user-guides/configuration-guide.md | Documentation update adding Google to supported providers table |
Comments suppressed due to low confidence (1)
tests/test_embeddings_google.py:70
- This function has the same name as the async function on line 52 but different signature. Consider renaming to 'test_sync_live_query' to differentiate from the async version.
def test_live_query(app):
self.embedding_size = self.embedding_size_dict[self.model] | ||
else: | ||
# Perform a first encoding to get the embedding size | ||
self.embedding_size = len(self.encode(["test"])[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making an actual API call during initialization to determine embedding size could cause unnecessary latency and API costs. Consider using a placeholder or lazy initialization approach.
self.embedding_size = self.embedding_size_dict[self.model] | |
else: | |
# Perform a first encoding to get the embedding size | |
self.embedding_size = len(self.encode(["test"])[0]) | |
self._embedding_size = self.embedding_size_dict[self.model] | |
else: | |
# Defer embedding size determination until it is accessed | |
self._embedding_size = None |
Copilot uses AI. Check for mistakes.
self.embedding_size_dict = { | ||
"gemini-embedding-001": 3072, | ||
"text-embedding-005": 768, | ||
"text-multilingual-embedding-002": 768, | ||
} | ||
|
||
if self.model in self.embedding_size_dict: | ||
self.embedding_size = self.embedding_size_dict[self.model] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The embedding size dictionary is hardcoded in the constructor. Consider moving this to a class-level constant or configuration file to improve maintainability when new models are added.
self.embedding_size_dict = { | |
"gemini-embedding-001": 3072, | |
"text-embedding-005": 768, | |
"text-multilingual-embedding-002": 768, | |
} | |
if self.model in self.embedding_size_dict: | |
self.embedding_size = self.embedding_size_dict[self.model] | |
# Mapping of embedding models to their respective sizes. | |
embedding_size_dict = { | |
"gemini-embedding-001": 3072, | |
"text-embedding-005": 768, | |
"text-multilingual-embedding-002": 768, | |
} | |
if self.model in self.__class__.embedding_size_dict: | |
self.embedding_size = self.__class__.embedding_size_dict[self.model] |
Copilot uses AI. Check for mistakes.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1304 +/- ##
===========================================
- Coverage 70.45% 70.40% -0.05%
===========================================
Files 161 162 +1
Lines 16214 16235 +21
===========================================
+ Hits 11423 11431 +8
- Misses 4791 4804 +13
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Description
Add Google Embedding provider
@Pouyanpi,
I used langchain-google-genai.
However, if you're worried about the ‘langchain dependency’, I can change the method to directly use
from google import genai
.Related Issue(s)
Fixes #1292
Checklist