Feature/fix opensearch vector mapping #2399
Merged
+37
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR also includes the changes proposed in 2376 -- I'm closing that one in favor of this (Mauricio)
The current OpenSearch configuration requires username and password for authentication, which is not compliant with security policies in many enterprises that enforce AWS IAM-based authentication (e.g., AWS SigV4 via SAML or IAM roles).
This feature request proposes adding support for AWS authentication methods such as AWSV4SignerAuth or AWS4Auth, which are already supported by the opensearch-py library. This would enable seamless authentication via AWS IAM roles, improving security, compliance, and ease of integration with AWS-hosted OpenSearch domains.
Fixes # #2375
Issue
The current implementation of the OpenSearch integration had a critical limitation with vector search filtering. The
create_index()
method was creating OpenSearch indices without explicitly specifying a vector engine, which caused OpenSearch to default to usingnmslib
as the vector engine. This default engine doesn't support query filters during search operations - it only allows post-query filtering, which is less efficient.When filters are applied with
nmslib
, the system:This approach is inefficient because it processes and ranks potentially irrelevant vectors that will later be filtered out.
Solution
This PR updates the
create_index()
method to explicitly configure the Lucene engine with HNSW algorithm for vector search, matching the configuration already present in thecreate_col()
method:Benefits
With this change, the OpenSearch integration now:
create_index()
andcreate_col()
methods, both using the same vector engineTechnical Details
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Please delete options that are not relevant.
Checklist:
Maintainer Checklist