[python] Support native batch vector search in Lumina reader#8280
Draft
XiaoHongbo-Hope wants to merge 2 commits into
Draft
[python] Support native batch vector search in Lumina reader#8280XiaoHongbo-Hope wants to merge 2 commits into
XiaoHongbo-Hope wants to merge 2 commits into
Conversation
Batch vector search in pypaimon looped single searches (one search_list call per query vector). The Lumina native binding already exposes a batch entry: search_list(flat, n, k) / search_with_filter_list, the same native call the Java path uses. Route batch search through it. - Add BatchVectorSearch predicate (mirrors Java BatchVectorSearch). - GlobalIndexReader: default visit_batch_vector_search fans out to single search for readers without a native batch path. - OffsetGlobalIndexReader: offset wrapper for batch. - Lumina reader: native visit_batch_vector_search flattens the n query vectors into one (n * dim) buffer, calls search_list/search_with_filter_list once, and slices each query's results from [q * k, q * k + k). - read_batch: one batch call per split, merged per query vector across splits (mirrors Java BatchVectorReadImpl). - Tests: cover the default fan-out and the native batch path; add a real Lumina batch-vs-single equivalence regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
visit_vector_search and visit_batch_vector_search shared an almost identical body (flatten + dim check, effective_k clamping, filter vs non-filter native call, sentinel result collection). Extract a single _run_search(vectors, ...) that both delegate to -- single search is just the n == 1 case -- removing the duplicated logic and the n == 1 special case so the two paths can no longer drift. Also return [None] * n from _eval_batch on empty index files so the per-split batch result type is uniform.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Tests