Skip to content

[python] Support native batch vector search in Lumina reader#8280

Draft
XiaoHongbo-Hope wants to merge 2 commits into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix
Draft

[python] Support native batch vector search in Lumina reader#8280
XiaoHongbo-Hope wants to merge 2 commits into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix

Conversation

@XiaoHongbo-Hope

Copy link
Copy Markdown
Contributor

Purpose

Tests

XiaoHongbo-Hope and others added 2 commits June 18, 2026 03:43
Batch vector search in pypaimon looped single searches (one search_list
call per query vector). The Lumina native binding already exposes a batch
entry: search_list(flat, n, k) / search_with_filter_list, the same native
call the Java path uses. Route batch search through it.

- Add BatchVectorSearch predicate (mirrors Java BatchVectorSearch).
- GlobalIndexReader: default visit_batch_vector_search fans out to single
  search for readers without a native batch path.
- OffsetGlobalIndexReader: offset wrapper for batch.
- Lumina reader: native visit_batch_vector_search flattens the n query
  vectors into one (n * dim) buffer, calls search_list/search_with_filter_list
  once, and slices each query's results from [q * k, q * k + k).
- read_batch: one batch call per split, merged per query vector across
  splits (mirrors Java BatchVectorReadImpl).
- Tests: cover the default fan-out and the native batch path; add a real
  Lumina batch-vs-single equivalence regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
visit_vector_search and visit_batch_vector_search shared an almost
identical body (flatten + dim check, effective_k clamping, filter vs
non-filter native call, sentinel result collection). Extract a single
_run_search(vectors, ...) that both delegate to -- single search is just
the n == 1 case -- removing the duplicated logic and the n == 1 special
case so the two paths can no longer drift.

Also return [None] * n from _eval_batch on empty index files so the
per-split batch result type is uniform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant