Reject non-finite persisted vector floats by QuakeWang · Pull Request #37 · apache/paimon-vector-index

QuakeWang · 2026-06-11T07:07:56Z

Summary

User input validation already rejects NaN and Infinity, but corrupted or older index files could still persist non-finite f32 values in reader-side float sections. Those values could enter search ranking and hit partial_cmp().unwrap() panic paths, crossing JNI as a panic-boundary error instead of a normal malformed-index error.

This PR rejects non-finite persisted f32 values while decoding v1 reader sections for IVF-FLAT, IVF-HNSW-FLAT, IVF-PQ, and IVF-HNSW-SQ. It also hardens reader-side top-k ordering with total_cmp and keeps intentional JNI panic-boundary tests separate from malformed-index error coverage.

Testing

cargo fmt --all -- --check
cargo build --all-targets
cargo test -p paimon-vindex-core
cargo clippy --all-targets --workspace -- -D warnings
cargo clippy -p paimon-vindex-jni --all-targets --features panic-boundary-test-hook -- -D warnings
mvn -f java/pom.xml test
cargo build -p paimon-vindex-jni --release --features panic-boundary-test-hook
java -cp java/target/classes:java/target/test-classes org.apache.paimon.index.vector.VectorIndexNativePanicBoundaryTest target/release/libpaimon_vindex_jni.dylib

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

JingsongLi · 2026-06-12T07:06:46Z

Potential performance concern: this PR adds finite-f32 validation while decoding persisted list payloads. For IVF-FLAT in particular, read_inverted_list() is on the search path and runs once per probed list (search) / unique probed list (search_batch). The new bytes_to_f32_vec_checked() first decodes the whole vector payload into Vec<f32> and then scans all values again with is_finite(), so each query adds an extra O(sum(count * d) over probed lists) pass over candidate vectors.

Could we either merge the finite check into the bytes-to-f32 decode loop to avoid the second pass, or validate/cache a list only once if the reader assumes the underlying index bytes are immutable? The one-time checks in ensure_loaded() look fine; the main concern is the repeated per-search list-vector validation.

QuakeWang · 2026-06-12T08:18:14Z

Potential performance concern: this PR adds finite-f32 validation while decoding persisted list payloads. For IVF-FLAT in particular, read_inverted_list() is on the search path and runs once per probed list (search) / unique probed list (search_batch). The new bytes_to_f32_vec_checked() first decodes the whole vector payload into Vec<f32> and then scans all values again with is_finite(), so each query adds an extra O(sum(count * d) over probed lists) pass over candidate vectors.

Could we either merge the finite check into the bytes-to-f32 decode loop to avoid the second pass, or validate/cache a list only once if the reader assumes the underlying index bytes are immutable? The one-time checks in ensure_loaded() look fine; the main concern is the repeated per-search list-vector validation.

@JingsongLi Thanks for pointing this out. I agree this validation was adding an avoidable second pass on the search path.

I updated the checked f32 decoding helpers used by IVF-FLAT / IVF-HNSW-FLAT list payload decoding to validate each value while decoding bytes into f32, so we no longer decode the full vector payload and scan it again.

I intentionally left the IVF-PQ ensure_loaded() checks unchanged because those are one-time metadata/codebook validations, not repeated per-search list-vector validation.

JingsongLi · 2026-06-12T11:32:35Z

I don't think we should pay regular costs for corrupted index files. Even though the current cost is not low, it is still a linear cost.

JingsongLi · 2026-06-12T11:36:54Z

I know we should have better validation, but I don't think we should overly pursue validation. The core should still focus on functional phenomena and problems, or performance improvement.

Reject non-finite persisted vector floats

b327c22

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

Avoid repeated finite validation pass while decoding floats

62c1b53

JingsongLi closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject non-finite persisted vector floats#37

Reject non-finite persisted vector floats#37
QuakeWang wants to merge 2 commits into
apache:mainfrom
QuakeWang:reject-nonfinite-index

QuakeWang commented Jun 11, 2026

Uh oh!

JingsongLi commented Jun 12, 2026

Uh oh!

QuakeWang commented Jun 12, 2026

Uh oh!

JingsongLi commented Jun 12, 2026 •

edited

Loading

Uh oh!

JingsongLi commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

QuakeWang commented Jun 11, 2026

Summary

Testing

Uh oh!

JingsongLi commented Jun 12, 2026

Uh oh!

QuakeWang commented Jun 12, 2026

Uh oh!

JingsongLi commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JingsongLi commented Jun 12, 2026 •

edited

Loading