Reject non-finite persisted vector floats#37
Conversation
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
|
Potential performance concern: this PR adds finite-f32 validation while decoding persisted list payloads. For IVF-FLAT in particular, Could we either merge the finite check into the bytes-to-f32 decode loop to avoid the second pass, or validate/cache a list only once if the reader assumes the underlying index bytes are immutable? The one-time checks in |
@JingsongLi Thanks for pointing this out. I agree this validation was adding an avoidable second pass on the search path. I updated the checked f32 decoding helpers used by IVF-FLAT / IVF-HNSW-FLAT list payload decoding to validate each value while decoding bytes into I intentionally left the IVF-PQ |
|
I don't think we should pay regular costs for corrupted index files. Even though the current cost is not low, it is still a linear cost. |
|
I know we should have better validation, but I don't think we should overly pursue validation. The core should still focus on functional phenomena and problems, or performance improvement. |
Summary
User input validation already rejects NaN and Infinity, but corrupted or older index files could still persist non-finite f32 values in reader-side float sections. Those values could enter search ranking and hit partial_cmp().unwrap() panic paths, crossing JNI as a panic-boundary error instead of a normal malformed-index error.
This PR rejects non-finite persisted f32 values while decoding v1 reader sections for IVF-FLAT, IVF-HNSW-FLAT, IVF-PQ, and IVF-HNSW-SQ. It also hardens reader-side top-k ordering with total_cmp and keeps intentional JNI panic-boundary tests separate from malformed-index error coverage.
Testing