-
Notifications
You must be signed in to change notification settings - Fork 0
ivf #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
standard 1 level clustering or faster two level hierarchical clustering. Clustering algorithm uses AMX on supported Intel (R) Xeon (R) systems. Also, added support for BF16 datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds experimental support for IVF (Inverted File) indexing to the SVS library, introducing a new vector search algorithm that organizes data using K-means clustering. The implementation includes support for various data types (float, float16, bfloat16) and distance metrics (L2, MIP).
- Adds IVF index implementation with K-means and hierarchical K-means clustering
- Introduces BFloat16 data type support throughout the codebase
- Provides utilities for building, searching, and converting data for IVF indexes
Reviewed Changes
Copilot reviewed 58 out of 61 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
utils/search_ivf.cpp |
Utility for searching IVF indexes with benchmarking capabilities |
utils/convert_data_to_bfloat16.cpp |
Utility to convert data to BFloat16 format |
utils/build_ivf.cpp |
Utility for building IVF clustering from datasets |
include/svs/lib/bfloat16.h |
New BFloat16 data type implementation |
include/svs/orchestrators/ivf.h |
Main IVF orchestrator interface |
include/svs/index/ivf/ |
Core IVF index implementation files |
tests/ |
Comprehensive test coverage for IVF functionality |
bindings/python/src/ivf.cpp |
Python bindings for IVF functionality |
| // Small epsilon value used for floating-point comparisons to avoid precision | ||
| // issues. The value 1/1024 (approximately 0.0009765625) is chosen as a reasonable | ||
| // threshold for numerical stability in algorithms such as k-means clustering, where exact | ||
| // equality of floating-point values is rare due to rounding errors. |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment references an EPSILON constant but the constant is not defined in this file. Consider defining the EPSILON constant near this comment or include a reference to where it's defined.
| // equality of floating-point values is rare due to rounding errors. | |
| // equality of floating-point values is rare due to rounding errors. | |
| constexpr float EPSILON = 1.0f / 1024.0f; |
| centroids.set_datum(i, centroids.get_datum(j)); | ||
| for (size_t k = 0; k < dims; k++) { | ||
| if (k % 2 == 0) { | ||
| centroids.get_datum(i)[k] *= 1 + EPSILON; |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EPSILON is used here but not defined in this file. The constant should be defined with an appropriate value and documentation about why this specific value is chosen.
|
|
||
| int svs_main(std::vector<std::string>&& args) { | ||
| if (args.size() != 14) { | ||
| std::cout << "Expected 13 arguments. Instead, got " << args.size() - 1 << ". " |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message says 'Expected 13 arguments' but the condition checks for 14 arguments (args.size() != 14). This creates a confusing mismatch between the expected count in the message and the actual check.
| std::cout << "Expected 13 arguments. Instead, got " << args.size() - 1 << ". " | |
| std::cout << "Expected 13 user-supplied arguments (plus program name). Instead, got " << args.size() - 1 << ". " |
|
|
||
| int svs_main(std::vector<std::string> args) { | ||
| if (args.size() != 8) { | ||
| std::cout << "Expected 7 arguments. Instead, got " << args.size() - 1 << ". " |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message says 'Expected 7 arguments' but the condition checks for 8 arguments (args.size() != 8). This creates a confusing mismatch between the expected count in the message and the actual check.
| std::cout << "Expected 7 arguments. Instead, got " << args.size() - 1 << ". " | |
| std::cout << "Expected 7 arguments. Instead, got " << (args.size() - 1) << ". " |
| // performance. This value was chosen based on empirical testing to avoid excessive memory | ||
| // allocation while supporting large batch operations typical in high-throughput | ||
| // environments. | ||
| const size_t MAX_QUERY_BATCH_SIZE = 10000; |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number 10000 should be documented with a comment explaining why this specific value was chosen and what factors influence this decision (memory usage, performance characteristics, etc.).
|
|
||
| // On GCC, we need to add this attribute so that BFloat16 members can appear inside | ||
| // packed structs. | ||
| class __attribute__((packed)) BFloat16 { |
Copilot
AI
Aug 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using a more portable approach for packed attributes. The GCC-specific attribute((packed)) may not work on all compilers. Consider using a portable macro or pragma pack for better cross-platform compatibility.
trial