Skip to content

Conversation

@ibhati
Copy link
Owner

@ibhati ibhati commented Aug 6, 2025

trial

@ibhati ibhati requested a review from Copilot August 6, 2025 20:18

This comment was marked as outdated.

@ibhati ibhati requested a review from Copilot August 6, 2025 21:07

This comment was marked as outdated.

@ibhati ibhati requested a review from Copilot August 6, 2025 21:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds experimental support for IVF (Inverted File) indexing to the SVS library, introducing a new vector search algorithm that organizes data using K-means clustering. The implementation includes support for various data types (float, float16, bfloat16) and distance metrics (L2, MIP).

  • Adds IVF index implementation with K-means and hierarchical K-means clustering
  • Introduces BFloat16 data type support throughout the codebase
  • Provides utilities for building, searching, and converting data for IVF indexes

Reviewed Changes

Copilot reviewed 58 out of 61 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
utils/search_ivf.cpp Utility for searching IVF indexes with benchmarking capabilities
utils/convert_data_to_bfloat16.cpp Utility to convert data to BFloat16 format
utils/build_ivf.cpp Utility for building IVF clustering from datasets
include/svs/lib/bfloat16.h New BFloat16 data type implementation
include/svs/orchestrators/ivf.h Main IVF orchestrator interface
include/svs/index/ivf/ Core IVF index implementation files
tests/ Comprehensive test coverage for IVF functionality
bindings/python/src/ivf.cpp Python bindings for IVF functionality

// Small epsilon value used for floating-point comparisons to avoid precision
// issues. The value 1/1024 (approximately 0.0009765625) is chosen as a reasonable
// threshold for numerical stability in algorithms such as k-means clustering, where exact
// equality of floating-point values is rare due to rounding errors.
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references an EPSILON constant but the constant is not defined in this file. Consider defining the EPSILON constant near this comment or include a reference to where it's defined.

Suggested change
// equality of floating-point values is rare due to rounding errors.
// equality of floating-point values is rare due to rounding errors.
constexpr float EPSILON = 1.0f / 1024.0f;

Copilot uses AI. Check for mistakes.
centroids.set_datum(i, centroids.get_datum(j));
for (size_t k = 0; k < dims; k++) {
if (k % 2 == 0) {
centroids.get_datum(i)[k] *= 1 + EPSILON;
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EPSILON is used here but not defined in this file. The constant should be defined with an appropriate value and documentation about why this specific value is chosen.

Copilot uses AI. Check for mistakes.

int svs_main(std::vector<std::string>&& args) {
if (args.size() != 14) {
std::cout << "Expected 13 arguments. Instead, got " << args.size() - 1 << ". "
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says 'Expected 13 arguments' but the condition checks for 14 arguments (args.size() != 14). This creates a confusing mismatch between the expected count in the message and the actual check.

Suggested change
std::cout << "Expected 13 arguments. Instead, got " << args.size() - 1 << ". "
std::cout << "Expected 13 user-supplied arguments (plus program name). Instead, got " << args.size() - 1 << ". "

Copilot uses AI. Check for mistakes.

int svs_main(std::vector<std::string> args) {
if (args.size() != 8) {
std::cout << "Expected 7 arguments. Instead, got " << args.size() - 1 << ". "
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says 'Expected 7 arguments' but the condition checks for 8 arguments (args.size() != 8). This creates a confusing mismatch between the expected count in the message and the actual check.

Suggested change
std::cout << "Expected 7 arguments. Instead, got " << args.size() - 1 << ". "
std::cout << "Expected 7 arguments. Instead, got " << (args.size() - 1) << ". "

Copilot uses AI. Check for mistakes.
// performance. This value was chosen based on empirical testing to avoid excessive memory
// allocation while supporting large batch operations typical in high-throughput
// environments.
const size_t MAX_QUERY_BATCH_SIZE = 10000;
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 10000 should be documented with a comment explaining why this specific value was chosen and what factors influence this decision (memory usage, performance characteristics, etc.).

Copilot uses AI. Check for mistakes.

// On GCC, we need to add this attribute so that BFloat16 members can appear inside
// packed structs.
class __attribute__((packed)) BFloat16 {
Copy link

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using a more portable approach for packed attributes. The GCC-specific attribute((packed)) may not work on all compilers. Consider using a portable macro or pragma pack for better cross-platform compatibility.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants