-
Notifications
You must be signed in to change notification settings - Fork 152
Grouped top k operator #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Cursor Agent can help with this pull request. Just |
🦋 Changeset detectedLatest commit: 190a02b The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
More templates
@tanstack/angular-db
@tanstack/db
@tanstack/db-ivm
@tanstack/electric-db-collection
@tanstack/offline-transactions
@tanstack/powersync-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/rxdb-db-collection
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
|
Size Change: 0 B Total Size: 88.2 kB ℹ️ View Unchanged
|
|
Size Change: 0 B Total Size: 3.35 kB ℹ️ View Unchanged
|
This commit introduces a new operator that allows for topK operations to be performed independently on groups of data. Co-authored-by: kevin.de.porre <[email protected]>
Introduce TopKState<K, T> to encapsulate the common state and operations for managing a single topK window (multiplicity tracking and topK data structure). Both TopKWithFractionalIndexOperator (single instance) and GroupedTopKWithFractionalIndexOperator (one instance per group) now use this helper class, eliminating code duplication. Also extract handleMoveIn and handleMoveOut as standalone functions that can be reused by both operators.
fc2fc60 to
1b9d7ea
Compare
1b9d7ea to
88dbcf6
Compare
KyleAMathews
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: PR #993
Overall Assessment: Approve ✅
This is excellent foundational work that both refactors existing code AND adds new capability. Like Brigham Young organizing the Saints into companies with captains of tens and hundreds, this PR introduces a hierarchical grouping structure for topK operations while maintaining clean separation of concerns.
Summary
The PR introduces:
TopKArray- Extracted TopK data structure with fractional indexingTopKState- Encapsulated state management (multiplicity tracking + TopK)GroupedTopKWithFractionalIndexOperator- Per-group topK windows- Refactored original
topKWithFractionalIndexto use shared abstractions
Strengths
-
Clean Extraction & Reuse: The
TopKArrayandTopKStateclasses are cleanly extracted and reusable. -
Memory Management: Empty groups are cleaned up to prevent memory leaks via
#cleanupGroupIfEmpty. -
Extensible Design: The
createTopKmethod is protected and can be overridden for different implementations (e.g., B+ tree). -
Stable Ordering: The
createKeyedComparatorprovides deterministic ordering by using the row key as a tie-breaker. -
Comprehensive Test Suite: Tests cover all major scenarios including incremental updates, removals, multiple groups, offset support, and window movement.
Minor Suggestions
-
topKArray.ts:183 - The TODO comment about refactoring insert/delete is a good observation. The window-shifting logic could potentially be unified.
-
topKState.ts:61-65 - The error message for B+-tree move could be more informative about what the caller should do instead.
-
groupedTopKWithFractionalIndex.ts:107 - The TODO comment should be resolved or turned into an issue.
Question
For the grouped operator, when all items in a group are removed and then items are added back to that same group, does the fractional indexing start fresh? This seems correct (new group = new TopKArray = fresh fractional indices), but worth confirming.
Solid, well-organized code that enables important hierarchical data projection capabilities! 🏛️
Followup on #738.
Implements the same strategy as in that PR to support grouped topK but as a new grouped topK operator. The logic regarding an individual topK is extracted into a common
TopKStateclass that is reused by both the original topK operator and the grouped topK operator.