feat: use partition range cache in scan#7873
feat: use partition range cache in scan#7873evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
Conversation
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances read performance by introducing and integrating a partition range cache into the sequential and series scan operations. The changes focus on optimizing memory management during batch processing through an asynchronous compaction mechanism and a dedicated memory limiter, ensuring efficient caching of frequently accessed data without excessive resource usage. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a RangeResultMemoryLimiter to manage memory for range scan results, integrating it into the CacheManager and CacheStrategy. The caching mechanism for range scan results has been significantly refactored to use asynchronous batch compaction and memory limiting, with the CacheBatchBuffer now handling these operations. The scanning logic in seq_scan.rs and series_scan.rs has been updated to leverage this new caching infrastructure, including a new build_flat_partition_range_read function. Additionally, a test in scan_test.rs was refactored for more robust result comparison. A review comment highlights a potential inconsistency in RangeResultMemoryLimiter where the permit_bytes value is not stored, leading to acquire() relying on a hardcoded constant instead of the initialized value.
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 018c0dd337
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: evenyag <realevenyag@gmail.com>
Deadlock Chain 1. Range-level merge tasks: Each concurrent build_flat_partition_range_read (line 494-506) calls build_flat_reader_from_sources → create_parallel_flat_sources → spawn_flat_scan_task. These background tasks loop: acquire permit → input.next() → release permit. 2. Final merge tasks: After all range tasks return streams (line 509-511), the distributor calls build_flat_reader_from_sources again (line 520-527) → create_parallel_flat_sources → more spawn_flat_scan_task tasks. These also loop: acquire permit → input.next() → release permit. 3. Circular wait: The final merge tasks' input.next() reads from ReceiverStreams backed by range-level merge tasks. If all num_partitions permits are held by final merge tasks blocked on input.next(), the range-level merge tasks can't acquire permits to produce data → deadlock. Signed-off-by: evenyag <realevenyag@gmail.com>
|
would it be useful to add a test with multiple partition range(each with multiple sources) and with both permit being small like 1 or 2? |
Signed-off-by: evenyag <realevenyag@gmail.com>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Uses partition range cache in SeqScan and SeriesScan read paths.
Improves how the range cache buffers results.
PR Checklist
Please convert it to a draft if some of the following conditions are not met.