Skip to content

feat: use partition range cache in scan#7873

Open
evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
evenyag:pr/partition-range-cache-read-path
Open

feat: use partition range cache in scan#7873
evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
evenyag:pr/partition-range-cache-read-path

Conversation

@evenyag
Copy link
Copy Markdown
Contributor

@evenyag evenyag commented Mar 26, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Uses partition range cache in SeqScan and SeriesScan read paths.

Improves how the range cache buffers results.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

evenyag added 3 commits March 24, 2026 19:55
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Mar 26, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances read performance by introducing and integrating a partition range cache into the sequential and series scan operations. The changes focus on optimizing memory management during batch processing through an asynchronous compaction mechanism and a dedicated memory limiter, ensuring efficient caching of frequently accessed data without excessive resource usage.

Highlights

  • Memory Limiter for Range Cache: Introduced a RangeResultMemoryLimiter to control memory usage during the concatenation of record batches for the range result cache, preventing excessive memory consumption.
  • Asynchronous Batch Compaction: Implemented an asynchronous mechanism for compacting record batches within the range cache, allowing for efficient processing and storage of cached data by combining smaller batches into larger ones.
  • Integration with Scan Paths: Integrated the new partition range caching logic into both SeqScan (sequential scan) and SeriesScan read paths, enabling these operations to leverage cached results for improved performance.
  • Refined Cache Value Structure: Refactored RangeScanCacheValue to store CachedBatchSlice objects, which contain the compacted RecordBatch and original slice lengths, facilitating accurate replay of cached data.
  • Improved Series Scan Testing: Updated the test_series_scan_flat function in scan_test.rs to use a more robust and explicit method for verifying series scan results, enhancing test reliability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a RangeResultMemoryLimiter to manage memory for range scan results, integrating it into the CacheManager and CacheStrategy. The caching mechanism for range scan results has been significantly refactored to use asynchronous batch compaction and memory limiting, with the CacheBatchBuffer now handling these operations. The scanning logic in seq_scan.rs and series_scan.rs has been updated to leverage this new caching infrastructure, including a new build_flat_partition_range_read function. Additionally, a test in scan_test.rs was refactored for more robust result comparison. A review comment highlights a potential inconsistency in RangeResultMemoryLimiter where the permit_bytes value is not stored, leading to acquire() relying on a hardcoded constant instead of the initialized value.

evenyag added 3 commits March 26, 2026 21:58
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@github-actions github-actions bot added size/L and removed size/M labels Mar 27, 2026
@evenyag evenyag marked this pull request as ready for review March 30, 2026 03:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 018c0dd337

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@evenyag evenyag mentioned this pull request Mar 30, 2026
6 tasks
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag requested a review from discord9 March 30, 2026 07:27
Deadlock Chain

1. Range-level merge tasks: Each concurrent build_flat_partition_range_read (line 494-506) calls
build_flat_reader_from_sources → create_parallel_flat_sources → spawn_flat_scan_task. These
background tasks loop: acquire permit → input.next() → release permit.
2. Final merge tasks: After all range tasks return streams (line 509-511), the distributor calls
build_flat_reader_from_sources again (line 520-527) → create_parallel_flat_sources → more
spawn_flat_scan_task tasks. These also loop: acquire permit → input.next() → release permit.
3. Circular wait: The final merge tasks' input.next() reads from ReceiverStreams backed by
range-level merge tasks. If all num_partitions permits are held by final merge tasks blocked on
input.next(), the range-level merge tasks can't acquire permits to produce data → deadlock.

Signed-off-by: evenyag <realevenyag@gmail.com>
@discord9
Copy link
Copy Markdown
Contributor

discord9 commented Apr 1, 2026

would it be useful to add a test with multiple partition range(each with multiple sources) and with both permit being small like 1 or 2?

@v0y4g3r v0y4g3r added this pull request to the merge queue Apr 1, 2026
@v0y4g3r v0y4g3r removed this pull request from the merge queue due to a manual request Apr 1, 2026
Signed-off-by: evenyag <realevenyag@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants