feat: use partition range cache in scan by evenyag · Pull Request #7873 · GreptimeTeam/greptimedb

evenyag · 2026-03-26T13:35:42Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Uses partition range cache in SeqScan and SeriesScan read paths.

Improves how the range cache buffers results.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

Signed-off-by: evenyag <realevenyag@gmail.com>

gemini-code-assist · 2026-03-26T13:36:08Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances read performance by introducing and integrating a partition range cache into the sequential and series scan operations. The changes focus on optimizing memory management during batch processing through an asynchronous compaction mechanism and a dedicated memory limiter, ensuring efficient caching of frequently accessed data without excessive resource usage.

Highlights

Memory Limiter for Range Cache: Introduced a RangeResultMemoryLimiter to control memory usage during the concatenation of record batches for the range result cache, preventing excessive memory consumption.
Asynchronous Batch Compaction: Implemented an asynchronous mechanism for compacting record batches within the range cache, allowing for efficient processing and storage of cached data by combining smaller batches into larger ones.
Integration with Scan Paths: Integrated the new partition range caching logic into both SeqScan (sequential scan) and SeriesScan read paths, enabling these operations to leverage cached results for improved performance.
Refined Cache Value Structure: Refactored RangeScanCacheValue to store CachedBatchSlice objects, which contain the compacted RecordBatch and original slice lengths, facilitating accurate replay of cached data.
Improved Series Scan Testing: Updated the test_series_scan_flat function in scan_test.rs to use a more robust and explicit method for verifying series scan results, enhancing test reliability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a RangeResultMemoryLimiter to manage memory for range scan results, integrating it into the CacheManager and CacheStrategy. The caching mechanism for range scan results has been significantly refactored to use asynchronous batch compaction and memory limiting, with the CacheBatchBuffer now handling these operations. The scanning logic in seq_scan.rs and series_scan.rs has been updated to leverage this new caching infrastructure, including a new build_flat_partition_range_read function. Additionally, a test in scan_test.rs was refactored for more robust result comparison. A review comment highlights a potential inconsistency in RangeResultMemoryLimiter where the permit_bytes value is not stored, leading to acquire() relying on a hardcoded constant instead of the initialized value.

src/mito2/src/cache.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 018c0dd337

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/mito2/src/read/seq_scan.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

src/mito2/src/read/range_cache.rs

Deadlock Chain 1. Range-level merge tasks: Each concurrent build_flat_partition_range_read (line 494-506) calls build_flat_reader_from_sources → create_parallel_flat_sources → spawn_flat_scan_task. These background tasks loop: acquire permit → input.next() → release permit. 2. Final merge tasks: After all range tasks return streams (line 509-511), the distributor calls build_flat_reader_from_sources again (line 520-527) → create_parallel_flat_sources → more spawn_flat_scan_task tasks. These also loop: acquire permit → input.next() → release permit. 3. Circular wait: The final merge tasks' input.next() reads from ReceiverStreams backed by range-level merge tasks. If all num_partitions permits are held by final merge tasks blocked on input.next(), the range-level merge tasks can't acquire permits to produce data → deadlock. Signed-off-by: evenyag <realevenyag@gmail.com>

discord9 · 2026-04-01T03:36:17Z

would it be useful to add a test with multiple partition range(each with multiple sources) and with both permit being small like 1 or 2?

Signed-off-by: evenyag <realevenyag@gmail.com>

evenyag added 3 commits March 24, 2026 19:55

feat: use range cache in scan

63e87a4

Signed-off-by: evenyag <realevenyag@gmail.com>

refactor: rename dedup to skip_dedup

17bc3fa

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: use background concat for buffered batches

92d1fb1

Signed-off-by: evenyag <realevenyag@gmail.com>

github-actions bot added size/M docs-not-required This change does not impact docs. labels Mar 26, 2026

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

src/mito2/src/cache.rs Show resolved Hide resolved

evenyag added 3 commits March 26, 2026 21:58

chore: fmt

e4d19ee

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: store permits

f594e91

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: fix potential panic

018c0dd

Signed-off-by: evenyag <realevenyag@gmail.com>

github-actions bot added size/L and removed size/M labels Mar 27, 2026

evenyag marked this pull request as ready for review March 30, 2026 03:01

evenyag requested review from v0y4g3r and waynexia as code owners March 30, 2026 03:01

chatgpt-codex-connector bot reviewed Mar 30, 2026

View reviewed changes

src/mito2/src/read/seq_scan.rs Show resolved Hide resolved

evenyag mentioned this pull request Mar 30, 2026

Release v1.0.0 #7883

Open

6 tasks

fix: skip range-cache wrapping when cache is disabled

33d9b23

Signed-off-by: evenyag <realevenyag@gmail.com>

evenyag requested a review from discord9 March 30, 2026 07:27

discord9 reviewed Mar 30, 2026

View reviewed changes

src/mito2/src/read/range_cache.rs Show resolved Hide resolved

discord9 approved these changes Apr 1, 2026

View reviewed changes

v0y4g3r approved these changes Apr 1, 2026

View reviewed changes

v0y4g3r added this pull request to the merge queue Apr 1, 2026

v0y4g3r removed this pull request from the merge queue due to a manual request Apr 1, 2026

test: add test for small permits

73067c0

Signed-off-by: evenyag <realevenyag@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use partition range cache in scan#7873

feat: use partition range cache in scan#7873
evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
evenyag:pr/partition-range-cache-read-path

evenyag commented Mar 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

discord9 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

evenyag commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

discord9 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

evenyag commented Mar 26, 2026 •

edited

Loading