feat: flow inc query by discord9 · Pull Request #7821 · GreptimeTeam/greptimedb

discord9 · 2026-03-17T06:24:56Z

I hereby agree to the terms of the GreptimeDB CLA.

Summary

This PR adds a proof-of-concept for Flow incremental queries using a changed-rows-only incremental shape.

Instead of recomputing the full result from the full source input on every round, the POC aggregates only the delta window and joins it with sink state to emit only the rows that need to be updated in the sink.

Why

The goal is to validate whether incremental queries can materially reduce Flow query cost before we invest further in end-to-end scheduling and batching work.

Benchmark Result

The strongest result comes from a workload shaped like src >> sink >> delta:

small_delta_src_ultra_sink_mid
- full recompute: 707-716 ms
- changed-rows-only incremental (delta-only left join update): 124-126 ms

This is roughly a 5.7x improvement in wall-clock latency.

Profiling

For this workload, full recompute is dominated by scanning and aggregating the full source table, while the incremental path shifts the main cost to sink-side scan and join work.

That means the POC already demonstrates a meaningful win, and the next optimization target is clear: reduce sink-side read cost further.

Caveats

This is still a POC, not the final production shape.
The gain is workload-dependent; the positive region is currently clearest when src >> sink >> delta.
The current benchmark focuses on query-shape validation, not full end-to-end Flow scheduling behavior.

Next Step

Continue optimizing the changed-rows-only incremental path, especially sink-side scan/repartition cost.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

Signed-off-by: discord9 <discord9@163.com>

gemini-code-assist · 2026-03-17T06:26:16Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces several enhancements to the flow query engine, focusing on improving the reliability and accuracy of incremental queries. It includes mechanisms for detecting and handling stale cursor errors, tracking region watermarks, and extending the query context with flow-specific configurations. These changes contribute to a more robust and efficient flow query processing pipeline.

Highlights

Stale Cursor Handling: Introduces mechanisms to detect and handle stale cursor errors in flow queries, ensuring more robust incremental query processing.
Region Watermark Tracking: Adds functionality to track region watermarks, enabling more accurate incremental queries and data consistency.
Query Context Extensions: Enhances query context with extensions for flow control, including incremental mode and region sequence tracking.

Activity

discord9 authored and implemented the changes.
The PR is currently awaiting review and further activity.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for incremental queries in flows. The changes are extensive, touching many parts of the codebase from the client and server communication layers to the storage and query engines. Key changes include:

Modifying the flight protocol handling in the client to process metrics messages that may follow record batches.
Extending QueryContext and related structs to carry sequence numbers for snapshot and incremental reads.
Implementing a RegionWatermarkStream in the datanode to attach the latest region sequence number (watermark) to the query result stream.
Adding logic in the flow task runner to detect and handle "stale cursor" errors from incremental queries by falling back to a full re-computation.
Introducing checks in the storage engine (mito2) to detect stale incremental queries and return a specific error.
Adding comprehensive integration tests to validate the new incremental query and error handling logic.

My feedback focuses on improving error handling in the client and enhancing code documentation for better maintainability. The overall implementation appears solid and well-thought-out.

gemini-code-assist · 2026-03-17T06:29:30Z

src/client/src/database.rs

+                                            let m: Option<Arc<RecordBatchMetrics>> =
+                                                serde_json::from_str(&s).ok().map(Arc::new);
+                                            metrics_ref.swap(m);


The use of .ok() silently discards any JSON deserialization errors. This could hide potential issues with metrics serialization from the server, making debugging more difficult. It would be better to log a warning when deserialization fails. This also applies to the similar code block at line 494.

Suggested change

let m: Option<Arc<RecordBatchMetrics>> =

serde_json::from_str(&s).ok().map(Arc::new);

metrics_ref.swap(m);

let m = match serde_json::from_str(&s) {

Ok(metrics) => Some(Arc::new(metrics)),

Err(e) => {

warn!("Failed to deserialize RecordBatchMetrics from flight message: {}", e);

None

}

};

metrics_ref.swap(m);

gemini-code-assist · 2026-03-17T06:29:30Z

src/common/meta/src/rpc/ddl.rs

 pub struct FlowQueryContext {
-    /// Current catalog name - needed for flow metadata and recovery
    pub catalog: String,
-    /// Current schema name - needed for table resolution during flow execution
    pub schema: String,
-    /// Timezone for timestamp operations in the flow
    pub timezone: String,
+    #[serde(default)]
+    pub extensions: HashMap<String, String>,
+    #[serde(default)]
+    pub channel: u8,
+    #[serde(default)]
+    pub snapshot_seqs: HashMap<u64, u64>,
+    #[serde(default)]
+    pub sst_min_sequences: HashMap<u64, u64>,


The doc comments for catalog, schema, and timezone fields in FlowQueryContext were removed. These comments were useful for understanding the purpose of each field. It would be great to restore them and also add doc comments for the newly added fields (extensions, channel, snapshot_seqs, sst_min_sequences) to improve code clarity and maintainability.

gemini-code-assist · 2026-03-17T06:29:30Z

src/flow/src/batching_mode/frontend_client.rs

 }

 impl FrontendClient {
+    /// TODO(discord9): better way to detect stale cursor error instead of parsing the error message


Parsing error messages is brittle and can easily break if the error message format changes in the future. It's good that this is marked with a TODO, but this should be prioritized to be replaced with a more robust mechanism, like structured error propagation, to ensure long-term stability.

Signed-off-by: discord9 <discord9@163.com>

discord9 added 6 commits March 16, 2026 18:07

feat: add fields for inc query in query ctx

e6eb272

Signed-off-by: discord9 <discord9@163.com>

feat: QueryContext to ScanRequest

d841760

Signed-off-by: discord9 <discord9@163.com>

feat: stale inc error

e5bc572

Signed-off-by: discord9 <discord9@163.com>

refactor: better status code

8541e0d

Signed-off-by: discord9 <discord9@163.com>

feat: dn2fe2client seq in metrics

da6eb85

Signed-off-by: discord9 <discord9@163.com>

feat: flownode detect stale query

bc76861

Signed-off-by: discord9 <discord9@163.com>

github-actions bot added size/XL docs-not-required This change does not impact docs. labels Mar 17, 2026

discord9 changed the title ~~Feat/flow inc query~~ feat: flow inc query Mar 17, 2026

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

discord9 added 2 commits March 17, 2026 21:48

feat: flow terminal metrics handling

8376150

Signed-off-by: discord9 <discord9@163.com>

fix: make snapshot/inc query works

1c19639

Signed-off-by: discord9 <discord9@163.com>

github-actions bot added size/XXL and removed size/XL labels Mar 18, 2026

discord9 added 8 commits March 19, 2026 12:12

refactor: add participating regions

7676187

Signed-off-by: discord9 <discord9@163.com>

feat: flow checkpoint&inc/full mode

751895c

Signed-off-by: discord9 <discord9@163.com>

feat: flow join rewriter

bb23334

Signed-off-by: discord9 <discord9@163.com>

chore: allow hold lock await in test

5c213ad

Signed-off-by: discord9 <discord9@163.com>

chore: test

371da5a

Signed-off-by: discord9 <discord9@163.com>

chore: test

9fdc6b2

Signed-off-by: discord9 <discord9@163.com>

feat: better seq snapshot loc

576c802

Signed-off-by: discord9 <discord9@163.com>

bench: flow update case

4244597

Signed-off-by: discord9 <discord9@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: flow inc query#7821

feat: flow inc query#7821
discord9 wants to merge 16 commits intomainfrom
feat/flow_inc_query

discord9 commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

discord9 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Benchmark Result

Profiling

Caveats

Next Step

PR Checklist

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

discord9 commented Mar 17, 2026 •

edited

Loading