Skip to content

feat: flow inc query#7821

Draft
discord9 wants to merge 16 commits intomainfrom
feat/flow_inc_query
Draft

feat: flow inc query#7821
discord9 wants to merge 16 commits intomainfrom
feat/flow_inc_query

Conversation

@discord9
Copy link
Copy Markdown
Contributor

@discord9 discord9 commented Mar 17, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Summary

This PR adds a proof-of-concept for Flow incremental queries using a changed-rows-only incremental shape.

Instead of recomputing the full result from the full source input on every round, the POC aggregates only the delta window and joins it with sink state to emit only the rows that need to be updated in the sink.

Why

The goal is to validate whether incremental queries can materially reduce Flow query cost before we invest further in end-to-end scheduling and batching work.

Benchmark Result

The strongest result comes from a workload shaped like src >> sink >> delta:

  • small_delta_src_ultra_sink_mid
    • full recompute: 707-716 ms
    • changed-rows-only incremental (delta-only left join update): 124-126 ms

This is roughly a 5.7x improvement in wall-clock latency.

Profiling

For this workload, full recompute is dominated by scanning and aggregating the full source table, while the incremental path shifts the main cost to sink-side scan and join work.

That means the POC already demonstrates a meaningful win, and the next optimization target is clear: reduce sink-side read cost further.

Caveats

  • This is still a POC, not the final production shape.
  • The gain is workload-dependent; the positive region is currently clearest when src >> sink >> delta.
  • The current benchmark focuses on query-shape validation, not full end-to-end Flow scheduling behavior.

Next Step

Continue optimizing the changed-rows-only incremental path, especially sink-side scan/repartition cost.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
@github-actions github-actions bot added size/XL docs-not-required This change does not impact docs. labels Mar 17, 2026
@discord9 discord9 changed the title Feat/flow inc query feat: flow inc query Mar 17, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces several enhancements to the flow query engine, focusing on improving the reliability and accuracy of incremental queries. It includes mechanisms for detecting and handling stale cursor errors, tracking region watermarks, and extending the query context with flow-specific configurations. These changes contribute to a more robust and efficient flow query processing pipeline.

Highlights

  • Stale Cursor Handling: Introduces mechanisms to detect and handle stale cursor errors in flow queries, ensuring more robust incremental query processing.
  • Region Watermark Tracking: Adds functionality to track region watermarks, enabling more accurate incremental queries and data consistency.
  • Query Context Extensions: Enhances query context with extensions for flow control, including incremental mode and region sequence tracking.
Activity
  • discord9 authored and implemented the changes.
  • The PR is currently awaiting review and further activity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for incremental queries in flows. The changes are extensive, touching many parts of the codebase from the client and server communication layers to the storage and query engines. Key changes include:

  • Modifying the flight protocol handling in the client to process metrics messages that may follow record batches.
  • Extending QueryContext and related structs to carry sequence numbers for snapshot and incremental reads.
  • Implementing a RegionWatermarkStream in the datanode to attach the latest region sequence number (watermark) to the query result stream.
  • Adding logic in the flow task runner to detect and handle "stale cursor" errors from incremental queries by falling back to a full re-computation.
  • Introducing checks in the storage engine (mito2) to detect stale incremental queries and return a specific error.
  • Adding comprehensive integration tests to validate the new incremental query and error handling logic.

My feedback focuses on improving error handling in the client and enhancing code documentation for better maintainability. The overall implementation appears solid and well-thought-out.

Comment on lines +468 to +470
let m: Option<Arc<RecordBatchMetrics>> =
serde_json::from_str(&s).ok().map(Arc::new);
metrics_ref.swap(m);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of .ok() silently discards any JSON deserialization errors. This could hide potential issues with metrics serialization from the server, making debugging more difficult. It would be better to log a warning when deserialization fails. This also applies to the similar code block at line 494.

Suggested change
let m: Option<Arc<RecordBatchMetrics>> =
serde_json::from_str(&s).ok().map(Arc::new);
metrics_ref.swap(m);
let m = match serde_json::from_str(&s) {
Ok(metrics) => Some(Arc::new(metrics)),
Err(e) => {
warn!("Failed to deserialize RecordBatchMetrics from flight message: {}", e);
None
}
};
metrics_ref.swap(m);

Comment on lines 1480 to +1491
pub struct FlowQueryContext {
/// Current catalog name - needed for flow metadata and recovery
pub catalog: String,
/// Current schema name - needed for table resolution during flow execution
pub schema: String,
/// Timezone for timestamp operations in the flow
pub timezone: String,
#[serde(default)]
pub extensions: HashMap<String, String>,
#[serde(default)]
pub channel: u8,
#[serde(default)]
pub snapshot_seqs: HashMap<u64, u64>,
#[serde(default)]
pub sst_min_sequences: HashMap<u64, u64>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The doc comments for catalog, schema, and timezone fields in FlowQueryContext were removed. These comments were useful for understanding the purpose of each field. It would be great to restore them and also add doc comments for the newly added fields (extensions, channel, snapshot_seqs, sst_min_sequences) to improve code clarity and maintainability.

}

impl FrontendClient {
/// TODO(discord9): better way to detect stale cursor error instead of parsing the error message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Parsing error messages is brittle and can easily break if the error message format changes in the future. It's good that this is marked with a TODO, but this should be prioritized to be replaced with a more robust mechanism, like structured error propagation, to ensure long-term stability.

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
@github-actions github-actions bot added size/XXL and removed size/XL labels Mar 18, 2026
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant