feat: some optimistic paths for instant manipulate by waynexia · Pull Request #7812 · GreptimeTeam/greptimedb

waynexia · 2026-03-14T04:33:23Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Some optimistic paths to avoid unnecessary array materialization

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

gemini-code-assist · 2026-03-14T04:33:37Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant performance optimizations within the InstantManipulate operator by intelligently reusing column data. By identifying and propagating information about constant tag columns, the system can avoid redundant materialization of array data, leading to more efficient query execution, especially in scenarios where the output record batch size changes but certain columns remain static.

Highlights

Optimized Column Materialization: Introduced optimistic paths to avoid unnecessary array materialization for columns that remain constant across a series, significantly improving performance for certain query patterns.
Tag Column Propagation: Added a tag_columns field to the InstantManipulate logical plan, allowing the planner to pass hints about tag columns to the execution plan. This enables the new optimization for reusing non-sample columns.
Dynamic Tag Column Resolution: Implemented logic to dynamically resolve tag_columns from the input SeriesDivide logical plan if not explicitly provided, ensuring the optimization can be applied even when tag columns are inferred.
New Helper Function for Column Reuse: Added a reuse_constant_column helper function to efficiently handle the reuse of constant columns when the output record batch length changes, avoiding recomputation or re-allocation.

Changelog

src/promql/src/extension_plan/instant_manipulate.rs
- Imported DataType, ScalarValue, and Extension for enhanced type handling and logical plan extensions.
- Removed unused ArrowResult import.
- Imported SeriesDivide for logical plan introspection.
- Added tag_columns field to InstantManipulate struct to store planner-provided tag column hints.
- Updated InstantManipulate::new constructor and UserDefinedLogicalNodeCore implementations to accept and propagate tag_columns.
- Implemented resolve_tag_columns method to infer tag columns from SeriesDivide input if not explicitly set.
- Added reuse_all_non_sample_columns field to InstantManipulateExec and InstantManipulateStream to control the new optimization.
- Optimized vector capacity allocation for take_indices and aligned_ts in InstantManipulateStream.
- Modified InstantManipulateStream::take_record_batch to implement the logic for reusing non-sample columns based on reuse_all_non_sample_columns.
- Introduced reuse_constant_column function to efficiently slice or extend constant columns.
- Added new unit tests for rebuild_should_recover_tag_columns_from_series_divide_input and tsid_fast_path_reuses_non_sample_columns_when_output_grows to validate the new functionality.
src/query/src/promql/planner.rs
- Modified PromPlanner::plan_instant_manipulate to pass the tag_columns from the context to the InstantManipulate logical plan.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an optimistic execution path for InstantManipulate to improve performance by avoiding unnecessary array materialization, particularly when using __tsid. The changes are mostly centered around instant_manipulate.rs, with a key modification in take_record_batch_optional to reuse constant columns. While the optimization itself is well-implemented, I've identified a critical issue in the planner logic that currently prevents this new fast path from being activated. Additionally, I've suggested an improvement to make the column resolution logic more robust. Addressing these points will ensure the new feature works as intended and improves the overall resilience of the query plan.

gemini-code-assist · 2026-03-14T04:36:01Z

src/query/src/promql/planner.rs

                .time_index_column
                .clone()
                .expect("time index should be set in `setup_context`"),
+            self.ctx.tag_columns.clone(),


The tag_columns passed to InstantManipulate::new here is incorrect when use_tsid is true. self.ctx.tag_columns holds the original tag columns from the table metadata, not ["__tsid"]. This prevents the reuse_all_non_sample_columns optimization in InstantManipulate from being triggered, which undermines a key goal of this pull request.

To fix this, you should conditionally pass vec!["__tsid".to_string()] when self.ctx.use_tsid is true, similar to how series_key_columns is determined for SeriesDivide.

if self.ctx.use_tsid { vec![store_api::metric_engine_consts::DATA_SCHEMA_TSID_COLUMN_NAME.to_string()] } else { self.ctx.tag_columns.clone() },

gemini-code-assist · 2026-03-14T04:36:01Z

src/promql/src/extension_plan/instant_manipulate.rs

+        let LogicalPlan::Extension(Extension { node }) = input else {
+            return Vec::new();
+        };
+
+        node.as_any()
+            .downcast_ref::<SeriesDivide>()
+            .map(|series_divide| series_divide.tags().to_vec())
+            .unwrap_or_default()


This implementation of resolve_tag_columns only inspects the immediate input of InstantManipulate. However, the planner constructs a plan where the input is a SeriesNormalize node, which in turn wraps the SeriesDivide node. Consequently, the downcast_ref::<SeriesDivide>() will fail, and tag columns won't be resolved from the input plan if they are not explicitly provided.

While the primary fix should be in the planner to pass the correct tag_columns, making this function more robust by looking through SeriesNormalize would be a valuable improvement for future use cases. You might consider traversing the plan to find the underlying SeriesDivide node.

evenyag · 2026-03-17T07:45:14Z

src/promql/src/extension_plan/instant_manipulate.rs

+        let reuse_all_non_sample_columns =
+            matches!(self.tag_columns.as_slice(), [tag] if tag == "__tsid");


So we only set reuse_all_non_sample_columns to true when there are no tags except tsid?

evenyag · 2026-03-17T07:46:56Z

src/promql/src/extension_plan/instant_manipulate.rs

+            .with_exprs_and_inputs(vec![], vec![series_divide])
+            .unwrap();
+
+        assert_eq!(plan.tag_columns, vec!["__tsid".to_string()]);


Also assert reuse_all_non_sample_columns?

evenyag · 2026-03-17T07:48:27Z

src/promql/src/extension_plan/instant_manipulate.rs

+            reuse_all_non_sample_columns: true,
+            input,
+            metric: ExecutionPlanMetricsSet::new(),
+        });


In non-test code, we won't set reuse_all_non_sample_columns to true because tags also contain a host column. Is it expected?

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

waynexia · 2026-03-19T23:09:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe2a9f91d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T23:14:13Z

src/promql/src/extension_plan/instant_manipulate.rs

+            if self.reuse_all_non_sample_columns && Some(index) != self.field_index {
+                arrays.push(reuse_constant_column(array, output_len)?);


Keep secondary value columns on the take() path

When __tsid is used as the series key, this branch treats every non-time, non-field_index column as a constant and rebuilds it with reuse_constant_column(). PromPlanner::prom_vector_selector_to_plan still passes only self.ctx.field_columns.first() into InstantManipulate, so on metric-engine tables with multiple value columns (field_1, field_2, …) those extra samples now bypass take_indices entirely. Any instant query that expands or skips points via lookback will therefore return stale/corrupted values for the secondary fields instead of the rows selected by compute::take().

Useful? React with 👍 / 👎.

feat: some optimistic paths for instant manipulate

456288d

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

waynexia requested review from discord9 and evenyag as code owners March 14, 2026 04:33

github-actions bot added the size/S label Mar 14, 2026

github-actions bot added the docs-not-required This change does not impact docs. label Mar 14, 2026

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

evenyag reviewed Mar 17, 2026

View reviewed changes

use tsid in manipulate plan, resolve_tag_columns walks whole plan

fe2a9f9

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

chatgpt-codex-connector bot reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: some optimistic paths for instant manipulate#7812

feat: some optimistic paths for instant manipulate#7812
waynexia wants to merge 2 commits intomainfrom
inst-mani-fast-path

waynexia commented Mar 14, 2026

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

evenyag Mar 17, 2026

Uh oh!

evenyag Mar 17, 2026

Uh oh!

evenyag Mar 17, 2026

Uh oh!

waynexia commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let reuse_all_non_sample_columns =
		matches!(self.tag_columns.as_slice(), [tag] if tag == "__tsid");

		if self.reuse_all_non_sample_columns && Some(index) != self.field_index {
		arrays.push(reuse_constant_column(array, output_len)?);

Conversation

waynexia commented Mar 14, 2026

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

evenyag Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

evenyag Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

evenyag Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

waynexia commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants