Skip to content

feat: tune constants#7851

Open
waynexia wants to merge 3 commits intomainfrom
tune-consts
Open

feat: tune constants#7851
waynexia wants to merge 3 commits intomainfrom
tune-consts

Conversation

@waynexia
Copy link
Copy Markdown
Member

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Adjust constants to make them more streamlined

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia waynexia requested review from a team, discord9, evenyag and v0y4g3r as code owners March 23, 2026 23:22
@github-actions github-actions bot added size/S docs-not-required This change does not impact docs. labels Mar 23, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining several internal constants and mechanisms to enhance the system's performance and resource utilization. The changes include optimizing Parquet file metadata loading, improving the file cache's capacity distribution, and aligning query execution batch sizes with DataFusion defaults for more efficient data processing. These adjustments aim to streamline operations and improve overall system responsiveness.

Highlights

  • Parquet Metadata Prefetching: Implemented support for metadata_size_hint in the Parquet file reader, allowing for optimized prefetching of metadata and potentially faster file access.
  • File Cache Capacity Management: Refactored the file cache's capacity allocation logic to ensure a more robust and balanced distribution between Parquet and Puffin caches, respecting overall budget and minimum capacity requirements.
  • Optimized Parquet Read Batch Size: Increased the default Parquet read batch size to align with DataFusion's default, which helps reduce rebatching and concatenation overhead in the query pipeline.
  • Dynamic Batch Sizing for Query Plans: Modified Absent and RangeSelect query execution plans to dynamically utilize the session's configured batch size, removing previously hardcoded values and improving flexibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a set of valuable improvements focused on tuning constants and enhancing data processing logic. Key changes include adjusting Parquet read/write constants to improve performance and on-disk format stability, and refactoring several components to use the configurable session batch size, which increases consistency across the system. Additionally, the logic for splitting file cache capacity has been made more robust, fixing a potential overallocation bug. The adoption of a more modern Parquet metadata reader is also a welcome improvement. Overall, these changes are well-implemented and positively contribute to the codebase's quality and configurability.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 930f70b052

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@discord9 discord9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absent could impl output batch size, rest LGTM

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@github-actions github-actions bot added size/M and removed size/S labels Mar 24, 2026
Comment on lines +1166 to +1170
let num_rows = self.output_batch.as_ref().unwrap().num_rows();
if num_rows == 0 {
self.output_batch_offset = 0;
return Ok(self.output_batch.take());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected to return an empty record batch in range select?

Comment on lines +548 to +550
if self.output_timestamps.len() >= self.batch_size {
return Ok(());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return early here, do we need to update the input_timestamp_offset?

@evenyag
Copy link
Copy Markdown
Contributor

evenyag commented Mar 26, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants