Skip to content

Conversation

MichaelScofield
Copy link
Collaborator

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

so that there are no intermediate Values to occupy memory, and the performance for iterating record batch is significantly better:

from iter_record_batch_rows.rs:

iter_record_batch/by_greptimedb_values/1
                        time:   [194.58 ns 198.29 ns 201.68 ns]
iter_record_batch/by_loop_rows_and_columns/1
                        time:   [22.715 ns 22.834 ns 22.965 ns]
iter_record_batch/by_datafusion_scalar_values/1
                        time:   [74.872 ns 75.163 ns 75.510 ns]
iter_record_batch/by_datafusion_scalar_values_with_buf/1
                        time:   [77.063 ns 77.415 ns 77.786 ns]
iter_record_batch/by_greptimedb_values/10
                        time:   [391.12 ns 393.26 ns 395.50 ns]
iter_record_batch/by_loop_rows_and_columns/10
                        time:   [95.108 ns 96.171 ns 97.482 ns]
iter_record_batch/by_datafusion_scalar_values/10
                        time:   [602.60 ns 605.13 ns 607.61 ns]
iter_record_batch/by_datafusion_scalar_values_with_buf/10
                        time:   [381.24 ns 382.93 ns 384.55 ns]
iter_record_batch/by_greptimedb_values/100
                        time:   [2.8510 µs 2.8677 µs 2.8859 µs]
iter_record_batch/by_loop_rows_and_columns/100
                        time:   [748.73 ns 750.04 ns 751.43 ns]
iter_record_batch/by_datafusion_scalar_values/100
                        time:   [6.0008 µs 6.0182 µs 6.0378 µs]
iter_record_batch/by_datafusion_scalar_values_with_buf/100
                        time:   [3.4424 µs 3.4534 µs 3.4646 µs]
iter_record_batch/by_greptimedb_values/1000
                        time:   [27.471 µs 27.617 µs 27.767 µs]
iter_record_batch/by_loop_rows_and_columns/1000
                        time:   [7.3151 µs 7.3463 µs 7.3815 µs]
iter_record_batch/by_datafusion_scalar_values/1000
                        time:   [58.915 µs 59.231 µs 59.632 µs]
iter_record_batch/by_datafusion_scalar_values_with_buf/1000
                        time:   [33.747 µs 33.856 µs 33.972 µs]
iter_record_batch/by_greptimedb_values/10000
                        time:   [270.98 µs 272.14 µs 273.35 µs]
iter_record_batch/by_loop_rows_and_columns/10000
                        time:   [72.313 µs 72.386 µs 72.475 µs]
iter_record_batch/by_datafusion_scalar_values/10000
                        time:   [589.19 µs 591.01 µs 592.89 µs]
iter_record_batch/by_datafusion_scalar_values_with_buf/10000
                        time:   [335.18 µs 336.08 µs 337.07 µs]

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@MichaelScofield MichaelScofield requested a review from a team as a code owner October 15, 2025 10:09
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Oct 15, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the MySQL writer to convert Arrow arrays directly to MySQL values instead of using intermediate GreptimeDB Value objects, resulting in significant performance improvements for iterating record batches.

Key changes:

  • Replaced Value-based iteration with direct Arrow array processing in the MySQL writer
  • Added benchmark tests to demonstrate performance improvements
  • Updated dependencies to support new direct conversion approach

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

File Description
src/servers/src/mysql/writer.rs Complete refactor of record batch processing to use direct Arrow array access instead of Value objects
src/servers/Cargo.toml Added common-decimal dependency for direct decimal handling
src/common/recordbatch/benches/iter_record_batch_rows.rs New benchmark file comparing different record batch iteration approaches
src/common/recordbatch/Cargo.toml Added criterion benchmark dependency and configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@MichaelScofield MichaelScofield force-pushed the refactor/mysql-write-arrow-directly branch from f74b7a3 to e47fed7 Compare October 15, 2025 11:27
@sunng87
Copy link
Member

sunng87 commented Oct 15, 2025

How will we do with the new Json type, since there is no concrete json type in datafusion?

@MichaelScofield MichaelScofield marked this pull request as draft October 16, 2025 01:32
@MichaelScofield MichaelScofield force-pushed the refactor/mysql-write-arrow-directly branch from e47fed7 to e023788 Compare October 17, 2025 05:54
@MichaelScofield MichaelScofield marked this pull request as ready for review October 17, 2025 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants