Skip to content

fix: Fix peak memory display in EXPLAIN ANALYZE for multiple operators#23140

Open
2010YOUY01 wants to merge 1 commit into
apache:mainfrom
2010YOUY01:hj-mem-fix
Open

fix: Fix peak memory display in EXPLAIN ANALYZE for multiple operators#23140
2010YOUY01 wants to merge 1 commit into
apache:mainfrom
2010YOUY01:hj-mem-fix

Conversation

@2010YOUY01

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

See reproducer in datafusion-cli:

yongting@Yongtings-MacBook-Pro-2 ~/C/d/datafusion (hj-mem-fix *)> datafusion-cli
DataFusion CLI v54.0.0
> set datafusion.explain.analyze_categories = 'bytes';
0 row(s) fetched.
Elapsed 0.001 seconds.

> explain analyze
select *
from generate_series(100000000) as t1(v1)
join generate_series(100000000) as t2(v1)
on t1.v1=t2.v1;
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                  |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | HashJoinExec: mode=Partitioned, join_type=Inner, on=[(v1@0, v1@0)], metrics=[output_bytes=1526.7 MB, build_mem_used=2.80 B]                           |
|                   |   RepartitionExec: partitioning=Hash([v1@0], 14), input_partitions=1, maintains_sort_order=true, metrics=[output_bytes=763.4 MB, spilled_bytes=0.0 B] |
|                   |     ProjectionExec: expr=[value@0 as v1], metrics=[output_bytes=763.0 MB]                                                                             |
|                   |       LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100000000, batch_size=8192], metrics=[output_bytes=763.0 MB]      |
|                   |   RepartitionExec: partitioning=Hash([v1@0], 14), input_partitions=1, maintains_sort_order=true, metrics=[output_bytes=763.4 MB, spilled_bytes=0.0 B] |
|                   |     ProjectionExec: expr=[value@0 as v1], metrics=[output_bytes=763.0 MB]                                                                             |
|                   |       LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100000000, batch_size=8192], metrics=[output_bytes=763.0 MB]      |
|                   |                                                                                                                                                       |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 1.040 seconds.

See HashJoinExec: ... build_mem_used=2.80 B, it actually means 2.8 billion bytes, but it looks quite misleading. This PR fixes it:

+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                  |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | HashJoinExec: mode=Partitioned, join_type=Inner, on=[(v1@0, v1@0)], metrics=[output_bytes=1526.7 MB, build_mem_used=2.6 GB]  

The reason is those metrics are tracking peak memory usage for a operator in the query lifecycle, so it's using Gauge metrics type, but how to choose display format become tricky (bytes or count?)

This PR adds a new Metric type, that wraps the existing Gauge, and use it to represent 'gauge for memory bytes', this fixes the bytes display issue.

What changes are included in this PR?

  1. Introduce MetricValue::PeakMemoryUsage to address the above issue
  2. Use this metric type for applicable metrics
  3. Add slts for testing.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Jun 24, 2026
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-ffi v54.0.0 (current)
       Built [  64.037s] (current)
     Parsing datafusion-ffi v54.0.0 (current)
      Parsed [   0.065s] (current)
    Building datafusion-ffi v54.0.0 (baseline)
       Built [  58.659s] (baseline)
     Parsing datafusion-ffi v54.0.0 (baseline)
      Parsed [   0.065s] (baseline)
    Checking datafusion-ffi v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.369s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 125.086s] datafusion-ffi
    Building datafusion-physical-expr-common v54.0.0 (current)
       Built [  23.003s] (current)
     Parsing datafusion-physical-expr-common v54.0.0 (current)
      Parsed [   0.023s] (current)
    Building datafusion-physical-expr-common v54.0.0 (baseline)
       Built [  23.392s] (baseline)
     Parsing datafusion-physical-expr-common v54.0.0 (baseline)
      Parsed [   0.023s] (baseline)
    Checking datafusion-physical-expr-common v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.302s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/enum_variant_added.ron

Failed in:
  variant MetricValue:PeakMemoryUsage in /home/runner/work/datafusion/datafusion/datafusion/physical-expr-common/src/metrics/value.rs:676

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  47.535s] datafusion-physical-expr-common
    Building datafusion-physical-plan v54.0.0 (current)
       Built [  35.290s] (current)
     Parsing datafusion-physical-plan v54.0.0 (current)
      Parsed [   0.138s] (current)
    Building datafusion-physical-plan v54.0.0 (baseline)
       Built [  34.832s] (baseline)
     Parsing datafusion-physical-plan v54.0.0 (baseline)
      Parsed [   0.140s] (baseline)
    Checking datafusion-physical-plan v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.890s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  72.563s] datafusion-physical-plan
    Building datafusion-sqllogictest v54.0.0 (current)
       Built [ 186.647s] (current)
     Parsing datafusion-sqllogictest v54.0.0 (current)
      Parsed [   0.024s] (current)
    Building datafusion-sqllogictest v54.0.0 (baseline)
       Built [ 182.692s] (baseline)
     Parsing datafusion-sqllogictest v54.0.0 (baseline)
      Parsed [   0.024s] (baseline)
    Checking datafusion-sqllogictest v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.118s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 372.122s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 24, 2026
@2010YOUY01

Copy link
Copy Markdown
Contributor Author

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details

It suggests to use #[non_exhaustive]

// library crate
#[non_exhaustive]
pub enum Color {
    Red,
    Green,
}

Then downstream crate must include wildcard _ and later adding new enum variant won't be counted as a breaking change.

match color {
    Color::Red => "red",
    Color::Green => "green",
    _ => "unknown future color",
}

I think it's better to keep it as a breaking change and require downstreams to update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change ffi Changes to the ffi crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant