Skip to content

Set TrackConsumersPool as default in datafusion-cli #16081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

ding-young
Copy link
Contributor

Which issue does this PR close?

Part of #16065

Rationale for this change

Currently, datafusion-cli does not provide an option to use TrackConsumersPool as memory pool. It would be useful to set TrackConsumersPool as default so that users can view top memory consumers when running out of memory.

What changes are included in this PR?

This pr adds a new option top-memory-consumers in datafusion-cli. Thereby, user can set TrackConsumersPool { inner: <mem-pool-type>, top: <top-memory-consumers>} as memory pool. If top-memory-consumers is set to 0, datafusion-cli will use mem-pool-type not wrapped with TrackConsumersPool.

Are these changes tested?

Yes, via locally.

Are there any user-facing changes?

I updated the docs. (usage.md)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 18, 2025
@jfahne
Copy link
Contributor

jfahne commented May 19, 2025

Ideally there would be a basic unit test to document/verify the functionality of the configuration option. I believe it would be added here (link to cli_integration.rs).

@2010YOUY01
Copy link
Contributor

I think it's good to go after adding the test.

@ding-young
Copy link
Contributor Author

@jfahne @2010YOUY01 Thank you for review! I'll add the test

@ding-young
Copy link
Contributor Author

ding-young commented May 19, 2025

I think the order of top memory consumers can even change 😓 I'll update the snapshot tests.

@ding-young ding-young force-pushed the default-mempool branch 2 times, most recently from 4dfbf1c to 77089a2 Compare May 21, 2025 02:07
Copy link
Contributor

@jfahne jfahne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding the test.

@2010YOUY01 2010YOUY01 requested a review from Copilot May 21, 2025 04:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR sets TrackConsumersPool as the default memory pool in datafusion-cli by introducing a new CLI option --top-memory-consumers.

  • Adds the --top-memory-consumers option to control tracking of top memory consumers.
  • Extends integration tests and updates snapshots to validate the new memory pool behavior.
  • Updates CLI documentation to include the new option.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

File Description
docs/source/user-guide/cli/usage.md Added documentation for the new --top-memory-consumers option.
datafusion-cli/tests/snapshots/cli_top_memory_consumers@top3_default.snap, [email protected], cli_top_memory_consumers@no_track.snap Updated snapshots to reflect error messages with and without memory consumer tracking.
datafusion-cli/tests/cli_integration.rs Added integration tests to exercise the new --top-memory-consumers behavior.
datafusion-cli/src/main.rs Updated CLI argument parsing and memory pool initialization to support TrackConsumersPool.

Comment on lines 181 to 196
let pool: Arc<dyn MemoryPool> = match args.mem_pool_type {
PoolType::Fair => Arc::new(FairSpillPool::new(memory_limit)),
PoolType::Greedy => Arc::new(GreedyMemoryPool::new(memory_limit)),
PoolType::Fair if args.top_memory_consumers == 0 => {
Arc::new(FairSpillPool::new(memory_limit))
}
PoolType::Fair => Arc::new(TrackConsumersPool::new(
FairSpillPool::new(memory_limit),
NonZeroUsize::new(args.top_memory_consumers).unwrap(),
)),
PoolType::Greedy if args.top_memory_consumers == 0 => {
Arc::new(GreedyMemoryPool::new(memory_limit))
}
PoolType::Greedy => Arc::new(TrackConsumersPool::new(
GreedyMemoryPool::new(memory_limit),
NonZeroUsize::new(args.top_memory_consumers).unwrap(),
)),
};
Copy link
Preview

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logic for wrapping Fair and Greedy pools with TrackConsumersPool is duplicated. Consider refactoring this common functionality into a helper function to reduce code duplication.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2010YOUY01 I ended up keeping the match statement as is since TrackConsumersPool::new() requires a concrete type implementing MemoryPool, and we can't wrap a dyn MemoryPool directly. 😢

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ding-young and @jfahne

I think the reviews from the bot are also worth addressing before merging.

@ding-young
Copy link
Contributor Author

Thank you @2010YOUY01 @jfahne . I'll address them by today.

add snapshot tests for memory exhaustion
@@ -57,6 +57,9 @@ OPTIONS:
--mem-pool-type <MEM_POOL_TYPE>
Specify the memory pool type 'greedy' or 'fair', default to 'greedy'

--top-memory-consumers <TOP_MEMORY_CONSUMERS>
The number of top memory consumers to display when query fails due to memory exhaustion. To disable memory consumer tracking, set this value to 0 [default: 3]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description updated

@2010YOUY01 2010YOUY01 merged commit cb45f1f into apache:main May 22, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants