-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Set TrackConsumersPool
as default in datafusion-cli
#16081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c261c9e
to
e2dc277
Compare
Ideally there would be a basic unit test to document/verify the functionality of the configuration option. I believe it would be added here (link to cli_integration.rs). |
I think it's good to go after adding the test. |
@jfahne @2010YOUY01 Thank you for review! I'll add the test |
I think the order of top memory consumers can even change 😓 I'll update the snapshot tests. |
4dfbf1c
to
77089a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR sets TrackConsumersPool as the default memory pool in datafusion-cli by introducing a new CLI option --top-memory-consumers.
- Adds the --top-memory-consumers option to control tracking of top memory consumers.
- Extends integration tests and updates snapshots to validate the new memory pool behavior.
- Updates CLI documentation to include the new option.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
docs/source/user-guide/cli/usage.md | Added documentation for the new --top-memory-consumers option. |
datafusion-cli/tests/snapshots/cli_top_memory_consumers@top3_default.snap, [email protected], cli_top_memory_consumers@no_track.snap | Updated snapshots to reflect error messages with and without memory consumer tracking. |
datafusion-cli/tests/cli_integration.rs | Added integration tests to exercise the new --top-memory-consumers behavior. |
datafusion-cli/src/main.rs | Updated CLI argument parsing and memory pool initialization to support TrackConsumersPool. |
let pool: Arc<dyn MemoryPool> = match args.mem_pool_type { | ||
PoolType::Fair => Arc::new(FairSpillPool::new(memory_limit)), | ||
PoolType::Greedy => Arc::new(GreedyMemoryPool::new(memory_limit)), | ||
PoolType::Fair if args.top_memory_consumers == 0 => { | ||
Arc::new(FairSpillPool::new(memory_limit)) | ||
} | ||
PoolType::Fair => Arc::new(TrackConsumersPool::new( | ||
FairSpillPool::new(memory_limit), | ||
NonZeroUsize::new(args.top_memory_consumers).unwrap(), | ||
)), | ||
PoolType::Greedy if args.top_memory_consumers == 0 => { | ||
Arc::new(GreedyMemoryPool::new(memory_limit)) | ||
} | ||
PoolType::Greedy => Arc::new(TrackConsumersPool::new( | ||
GreedyMemoryPool::new(memory_limit), | ||
NonZeroUsize::new(args.top_memory_consumers).unwrap(), | ||
)), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The logic for wrapping Fair and Greedy pools with TrackConsumersPool is duplicated. Consider refactoring this common functionality into a helper function to reduce code duplication.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2010YOUY01 I ended up keeping the match statement as is since TrackConsumersPool::new()
requires a concrete type implementing MemoryPool
, and we can't wrap a dyn MemoryPool
directly. 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ding-young and @jfahne
I think the reviews from the bot are also worth addressing before merging.
Thank you @2010YOUY01 @jfahne . I'll address them by today. |
add snapshot tests for memory exhaustion
77089a2
to
08e7e91
Compare
@@ -57,6 +57,9 @@ OPTIONS: | |||
--mem-pool-type <MEM_POOL_TYPE> | |||
Specify the memory pool type 'greedy' or 'fair', default to 'greedy' | |||
|
|||
--top-memory-consumers <TOP_MEMORY_CONSUMERS> | |||
The number of top memory consumers to display when query fails due to memory exhaustion. To disable memory consumer tracking, set this value to 0 [default: 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description updated
Which issue does this PR close?
Part of #16065
Rationale for this change
Currently, datafusion-cli does not provide an option to use
TrackConsumersPool
as memory pool. It would be useful to setTrackConsumersPool
as default so that users can view top memory consumers when running out of memory.What changes are included in this PR?
This pr adds a new option
top-memory-consumers
in datafusion-cli. Thereby, user can setTrackConsumersPool { inner: <mem-pool-type>, top: <top-memory-consumers>}
as memory pool. Iftop-memory-consumers
is set to 0, datafusion-cli will usemem-pool-type
not wrapped withTrackConsumersPool
.Are these changes tested?
Yes, via locally.
Are there any user-facing changes?
I updated the docs. (
usage.md
)