Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pinned pool size setting may be missing from the autotuner recommendation #1398

Open
kuhushukla opened this issue Oct 29, 2024 · 2 comments
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@kuhushukla
Copy link
Collaborator

Describe the bug
We do not always recommend the user to have a pinned pool size setting with an appropriate overhead memory setting.
spark.rapids.memory.pinnedPool.size . This was seen in on-prem environments.

We should verify if this setting is set appropriately with respect to other params as well. eg. spark.executor.memory and spark.executor.memoryOverhead

@kuhushukla kuhushukla added ? - Needs Triage bug Something isn't working labels Oct 29, 2024
@amahussein amahussein added the core_tools Scope the core module (scala) label Nov 1, 2024
@mattahrens
Copy link
Collaborator

Pinned memory logic is in one of two places:

Can you give exact repro information where pinned pool wasn't being recommended? I wonder of cluster memory info was missing in input.

@parthosa
Copy link
Collaborator

parthosa commented Jan 7, 2025

We do not always recommend the user to have a pinned pool size setting with an appropriate overhead memory setting.
spark.rapids.memory.pinnedPool.size

I investigated this and was able to find some bugs that could have caused this:

Case 1:
@kuhushukla Do you remember if the recommended configs did not have an entry for spark.rapids.memory.pinnedPool.size but the comments had this line?

- 'spark.rapids.memory.pinnedPool.size' should be set to 2048m.

This is a bug in our code due to our incorrect cluster memory calculation when dynamic allocation is enabled. This should be fixed by my dev branch that address #1121

Case 2:
The AutoTuner comment had this line:

- 'spark.executor.memoryOverhead' must be set if using 'spark.rapids.memory.pinnedPool.size

This is a bug in our code where we compare a memory value string (4G) with the string spark.executor.memoryOverhead. We should also fix this.

Case 3:
Was the job using standalone mode? In that case recommending spark.executor.memoryOverhead is disabled as it is not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

No branches or pull requests

4 participants