You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We do not always recommend the user to have a pinned pool size setting with an appropriate overhead memory setting. spark.rapids.memory.pinnedPool.size . This was seen in on-prem environments.
We should verify if this setting is set appropriately with respect to other params as well. eg. spark.executor.memory and spark.executor.memoryOverhead
The text was updated successfully, but these errors were encountered:
We do not always recommend the user to have a pinned pool size setting with an appropriate overhead memory setting.
spark.rapids.memory.pinnedPool.size
I investigated this and was able to find some bugs that could have caused this:
Case 1: @kuhushukla Do you remember if the recommended configs did not have an entry for spark.rapids.memory.pinnedPool.size but the comments had this line?
- 'spark.rapids.memory.pinnedPool.size' should be set to 2048m.
This is a bug in our code due to our incorrect cluster memory calculation when dynamic allocation is enabled. This should be fixed by my dev branch that address #1121
Case 2:
The AutoTuner comment had this line:
- 'spark.executor.memoryOverhead' must be set if using 'spark.rapids.memory.pinnedPool.size
This is a bug in our code where we compare a memory value string (4G) with the string spark.executor.memoryOverhead. We should also fix this.
Case 3:
Was the job using standalone mode? In that case recommending spark.executor.memoryOverhead is disabled as it is not supported.
Describe the bug
We do not always recommend the user to have a pinned pool size setting with an appropriate overhead memory setting.
spark.rapids.memory.pinnedPool.size
. This was seen in on-prem environments.We should verify if this setting is set appropriately with respect to other params as well. eg.
spark.executor.memory
andspark.executor.memoryOverhead
The text was updated successfully, but these errors were encountered: