-
Notifications
You must be signed in to change notification settings - Fork 15
Provide hugepage advice only when the THP enabled setting is madvise #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Provide hugepage advice only when the THP enabled setting is madvise #27
Conversation
src/hpa.c
Outdated
init_system_thp_mode == thp_mode_always && | ||
opt_thp == thp_mode_default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If opt_thp is thp_mode_always or thp_mode_never, we should not explicitly hugify either?
should this be something like
opt_thp == thp_mode_default && !(opt_hpa_opts.experimental_thp_always_no_madvise &&
init_system_thp_mode == thp_mode_always)
src/hpa.c
Outdated
|
||
/* Actually do the purging, now that the lock is dropped. */ | ||
if (batch[i].dehugify) { | ||
if (hpa_explicit_hugify_and_dehugfiy() && batch[i].dehugify) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a separate flag/PR, but i thought for experiments we also wanted to allow directly purging without dehugify (regardless of init_system_thp_mode)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, just noticed #23
include/jemalloc/internal/hpa_opts.h
Outdated
* Whether to skip advising memory regions with MADV_{NO,}HUGEPAGE if | ||
* the transparent hugepages enabled value is set to always. | ||
*/ | ||
bool experimental_thp_always_no_madvise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of this should be added in the HPA_SHARD_OPTS_DEFAULT
below since it is used to initialize this struct.
src/ctl.c
Outdated
CTL_PROTO(opt_hpa_hugify_sync) | ||
CTL_PROTO(opt_hpa_min_purge_interval_ms) | ||
CTL_PROTO(opt_experimental_hpa_max_purge_nhp) | ||
CTL_PROTO(opt_experimental_thp_always_no_madvise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what scenarios do we still want to madvise hugepage when we have thp:always
?
Besides, we still madvise MADV_DONTNEED
even with this flag and we have already removed dehugifying calls, so maybe opt_experimental_thp_always_no_madv_hugepage
is more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if the user sets thp:{always,never}
then the extent hooks will always advise regions it allocates as MADV_{,NO}HUGEPAGE
. What does this have to do with HPA? As far as I can tell, nothing, because when HPA allocates memory, it does not get memory from the extent hooks. Instead, it calls pages_map
directly. (That might make sense to change if we get rid of PAC but that is a change for another day.)
As for your second question, I think the flag name could use improvement and might be better named experimental_hpa_(something)
and not mention "thp" since that is easily confused with the thp
option. I could also pull opt_thp == default
out from the conjunction that says whether to suppress the use of madvise
when hugifying.
So, how about the following follow-up changes?
- Rename the flag to
experimental_hpa_no_explicit_hugify
(not mentioningMADV_HUGEPAGE
avoids excludingMADV_COLLAPSE
) - Remove the check for
opt_thp == default
- Rename the predicate
hpa_explicit_hugify
since we do not explicitly dehugify now that my other PR landed
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. While the change is 1. no perf impact on systems setting kernel thp to madvise and 2. the perf impact should be a win for kernel thp always/never, there is not a high necessity in maintaining a separate runtime options here. We originally wanted to have this option mainly to benefit testing and rollout, but it is a burden not worthy, so let's also remove the opt_experimental_thp_always_no_madvise
.
Besides, I also agree to remove the check for opt_thp == default
. For the long term, this option controls thp usage in pac and should not be used in HPA. Once we have HPA well-prepared, all traffic that wants to use thp should go through HPA instead of setting opt_thp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what scenarios do we still want to madvise hugepage when we have thp:always?
I thought we at least want a switch for now for experimentation purpose?
We originally wanted to have this option mainly to benefit testing and rollout, but it is a burden not worthy, so let's also remove the opt_experimental_thp_always_no_madvise.
We discussed that we want to improve hugificaiton heuristic to be more compatible with kernel thp heuristic (i.e. taking pressure into account along with others), wouldn't we want to experiment explicit madvise on top of thp=always for our hugification heuristic improvements, compare that against not requesting hugification at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we at least want a switch for now for experimentation purpose?
That was the original plan and it's what I suggested in a follow-up comment above. However, the feedback I got afterward was to disable the advice when THP is enabled system-wide since, as far as we could tell, with that setting the advice is a no-op.
That said, further research revealed that the advice does have a surprising effect when the defrag setting is defer+madvise
but none of the systems we care about use that setting, at least at the moment.
We discussed that we want to improve hugificaiton heuristic to be more compatible with kernel thp heuristic (i.e. taking pressure into account along with others), wouldn't we want to experiment explicit madvise on top of thp=always for our hugification heuristic improvements, compare that against not requesting hugification at all.
Yes, I think we do want to adjust the heuristics to be more in line with the current behavior of the kernel. However, the no-op nature of MADV_HUGEPAGE
when THP is enabled system-wide suggests it is unlikely to be a factor in getting there.
What the current data tells us is that when configured to hugify solely by advising, there is a high percentage of full pageslabs that are never advised. There could be a bug, or bad thresholds, or it could be that we are not budgeting enough foreground or background explicit hugification work. When we get to the bottom of that issue and observe the system with a higher percentage of advised memory, I think the specific mechanisms to use to promote hugification will become clear.
When THP is enabled system-wide, there is no need to provide hugepage advice to enable THP for a region of memory. On the other hand, when the THP defragmentation mode is defer+madvise, hugepage advice can result in unwanted application stalls to compact and defragment memory if THP is not immediately available for allocation.
ef38b48
to
d7f6e13
Compare
When THP is enabled system-wide, there is no need to provide hugepage
advice to enable THP for a region of memory. On the other hand, when
the THP defragmentation mode is defer+madvise, hugepage advice can
result in unwanted application stalls to compact and defragment memory
if THP is not immediately available for allocation.