Skip to content

Conversation

lexprfuncall
Copy link

@lexprfuncall lexprfuncall commented Aug 11, 2025

When THP is enabled system-wide, there is no need to provide hugepage
advice to enable THP for a region of memory. On the other hand, when
the THP defragmentation mode is defer+madvise, hugepage advice can
result in unwanted application stalls to compact and defragment memory
if THP is not immediately available for allocation.

@meta-cla meta-cla bot added the cla signed label Aug 11, 2025
src/hpa.c Outdated
Comment on lines 430 to 431
init_system_thp_mode == thp_mode_always &&
opt_thp == thp_mode_default);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If opt_thp is thp_mode_always or thp_mode_never, we should not explicitly hugify either?

should this be something like

opt_thp == thp_mode_default && !(opt_hpa_opts.experimental_thp_always_no_madvise && 
init_system_thp_mode == thp_mode_always)

src/hpa.c Outdated

/* Actually do the purging, now that the lock is dropped. */
if (batch[i].dehugify) {
if (hpa_explicit_hugify_and_dehugfiy() && batch[i].dehugify) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a separate flag/PR, but i thought for experiments we also wanted to allow directly purging without dehugify (regardless of init_system_thp_mode)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, just noticed #23

* Whether to skip advising memory regions with MADV_{NO,}HUGEPAGE if
* the transparent hugepages enabled value is set to always.
*/
bool experimental_thp_always_no_madvise;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of this should be added in the HPA_SHARD_OPTS_DEFAULT below since it is used to initialize this struct.

src/ctl.c Outdated
CTL_PROTO(opt_hpa_hugify_sync)
CTL_PROTO(opt_hpa_min_purge_interval_ms)
CTL_PROTO(opt_experimental_hpa_max_purge_nhp)
CTL_PROTO(opt_experimental_thp_always_no_madvise)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what scenarios do we still want to madvise hugepage when we have thp:always?

Besides, we still madvise MADV_DONTNEED even with this flag and we have already removed dehugifying calls, so maybe opt_experimental_thp_always_no_madv_hugepage is more appropriate?

Copy link
Author

@lexprfuncall lexprfuncall Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the user sets thp:{always,never} then the extent hooks will always advise regions it allocates as MADV_{,NO}HUGEPAGE. What does this have to do with HPA? As far as I can tell, nothing, because when HPA allocates memory, it does not get memory from the extent hooks. Instead, it calls pages_map directly. (That might make sense to change if we get rid of PAC but that is a change for another day.)

As for your second question, I think the flag name could use improvement and might be better named experimental_hpa_(something) and not mention "thp" since that is easily confused with the thp option. I could also pull opt_thp == default out from the conjunction that says whether to suppress the use of madvise when hugifying.

So, how about the following follow-up changes?

  1. Rename the flag to experimental_hpa_no_explicit_hugify (not mentioning MADV_HUGEPAGE avoids excluding MADV_COLLAPSE)
  2. Remove the check for opt_thp == default
  3. Rename the predicate hpa_explicit_hugify since we do not explicitly dehugify now that my other PR landed

WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. While the change is 1. no perf impact on systems setting kernel thp to madvise and 2. the perf impact should be a win for kernel thp always/never, there is not a high necessity in maintaining a separate runtime options here. We originally wanted to have this option mainly to benefit testing and rollout, but it is a burden not worthy, so let's also remove the opt_experimental_thp_always_no_madvise.

Besides, I also agree to remove the check for opt_thp == default. For the long term, this option controls thp usage in pac and should not be used in HPA. Once we have HPA well-prepared, all traffic that wants to use thp should go through HPA instead of setting opt_thp.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what scenarios do we still want to madvise hugepage when we have thp:always?

I thought we at least want a switch for now for experimentation purpose?

We originally wanted to have this option mainly to benefit testing and rollout, but it is a burden not worthy, so let's also remove the opt_experimental_thp_always_no_madvise.

We discussed that we want to improve hugificaiton heuristic to be more compatible with kernel thp heuristic (i.e. taking pressure into account along with others), wouldn't we want to experiment explicit madvise on top of thp=always for our hugification heuristic improvements, compare that against not requesting hugification at all.

Copy link
Author

@lexprfuncall lexprfuncall Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we at least want a switch for now for experimentation purpose?

That was the original plan and it's what I suggested in a follow-up comment above. However, the feedback I got afterward was to disable the advice when THP is enabled system-wide since, as far as we could tell, with that setting the advice is a no-op.

That said, further research revealed that the advice does have a surprising effect when the defrag setting is defer+madvise but none of the systems we care about use that setting, at least at the moment.

We discussed that we want to improve hugificaiton heuristic to be more compatible with kernel thp heuristic (i.e. taking pressure into account along with others), wouldn't we want to experiment explicit madvise on top of thp=always for our hugification heuristic improvements, compare that against not requesting hugification at all.

Yes, I think we do want to adjust the heuristics to be more in line with the current behavior of the kernel. However, the no-op nature of MADV_HUGEPAGE when THP is enabled system-wide suggests it is unlikely to be a factor in getting there.

What the current data tells us is that when configured to hugify solely by advising, there is a high percentage of full pageslabs that are never advised. There could be a bug, or bad thresholds, or it could be that we are not budgeting enough foreground or background explicit hugification work. When we get to the bottom of that issue and observe the system with a higher percentage of advised memory, I think the specific mechanisms to use to promote hugification will become clear.

When THP is enabled system-wide, there is no need to provide hugepage
advice to enable THP for a region of memory.  On the other hand, when
the THP defragmentation mode is defer+madvise, hugepage advice can
result in unwanted application stalls to compact and defragment memory
if THP is not immediately available for allocation.
@lexprfuncall lexprfuncall force-pushed the experimental-thp-always-no-madvise branch from ef38b48 to d7f6e13 Compare August 22, 2025 02:32
@lexprfuncall lexprfuncall changed the title Add an option to disable advice for hugifying and dehugifying Provide hugepage advice only when the THP enabled setting is madvise Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants