From 42cd40267df845621f7046f3aa0fc99119d143e1 Mon Sep 17 00:00:00 2001 From: Wei Xu Date: Fri, 22 Nov 2024 00:05:00 +0100 Subject: [PATCH] mm/mglru: only clear kswapd_failures if reclaimable BugLink: https://bugs.launchpad.net/bugs/2087886 lru_gen_shrink_node() unconditionally clears kswapd_failures, which can prevent kswapd from sleeping and cause 100% kswapd cpu usage even when kswapd repeatedly fails to make progress in reclaim. Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some progress, similar to shrink_node(). I happened to run into this problem in one of my tests recently. It requires a combination of several conditions: The allocator needs to allocate a right amount of pages such that it can wake up kswapd without itself being OOM killed; there is no memory for kswapd to reclaim (My test disables swap and cleans page cache first); no other process frees enough memory at the same time. Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Wei Xu Cc: Axel Rasmussen Cc: Brian Geffon Cc: Jan Alexander Steffens Cc: Suleiman Souhlal Cc: Yu Zhao Cc: Signed-off-by: Andrew Morton (cherry picked from commit b130ba4a6259f6b64d8af15e9e7ab1e912bcb7ad) Signed-off-by: Matthew Ruffell Acked-by: Koichiro Den Acked-by: Agathe Porte Acked-by: Manuel Diewald Signed-off-by: Stefan Bader --- mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index eefd9c908b659..9f3cc52fc3dab 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4925,8 +4925,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * blk_finish_plug(&plug); done: - /* kswapd should never fail */ - pgdat->kswapd_failures = 0; + if (sc->nr_reclaimed > reclaimed) + pgdat->kswapd_failures = 0; } /******************************************************************************