diff options
| author | JP Kobryn (Meta) <jp.kobryn@linux.dev> | 2026-04-06 12:50:14 -0700 |
|---|---|---|
| committer | Andrew Morton <akpm@linux-foundation.org> | 2026-05-28 21:04:50 -0700 |
| commit | f2a950170f7a78761c2b2e5e535716fb0f8c0813 (patch) | |
| tree | c843729cef614e01cf7ea27f478cc8f8e2de418d /scripts/objdiff | |
| parent | 1d8274b82cd1870eba883fd20204bcd8601c3527 (diff) | |
mm/vmpressure: skip socket pressure for costly order reclaim
When reclaim is triggered by high order allocations on a fragmented
system, vmpressure() can report poor reclaim efficiency even though the
system has plenty of free memory. This is because many pages are scanned,
but few are found to actually reclaim - the pages are actively in use and
don't need to be freed. The resulting scan:reclaim ratio causes
vmpressure() to assert socket pressure, throttling TCP throughput
unnecessarily.
Costly order allocations (above PAGE_ALLOC_COSTLY_ORDER) rely heavily on
compaction to succeed, so poor reclaim efficiency at these orders does not
necessarily indicate memory pressure. The kernel already treats this
order as the boundary where reclaim is no longer expected to succeed and
compaction may take over.
Make vmpressure() order-aware through an additional parameter sourced from
scan_control at existing call sites. Socket pressure is now only asserted
when order <= PAGE_ALLOC_COSTLY_ORDER.
Memcg reclaim is unaffected since try_to_free_mem_cgroup_pages() always
uses order 0, which passes the filter unconditionally. Similarly,
vmpressure_prio() now passes order 0 internally when calling vmpressure(),
ensuring critical pressure from low reclaim priority is not suppressed by
the order filter.
The patch was motivated by a case of impacted net throughput in
production. On one affected host, the memory state at the time showed
~15GB available, zero cgroup pressure, and the following buddyinfo state:
Order FreePages
0: 133,970
1: 29,230
2: 17,351
3: 18,984
7+: 0
Using bpf, it was found that 94% of vmpressure calls on this host were
from order-7 kswapd reclaim.
TCP minimum recv window is rcv_ssthresh:19712.
Before patch:
723 out of 3,843 (19%) TCP connections stuck at minimum recv window
After live-patching and ~30min elapsed:
0 out of 3,470 TCP connections stuck at minimum recv window
Link: https://lore.kernel.org/20260406195014.112521-1-jp.kobryn@linux.dev
Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qi Zheng <qi.zheng@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'scripts/objdiff')
0 files changed, 0 insertions, 0 deletions
