<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/memcontrol.c, branch v5.9</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: memcontrol: fix missing suffix of workingset_restore</title>
<updated>2020-09-26T17:33:57+00:00</updated>
<author>
<name>Muchun Song</name>
<email>songmuchun@bytedance.com</email>
</author>
<published>2020-09-26T04:19:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8d3fe09d8d6645dcbbe2413cde58f51ceb6545a6'/>
<id>8d3fe09d8d6645dcbbe2413cde58f51ceb6545a6</id>
<content type='text'>
We forget to add the suffix to the workingset_restore string, so fix it.

And also update the documentation of cgroup-v2.rst.

Fixes: 170b04b7ae49 ("mm/workingset: prepare the workingset detection infrastructure for anon LRU")
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200916100030.71698-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We forget to add the suffix to the workingset_restore string, so fix it.

And also update the documentation of cgroup-v2.rst.

Fixes: 170b04b7ae49 ("mm/workingset: prepare the workingset detection infrastructure for anon LRU")
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200916100030.71698-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: fix use-after-free in uncharge_batch</title>
<updated>2020-09-05T19:14:29+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2020-09-04T23:35:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1796544a0ca0f14386a679d3d05fbc69235015e'/>
<id>f1796544a0ca0f14386a679d3d05fbc69235015e</id>
<content type='text'>
syzbot has reported an use-after-free in the uncharge_batch path

  BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
  BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
  BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
  BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline]
  BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
  Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304

  CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1f0/0x31e lib/dump_stack.c:118
    print_address_description+0x66/0x620 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report+0x132/0x1d0 mm/kasan/report.c:530
    check_memory_region_inline mm/kasan/generic.c:183 [inline]
    check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192
    instrument_atomic_write include/linux/instrumented.h:71 [inline]
    atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
    atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
    page_counter_cancel mm/page_counter.c:54 [inline]
    page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
    uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764
    uncharge_page+0x115/0x430 mm/memcontrol.c:6796
    uncharge_list mm/memcontrol.c:6835 [inline]
    mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877
    release_pages+0x13a2/0x1550 mm/swap.c:911
    tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
    tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
    tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
    tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328
    exit_mmap+0x296/0x550 mm/mmap.c:3185
    __mmput+0x113/0x370 kernel/fork.c:1076
    exit_mm+0x4cd/0x550 kernel/exit.c:483
    do_exit+0x576/0x1f20 kernel/exit.c:793
    do_group_exit+0x161/0x2d0 kernel/exit.c:903
    get_signal+0x139b/0x1d30 kernel/signal.c:2743
    arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811
    exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
    exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166
    syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from
page accounting") reworked the memcg lifetime to be bound the the struct
page rather than charges.  It also removed the css_put_many from
uncharge_batch and that is causing the above splat.

uncharge_batch() is supposed to uncharge accumulated charges for all
pages freed from the same memcg.  The queuing is done by uncharge_page
which however drops the memcg reference after it adds charges to the
batch.  If the current page happens to be the last one holding the
reference for its memcg then the memcg is OK to go and the next page to
be freed will trigger batched uncharge which needs to access the memcg
which is gone already.

Fix the issue by taking a reference for the memcg in the current batch.

Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting")
Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com
Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot has reported an use-after-free in the uncharge_batch path

  BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
  BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
  BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
  BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline]
  BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
  Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304

  CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1f0/0x31e lib/dump_stack.c:118
    print_address_description+0x66/0x620 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report+0x132/0x1d0 mm/kasan/report.c:530
    check_memory_region_inline mm/kasan/generic.c:183 [inline]
    check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192
    instrument_atomic_write include/linux/instrumented.h:71 [inline]
    atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
    atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
    page_counter_cancel mm/page_counter.c:54 [inline]
    page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
    uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764
    uncharge_page+0x115/0x430 mm/memcontrol.c:6796
    uncharge_list mm/memcontrol.c:6835 [inline]
    mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877
    release_pages+0x13a2/0x1550 mm/swap.c:911
    tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
    tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
    tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
    tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328
    exit_mmap+0x296/0x550 mm/mmap.c:3185
    __mmput+0x113/0x370 kernel/fork.c:1076
    exit_mm+0x4cd/0x550 kernel/exit.c:483
    do_exit+0x576/0x1f20 kernel/exit.c:793
    do_group_exit+0x161/0x2d0 kernel/exit.c:903
    get_signal+0x139b/0x1d30 kernel/signal.c:2743
    arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811
    exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
    exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166
    syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from
page accounting") reworked the memcg lifetime to be bound the the struct
page rather than charges.  It also removed the css_put_many from
uncharge_batch and that is causing the above splat.

uncharge_batch() is supposed to uncharge accumulated charges for all
pages freed from the same memcg.  The queuing is done by uncharge_page
which however drops the memcg reference after it adds charges to the
batch.  If the current page happens to be the last one holding the
reference for its memcg then the memcg is OK to go and the next page to
be freed will trigger batched uncharge which needs to access the memcg
which is gone already.

Fix the issue by taking a reference for the memcg in the current batch.

Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting")
Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com
Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: replace hpage_nr_pages with thp_nr_pages</title>
<updated>2020-08-15T02:56:56+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2020-08-15T00:30:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c357848b44b4016ca422178aa368a7472245f6f'/>
<id>6c357848b44b4016ca422178aa368a7472245f6f</id>
<content type='text'>
The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcontrol: fix warning when allocating the root cgroup</title>
<updated>2020-08-13T19:15:21+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-08-13T14:40:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9f457179244a1c0316546b1760f8993d0d718861'/>
<id>9f457179244a1c0316546b1760f8993d0d718861</id>
<content type='text'>
Commit 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the
parent cgroup") adds memory tracking to the memcg kernel structures
themselves to make cgroups liable for the memory they are consuming
through the allocation of child groups (which can be significant).

This code is a bit awkward as it's spread out through several functions:
The outermost function does memalloc_use_memcg(parent) to set up
current-&gt;active_memcg, which designates which cgroup to charge, and the
inner functions pass GFP_ACCOUNT to request charging for specific
allocations.  To make sure this dependency is satisfied at all times -
to make sure we don't randomly charge whoever is calling the functions -
the inner functions warn on !current-&gt;active_memcg.

However, this triggers a false warning when the root memcg itself is
allocated.  No parent exists in this case, and so current-&gt;active_memcg
is rightfully NULL.  It's a false positive, not indicative of a bug.

Delete the warnings for now, we can revisit this later.

Fixes: 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the parent cgroup")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the
parent cgroup") adds memory tracking to the memcg kernel structures
themselves to make cgroups liable for the memory they are consuming
through the allocation of child groups (which can be significant).

This code is a bit awkward as it's spread out through several functions:
The outermost function does memalloc_use_memcg(parent) to set up
current-&gt;active_memcg, which designates which cgroup to charge, and the
inner functions pass GFP_ACCOUNT to request charging for specific
allocations.  To make sure this dependency is satisfied at all times -
to make sure we don't randomly charge whoever is calling the functions -
the inner functions warn on !current-&gt;active_memcg.

However, this triggers a false warning when the root memcg itself is
allocated.  No parent exists in this case, and so current-&gt;active_memcg
is rightfully NULL.  It's a false positive, not indicative of a bug.

Delete the warnings for now, we can revisit this later.

Fixes: 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the parent cgroup")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memcontrol.c: delete duplicated words</title>
<updated>2020-08-12T17:57:58+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2020-08-12T01:33:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac5ddd0fcea9bfee1d48884e7fa94e706d213aac'/>
<id>ac5ddd0fcea9bfee1d48884e7fa94e706d213aac</id>
<content type='text'>
Drop the repeated word "down".

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Link: http://lkml.kernel.org/r/20200801173822.14973-6-rdunlap@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Drop the repeated word "down".

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Link: http://lkml.kernel.org/r/20200801173822.14973-6-rdunlap@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/workingset: prepare the workingset detection infrastructure for anon LRU</title>
<updated>2020-08-12T17:57:55+00:00</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2020-08-12T01:30:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=170b04b7ae49634df103810dad67b22cf8a99aa6'/>
<id>170b04b7ae49634df103810dad67b22cf8a99aa6</id>
<content type='text'>
To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon and
file variants, as well as the refaults counter in struct lruvec.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Link: http://lkml.kernel.org/r/1595490560-15117-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon and
file variants, as well as the refaults counter in struct lruvec.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Link: http://lkml.kernel.org/r/1595490560-15117-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcg: charge memcg percpu memory to the parent cgroup</title>
<updated>2020-08-12T17:57:55+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-08-12T01:30:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3e38e0aaca9eafb12b1c4b731d1c10975cbe7974'/>
<id>3e38e0aaca9eafb12b1c4b731d1c10975cbe7974</id>
<content type='text'>
Memory cgroups are using large chunks of percpu memory to store vmstat
data.  Yet this memory is not accounted at all, so in the case when there
are many (dying) cgroups, it's not exactly clear where all the memory is.

Because the size of memory cgroup internal structures can dramatically
exceed the size of object or page which is pinning it in the memory, it's
not a good idea to simply ignore it.  It actually breaks the isolation
between cgroups.

Let's account the consumed percpu memory to the parent cgroup.

[guro@fb.com: add WARN_ON_ONCE()s, per Johannes]
  Link: http://lkml.kernel.org/r/20200811170611.GB1507044@carbon.DHCP.thefacebook.com

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tobin C. Harding &lt;tobin@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Link: http://lkml.kernel.org/r/20200623184515.4132564-5-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Memory cgroups are using large chunks of percpu memory to store vmstat
data.  Yet this memory is not accounted at all, so in the case when there
are many (dying) cgroups, it's not exactly clear where all the memory is.

Because the size of memory cgroup internal structures can dramatically
exceed the size of object or page which is pinning it in the memory, it's
not a good idea to simply ignore it.  It actually breaks the isolation
between cgroups.

Let's account the consumed percpu memory to the parent cgroup.

[guro@fb.com: add WARN_ON_ONCE()s, per Johannes]
  Link: http://lkml.kernel.org/r/20200811170611.GB1507044@carbon.DHCP.thefacebook.com

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tobin C. Harding &lt;tobin@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Link: http://lkml.kernel.org/r/20200623184515.4132564-5-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcg/percpu: per-memcg percpu memory statistics</title>
<updated>2020-08-12T17:57:55+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-08-12T01:30:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=772616b031f06e05846488b01dab46a7c832da13'/>
<id>772616b031f06e05846488b01dab46a7c832da13</id>
<content type='text'>
Percpu memory can represent a noticeable chunk of the total memory
consumption, especially on big machines with many CPUs.  Let's track
percpu memory usage for each memcg and display it in memory.stat.

A percpu allocation is usually scattered over multiple pages (and nodes),
and can be significantly smaller than a page.  So let's add a byte-sized
counter on the memcg level: MEMCG_PERCPU_B.  Byte-sized vmstat infra
created for slabs can be perfectly reused for percpu case.

[guro@fb.com: v3]
  Link: http://lkml.kernel.org/r/20200623184515.4132564-4-guro@fb.com

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tobin C. Harding &lt;tobin@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Link: http://lkml.kernel.org/r/20200608230819.832349-4-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Percpu memory can represent a noticeable chunk of the total memory
consumption, especially on big machines with many CPUs.  Let's track
percpu memory usage for each memcg and display it in memory.stat.

A percpu allocation is usually scattered over multiple pages (and nodes),
and can be significantly smaller than a page.  So let's add a byte-sized
counter on the memcg level: MEMCG_PERCPU_B.  Byte-sized vmstat infra
created for slabs can be perfectly reused for percpu case.

[guro@fb.com: v3]
  Link: http://lkml.kernel.org/r/20200623184515.4132564-4-guro@fb.com

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tobin C. Harding &lt;tobin@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Link: http://lkml.kernel.org/r/20200608230819.832349-4-guro@fb.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcontrol: don't count limit-setting reclaim as memory pressure</title>
<updated>2020-08-07T18:33:26+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-08-07T06:22:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e22c6ed90aa91abc08f107344428ebb8c2629e98'/>
<id>e22c6ed90aa91abc08f107344428ebb8c2629e98</id>
<content type='text'>
When an outside process lowers one of the memory limits of a cgroup (or
uses the force_empty knob in cgroup1), direct reclaim is performed in the
context of the write(), in order to directly enforce the new limit and
have it being met by the time the write() returns.

Currently, this reclaim activity is accounted as memory pressure in the
cgroup that the writer(!) belongs to.  This is unexpected.  It
specifically causes problems for senpai
(https://github.com/facebookincubator/senpai), which is an agent that
routinely adjusts the memory limits and performs associated reclaim work
in tens or even hundreds of cgroups running on the host.  The cgroup that
senpai is running in itself will report elevated levels of memory
pressure, even though it itself is under no memory shortage or any sort of
distress.

Move the psi annotation from the central cgroup reclaim function to
callsites in the allocation context, and thereby no longer count any
limit-setting reclaim as memory pressure.  If the newly set limit causes
the workload inside the cgroup into direct reclaim, that of course will
continue to count as memory pressure.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Reviewed-by: Roman Gushchin &lt;guro@fb.com&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Link: http://lkml.kernel.org/r/20200728135210.379885-2-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an outside process lowers one of the memory limits of a cgroup (or
uses the force_empty knob in cgroup1), direct reclaim is performed in the
context of the write(), in order to directly enforce the new limit and
have it being met by the time the write() returns.

Currently, this reclaim activity is accounted as memory pressure in the
cgroup that the writer(!) belongs to.  This is unexpected.  It
specifically causes problems for senpai
(https://github.com/facebookincubator/senpai), which is an agent that
routinely adjusts the memory limits and performs associated reclaim work
in tens or even hundreds of cgroups running on the host.  The cgroup that
senpai is running in itself will report elevated levels of memory
pressure, even though it itself is under no memory shortage or any sort of
distress.

Move the psi annotation from the central cgroup reclaim function to
callsites in the allocation context, and thereby no longer count any
limit-setting reclaim as memory pressure.  If the newly set limit causes
the workload inside the cgroup into direct reclaim, that of course will
continue to count as memory pressure.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Reviewed-by: Roman Gushchin &lt;guro@fb.com&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Link: http://lkml.kernel.org/r/20200728135210.379885-2-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcontrol: restore proper dirty throttling when memory.high changes</title>
<updated>2020-08-07T18:33:26+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2020-08-07T06:22:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19ce33acbbd03c371f2d4b533879c4755686f95f'/>
<id>19ce33acbbd03c371f2d4b533879c4755686f95f</id>
<content type='text'>
Commit 8c8c383c04f6 ("mm: memcontrol: try harder to set a new
memory.high") inadvertently removed a callback to recalculate the
writeback cache size in light of a newly configured memory.high limit.

Without letting the writeback cache know about a potentially heavily
reduced limit, it may permit too many dirty pages, which can cause
unnecessary reclaim latencies or even avoidable OOM situations.

This was spotted while reading the code, it hasn't knowingly caused any
problems in practice so far.

Fixes: 8c8c383c04f6 ("mm: memcontrol: try harder to set a new memory.high")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Link: http://lkml.kernel.org/r/20200728135210.379885-1-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 8c8c383c04f6 ("mm: memcontrol: try harder to set a new
memory.high") inadvertently removed a callback to recalculate the
writeback cache size in light of a newly configured memory.high limit.

Without letting the writeback cache know about a potentially heavily
reduced limit, it may permit too many dirty pages, which can cause
unnecessary reclaim latencies or even avoidable OOM situations.

This was spotted while reading the code, it hasn't knowingly caused any
problems in practice so far.

Fixes: 8c8c383c04f6 ("mm: memcontrol: try harder to set a new memory.high")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Link: http://lkml.kernel.org/r/20200728135210.379885-1-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
