linux-stable.git/mm, branch v5.13.2

kfence: unconditionally use unbound work queue

2021-07-14T15:07:47+00:00

[ Upstream commit ff06e45d3aace3f93d23956c1e655224f363ebe2 ]

Unconditionally use unbound work queue, and not just if wq_power_efficient
is true.  Because if the system is idle, KFENCE may wait, and by being run
on the unbound work queue, we permit the scheduler to make better
scheduling decisions and not require pinning KFENCE to the same CPU upon
waking up.

Link: https://lkml.kernel.org/r/20210521111630.472579-1-elver@google.com
Fixes: 36f0b35d0894 ("kfence: use power-efficient work queue to run delayed work")
Signed-off-by: Marco Elver 
Reported-by: Hillf Danton 
Reviewed-by: Alexander Potapenko 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/zswap.c: fix two bugs in zswap_writeback_entry()

2021-07-14T15:07:47+00:00

[ Upstream commit 46b76f2e09dc35f70aca2f4349eb0d158f53fe93 ]

In the ZSWAP_SWAPCACHE_FAIL and ZSWAP_SWAPCACHE_EXIST case, we forgot to
call zpool_unmap_handle() when zpool can't sleep. And we might sleep in
zswap_get_swap_cache_page() while zpool can't sleep. To fix all of these,
zpool_unmap_handle() should be done before zswap_get_swap_cache_page()
when zpool can't sleep.

Link: https://lkml.kernel.org/r/20210522092242.3233191-4-linmiaohe@huawei.com
Fixes: fc6697a89f56 ("mm/zswap: add the flag can_sleep_mapped")
Signed-off-by: Miaohe Lin 
Cc: Colin Ian King 
Cc: Dan Streetman 
Cc: Nathan Chancellor 
Cc: Sebastian Andrzej Siewior 
Cc: Seth Jennings 
Cc: Tian Tao 
Cc: Vitaly Wool 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm: migrate: fix missing update page_private to hugetlb_page_subpool

2021-07-14T15:07:47+00:00

[ Upstream commit 6acfb5ba150cf75005ce85e0e25d79ef2fec287c ]

Since commit d6995da31122 ("hugetlb: use page.private for hugetlb specific
page flags") converts page.private for hugetlb specific page flags.  We
should use hugetlb_page_subpool() to get the subpool pointer instead of
page_private().

This 'could' prevent the migration of hugetlb pages.  page_private(hpage)
is now used for hugetlb page specific flags.  At migration time, the only
flag which could be set is HPageVmemmapOptimized.  This flag will only be
set if the new vmemmap reduction feature is enabled.  In addition,
!page_mapping() implies an anonymous mapping.  So, this will prevent
migration of hugetb pages in anonymous mappings if the vmemmap reduction
feature is enabled.

In addition, that if statement checked for the rare race condition of a
page being migrated while in the process of being freed.  Since that check
is now wrong, we could leak hugetlb subpool usage counts.

The commit forgot to update it in the page migration routine.  So fix it.

[songmuchun@bytedance.com: fix compiler error when !CONFIG_HUGETLB_PAGE reported by Randy]
  Link: https://lkml.kernel.org/r/20210521022747.35736-1-songmuchun@bytedance.com

Link: https://lkml.kernel.org/r/20210520025949.1866-1-songmuchun@bytedance.com
Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
Signed-off-by: Muchun Song 
Reported-by: Anshuman Khandual 
Reviewed-by: Mike Kravetz 
Acked-by: Michal Hocko 
Tested-by: Anshuman Khandual 	[arm64]
Cc: Oscar Salvador 
Cc: David Hildenbrand 
Cc: Matthew Wilcox 
Cc: Xiongchun Duan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page

2021-07-14T15:07:47+00:00

[ Upstream commit 28473d91ff7f686d58047ff55f2fa98ab59114a4 ]

We should use release_z3fold_page_locked() to release z3fold page when
it's locked, although it looks harmless to use release_z3fold_page() now.

Link: https://lkml.kernel.org/r/20210619093151.1492174-7-linmiaohe@huawei.com
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin 
Reviewed-by: Vitaly Wool 
Cc: Hillf Danton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/z3fold: fix potential memory leak in z3fold_destroy_pool()

2021-07-14T15:07:47+00:00

[ Upstream commit dac0d1cfda56472378d330b1b76b9973557a7b1d ]

There is a memory leak in z3fold_destroy_pool() as it forgets to
free_percpu pool->unbuddied.  Call free_percpu for pool->unbuddied to fix
this issue.

Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com
Fixes: d30561c56f41 ("z3fold: use per-cpu unbuddied lists")
Signed-off-by: Miaohe Lin 
Reviewed-by: Vitaly Wool 
Cc: Hillf Danton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

hugetlb: remove prep_compound_huge_page cleanup

2021-07-14T15:07:47+00:00

[ Upstream commit 48b8d744ea841b8adf8d07bfe7a2d55f22e4d179 ]

Patch series "Fix prep_compound_gigantic_page ref count adjustment".

These patches address the possible race between
prep_compound_gigantic_page and __page_cache_add_speculative as described
by Jann Horn in [1].

The first patch simply removes the unnecessary/obsolete helper routine
prep_compound_huge_page to make the actual fix a little simpler.

The second patch is the actual fix and has a detailed explanation in the
commit message.

This potential issue has existed for almost 10 years and I am unaware of
anyone actually hitting the race.  I did not cc stable, but would be happy
to squash the patches and send to stable if anyone thinks that is a good
idea.

[1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/

This patch (of 2):

I could not think of a reliable way to recreate the issue for testing.
Rather, I 'simulated errors' to exercise all the error paths.

The routine prep_compound_huge_page is a simple wrapper to call either
prep_compound_gigantic_page or prep_compound_page.  However, it is only
called from gather_bootmem_prealloc which only processes gigantic pages.
Eliminate the routine and call prep_compound_gigantic_page directly.

Link: https://lkml.kernel.org/r/20210622021423.154662-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20210622021423.154662-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz 
Cc: Andrea Arcangeli 
Cc: Jan Kara 
Cc: Jann Horn 
Cc: John Hubbard 
Cc: "Kirill A . Shutemov" 
Cc: Matthew Wilcox 
Cc: Michal Hocko 
Cc: Youquan Song 
Cc: Muchun Song 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/huge_memory.c: don't discard hugepage if other processes are mapping it

2021-07-14T15:07:47+00:00

[ Upstream commit babbbdd08af98a59089334eb3effbed5a7a0cf7f ]

If other processes are mapping any other subpages of the hugepage, i.e.
in pte-mapped thp case, page_mapcount() will return 1 incorrectly.  Then
we would discard the page while other processes are still mapping it.  Fix
it by using total_mapcount() which can tell whether other processes are
still mapping it.

Link: https://lkml.kernel.org/r/20210511134857.1581273-6-linmiaohe@huawei.com
Fixes: b8d3c4c3009d ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
Reviewed-by: Yang Shi 
Signed-off-by: Miaohe Lin 
Cc: Alexey Dobriyan 
Cc: "Aneesh Kumar K . V" 
Cc: Anshuman Khandual 
Cc: David Hildenbrand 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Kirill A. Shutemov 
Cc: Matthew Wilcox 
Cc: Minchan Kim 
Cc: Ralph Campbell 
Cc: Rik van Riel 
Cc: Song Liu 
Cc: William Kucharski 
Cc: Zi Yan 
Cc: Mike Kravetz 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()

2021-07-14T15:07:46+00:00

[ Upstream commit e6be37b2e7bddfe0c76585ee7c7eee5acc8efeab ]

Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
(non-shmem) FS"), read-only THP file mapping is supported.  But it forgot
to add checking for it in transparent_hugepage_enabled().  To fix it, we
add checking for read-only THP file mapping and also introduce helper
transhuge_vma_enabled() to check whether thp is enabled for specified vma
to reduce duplicated code.  We rename transparent_hugepage_enabled to
transparent_hugepage_active to make the code easier to follow as suggested
by David Hildenbrand.

[linmiaohe@huawei.com: define transhuge_vma_enabled next to transhuge_vma_suitable]
  Link: https://lkml.kernel.org/r/20210514093007.4117906-1-linmiaohe@huawei.com

Link: https://lkml.kernel.org/r/20210511134857.1581273-4-linmiaohe@huawei.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Miaohe Lin 
Reviewed-by: Yang Shi 
Cc: Alexey Dobriyan 
Cc: "Aneesh Kumar K . V" 
Cc: Anshuman Khandual 
Cc: David Hildenbrand 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Kirill A. Shutemov 
Cc: Matthew Wilcox 
Cc: Minchan Kim 
Cc: Ralph Campbell 
Cc: Rik van Riel 
Cc: Song Liu 
Cc: William Kucharski 
Cc: Zi Yan 
Cc: Mike Kravetz 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/page_alloc: fix counting of managed_pages

2021-07-14T15:06:54+00:00

[ Upstream commit f7ec104458e00d27a190348ac3a513f3df3699a4 ]

commit f63661566fad ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if
the zone is empty") clears out zone->lowmem_reserve[] if zone is empty.
But when zone is not empty and sysctl_lowmem_reserve_ratio[i] is set to
zero, zone_managed_pages(zone) is not counted in the managed_pages either.
This is inconsistent with the description of lowmem_reserve, so fix it.

Link: https://lkml.kernel.org/r/20210527125707.3760259-1-liushixin2@huawei.com
Fixes: f63661566fad ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty")
Signed-off-by: Liu Shixin 
Reported-by: yangerkun 
Reviewed-by: Baoquan He 
Acked-by: David Hildenbrand 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm: memcg/slab: properly set up gfp flags for objcg pointer array

2021-07-14T15:06:54+00:00

[ Upstream commit 41eb5df1cbc9b302fc263ad7c9f38cfc38b4df61 ]

Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.

Since the merging of the new slab memory controller in v5.9, the page
structure stores a pointer to objcg pointer array for slab pages.  When
the slab has no used objects, it can be freed in free_slab() which will
call kfree() to free the objcg pointer array in
memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.

With the right workload, the slab cache may be set up in a way that allows
the recursive kfree() calling loop to nest deep enough to cause a kernel
stack overflow and panic the system.  In fact, we have a reproducer that
can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
and kmalloc-rcl-128 slabs with the following kfree() loop recursively
called 74 times:

  [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
[<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
[<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
[<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
also found an issue on the allocation side.  If the objcg pointer array
happen to come from the same slab or a circular dependency linkage is
formed with multiple slabs, those affected slabs can never be freed again.

This patch series addresses these two issues by introducing a new set of
kmalloc-cg- caches split from kmalloc- caches.  The new set will
only contain non-reclaimable and non-dma objects that are accounted in
memory cgroups whereas the old set are now for unaccounted objects only.
By making this split, all the objcg pointer arrays will come from the
kmalloc- caches, but those caches will never hold any objcg pointer
array.  As a result, deeply nested kfree() call and the unfreeable slab
problems are now gone.

This patch (of 4):

Since the merging of the new slab memory controller in v5.9, the page
structure may store a pointer to obj_cgroup pointer array for slab pages.
Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
is not readily reclaimable and doesn't need to come from the DMA buffer.
So those GFP bits should be masked off as well.

Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
that it is consistently applied no matter where it is called.

Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long 
Reviewed-by: Shakeel Butt 
Acked-by: Roman Gushchin 
Reviewed-by: Vlastimil Babka 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin