<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/vmalloc.c, branch v6.5.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: add a call to flush_cache_vmap() in vmap_pfn()</title>
<updated>2023-08-21T20:07:21+00:00</updated>
<author>
<name>Alexandre Ghiti</name>
<email>alexghiti@rivosinc.com</email>
</author>
<published>2023-08-09T16:46:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a50420c79731fc5cf27ad43719c1091e842a2606'/>
<id>a50420c79731fc5cf27ad43719c1091e842a2606</id>
<content type='text'>
flush_cache_vmap() must be called after new vmalloc mappings are installed
in the page table in order to allow architectures to make sure the new
mapping is visible.

It could lead to a panic since on some architectures (like powerpc),
the page table walker could see the wrong pte value and trigger a
spurious page fault that can not be resolved (see commit f1cb8f9beba8
("powerpc/64s/radix: avoid ptesync after set_pte and
ptep_set_access_flags")).

But actually the patch is aiming at riscv: the riscv specification
allows the caching of invalid entries in the TLB, and since we recently
removed the vmalloc page fault handling, we now need to emit a tlb
shootdown whenever a new vmalloc mapping is emitted
(https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/).
That's a temporary solution, there are ways to avoid that :)

Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com
Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Reported-by: Dylan Jhong &lt;dylan@andestech.com&gt;
Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/
Signed-off-by: Alexandre Ghiti &lt;alexghiti@rivosinc.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
Acked-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
Reviewed-by: Dylan Jhong &lt;dylan@andestech.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
flush_cache_vmap() must be called after new vmalloc mappings are installed
in the page table in order to allow architectures to make sure the new
mapping is visible.

It could lead to a panic since on some architectures (like powerpc),
the page table walker could see the wrong pte value and trigger a
spurious page fault that can not be resolved (see commit f1cb8f9beba8
("powerpc/64s/radix: avoid ptesync after set_pte and
ptep_set_access_flags")).

But actually the patch is aiming at riscv: the riscv specification
allows the caching of invalid entries in the TLB, and since we recently
removed the vmalloc page fault handling, we now need to emit a tlb
shootdown whenever a new vmalloc mapping is emitted
(https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/).
That's a temporary solution, there are ways to avoid that :)

Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com
Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Reported-by: Dylan Jhong &lt;dylan@andestech.com&gt;
Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/
Signed-off-by: Alexandre Ghiti &lt;alexghiti@rivosinc.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
Acked-by: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
Reviewed-by: Dylan Jhong &lt;dylan@andestech.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.</title>
<updated>2023-06-23T23:58:19+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2023-06-23T23:58:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=63773d2b593d86440c3b96fd300ed80d00cd06ef'/>
<id>63773d2b593d86440c3b96fd300ed80d00cd06ef</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: ptep_get() conversion</title>
<updated>2023-06-19T23:19:25+00:00</updated>
<author>
<name>Ryan Roberts</name>
<email>ryan.roberts@arm.com</email>
</author>
<published>2023-06-12T15:15:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c33c794828f21217f72ce6fc140e0d34e0d56bff'/>
<id>c33c794828f21217f72ce6fc140e0d34e0d56bff</id>
<content type='text'>
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Dave Airlie &lt;airlied@gmail.com&gt;
Cc: Dimitri Sivanich &lt;dimitri.sivanich@hpe.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Oleksandr Tyshchenko &lt;oleksandr_tyshchenko@epam.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Dave Airlie &lt;airlied@gmail.com&gt;
Cc: Dimitri Sivanich &lt;dimitri.sivanich@hpe.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Oleksandr Tyshchenko &lt;oleksandr_tyshchenko@epam.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: vmalloc_to_page() use pte_offset_kernel()</title>
<updated>2023-06-19T23:19:14+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2023-06-09T01:21:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0d1c81edc61e553ed7a5db18fb8074c8b78e1538'/>
<id>0d1c81edc61e553ed7a5db18fb8074c8b78e1538</id>
<content type='text'>
vmalloc_to_page() was using pte_offset_map() (followed by pte_unmap()),
but it's intended for userspace page tables: prefer pte_offset_kernel().

Link: https://lkml.kernel.org/r/696386a-84f8-b33c-82e5-f865ed6eb39@google.com
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Steven Price &lt;steven.price@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zack Rusin &lt;zackr@vmware.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
vmalloc_to_page() was using pte_offset_map() (followed by pte_unmap()),
but it's intended for userspace page tables: prefer pte_offset_kernel().

Link: https://lkml.kernel.org/r/696386a-84f8-b33c-82e5-f865ed6eb39@google.com
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Steven Price &lt;steven.price@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zack Rusin &lt;zackr@vmware.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: replace the ternary conditional operator with min()</title>
<updated>2023-06-19T23:19:11+00:00</updated>
<author>
<name>Lu Hongfei</name>
<email>luhongfei@vivo.com</email>
</author>
<published>2023-06-09T09:30:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0e4bc271110e0c58c010071a9bbf150f39851dac'/>
<id>0e4bc271110e0c58c010071a9bbf150f39851dac</id>
<content type='text'>
It would be better to replace the traditional ternary conditional
operator with min() in zero_iter

Link: https://lkml.kernel.org/r/20230609093057.27777-1-luhongfei@vivo.com
Signed-off-by: Lu Hongfei &lt;luhongfei@vivo.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It would be better to replace the traditional ternary conditional
operator with min() in zero_iter

Link: https://lkml.kernel.org/r/20230609093057.27777-1-luhongfei@vivo.com
Signed-off-by: Lu Hongfei &lt;luhongfei@vivo.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: do not output a spurious warning when huge vmalloc() fails</title>
<updated>2023-06-19T20:19:31+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lstoakes@gmail.com</email>
</author>
<published>2023-06-05T20:11:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=95a301eefa82057571207edd06ea36218985a75e'/>
<id>95a301eefa82057571207edd06ea36218985a75e</id>
<content type='text'>
In __vmalloc_area_node() we always warn_alloc() when an allocation
performed by vm_area_alloc_pages() fails unless it was due to a pending
fatal signal.

However, huge page allocations instigated either by vmalloc_huge() or
__vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
kvmalloc_node()) always falls back to order-0 allocations if the huge page
allocation fails.

This renders the warning useless and noisy, especially as all callers
appear to be aware that this may fallback.  This has already resulted in
at least one bug report from a user who was confused by this (see link).

Therefore, simply update the code to only output this warning for order-0
pages when no fatal signal is pending.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com
Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()")
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In __vmalloc_area_node() we always warn_alloc() when an allocation
performed by vm_area_alloc_pages() fails unless it was due to a pending
fatal signal.

However, huge page allocations instigated either by vmalloc_huge() or
__vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
kvmalloc_node()) always falls back to order-0 allocations if the huge page
allocation fails.

This renders the warning useless and noisy, especially as all callers
appear to be aware that this may fallback.  This has already resulted in
at least one bug report from a user who was confused by this (see link).

Therefore, simply update the code to only output this warning for order-0
pages when no fatal signal is pending.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com
Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()")
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: vmalloc must set pte via arch code</title>
<updated>2023-06-09T23:25:55+00:00</updated>
<author>
<name>Ryan Roberts</name>
<email>ryan.roberts@arm.com</email>
</author>
<published>2023-06-02T09:29:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3f78e74986546a6da5d28c24e627d95d17f79ec'/>
<id>b3f78e74986546a6da5d28c24e627d95d17f79ec</id>
<content type='text'>
Patch series "Fixes for pte encapsulation bypasses", v3.

A series to improve the encapsulation of pte entries by disallowing
non-arch code from directly dereferencing pte_t pointers.


This patch (of 4):

It is bad practice to directly set pte entries within a pte table. 
Instead all modifications must go through arch-provided helpers such as
set_pte_at() to give the arch code visibility and allow it to check (and
potentially modify) the operation.

Link: https://lkml.kernel.org/r/20230602092949.545577-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230602092949.545577-2-ryan.roberts@arm.com
Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "Fixes for pte encapsulation bypasses", v3.

A series to improve the encapsulation of pte entries by disallowing
non-arch code from directly dereferencing pte_t pointers.


This patch (of 4):

It is bad practice to directly set pte entries within a pte table. 
Instead all modifications must go through arch-provided helpers such as
set_pte_at() to give the arch code visibility and allow it to check (and
potentially modify) the operation.

Link: https://lkml.kernel.org/r/20230602092949.545577-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230602092949.545577-2-ryan.roberts@arm.com
Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: dont purge usable blocks unnecessarily</title>
<updated>2023-06-09T23:25:41+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-05-25T12:57:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=77e50af07f14ea7b53f82f9417ddf2fd96c78da3'/>
<id>77e50af07f14ea7b53f82f9417ddf2fd96c78da3</id>
<content type='text'>
Purging fragmented blocks is done unconditionally in several contexts:

  1) From drain_vmap_area_work(), when the number of lazy to be freed
     vmap_areas reached the threshold

  2) Reclaiming vmalloc address space from pcpu_get_vm_areas()

  3) _vm_unmap_aliases()

#1 There is no reason to zap fragmented vmap blocks unconditionally, simply
   because reclaiming all lazy areas drains at least

      32MB * fls(num_online_cpus())

   per invocation which is plenty.

#2 Reclaiming when running out of space or due to memory pressure makes a
   lot of sense

#3 _unmap_aliases() requires to touch everything because the caller has no
   clue which vmap_area used a particular page last and the vmap_area lost
   that information too.

   Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
   vmap area first and then cares about the flush. That in turn requires
   a full walk of _all_ vmap areas including the one which was just
   added to the purge list.

   But as this has to be flushed anyway this is an opportunity to combine
   outstanding TLB flushes and do the housekeeping of purging freed areas,
   but like #1 there is no real good reason to zap usable vmap blocks
   unconditionally.

Add a @force_purge argument to the newly split out block purge function and
if not true only purge fragmented blocks which have less than 1/4 of their
capacity left.

Rename purge_vmap_area_lazy() to reclaim_and_purge_vmap_areas() to make it
clear what the function does.

[lstoakes@gmail.com: correct VMAP_PURGE_THRESHOLD check]
  Link: https://lkml.kernel.org/r/3e92ef61-b910-4576-88e7-cf43211fd4e7@lucifer.local
Link: https://lkml.kernel.org/r/20230525124504.864005691@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Purging fragmented blocks is done unconditionally in several contexts:

  1) From drain_vmap_area_work(), when the number of lazy to be freed
     vmap_areas reached the threshold

  2) Reclaiming vmalloc address space from pcpu_get_vm_areas()

  3) _vm_unmap_aliases()

#1 There is no reason to zap fragmented vmap blocks unconditionally, simply
   because reclaiming all lazy areas drains at least

      32MB * fls(num_online_cpus())

   per invocation which is plenty.

#2 Reclaiming when running out of space or due to memory pressure makes a
   lot of sense

#3 _unmap_aliases() requires to touch everything because the caller has no
   clue which vmap_area used a particular page last and the vmap_area lost
   that information too.

   Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
   vmap area first and then cares about the flush. That in turn requires
   a full walk of _all_ vmap areas including the one which was just
   added to the purge list.

   But as this has to be flushed anyway this is an opportunity to combine
   outstanding TLB flushes and do the housekeeping of purging freed areas,
   but like #1 there is no real good reason to zap usable vmap blocks
   unconditionally.

Add a @force_purge argument to the newly split out block purge function and
if not true only purge fragmented blocks which have less than 1/4 of their
capacity left.

Rename purge_vmap_area_lazy() to reclaim_and_purge_vmap_areas() to make it
clear what the function does.

[lstoakes@gmail.com: correct VMAP_PURGE_THRESHOLD check]
  Link: https://lkml.kernel.org/r/3e92ef61-b910-4576-88e7-cf43211fd4e7@lucifer.local
Link: https://lkml.kernel.org/r/20230525124504.864005691@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: add missing READ/WRITE_ONCE() annotations</title>
<updated>2023-06-09T23:25:41+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-05-25T12:57:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7f48121e9fa82bdaf0bd0f7a7e49f48803c6c0e8'/>
<id>7f48121e9fa82bdaf0bd0f7a7e49f48803c6c0e8</id>
<content type='text'>
purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
lockless for a quick check.

Add the missing READ/WRITE_ONCE() annotations.

Link: https://lkml.kernel.org/r/20230525124504.807356682@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
lockless for a quick check.

Add the missing READ/WRITE_ONCE() annotations.

Link: https://lkml.kernel.org/r/20230525124504.807356682@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmalloc: check free space in vmap_block lockless</title>
<updated>2023-06-09T23:25:41+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-05-25T12:57:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=43d7650234c62201ba3ca5b731226b0b189989a8'/>
<id>43d7650234c62201ba3ca5b731226b0b189989a8</id>
<content type='text'>
vb_alloc() unconditionally locks a vmap_block on the free list to check
the free space.

This can be done locklessly because vmap_block::free never increases, it's
only decreased on allocations.

Check the free space lockless and only if that succeeds, recheck under the
lock.

Link: https://lkml.kernel.org/r/20230525124504.750481992@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
vb_alloc() unconditionally locks a vmap_block on the free list to check
the free space.

This can be done locklessly because vmap_block::free never increases, it's
only decreased on allocations.

Check the free space lockless and only if that succeeds, recheck under the
lock.

Link: https://lkml.kernel.org/r/20230525124504.750481992@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
