<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/memfd.c, branch v6.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: move folio_index to mm/swap.h and remove no longer needed helper</title>
<updated>2025-05-13T06:50:50+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-04-30T18:10:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d0f0f06153116bb737eef76d47406639e161613'/>
<id>7d0f0f06153116bb737eef76d47406639e161613</id>
<content type='text'>
There are no remaining users of folio_index() outside the mm subsystem. 
Move it to mm/swap.h to co-locate it with swap_cache_index(), eliminating
a forward declaration, and a function call overhead.

Also remove the helper that was used to fix circular header dependency
issue.

Link: https://lkml.kernel.org/r/20250430181052.55698-6-ryncsn@gmail.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Cc: Joanne Koong &lt;joannelkoong@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Qu Wenruo &lt;wqu@suse.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are no remaining users of folio_index() outside the mm subsystem. 
Move it to mm/swap.h to co-locate it with swap_cache_index(), eliminating
a forward declaration, and a function call overhead.

Also remove the helper that was used to fix circular header dependency
issue.

Link: https://lkml.kernel.org/r/20250430181052.55698-6-ryncsn@gmail.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Cc: Joanne Koong &lt;joannelkoong@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Qu Wenruo &lt;wqu@suse.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memfd: fix spelling and grammatical issues</title>
<updated>2025-03-17T05:06:04+00:00</updated>
<author>
<name>Liu Ye</name>
<email>liuye@kylinos.cn</email>
</author>
<published>2025-02-06T06:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=33c9b01ed2fcbc101cdfeb497f4581e981e7c1e7'/>
<id>33c9b01ed2fcbc101cdfeb497f4581e981e7c1e7</id>
<content type='text'>
The comment "If a private mapping then writability is irrelevant" contains
a typo.  It should be "If a private mapping then writability is
irrelevant".  The comment "SEAL_EXEC implys SEAL_WRITE, making W^X from
the start." contains a typo.  It should be "SEAL_EXEC implies SEAL_WRITE,
making W^X from the start."

Link: https://lkml.kernel.org/r/20250206060958.98010-1-liuye@kylinos.cn
Signed-off-by: Liu Ye &lt;liuye@kylinos.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The comment "If a private mapping then writability is irrelevant" contains
a typo.  It should be "If a private mapping then writability is
irrelevant".  The comment "SEAL_EXEC implys SEAL_WRITE, making W^X from
the start." contains a typo.  It should be "SEAL_EXEC implies SEAL_WRITE,
making W^X from the start."

Link: https://lkml.kernel.org/r/20250206060958.98010-1-liuye@kylinos.cn
Signed-off-by: Liu Ye &lt;liuye@kylinos.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memfd: use strncpy_from_user() to read memfd name</title>
<updated>2025-01-26T04:22:40+00:00</updated>
<author>
<name>Isaac J. Manjarres</name>
<email>isaacmanjarres@google.com</email>
</author>
<published>2025-01-10T16:59:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f65ac0b7577a9691ed838c95e514ce287d8e36e'/>
<id>8f65ac0b7577a9691ed838c95e514ce287d8e36e</id>
<content type='text'>
The existing logic uses strnlen_user() to calculate the length of the
memfd name from userspace and then copies the string into a buffer using
copy_from_user().  This is error-prone, as the string length could have
changed between the time when it was calculated and when the string was
copied.  The existing logic handles this by ensuring that the last byte in
the buffer is the terminating zero.

This handling is contrived and can better be handled by using
strncpy_from_user(), which gets the length of the string and copies it in
one shot.  Therefore, simplify the logic for copying the memfd name by
using strncpy_from_user().

No functional change.

Link: https://lkml.kernel.org/r/20250110165904.3437374-3-isaacmanjarres@google.com
Signed-off-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Reviewed-by: Alice Ryhl &lt;aliceryhl@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The existing logic uses strnlen_user() to calculate the length of the
memfd name from userspace and then copies the string into a buffer using
copy_from_user().  This is error-prone, as the string length could have
changed between the time when it was calculated and when the string was
copied.  The existing logic handles this by ensuring that the last byte in
the buffer is the terminating zero.

This handling is contrived and can better be handled by using
strncpy_from_user(), which gets the length of the string and copies it in
one shot.  Therefore, simplify the logic for copying the memfd name by
using strncpy_from_user().

No functional change.

Link: https://lkml.kernel.org/r/20250110165904.3437374-3-isaacmanjarres@google.com
Signed-off-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Reviewed-by: Alice Ryhl &lt;aliceryhl@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memfd: refactor and cleanup the logic in memfd_create()</title>
<updated>2025-01-26T04:22:40+00:00</updated>
<author>
<name>Isaac J. Manjarres</name>
<email>isaacmanjarres@google.com</email>
</author>
<published>2025-01-10T16:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f5dbcd90dacd39ecd17fca86b47f52be6ffdd179'/>
<id>f5dbcd90dacd39ecd17fca86b47f52be6ffdd179</id>
<content type='text'>
Patch series "Cleanup for memfd_create()", v4.

memfd_create() handles all of its logic in a single function.  Some of the
logic in the function is also somewhat contrived (i.e.  copying the memfd
name from userpace).

This series aims to cleanup memfd_create() by splitting out the logic into
helper functions, and simplifying the memfd name copying to make the code
easier to follow.

This has no intended functional changes.

Thank you Alice and Lorenzo for reviewing v3 of this series and for your
feedback!


This patch (of 2):

memfd_create() is a pretty busy function that could be easier to read if
some of the logic was split out into helper functions.

Therefore, split the flags sanitization, name allocation, and file
structure allocation into their own helper functions.

No functional change.

Link: https://lkml.kernel.org/r/20250110165904.3437374-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20250110165904.3437374-2-isaacmanjarres@google.com
Signed-off-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Reviewed-by: Alice Ryhl &lt;aliceryhl@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "Cleanup for memfd_create()", v4.

memfd_create() handles all of its logic in a single function.  Some of the
logic in the function is also somewhat contrived (i.e.  copying the memfd
name from userpace).

This series aims to cleanup memfd_create() by splitting out the logic into
helper functions, and simplifying the memfd name copying to make the code
easier to follow.

This has no intended functional changes.

Thank you Alice and Lorenzo for reviewing v3 of this series and for your
feedback!


This patch (of 2):

memfd_create() is a pretty busy function that could be easier to read if
some of the logic was split out into helper functions.

Therefore, split the flags sanitization, name allocation, and file
structure allocation into their own helper functions.

No functional change.

Link: https://lkml.kernel.org/r/20250110165904.3437374-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20250110165904.3437374-2-isaacmanjarres@google.com
Signed-off-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Reviewed-by: Alice Ryhl &lt;aliceryhl@google.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: perform all memfd seal checks in a single place</title>
<updated>2025-01-14T06:40:51+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2024-12-06T21:28:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa00b8ef1803fe133b4897c25227aa0d298dd093'/>
<id>fa00b8ef1803fe133b4897c25227aa0d298dd093</id>
<content type='text'>
We no longer actually need to perform these checks in the f_op-&gt;mmap()
hook any longer.

We already moved the operation which clears VM_MAYWRITE on a read-only
mapping of a write-sealed memfd in order to work around the restrictions
imposed by commit 5de195060b2e ("mm: resolve faulty mmap_region() error
path behaviour").

There is no reason for us not to simply go ahead and additionally check to
see if any pre-existing seals are in place here rather than defer this to
the f_op-&gt;mmap() hook.

By doing this we remove more logic from shmem_mmap() which doesn't belong
there, as well as doing the same for hugetlbfs_file_mmap().  We also
remove dubious shared logic in mm.h which simply does not belong there
either.

It makes sense to do these checks at the earliest opportunity, we know
these are shmem (or hugetlbfs) mappings whose relevant VMA flags will not
change from the invoking do_mmap() so there is simply no need to wait.

This also means the implementation of further memfd seal flags can be done
within mm/memfd.c and also have the opportunity to modify VMA flags as
necessary early in the mapping logic.

[lorenzo.stoakes@oracle.com: fix typos in !memfd inline stub]
  Link: https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local
Link: https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jeff Xu &lt;jeffxu@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We no longer actually need to perform these checks in the f_op-&gt;mmap()
hook any longer.

We already moved the operation which clears VM_MAYWRITE on a read-only
mapping of a write-sealed memfd in order to work around the restrictions
imposed by commit 5de195060b2e ("mm: resolve faulty mmap_region() error
path behaviour").

There is no reason for us not to simply go ahead and additionally check to
see if any pre-existing seals are in place here rather than defer this to
the f_op-&gt;mmap() hook.

By doing this we remove more logic from shmem_mmap() which doesn't belong
there, as well as doing the same for hugetlbfs_file_mmap().  We also
remove dubious shared logic in mm.h which simply does not belong there
either.

It makes sense to do these checks at the earliest opportunity, we know
these are shmem (or hugetlbfs) mappings whose relevant VMA flags will not
change from the invoking do_mmap() so there is simply no need to wait.

This also means the implementation of further memfd seal flags can be done
within mm/memfd.c and also have the opportunity to modify VMA flags as
necessary early in the mapping logic.

[lorenzo.stoakes@oracle.com: fix typos in !memfd inline stub]
  Link: https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local
Link: https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jeff Xu &lt;jeffxu@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: reinstate ability to map write-sealed memfd mappings read-only</title>
<updated>2024-12-31T01:59:06+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2024-11-28T15:06:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8ec396d05d1b737c87311fb7311f753b02c2a6b1'/>
<id>8ec396d05d1b737c87311fb7311f753b02c2a6b1</id>
<content type='text'>
Patch series "mm: reinstate ability to map write-sealed memfd mappings
read-only".

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

This series reworks how we both permit write-sealed mappings being mapped
read-only and disallow mprotect() from undoing the write-seal, fixing this
regression.

We also add a regression test to ensure that we do not accidentally
regress this in future.

Thanks to Julian Orth for reporting this regression.


This patch (of 2):

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

This was previously unnecessarily disallowed, despite the man page
documentation indicating that it would be, thereby limiting the usefulness
of F_SEAL_WRITE logic.

We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
seal (one which disallows future writes to the memfd) to also be used for
F_SEAL_WRITE.

For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
read-only mapping to disallow mprotect() from overriding the seal - an
operation performed by seal_check_write(), invoked from shmem_mmap(), the
f_op-&gt;mmap() hook used by shmem mappings.

By extending this to F_SEAL_WRITE and critically - checking
mapping_map_writable() to determine if we may map the memfd AFTER we
invoke shmem_mmap() - the desired logic becomes possible.  This is because
mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
have cleared.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

We reinstate this functionality by moving the check out of shmem_mmap()
and instead performing it in do_mmap() at the point at which VMA flags are
being determined, which seems in any case to be a more appropriate place
in which to make this determination.

In order to achieve this we rework memfd seal logic to allow us access to
this information using existing logic and eliminate the clearing of
VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
instead.

Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com
Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reported-by: Julian Orth &lt;ju.orth@gmail.com&gt;
Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "mm: reinstate ability to map write-sealed memfd mappings
read-only".

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

This series reworks how we both permit write-sealed mappings being mapped
read-only and disallow mprotect() from undoing the write-seal, fixing this
regression.

We also add a regression test to ensure that we do not accidentally
regress this in future.

Thanks to Julian Orth for reporting this regression.


This patch (of 2):

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

This was previously unnecessarily disallowed, despite the man page
documentation indicating that it would be, thereby limiting the usefulness
of F_SEAL_WRITE logic.

We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
seal (one which disallows future writes to the memfd) to also be used for
F_SEAL_WRITE.

For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
read-only mapping to disallow mprotect() from overriding the seal - an
operation performed by seal_check_write(), invoked from shmem_mmap(), the
f_op-&gt;mmap() hook used by shmem mappings.

By extending this to F_SEAL_WRITE and critically - checking
mapping_map_writable() to determine if we may map the memfd AFTER we
invoke shmem_mmap() - the desired logic becomes possible.  This is because
mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
have cleared.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

We reinstate this functionality by moving the check out of shmem_mmap()
and instead performing it in do_mmap() at the point at which VMA flags are
being determined, which seems in any case to be a more appropriate place
in which to make this determination.

In order to achieve this we rework memfd seal logic to allow us access to
this information using existing logic and eliminate the clearing of
VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
instead.

Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com
Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reported-by: Julian Orth &lt;ju.orth@gmail.com&gt;
Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hugetlb: simplify refs in memfd_alloc_folio</title>
<updated>2024-09-26T21:01:44+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2024-09-04T19:41:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc677b5f3765cfd0944c8873d1ea57f1a3439676'/>
<id>dc677b5f3765cfd0944c8873d1ea57f1a3439676</id>
<content type='text'>
The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
delete the matching folio_put in memfd_pin_folios.  This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails.  That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio. 
Delete the latter.

This is a continuation of the fix
  "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"

[steven.sistare@oracle.com: remove explicit call to free_huge_folio(), per Matthew]
  Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
  Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@oracle.com
Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Suggested-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
delete the matching folio_put in memfd_pin_folios.  This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails.  That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio. 
Delete the latter.

This is a continuation of the fix
  "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"

[steven.sistare@oracle.com: remove explicit call to free_huge_folio(), per Matthew]
  Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
  Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@oracle.com
Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Suggested-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/gup: fix memfd_pin_folios hugetlb page allocation</title>
<updated>2024-09-26T21:01:43+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2024-09-03T14:25:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9289f020da47ef04b28865589eeee3d56d4bafea'/>
<id>9289f020da47ef04b28865589eeee3d56d4bafea</id>
<content type='text'>
When memfd_pin_folios -&gt; memfd_alloc_folio creates a hugetlb page, the
index is wrong.  The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever.  To fix,
adjust the index for the huge_page_order.

memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.

Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Acked-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When memfd_pin_folios -&gt; memfd_alloc_folio creates a hugetlb page, the
index is wrong.  The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever.  To fix,
adjust the index for the huge_page_order.

memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.

Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Acked-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak</title>
<updated>2024-09-26T21:01:43+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2024-09-03T14:25:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26a8ea80929c518bdec5e53a5776f95919b7c88e'/>
<id>26a8ea80929c518bdec5e53a5776f95919b7c88e</id>
<content type='text'>
memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
if the pages were not already faulted in.  During a normal page fault,
resv_huge_pages is consumed here:

hugetlb_fault()
  alloc_hugetlb_folio()
    dequeue_hugetlb_folio_vma()
      dequeue_hugetlb_folio_nodemask()
        dequeue_hugetlb_folio_node_exact()
          free_huge_pages--
      resv_huge_pages--

During memfd_pin_folios, the page is created by calling
alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
resv_huge_pages is not modified:

memfd_alloc_folio()
  alloc_hugetlb_folio_nodemask()
    dequeue_hugetlb_folio_nodemask()
      dequeue_hugetlb_folio_node_exact()
        free_huge_pages--

alloc_hugetlb_folio_nodemask has other callers that must not modify
resv_huge_pages.  Therefore, to fix, define an alternate version of
alloc_hugetlb_folio_nodemask for this call site that adjusts
resv_huge_pages.

Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Acked-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
if the pages were not already faulted in.  During a normal page fault,
resv_huge_pages is consumed here:

hugetlb_fault()
  alloc_hugetlb_folio()
    dequeue_hugetlb_folio_vma()
      dequeue_hugetlb_folio_nodemask()
        dequeue_hugetlb_folio_node_exact()
          free_huge_pages--
      resv_huge_pages--

During memfd_pin_folios, the page is created by calling
alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
resv_huge_pages is not modified:

memfd_alloc_folio()
  alloc_hugetlb_folio_nodemask()
    dequeue_hugetlb_folio_nodemask()
      dequeue_hugetlb_folio_node_exact()
        free_huge_pages--

alloc_hugetlb_folio_nodemask has other callers that must not modify
resv_huge_pages.  Therefore, to fix, define an alternate version of
alloc_hugetlb_folio_nodemask for this call site that adjusts
resv_huge_pages.

Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Acked-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/gup: introduce memfd_pin_folios() for pinning memfd folios</title>
<updated>2024-07-12T22:52:09+00:00</updated>
<author>
<name>Vivek Kasireddy</name>
<email>vivek.kasireddy@intel.com</email>
</author>
<published>2024-06-24T06:36:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=89c1905d9c140372b7f50ef48f42378cf85d9bc5'/>
<id>89c1905d9c140372b7f50ef48f42378cf85d9bc5</id>
<content type='text'>
For drivers that would like to longterm-pin the folios associated with a
memfd, the memfd_pin_folios() API provides an option to not only pin the
folios via FOLL_PIN but also to check and migrate them if they reside in
movable zone or CMA block.  This API currently works with memfds but it
should work with any files that belong to either shmemfs or hugetlbfs. 
Files belonging to other filesystems are rejected for now.

The folios need to be located first before pinning them via FOLL_PIN.  If
they are found in the page cache, they can be immediately pinned. 
Otherwise, they need to be allocated using the filesystem specific APIs
and then pinned.

[akpm@linux-foundation.org: improve the CONFIG_MMU=n situation, per SeongJae]
[vivek.kasireddy@intel.com: return -EINVAL if the end offset is greater than the size of memfd]
  Link: https://lkml.kernel.org/r/IA0PR11MB71850525CBC7D541CAB45DF1F8DB2@IA0PR11MB7185.namprd11.prod.outlook.com
Link: https://lkml.kernel.org/r/20240624063952.1572359-4-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt; (v2)
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt; (v3)
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt; (v6)
Acked-by: Dave Airlie &lt;airlied@redhat.com&gt;
Acked-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Dongwon Kim &lt;dongwon.kim@intel.com&gt;
Cc: Junxiao Chang &lt;junxiao.chang@intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For drivers that would like to longterm-pin the folios associated with a
memfd, the memfd_pin_folios() API provides an option to not only pin the
folios via FOLL_PIN but also to check and migrate them if they reside in
movable zone or CMA block.  This API currently works with memfds but it
should work with any files that belong to either shmemfs or hugetlbfs. 
Files belonging to other filesystems are rejected for now.

The folios need to be located first before pinning them via FOLL_PIN.  If
they are found in the page cache, they can be immediately pinned. 
Otherwise, they need to be allocated using the filesystem specific APIs
and then pinned.

[akpm@linux-foundation.org: improve the CONFIG_MMU=n situation, per SeongJae]
[vivek.kasireddy@intel.com: return -EINVAL if the end offset is greater than the size of memfd]
  Link: https://lkml.kernel.org/r/IA0PR11MB71850525CBC7D541CAB45DF1F8DB2@IA0PR11MB7185.namprd11.prod.outlook.com
Link: https://lkml.kernel.org/r/20240624063952.1572359-4-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy &lt;vivek.kasireddy@intel.com&gt;
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt; (v2)
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt; (v3)
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt; (v6)
Acked-by: Dave Airlie &lt;airlied@redhat.com&gt;
Acked-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Dongwon Kim &lt;dongwon.kim@intel.com&gt;
Cc: Junxiao Chang &lt;junxiao.chang@intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
