<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/open.c, branch v6.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2025-03-24T20:19:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-24T20:19:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=912b82dc0b27abc407c831e74fbcbdebfe19997b'/>
<id>912b82dc0b27abc407c831e74fbcbdebfe19997b</id>
<content type='text'>
Pull vfs file handling updates from Christian Brauner:
 "This contains performance improvements for struct file's new refcount
  mechanism and various other performance work:

   - The stock kernel transitioning the file to no refs held penalizes
     the caller with an extra atomic to block any increments. For cases
     where the file is highly likely to be going away this is easily
     avoidable.

     Add file_ref_put_close() to better handle the common case where
     closing a file descriptor also operates on the last reference and
     build fput_close_sync() and fput_close() on top of it. This brings
     about 1% performance improvement by eliding one atomic in the
     common case.

   - Predict no error in close() since the vast majority of the time
     system call returns 0.

   - Reduce the work done in fdget_pos() by predicting that the file was
     found and by explicitly comparing the reference count to one and
     ignoring the dead zone"

* tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: reduce work in fdget_pos()
  fs: use fput_close() in path_openat()
  fs: use fput_close() in filp_close()
  fs: use fput_close_sync() in close()
  file: add fput and file_ref_put routines optimized for use when closing a fd
  fs: predict no error in close()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull vfs file handling updates from Christian Brauner:
 "This contains performance improvements for struct file's new refcount
  mechanism and various other performance work:

   - The stock kernel transitioning the file to no refs held penalizes
     the caller with an extra atomic to block any increments. For cases
     where the file is highly likely to be going away this is easily
     avoidable.

     Add file_ref_put_close() to better handle the common case where
     closing a file descriptor also operates on the last reference and
     build fput_close_sync() and fput_close() on top of it. This brings
     about 1% performance improvement by eliding one atomic in the
     common case.

   - Predict no error in close() since the vast majority of the time
     system call returns 0.

   - Reduce the work done in fdget_pos() by predicting that the file was
     found and by explicitly comparing the reference count to one and
     ignoring the dead zone"

* tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: reduce work in fdget_pos()
  fs: use fput_close() in path_openat()
  fs: use fput_close() in filp_close()
  fs: use fput_close_sync() in close()
  file: add fput and file_ref_put routines optimized for use when closing a fd
  fs: predict no error in close()
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2025-03-24T16:13:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-24T16:13:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=99c21beaab2db53d1ba17102b7cedc7a584dfe23'/>
<id>99c21beaab2db53d1ba17102b7cedc7a584dfe23</id>
<content type='text'>
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy -&gt;remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the -&gt;i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around -&gt;get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the -&gt;i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around -&gt;get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy -&gt;remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the -&gt;i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around -&gt;get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the -&gt;i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around -&gt;get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: tidy up do_sys_openat2() with likely/unlikely</title>
<updated>2025-03-20T09:41:59+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mjguzik@gmail.com</email>
</author>
<published>2025-03-20T09:23:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5a05a5a44a983ead5c57673af2c5bcfd2f0d3e9'/>
<id>d5a05a5a44a983ead5c57673af2c5bcfd2f0d3e9</id>
<content type='text'>
Otherwise gcc 13 generates conditional forward jumps (aka branch
mispredict by default) for build_open_flags() being succesfull.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250320092331.1921700-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Otherwise gcc 13 generates conditional forward jumps (aka branch
mispredict by default) for build_open_flags() being succesfull.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250320092331.1921700-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: use fput_close() in filp_close()</title>
<updated>2025-03-05T17:31:23+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mjguzik@gmail.com</email>
</author>
<published>2025-03-05T12:36:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a914bd93f3edfedcdd59deb615e8dd1b3643cac5'/>
<id>a914bd93f3edfedcdd59deb615e8dd1b3643cac5</id>
<content type='text'>
When tracing a kernel build over refcounts seen this is a wash:
@[kprobe:filp_close]:
[0]                32195 |@@@@@@@@@@                                          |
[1]               164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

I verified vast majority of the skew comes from do_close_on_exec() which
could be changed to use a different variant instead.

Even without changing that, the 19.5% of calls which got here still can
save the extra atomic. Calls here are borderline non-existent compared
to fput (over 3.2 mln!), so they should not negatively affect
scalability.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250305123644.554845-4-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When tracing a kernel build over refcounts seen this is a wash:
@[kprobe:filp_close]:
[0]                32195 |@@@@@@@@@@                                          |
[1]               164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

I verified vast majority of the skew comes from do_close_on_exec() which
could be changed to use a different variant instead.

Even without changing that, the 19.5% of calls which got here still can
save the extra atomic. Calls here are borderline non-existent compared
to fput (over 3.2 mln!), so they should not negatively affect
scalability.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250305123644.554845-4-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: use fput_close_sync() in close()</title>
<updated>2025-03-05T17:31:23+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mjguzik@gmail.com</email>
</author>
<published>2025-03-05T12:36:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3e46a92a27c2927fcef996ba06cbe299da629c28'/>
<id>3e46a92a27c2927fcef996ba06cbe299da629c28</id>
<content type='text'>
This bumps open+close rate by 1% on Sapphire Rapids by eliding one
atomic.

It would be higher if it was not for several other slowdowns of the same
nature.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250305123644.554845-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This bumps open+close rate by 1% on Sapphire Rapids by eliding one
atomic.

It would be higher if it was not for several other slowdowns of the same
nature.

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250305123644.554845-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: predict no error in close()</title>
<updated>2025-03-05T17:25:36+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mjguzik@gmail.com</email>
</author>
<published>2025-03-01T10:43:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=23e490336467fcdaf95e1efcf8f58067b59f647b'/>
<id>23e490336467fcdaf95e1efcf8f58067b59f647b</id>
<content type='text'>
Vast majority of the time the system call returns 0.

Letting the compiler know shortens the routine (119 -&gt; 116) and the fast
path.

Disasm starting at the call to __fput_sync():

before:
&lt;+55&gt;:    call   0xffffffff816b0da0 &lt;__fput_sync&gt;
&lt;+60&gt;:    lea    0x201(%rbx),%eax
&lt;+66&gt;:    cmp    $0x1,%eax
&lt;+69&gt;:    jbe    0xffffffff816ab707 &lt;__x64_sys_close+103&gt;
&lt;+71&gt;:    mov    %ebx,%edx
&lt;+73&gt;:    movslq %ebx,%rax
&lt;+76&gt;:    and    $0xfffffffd,%edx
&lt;+79&gt;:    cmp    $0xfffffdfc,%edx
&lt;+85&gt;:    mov    $0xfffffffffffffffc,%rdx
&lt;+92&gt;:    cmove  %rdx,%rax
&lt;+96&gt;:    pop    %rbx
&lt;+97&gt;:    pop    %rbp
&lt;+98&gt;:    jmp    0xffffffff82242fa0 &lt;__x86_return_thunk&gt;
&lt;+103&gt;:   mov    $0xfffffffffffffffc,%rax
&lt;+110&gt;:   jmp    0xffffffff816ab700 &lt;__x64_sys_close+96&gt;
&lt;+112&gt;:   mov    $0xfffffffffffffff7,%rax
&lt;+119&gt;:   jmp    0xffffffff816ab700 &lt;__x64_sys_close+96&gt;

after:
&lt;+56&gt;:    call   0xffffffff816b0da0 &lt;__fput_sync&gt;
&lt;+61&gt;:    xor    %eax,%eax
&lt;+63&gt;:    test   %ebp,%ebp
&lt;+65&gt;:    jne    0xffffffff816ab6ea &lt;__x64_sys_close+74&gt;
&lt;+67&gt;:    pop    %rbx
&lt;+68&gt;:    pop    %rbp
&lt;+69&gt;:    jmp    0xffffffff82242fa0 &lt;__x86_return_thunk&gt; # the jmp out
&lt;+74&gt;:    lea    0x201(%rbp),%edx
&lt;+80&gt;:    mov    $0xfffffffffffffffc,%rax
&lt;+87&gt;:    cmp    $0x1,%edx
&lt;+90&gt;:    jbe    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;
&lt;+92&gt;:    mov    %ebp,%edx
&lt;+94&gt;:    and    $0xfffffffd,%edx
&lt;+97&gt;:    cmp    $0xfffffdfc,%edx
&lt;+103&gt;:   cmovne %rbp,%rax
&lt;+107&gt;:   jmp    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;
&lt;+109&gt;:   mov    $0xfffffffffffffff7,%rax
&lt;+116&gt;:   jmp    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250301104356.246031-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Vast majority of the time the system call returns 0.

Letting the compiler know shortens the routine (119 -&gt; 116) and the fast
path.

Disasm starting at the call to __fput_sync():

before:
&lt;+55&gt;:    call   0xffffffff816b0da0 &lt;__fput_sync&gt;
&lt;+60&gt;:    lea    0x201(%rbx),%eax
&lt;+66&gt;:    cmp    $0x1,%eax
&lt;+69&gt;:    jbe    0xffffffff816ab707 &lt;__x64_sys_close+103&gt;
&lt;+71&gt;:    mov    %ebx,%edx
&lt;+73&gt;:    movslq %ebx,%rax
&lt;+76&gt;:    and    $0xfffffffd,%edx
&lt;+79&gt;:    cmp    $0xfffffdfc,%edx
&lt;+85&gt;:    mov    $0xfffffffffffffffc,%rdx
&lt;+92&gt;:    cmove  %rdx,%rax
&lt;+96&gt;:    pop    %rbx
&lt;+97&gt;:    pop    %rbp
&lt;+98&gt;:    jmp    0xffffffff82242fa0 &lt;__x86_return_thunk&gt;
&lt;+103&gt;:   mov    $0xfffffffffffffffc,%rax
&lt;+110&gt;:   jmp    0xffffffff816ab700 &lt;__x64_sys_close+96&gt;
&lt;+112&gt;:   mov    $0xfffffffffffffff7,%rax
&lt;+119&gt;:   jmp    0xffffffff816ab700 &lt;__x64_sys_close+96&gt;

after:
&lt;+56&gt;:    call   0xffffffff816b0da0 &lt;__fput_sync&gt;
&lt;+61&gt;:    xor    %eax,%eax
&lt;+63&gt;:    test   %ebp,%ebp
&lt;+65&gt;:    jne    0xffffffff816ab6ea &lt;__x64_sys_close+74&gt;
&lt;+67&gt;:    pop    %rbx
&lt;+68&gt;:    pop    %rbp
&lt;+69&gt;:    jmp    0xffffffff82242fa0 &lt;__x86_return_thunk&gt; # the jmp out
&lt;+74&gt;:    lea    0x201(%rbp),%edx
&lt;+80&gt;:    mov    $0xfffffffffffffffc,%rax
&lt;+87&gt;:    cmp    $0x1,%edx
&lt;+90&gt;:    jbe    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;
&lt;+92&gt;:    mov    %ebp,%edx
&lt;+94&gt;:    and    $0xfffffffd,%edx
&lt;+97&gt;:    cmp    $0xfffffdfc,%edx
&lt;+103&gt;:   cmovne %rbp,%rax
&lt;+107&gt;:   jmp    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;
&lt;+109&gt;:   mov    $0xfffffffffffffff7,%rax
&lt;+116&gt;:   jmp    0xffffffff816ab6e3 &lt;__x64_sys_close+67&gt;

Signed-off-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Link: https://lore.kernel.org/r/20250301104356.246031-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>open: Fix return type of several functions from long to int</title>
<updated>2025-02-21T09:25:32+00:00</updated>
<author>
<name>Yuichiro Tsuji</name>
<email>yuichtsu@amazon.com</email>
</author>
<published>2025-01-21T07:08:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=29d80d506b18384f64a54b01fae78184e2d327f3'/>
<id>29d80d506b18384f64a54b01fae78184e2d327f3</id>
<content type='text'>
Fix the return type of several functions from long to int to match its actu
al behavior. These functions only return int values. This change improves
type consistency across the filesystem code and aligns the function signatu
re with its existing implementation and usage.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Yuichiro Tsuji &lt;yuichtsu@amazon.com&gt;
Link: https://lore.kernel.org/r/20250121070844.4413-2-yuichtsu@amazon.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the return type of several functions from long to int to match its actu
al behavior. These functions only return int values. This change improves
type consistency across the filesystem code and aligns the function signatu
re with its existing implementation and usage.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Yuichiro Tsuji &lt;yuichtsu@amazon.com&gt;
Link: https://lore.kernel.org/r/20250121070844.4413-2-yuichtsu@amazon.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fsnotify: disable notification by default for all pseudo files</title>
<updated>2025-02-07T09:27:26+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2025-02-03T22:32:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2a42754b3104d78a2bc7a2ad8844427411c76ca6'/>
<id>2a42754b3104d78a2bc7a2ad8844427411c76ca6</id>
<content type='text'>
Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.

Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.

Pre-content events are also kept disabled for sockets and pipes.

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/
Tested-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.

Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.

Pre-content events are also kept disabled for sockets and pipes.

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/
Tested-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fsnotify: use accessor to set FMODE_NONOTIFY_*</title>
<updated>2025-02-07T09:27:26+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2025-02-03T22:32:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=95101401bb50ae2cf9deee1bbf4d2b28d0dfdc26'/>
<id>95101401bb50ae2cf9deee1bbf4d2b28d0dfdc26</id>
<content type='text'>
The FMODE_NONOTIFY_* bits are a 2-bits mode.  Open coding manipulation
of those bits is risky.  Use an accessor file_set_fsnotify_mode() to
set the mode.

Rename file_set_fsnotify_mode() =&gt; file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The FMODE_NONOTIFY_* bits are a 2-bits mode.  Open coding manipulation
of those bits is risky.  Use an accessor file_set_fsnotify_mode() to
set the mode.

Rename file_set_fsnotify_mode() =&gt; file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-01-27T02:36:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-27T02:36:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9c5968db9e625019a0ee5226c7eebef5519d366a'/>
<id>9c5968db9e625019a0ee5226c7eebef5519d366a</id>
<content type='text'>
Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc &amp; dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc &amp; dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
</pre>
</div>
</content>
</entry>
</feed>
