<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/file.c, branch v6.8</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>file: remove __receive_fd()</title>
<updated>2023-12-12T13:24:14+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-11-30T12:49:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e94ddfe2aab72139acb8d5372fac9e6c3f3e383'/>
<id>4e94ddfe2aab72139acb8d5372fac9e6c3f3e383</id>
<content type='text'>
Honestly, there's little value in having a helper with and without that
int __user *ufd argument. It's just messy and doesn't really give us
anything. Just expose receive_fd() with that argument and get rid of
that helper.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Honestly, there's little value in having a helper with and without that
int __user *ufd argument. It's just messy and doesn't really give us
anything. Just expose receive_fd() with that argument and get rid of
that helper.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>file: remove pointless wrapper</title>
<updated>2023-12-12T13:24:13+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-11-30T12:49:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=24fa3ae9467f49dd9698fd884f2c6b13cc8ea12d'/>
<id>24fa3ae9467f49dd9698fd884f2c6b13cc8ea12d</id>
<content type='text'>
Only io_uring uses __close_fd_get_file(). All it does is hide
current-&gt;files but io_uring accesses files_struct directly right now
anyway so it's a bit pointless. Just rename pick_file() to
file_close_fd_locked() and let io_uring use it. Add a lockdep assert in
there that we expect the caller to hold file_lock while we're at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-2-e73ca6f4ea83@kernel.org
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only io_uring uses __close_fd_get_file(). All it does is hide
current-&gt;files but io_uring accesses files_struct directly right now
anyway so it's a bit pointless. Just rename pick_file() to
file_close_fd_locked() and let io_uring use it. Add a lockdep assert in
there that we expect the caller to hold file_lock while we're at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-2-e73ca6f4ea83@kernel.org
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>file: s/close_fd_get_file()/file_close_fd()/g</title>
<updated>2023-12-12T13:24:13+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-11-30T12:49:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a88c955fcfb49727d0ed86b47410f6555a8e69e4'/>
<id>a88c955fcfb49727d0ed86b47410f6555a8e69e4</id>
<content type='text'>
That really shouldn't have "get" in there as that implies we're bumping
the reference count which we don't do at all. We used to but not anmore.
Now we're just closing the fd and pick that file from the fdtable
without bumping the reference count. Update the wrong documentation
while at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
That really shouldn't have "get" in there as that implies we're bumping
the reference count which we don't do at all. We used to but not anmore.
Now we're just closing the fd and pick that file from the fdtable
without bumping the reference count. Update the wrong documentation
while at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Improve __fget_files_rcu() code generation (and thus __fget_light())</title>
<updated>2023-12-12T13:24:13+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-26T20:24:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=253ca8678d30bcf94410b54476fc1e0f1627a137'/>
<id>253ca8678d30bcf94410b54476fc1e0f1627a137</id>
<content type='text'>
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.

The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted.  Let's just
write that function to better.

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/CAHk-=wiCJtLbFWNURB34b9a_R_unaH3CiMRXfkR0-iihB_z68A@mail.gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.

The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted.  Let's just
write that function to better.

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/CAHk-=wiCJtLbFWNURB34b9a_R_unaH3CiMRXfkR0-iihB_z68A@mail.gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>file, i915: fix file reference for mmap_singleton()</title>
<updated>2023-10-25T20:17:04+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-10-25T10:14:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=61d4fb0b349ec1b33119913c3b0bd109de30142c'/>
<id>61d4fb0b349ec1b33119913c3b0bd109de30142c</id>
<content type='text'>
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915-&gt;gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915-&gt;gem.mmap_singleton pointer itself never had rcu semantics.

The i915-&gt;gem.mmap_singleton is replaced in
file-&gt;f_op-&gt;release::singleton_release():

        static int singleton_release(struct inode *inode, struct file *file)
        {
                struct drm_i915_private *i915 = file-&gt;private_data;

                cmpxchg(&amp;i915-&gt;gem.mmap_singleton, file, NULL);
                drm_dev_put(&amp;i915-&gt;drm);

                return 0;
        }

The cmpxchg() is ordered against a concurrent update of
i915-&gt;gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915-&gt;gem.mmap_singleton:

While mmap_singleton() does

        rcu_read_lock();
        file = get_file_rcu(&amp;i915-&gt;gem.mmap_singleton);
        rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

        smp_store_mb(i915-&gt;gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file-&gt;f_count to zero leaving the pointer
in i915-&gt;gem.mmap_singleton in tact.

Now, there might be delays until
file-&gt;f_op-&gt;release::singleton_release() is called and
i915-&gt;gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

        rcu_read_lock();
        file = get_file_rcu(&amp;i915-&gt;gem.mmap_singleton);
        rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915-&gt;gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file-&gt;f_op-&gt;release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

&lt;3&gt; [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
&lt;3&gt; [511.395716] rcu:   Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
&lt;3&gt; [511.395934] rcu:   (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
&lt;6&gt; [511.395944] task:i915_selftest   state:R  running task     stack:10568 pid:6238  tgid:6238  ppid:1001   flags:0x00004002
&lt;6&gt; [511.395962] Call Trace:
&lt;6&gt; [511.395966]  &lt;TASK&gt;
&lt;6&gt; [511.395974]  ? __schedule+0x3a8/0xd70
&lt;6&gt; [511.395995]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
&lt;6&gt; [511.396003]  ? lockdep_hardirqs_on+0xc3/0x140
&lt;6&gt; [511.396013]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
&lt;6&gt; [511.396029]  ? get_file_rcu+0x10/0x30
&lt;6&gt; [511.396039]  ? get_file_rcu+0x10/0x30
&lt;6&gt; [511.396046]  ? i915_gem_object_mmap+0xbc/0x450 [i915]
&lt;6&gt; [511.396509]  ? i915_gem_mmap+0x272/0x480 [i915]
&lt;6&gt; [511.396903]  ? mmap_region+0x253/0xb60
&lt;6&gt; [511.396925]  ? do_mmap+0x334/0x5c0
&lt;6&gt; [511.396939]  ? vm_mmap_pgoff+0x9f/0x1c0
&lt;6&gt; [511.396949]  ? rcu_is_watching+0x11/0x50
&lt;6&gt; [511.396962]  ? igt_mmap_offset+0xfc/0x110 [i915]
&lt;6&gt; [511.397376]  ? __igt_mmap+0xb3/0x570 [i915]
&lt;6&gt; [511.397762]  ? igt_mmap+0x11e/0x150 [i915]
&lt;6&gt; [511.398139]  ? __trace_bprintk+0x76/0x90
&lt;6&gt; [511.398156]  ? __i915_subtests+0xbf/0x240 [i915]
&lt;6&gt; [511.398586]  ? __pfx___i915_live_setup+0x10/0x10 [i915]
&lt;6&gt; [511.399001]  ? __pfx___i915_live_teardown+0x10/0x10 [i915]
&lt;6&gt; [511.399433]  ? __run_selftests+0xbc/0x1a0 [i915]
&lt;6&gt; [511.399875]  ? i915_live_selftests+0x4b/0x90 [i915]
&lt;6&gt; [511.400308]  ? i915_pci_probe+0x106/0x200 [i915]
&lt;6&gt; [511.400692]  ? pci_device_probe+0x95/0x120
&lt;6&gt; [511.400704]  ? really_probe+0x164/0x3c0
&lt;6&gt; [511.400715]  ? __pfx___driver_attach+0x10/0x10
&lt;6&gt; [511.400722]  ? __driver_probe_device+0x73/0x160
&lt;6&gt; [511.400731]  ? driver_probe_device+0x19/0xa0
&lt;6&gt; [511.400741]  ? __driver_attach+0xb6/0x180
&lt;6&gt; [511.400749]  ? __pfx___driver_attach+0x10/0x10
&lt;6&gt; [511.400756]  ? bus_for_each_dev+0x77/0xd0
&lt;6&gt; [511.400770]  ? bus_add_driver+0x114/0x210
&lt;6&gt; [511.400781]  ? driver_register+0x5b/0x110
&lt;6&gt; [511.400791]  ? i915_init+0x23/0xc0 [i915]
&lt;6&gt; [511.401153]  ? __pfx_i915_init+0x10/0x10 [i915]
&lt;6&gt; [511.401503]  ? do_one_initcall+0x57/0x270
&lt;6&gt; [511.401515]  ? rcu_is_watching+0x11/0x50
&lt;6&gt; [511.401521]  ? kmalloc_trace+0xa3/0xb0
&lt;6&gt; [511.401532]  ? do_init_module+0x5f/0x210
&lt;6&gt; [511.401544]  ? load_module+0x1d00/0x1f60
&lt;6&gt; [511.401581]  ? init_module_from_file+0x86/0xd0
&lt;6&gt; [511.401590]  ? init_module_from_file+0x86/0xd0
&lt;6&gt; [511.401613]  ? idempotent_init_module+0x17c/0x230
&lt;6&gt; [511.401639]  ? __x64_sys_finit_module+0x56/0xb0
&lt;6&gt; [511.401650]  ? do_syscall_64+0x3c/0x90
&lt;6&gt; [511.401659]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
&lt;6&gt; [511.401684]  &lt;/TASK&gt;

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn &lt;jannh@google.com&gt;,
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915-&gt;gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915-&gt;gem.mmap_singleton pointer itself never had rcu semantics.

The i915-&gt;gem.mmap_singleton is replaced in
file-&gt;f_op-&gt;release::singleton_release():

        static int singleton_release(struct inode *inode, struct file *file)
        {
                struct drm_i915_private *i915 = file-&gt;private_data;

                cmpxchg(&amp;i915-&gt;gem.mmap_singleton, file, NULL);
                drm_dev_put(&amp;i915-&gt;drm);

                return 0;
        }

The cmpxchg() is ordered against a concurrent update of
i915-&gt;gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915-&gt;gem.mmap_singleton:

While mmap_singleton() does

        rcu_read_lock();
        file = get_file_rcu(&amp;i915-&gt;gem.mmap_singleton);
        rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

        smp_store_mb(i915-&gt;gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file-&gt;f_count to zero leaving the pointer
in i915-&gt;gem.mmap_singleton in tact.

Now, there might be delays until
file-&gt;f_op-&gt;release::singleton_release() is called and
i915-&gt;gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

        rcu_read_lock();
        file = get_file_rcu(&amp;i915-&gt;gem.mmap_singleton);
        rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915-&gt;gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file-&gt;f_op-&gt;release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

&lt;3&gt; [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
&lt;3&gt; [511.395716] rcu:   Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
&lt;3&gt; [511.395934] rcu:   (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
&lt;6&gt; [511.395944] task:i915_selftest   state:R  running task     stack:10568 pid:6238  tgid:6238  ppid:1001   flags:0x00004002
&lt;6&gt; [511.395962] Call Trace:
&lt;6&gt; [511.395966]  &lt;TASK&gt;
&lt;6&gt; [511.395974]  ? __schedule+0x3a8/0xd70
&lt;6&gt; [511.395995]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
&lt;6&gt; [511.396003]  ? lockdep_hardirqs_on+0xc3/0x140
&lt;6&gt; [511.396013]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
&lt;6&gt; [511.396029]  ? get_file_rcu+0x10/0x30
&lt;6&gt; [511.396039]  ? get_file_rcu+0x10/0x30
&lt;6&gt; [511.396046]  ? i915_gem_object_mmap+0xbc/0x450 [i915]
&lt;6&gt; [511.396509]  ? i915_gem_mmap+0x272/0x480 [i915]
&lt;6&gt; [511.396903]  ? mmap_region+0x253/0xb60
&lt;6&gt; [511.396925]  ? do_mmap+0x334/0x5c0
&lt;6&gt; [511.396939]  ? vm_mmap_pgoff+0x9f/0x1c0
&lt;6&gt; [511.396949]  ? rcu_is_watching+0x11/0x50
&lt;6&gt; [511.396962]  ? igt_mmap_offset+0xfc/0x110 [i915]
&lt;6&gt; [511.397376]  ? __igt_mmap+0xb3/0x570 [i915]
&lt;6&gt; [511.397762]  ? igt_mmap+0x11e/0x150 [i915]
&lt;6&gt; [511.398139]  ? __trace_bprintk+0x76/0x90
&lt;6&gt; [511.398156]  ? __i915_subtests+0xbf/0x240 [i915]
&lt;6&gt; [511.398586]  ? __pfx___i915_live_setup+0x10/0x10 [i915]
&lt;6&gt; [511.399001]  ? __pfx___i915_live_teardown+0x10/0x10 [i915]
&lt;6&gt; [511.399433]  ? __run_selftests+0xbc/0x1a0 [i915]
&lt;6&gt; [511.399875]  ? i915_live_selftests+0x4b/0x90 [i915]
&lt;6&gt; [511.400308]  ? i915_pci_probe+0x106/0x200 [i915]
&lt;6&gt; [511.400692]  ? pci_device_probe+0x95/0x120
&lt;6&gt; [511.400704]  ? really_probe+0x164/0x3c0
&lt;6&gt; [511.400715]  ? __pfx___driver_attach+0x10/0x10
&lt;6&gt; [511.400722]  ? __driver_probe_device+0x73/0x160
&lt;6&gt; [511.400731]  ? driver_probe_device+0x19/0xa0
&lt;6&gt; [511.400741]  ? __driver_attach+0xb6/0x180
&lt;6&gt; [511.400749]  ? __pfx___driver_attach+0x10/0x10
&lt;6&gt; [511.400756]  ? bus_for_each_dev+0x77/0xd0
&lt;6&gt; [511.400770]  ? bus_add_driver+0x114/0x210
&lt;6&gt; [511.400781]  ? driver_register+0x5b/0x110
&lt;6&gt; [511.400791]  ? i915_init+0x23/0xc0 [i915]
&lt;6&gt; [511.401153]  ? __pfx_i915_init+0x10/0x10 [i915]
&lt;6&gt; [511.401503]  ? do_one_initcall+0x57/0x270
&lt;6&gt; [511.401515]  ? rcu_is_watching+0x11/0x50
&lt;6&gt; [511.401521]  ? kmalloc_trace+0xa3/0xb0
&lt;6&gt; [511.401532]  ? do_init_module+0x5f/0x210
&lt;6&gt; [511.401544]  ? load_module+0x1d00/0x1f60
&lt;6&gt; [511.401581]  ? init_module_from_file+0x86/0xd0
&lt;6&gt; [511.401590]  ? init_module_from_file+0x86/0xd0
&lt;6&gt; [511.401613]  ? idempotent_init_module+0x17c/0x230
&lt;6&gt; [511.401639]  ? __x64_sys_finit_module+0x56/0xb0
&lt;6&gt; [511.401650]  ? do_syscall_64+0x3c/0x90
&lt;6&gt; [511.401659]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
&lt;6&gt; [511.401684]  &lt;/TASK&gt;

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn &lt;jannh@google.com&gt;,
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>backing file: free directly</title>
<updated>2023-10-19T09:02:49+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-10-05T17:08:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6cf41fcfe099b61b4e3ea7dbe04a906f4c904822'/>
<id>6cf41fcfe099b61b4e3ea7dbe04a906f4c904822</id>
<content type='text'>
Backing files as used by overlayfs are never installed into file
descriptor tables and are explicitly documented as such. They aren't
subject to rcu access conditions like regular files are.

Their lifetime is bound to the lifetime of the overlayfs file, i.e.,
they're stashed in ovl_file-&gt;private_data and go away otherwise. If
they're set as vma-&gt;vm_file - which is their main purpose - then they're
subject to regular refcount rules and vma-&gt;vm_file can't be installed
into an fdtable after having been set. All in all I don't see any need
for rcu delay here. So free it directly.

This all hinges on such hybrid beasts to never actually be installed
into fdtables which - as mentioned before - is not allowed. So add an
explicit WARN_ON_ONCE() so we catch any case where someone is suddenly
trying to install one of those things into a file descriptor table so we
can have a nice long chat with them.

Link: https://lore.kernel.org/r/20231005-sakralbau-wappnen-f5c31755ed70@brauner
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backing files as used by overlayfs are never installed into file
descriptor tables and are explicitly documented as such. They aren't
subject to rcu access conditions like regular files are.

Their lifetime is bound to the lifetime of the overlayfs file, i.e.,
they're stashed in ovl_file-&gt;private_data and go away otherwise. If
they're set as vma-&gt;vm_file - which is their main purpose - then they're
subject to regular refcount rules and vma-&gt;vm_file can't be installed
into an fdtable after having been set. All in all I don't see any need
for rcu delay here. So free it directly.

This all hinges on such hybrid beasts to never actually be installed
into fdtables which - as mentioned before - is not allowed. So add an
explicit WARN_ON_ONCE() so we catch any case where someone is suddenly
trying to install one of those things into a file descriptor table so we
can have a nice long chat with them.

Link: https://lore.kernel.org/r/20231005-sakralbau-wappnen-f5c31755ed70@brauner
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>file: convert to SLAB_TYPESAFE_BY_RCU</title>
<updated>2023-10-19T09:02:48+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-09-29T06:45:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0ede61d8589cc2d93aa78230d74ac58b5b8d0244'/>
<id>0ede61d8589cc2d93aa78230d74ac58b5b8d0244</id>
<content type='text'>
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.

The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.

With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.

In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().

In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.

Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.

Cc: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.

The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.

With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.

In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().

In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.

Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.

Cc: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2023-08-28T17:17:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-28T17:17:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=de16588a7737b12e63ec646d72b45befb2b1f8f7'/>
<id>de16588a7737b12e63ec646d72b45befb2b1f8f7</id>
<content type='text'>
Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual filesystems.

  Features:

   - Block mode changes on symlinks and rectify our broken semantics

   - Report file modifications via fsnotify() for splice

   - Allow specifying an explicit timeout for the "rootwait" kernel
     command line option. This allows to timeout and reboot instead of
     always waiting indefinitely for the root device to show up

   - Use synchronous fput for the close system call

  Cleanups:

   - Get rid of open-coded lockdep workarounds for async io submitters
     and replace it all with a single consolidated helper

   - Simplify epoll allocation helper

   - Convert simple_write_begin and simple_write_end to use a folio

   - Convert page_cache_pipe_buf_confirm() to use a folio

   - Simplify __range_close to avoid pointless locking

   - Disable per-cpu buffer head cache for isolated cpus

   - Port ecryptfs to kmap_local_page() api

   - Remove redundant initialization of pointer buf in pipe code

   - Unexport the d_genocide() function which is only used within core
     vfs

   - Replace printk(KERN_ERR) and WARN_ON() with WARN()

  Fixes:

   - Fix various kernel-doc issues

   - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE

   - Fix a mainly theoretical issue in devpts

   - Check the return value of __getblk() in reiserfs

   - Fix a racy assert in i_readcount_dec

   - Fix integer conversion issues in various functions

   - Fix LSM security context handling during automounts that prevented
     NFS superblock sharing"

* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  cachefiles: use kiocb_{start,end}_write() helpers
  ovl: use kiocb_{start,end}_write() helpers
  aio: use kiocb_{start,end}_write() helpers
  io_uring: use kiocb_{start,end}_write() helpers
  fs: create kiocb_{start,end}_write() helpers
  fs: add kerneldoc to file_{start,end}_write() helpers
  io_uring: rename kiocb_end_write() local helper
  splice: Convert page_cache_pipe_buf_confirm() to use a folio
  libfs: Convert simple_write_begin and simple_write_end to use a folio
  fs/dcache: Replace printk and WARN_ON by WARN
  fs/pipe: remove redundant initialization of pointer buf
  fs: Fix kernel-doc warnings
  devpts: Fix kernel-doc warnings
  doc: idmappings: fix an error and rephrase a paragraph
  init: Add support for rootwait timeout parameter
  vfs: fix up the assert in i_readcount_dec
  fs: Fix one kernel-doc comment
  docs: filesystems: idmappings: clarify from where idmappings are taken
  fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
  vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual filesystems.

  Features:

   - Block mode changes on symlinks and rectify our broken semantics

   - Report file modifications via fsnotify() for splice

   - Allow specifying an explicit timeout for the "rootwait" kernel
     command line option. This allows to timeout and reboot instead of
     always waiting indefinitely for the root device to show up

   - Use synchronous fput for the close system call

  Cleanups:

   - Get rid of open-coded lockdep workarounds for async io submitters
     and replace it all with a single consolidated helper

   - Simplify epoll allocation helper

   - Convert simple_write_begin and simple_write_end to use a folio

   - Convert page_cache_pipe_buf_confirm() to use a folio

   - Simplify __range_close to avoid pointless locking

   - Disable per-cpu buffer head cache for isolated cpus

   - Port ecryptfs to kmap_local_page() api

   - Remove redundant initialization of pointer buf in pipe code

   - Unexport the d_genocide() function which is only used within core
     vfs

   - Replace printk(KERN_ERR) and WARN_ON() with WARN()

  Fixes:

   - Fix various kernel-doc issues

   - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE

   - Fix a mainly theoretical issue in devpts

   - Check the return value of __getblk() in reiserfs

   - Fix a racy assert in i_readcount_dec

   - Fix integer conversion issues in various functions

   - Fix LSM security context handling during automounts that prevented
     NFS superblock sharing"

* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  cachefiles: use kiocb_{start,end}_write() helpers
  ovl: use kiocb_{start,end}_write() helpers
  aio: use kiocb_{start,end}_write() helpers
  io_uring: use kiocb_{start,end}_write() helpers
  fs: create kiocb_{start,end}_write() helpers
  fs: add kerneldoc to file_{start,end}_write() helpers
  io_uring: rename kiocb_end_write() local helper
  splice: Convert page_cache_pipe_buf_confirm() to use a folio
  libfs: Convert simple_write_begin and simple_write_end to use a folio
  fs/dcache: Replace printk and WARN_ON by WARN
  fs/pipe: remove redundant initialization of pointer buf
  fs: Fix kernel-doc warnings
  devpts: Fix kernel-doc warnings
  doc: idmappings: fix an error and rephrase a paragraph
  init: Add support for rootwait timeout parameter
  vfs: fix up the assert in i_readcount_dec
  fs: Fix one kernel-doc comment
  docs: filesystems: idmappings: clarify from where idmappings are taken
  fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
  vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: Fix kernel-doc warnings</title>
<updated>2023-08-19T10:12:12+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2023-08-18T20:08:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=35931eb3945b8d38c31f8e956aee3cf31c52121b'/>
<id>35931eb3945b8d38c31f8e956aee3cf31c52121b</id>
<content type='text'>
These have a variety of causes and a corresponding variety of solutions.

Signed-off-by: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Message-Id: &lt;20230818200824.2720007-1-willy@infradead.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These have a variety of causes and a corresponding variety of solutions.

Signed-off-by: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Message-Id: &lt;20230818200824.2720007-1-willy@infradead.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: rely on -&gt;iterate_shared to determine f_pos locking</title>
<updated>2023-08-06T13:08:36+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-08-06T12:49:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d84d1b9af6366aa9df1b523bdb7e002372e38d0'/>
<id>7d84d1b9af6366aa9df1b523bdb7e002372e38d0</id>
<content type='text'>
Now that we removed -&gt;iterate we don't need to check for either
-&gt;iterate or -&gt;iterate_shared in file_needs_f_pos_lock(). Simply check
for -&gt;iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we removed -&gt;iterate we don't need to check for either
-&gt;iterate or -&gt;iterate_shared in file_needs_f_pos_lock(). Simply check
for -&gt;iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
