summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
24 hoursMerge tag 'vfs-7.1-rc7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix error handling in ovl_cache_get() - Tighten access checks for exited tasks in pidfd_getfd() - Fix selftests leak in __wait_for_test() - Limit FUSE_NOTIFY_RETRIEVE to uptodate folios - Reject fuse_notify() pagecache ops on directories - Clear JOBCTL_PENDING_MASK for caller in zap_other_threads() - Fix failure to unlock in nfsd4_create_file() - Fix pointer arithmetic in qnx6 directory iteration - Fix UAF due to unlocked ->mnt_ns read in may_decode_fh() - Avoid potential null folio->mapping deref during iomap error reporting * tag 'vfs-7.1-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: avoid potential null folio->mapping deref during error reporting fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh() fs/qnx6: fix pointer arithmetic in directory iteration VFS: fix possible failure to unlock in nfsd4_create_file() signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads() fuse: reject fuse_notify() pagecache ops on directories fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios selftests: harness: fix pidfd leak in __wait_for_test pidfd: refuse access to tasks that have started exiting harder ovl: keep err zero after successful ovl_cache_get()
46 hoursMerge tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fix from Trond Myklebust: - Fix a use after free in nfs_write_completion * tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: write_completion: dereference loop-local req, not hdr->req
47 hoursMerge tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "A collection of fixes mostly for the RT device, including a small refactor that has no functional change" * tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Remove mention of PageWriteback xfs: abort mount if xfs_fs_reserve_ag_blocks fails xfs: factor rtgroup geom write pointer reporting into a helper xfs: drop the RTG reference later in xfs_ioc_rtgroup_geometry xfs: fix rtgroup cleanup in CoW fork repair xfs: fix error returns in CoW fork repair xfs: fix overlapping extents returned for pNFS LAYOUTGET xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path xfs: handle racing deletions in xfs_zone_gc_iter_irec
47 hoursMerge tag 'erofs-for-7.1-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix a UAF of sbi->sync_decompress when compressed I/Os race with unmount - Fix a regression introduced this development cycle that incorrectly rejects multiple-algorithm images * tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix EFSCORRUPTED on multi-algorithm images in z_erofs_map_sanity_check() erofs: fix use-after-free on sbi->sync_decompress
3 daysiomap: avoid potential null folio->mapping deref during error reportingJoanne Koong
When a buffered read fails, iomap_finish_folio_read() reports the error with fserror_report_io(folio->mapping->host, ...). This is called after ifs->read_bytes_pending has been decremented by the bytes attempted to be read. For a folio split across multiple read completions, the folio is only guaranteed to stay locked while read_bytes_pending > 0. Once iomap_finish_folio_read() decrements read_bytes_pending, another in-flight read can complete and end the read on the folio, which unlocks it. This allows truncate logic to run and detach the folio (set folio->mapping to NULL). The error reporting path then can dereference a NULL folio->mapping. As reported by Sam Sun, this is the race that can occur: CPU0: failed completion CPU1: final completion CPU2: truncate ----------------------- ---------------------- -------------- read_bytes_pending -= len finished = false /* preempted before fserror_report_io() */ read_bytes_pending -= len finished = true folio_end_read() truncate clears folio->mapping fserror_report_io( folio->mapping->host, ...) ^ NULL deref Fix this by reporting the error first before decrementing ifs->read_bytes_pending. Fixes: a9d573ee88af ("iomap: report file I/O errors to the VFS") Cc: stable@vger.kernel.org Reported-by: Sam Sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/CAEkJfYPhWdd59RKmuNLJg-bkypHz7xiOwaWyNVu3A8CUqQCnvg@mail.gmail.com/ Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260604011858.2297561-1-joannelkoong@gmail.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 daysfhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh()Jann Horn
may_decode_fh() accesses mount::mnt_ns without holding any locks; that means the mount can concurrently be unmounted, and the mnt_namespace can concurrently be freed after an RCU grace period. This race can happens as follows, assuming that the mount point was created by open_tree(..., OPEN_TREE_CLONE): thread 1 thread 2 RCU __do_sys_open_by_handle_at do_handle_open handle_to_path may_decode_fh is_mounted [mount::mnt_ns access] [mount::mnt_ns access] __do_sys_close fput_close_sync __fput dissolve_on_fput umount_tree class_namespace_excl_destructor namespace_unlock free_mnt_ns mnt_ns_tree_remove call_rcu(mnt_ns_release_rcu) mnt_ns_release_rcu mnt_ns_release kfree [mnt_namespace::user_ns access] **UAF** Fix it by taking rcu_read_lock() around the mount::mnt_ns access, like in __prepend_path(). Additionally, document the semantics of mount::mnt_ns, and use WRITE_ONCE() for writers that can race with lockless readers. This bug is unreachable unless one of the following is set: - CONFIG_PREEMPTION - CONFIG_RCU_STRICT_GRACE_PERIOD because it requires an RCU grace period to happen during a syscall without an explicit preemption. This doesn't seem to have interesting security impact; worst-case, it could leak the result of an integer comparison to userspace (from the level check in cap_capable()), cause an endless loop, or crash the kernel by dereferencing an invalid address. Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260603-vfs-fhandle-uaf-fix-v2-1-d05db76a5084@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
5 daysNFS: write_completion: dereference loop-local req, not hdr->reqDave Jones
5d3869a41f36 ("NFS: fix writeback in presence of errors") introduced a dereference of hdr->req->wb_lock_context in nfs_write_completion's per-request loop. hdr->req is set once at nfs_pgheader_init() time and is not refcount-protected for the lifetime of the loop; when hdr aggregates requests from multiple page groups (common under heavy NFSv3 writeback), a parallel COMMIT on hdr->req's group can drop the last reference and free it while the outer loop is still iterating requests from other groups. KASAN catches this as an 8-byte read at offset +24 of a freed nfs_page slab object (wb_lock_context). All requests in a given pgio share the same open_context, so reading the loop-local req's wb_lock_context yields the same value and is safe -- req is still on hdr->pages and holds its writeback kref through the commit branch. Caught with kasan: BUG: KASAN: slab-use-after-free in nfs_write_completion+0x8f8/0xa50 [nfs] Read of size 8 at addr ffff888118af2058 by task kworker/u16:16/122062 CPU: 2 UID: 0 PID: 122062 Comm: kworker/u16:16 Kdump: loaded Not tainted 7.1.0-rc4+ #ge05a759574b2 PREEMPT Workqueue: nfsiod rpc_async_release Call Trace: <TASK> dump_stack_lvl+0xaf/0x100 ? nfs_write_completion+0x8f8/0xa50 [nfs] print_report+0x157/0x4a1 ? __virt_addr_valid+0x1fb/0x400 ? nfs_write_completion+0x8f8/0xa50 [nfs] kasan_report+0xc2/0x190 ? nfs_write_completion+0x8f8/0xa50 [nfs] nfs_write_completion+0x8f8/0xa50 [nfs] ? nfs_commit_release_pages+0xbd0/0xbd0 [nfs] ? lock_acquire+0x182/0x2e0 ? process_one_work+0x937/0x1890 ? nfs_pgio_header_alloc+0xd0/0xd0 [nfs] rpc_free_task+0xee/0x160 rpc_async_release+0x5d/0xb0 process_one_work+0x9b0/0x1890 ? pwq_dec_nr_in_flight+0xed0/0xed0 ? rpc_final_put_task+0x140/0x140 worker_thread+0x75a/0x10a0 ? process_one_work+0x1890/0x1890 ? kthread+0x1af/0x4d0 ? process_one_work+0x1890/0x1890 kthread+0x3d3/0x4d0 ? kthread_affine_node+0x2c0/0x2c0 ret_from_fork+0x669/0xa50 ? native_tss_update_io_bitmap+0x660/0x660 ? __switch_to+0x9dd/0x1310 ? kthread_affine_node+0x2c0/0x2c0 ret_from_fork_asm+0x11/0x20 </TASK> Allocated by task 121997 on cpu 3 at 31643.290294s: kasan_save_stack+0x1e/0x40 kasan_save_track+0x13/0x60 __kasan_slab_alloc+0x62/0x70 kmem_cache_alloc_noprof+0x1ab/0x4e0 nfs_page_create+0x152/0x460 [nfs] nfs_page_create_from_folio+0x7e/0x210 [nfs] nfs_update_folio+0x7a9/0x32a0 [nfs] nfs_write_end+0x290/0xc60 [nfs] generic_perform_write+0x4ce/0x990 nfs_file_write+0x6b3/0xce0 [nfs] vfs_write+0x63c/0xfa0 ksys_write+0x122/0x240 do_syscall_64+0xc3/0x13f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Freed by task 122046 on cpu 0 at 31647.037964s: kasan_save_stack+0x1e/0x40 kasan_save_track+0x13/0x60 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x3b/0x60 kmem_cache_free+0x11b/0x5a0 nfs_page_group_destroy+0x13a/0x210 [nfs] nfs_unlock_and_release_request+0x64/0x90 [nfs] nfs_commit_release_pages+0x339/0xbd0 [nfs] nfs_commit_release+0x51/0xb0 [nfs] rpc_free_task+0xee/0x160 rpc_async_release+0x5d/0xb0 process_one_work+0x9b0/0x1890 worker_thread+0x75a/0x10a0 kthread+0x3d3/0x4d0 ret_from_fork+0x669/0xa50 ret_from_fork_asm+0x11/0x20 The buggy address belongs to the object at ffff888118af2040\x0a which belongs to the cache nfs_page of size 96 The buggy address is located 24 bytes inside of\x0a freed 96-byte region [ffff888118af2040, ffff888118af20a0) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x118af2 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x4000000000000040(head|zone=2) page_type: f5(slab) raw: 4000000000000040 ffff88818cf2c4c0 ffffea000e61b990 ffffea0004e7d110 raw: 0000000000000000 0000000800190019 00000000f5000000 0000000000000000 head: 4000000000000040 ffff88818cf2c4c0 ffffea000e61b990 ffffea0004e7d110 head: 0000000000000000 0000000800190019 00000000f5000000 0000000000000000 head: 4000000000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 121997, tgid 121997 (rsync), ts 31643290274577, free_ts 31642154777182 post_alloc_hook+0xd1/0x100 get_page_from_freelist+0xbad/0x2910 __alloc_frozen_pages_noprof+0x1c6/0x4a0 allocate_slab+0x330/0x620 ___slab_alloc+0xe9/0x930 kmem_cache_alloc_noprof+0x35b/0x4e0 nfs_page_create+0x152/0x460 [nfs] nfs_page_create_from_folio+0x7e/0x210 [nfs] nfs_update_folio+0x7a9/0x32a0 [nfs] nfs_write_end+0x290/0xc60 [nfs] generic_perform_write+0x4ce/0x990 nfs_file_write+0x6b3/0xce0 [nfs] vfs_write+0x63c/0xfa0 ksys_write+0x122/0x240 do_syscall_64+0xc3/0x13f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 page last free pid 122202 tgid 122202 stack trace: __free_frozen_pages+0x6da/0xf30 qlist_free_all+0x53/0x130 kasan_quarantine_reduce+0x198/0x1f0 __kasan_slab_alloc+0x46/0x70 kmem_cache_alloc_noprof+0x1ab/0x4e0 __alloc_object+0x2f/0x230 __create_object+0x22/0x80 kmem_cache_alloc_node_noprof+0x416/0x4d0 __alloc_skb+0x146/0x6e0 tcp_stream_alloc_skb+0x35/0x660 tcp_sendmsg_locked+0x1746/0x4260 tcp_sendmsg+0x2f/0x40 inet_sendmsg+0x9e/0xe0 __sock_sendmsg+0xd9/0x180 sock_sendmsg+0x122/0x200 xprt_sock_sendmsg+0x4ff/0x9a0 Memory state around the buggy address: ffff888118af1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc ffff888118af1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888118af2000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ^ ffff888118af2080: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888118af2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Reviewed-by Jeff Layton <jlayton@kernel.org> Fixes: 5d3869a41f36 ("NFS: fix writeback in presence of errors") Cc: Olga Kornievskaia <okorniev@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: linux-nfs@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
6 dayserofs: fix EFSCORRUPTED on multi-algorithm images in z_erofs_map_sanity_check()Zhan Xusheng
Commit a5242d37c83a ("erofs: error out obviously illegal extents in advance") changed the per-extent algorithm presence check from "is the bit set" to "is the only bit set": - !(sbi->available_compr_algs & (1 << map->m_algorithmformat)) + (sbi->available_compr_algs ^ BIT(map->m_algorithmformat)) `available_compr_algs` is a bitmap of every compression algorithm available in the image (z_erofs_parse_cfgs() iterates it with for_each_set_bit()), so an image that enables more than one algorithm has multiple bits set. XOR is zero only when the bitmap is exactly BIT(map->m_algorithmformat); for any image with two or more algorithms the test is non-zero for every extent and the read fails with -EFSCORRUPTED ("inconsistent algorithmtype %u"). Reproducer (mkfs.erofs from erofs-utils 1.7.1): $ mkdir src $ yes A | head -c 100K > src/a $ head -c 64K /dev/zero > src/b $ mkfs.erofs -zlz4:deflate multi.erofs src $ mount -t erofs -o loop multi.erofs /mnt $ cat /mnt/a >/dev/null cat: /mnt/a: Structure needs cleaning $ dmesg | tail erofs (device loop0): inconsistent algorithmtype 0 for nid 46 erofs (device loop0): read error -117 @ 0 of nid 46 The erofs on-disk format (Z_EROFS_COMPRESSION_MAX = 4 with LZ4, LZMA, DEFLATE, ZSTD) and the kernel parser explicitly support multi-algorithm images, and erofs-utils 1.7.1 generates them via the "-z X:Y" syntax. Restore the original per-bit presence check. Fixes: a5242d37c83a ("erofs: error out obviously illegal extents in advance") Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
7 daysksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCELGil Portnoy
A deferred byte-range lock (an SMB2_LOCK that blocks) registers an async work on conn->async_requests via setup_async_work(), with cancel_fn = smb2_remove_blocked_lock and cancel_argv[0] pointing at the struct file_lock. When the request is cancelled, the worker frees the file_lock with locks_free_lock() and takes the cancelled early-exit, which "goto out"s and never reaches release_async_work() -- the only site that unlinks the work from conn->async_requests and clears cancel_fn/cancel_argv. The work therefore stays matchable on async_requests with a live cancel_fn pointing at the freed file_lock, until connection teardown finally runs release_async_work(). smb2_cancel() fires cancel_fn unconditionally with no state guard, so a second SMB2_CANCEL for the same AsyncId, arriving in that window, re-runs smb2_remove_blocked_lock() on the freed file_lock -- a slab use-after-free: BUG: KASAN: slab-use-after-free in __locks_delete_block __locks_delete_block locks_delete_block ksmbd_vfs_posix_lock_unblock smb2_remove_blocked_lock smb2_cancel <- 2nd SMB2_CANCEL fires cancel_fn handle_ksmbd_work Allocated by ...: locks_alloc_lock <- smb2_lock Freed by ...: locks_free_lock <- smb2_lock (cancelled branch) ... cache file_lock_cache of size 192 Reproduced on mainline with KASAN by an authenticated SMB client. Skip a work whose state is already KSMBD_WORK_CANCELLED so its cancel callback cannot be fired a second time. Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysksmbd: fix durable reconnect double-bind race in ksmbd_reopen_durable_fdGil Portnoy
Two concurrent same-user DHnC reconnects can both observe fp->conn == NULL before either sets it. ksmbd_reopen_durable_fd() checks fp->conn to guard against a handle already being reconnected, but the check and the binding assignment are not atomic: both threads pass the guard, both call ksmbd_conn_get() on the same fp, and both eventually reach kfree(fp->owner.name) -- a double-free of the owner.name slab object. The double-bound ksmbd_file also causes a write-UAF on the 344-byte ksmbd_file_cache object when a concurrent smb2_close() spins on fp->f_lock after the object has been freed by the losing reconnect path. KASAN on 7.1-rc5 (48-thread concurrent reconnect, 3000 cycles): BUG: KASAN: double-free in ksmbd_reopen_durable_fd+0x268/0x308 BUG: KASAN: slab-use-after-free in _raw_spin_lock+0xac/0x150 Write of size 4 at offset 24 into freed ksmbd_file_cache object Five double-bind windows observed; 63 total KASAN reports triggered. Fix: validate and claim fp->conn under write_lock(&global_ft.lock) so the check-and-claim is atomic. ksmbd_lookup_durable_fd() already treats fp->conn != NULL as "in use" and skips such an fp; setting fp->conn before dropping the lock closes the race. ksmbd_conn_get() is a non-sleeping refcount increment, safe under the rwlock. The rollback path on __open_id() failure also clears fp->conn/tcon under the lock so concurrent readers see a consistent state. Fixes: b1f1e80620de ("ksmbd: centralize ksmbd_conn final release to plug transport leak") Assisted-by: Henry (Claude):claude-opus-4 Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysksmbd: fix NULL-deref of opinfo->conn in oplock/lease break notifiersGil Portnoy
smb2_oplock_break_noti() and smb2_lease_break_noti() read opinfo->conn into a local with neither READ_ONCE() nor a NULL check. Both run from oplock_break() after opinfo_get_list() has dropped ci->m_lock, so a concurrent SMB2 LOGOFF (session_fd_check()) can set op->conn = NULL under ci->m_lock within that window. ksmbd_conn_r_count_inc(conn) then writes through NULL at offset 0xc4 -- a remotely triggerable oops. Guard both reads the way compare_guid_key() already does: read opinfo->conn with READ_ONCE() and return early if it is NULL, before allocating the work struct so nothing leaks. A NULL conn means the client is gone and the break is moot, so return 0; oplock_break() treats that as success and runs the normal teardown. Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Assisted-by: Henry (Claude):claude-opus-4 Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
8 daysMerge tag 'v7.1-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix uninitialized variable in smb2_writev_callback() - detect short folioq copy in cifs_copy_folioq_to_iter() * tag 'v7.1-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix uninitialized variable in smb2_writev_callback smb: client: detect short folioq copy in cifs_copy_folioq_to_iter()
8 daysxfs: Remove mention of PageWritebackMatthew Wilcox (Oracle)
Update a comment to refer to folios instead of pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: abort mount if xfs_fs_reserve_ag_blocks failsChristoph Hellwig
xfs_mountfs currently ignores all errors from xfs_fs_reserve_ag_blocks, which can lead to the mount path continuing on corruption errors. Fix the check to only ignore -ENOSPC as in other callers, and unwind for all other errors. Fixes: 81ed94751b15 ("xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: factor rtgroup geom write pointer reporting into a helperChristoph Hellwig
Sticks out a bit better if we add a separate helper for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: drop the RTG reference later in xfs_ioc_rtgroup_geometryChristoph Hellwig
Keep the rtgroup reference until after reporting the write pointer, as that uses it. Right now this is not a major issue as we don't support shrinking file systems in a way that makes RTGs go away, but let's stick to the proper reference counting to prepare for that. Fixes: c6ce65cb17aa ("xfs: add write pointer to xfs_rtgroup_geometry") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: fix rtgroup cleanup in CoW fork repairYingjie Gao
xrep_cow_find_bad_rt() initializes scrub rtgroup state before the force-rebuild path calls xrep_cow_mark_file_range(). If that call fails, the code jumps directly to out_rtg, which skips the scrub rtgroup cleanup and only drops the local rtgroup reference. Remove the unnecessary jump so the function falls through to out_sr, ensuring the realtime cursors, lock state, and sr->rtg reference are released before returning. Fixes: fd97fe111208 ("xfs: fix CoW forks for realtime files") Cc: <stable@vger.kernel.org> # v6.14 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: fix error returns in CoW fork repairYingjie Gao
xrep_cow_find_bad() returns success after the cleanup labels even if AG setup, btree queries, or bitmap updates failed. This can make repair continue with an incomplete bad-file-offset bitmap instead of stopping at the original error. The force-rebuild path has a related cleanup problem. If xrep_cow_mark_file_range() fails, the function returns directly and skips the scrub AG context and perag cleanup. Let the force-rebuild path fall through to the existing cleanup code and return the saved error after cleanup. Fixes: dbbdbd008632 ("xfs: repair problems in CoW forks") Cc: <stable@vger.kernel.org> # v6.8 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: fix overlapping extents returned for pNFS LAYOUTGETDai Ngo
xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(), which causes the bmap code to expand the mapping to cover the entire extent rather than the requested range. A single LAYOUTGET request from the client can cause the server to issue multiple calls to xfs_fs_map_blocks() for different offsets within the same extent. Because the use of XFS_BMAPI_ENTIRE flag, these calls can produce overlapping mappings. As a result, the LAYOUTGET reply sent to the NFS client may contain overlapping extents. This creates ambiguity in extent selection for a given file range, which can lead to incorrect device selection, inconsistent handling of datastate, and ultimately data corruption or protocol violations on the client side. Problem discovered with xfstest generic/075 test using NFSv4.2 mount with SCSI layout. Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that xfs_bmapi_read() returns only the mapping for the requested range. Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET"). Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: fix use of uninitialized imap in xfs_fs_map_blocks error pathDai Ngo
xfs_fs_map_blocks() acquires the data map lock and then calls xfs_bmapi_read(). If xfs_bmapi_read() fails, the function currently still falls through to xfs_bmbt_to_iomap(), which consumes an uninitialized imap record and may return invalid data to the caller. Fix this by releasing the data map lock and returning immediately when xfs_bmapi_read() reports an error. This prevents xfs_bmbt_to_iomap() from being called with an uninitialized xfs_bmbt_irec. Fixes: 527851124d10f ("xfs: implement pNFS export operations") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysxfs: handle racing deletions in xfs_zone_gc_iter_irecHans Holmberg
Under heavy garbage collection pressure from RocksDB workloads, filesystem shutdowns can occur in xfs_zone_gc_iter_irec when xfs_iget() returns -EINVAL for deleted files. Fix this by handling -EINVAL just like we handle -ENOENT, allowing zone GC to safely ignore stale mappings. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
8 daysMerge tag 'v7.1-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - security fix for FSCTL_SET_SPARSE - fix leak in ksmbd_query_inode_status() - fix OOB read in smb_check_perm_dacl() * tag 'v7.1-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE ksmbd: release ksmbd_inode ref via ksmbd_inode_put on lookup paths ksmbd: OOB read regression in smb_check_perm_dacl() ACE-walk loops
9 dayserofs: fix use-after-free on sbi->sync_decompressGao Xiang
z_erofs_decompress_kickoff() can race with filesystem unmount, causing a use-after-free on sbi->sync_decompress. When I/O completes, z_erofs_endio() calls z_erofs_decompress_kickoff() to queue z_erofs_decompressqueue_work() asynchronously. Then, after all folios are unlocked, unmount workflow can proceed and sbi will be freed before accessing to sbi->sync_decompress. Thread (unmount) I/O completion kworker queue_work z_erofs_decompressqueue_work (all folios are unlocked) cleanup_mnt .. erofs_kill_sb erofs_sb_free kfree(sbi) access sbi->sync_decompress // UAF!! Fixes: 40452ffca3c1 ("erofs: add sysfs node to control sync decompression strategy") Reported-by: syzbot+52bae5c495dbe261a0bc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=52bae5c495dbe261a0bc Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Jianan Huang <jnhuang95@gmail.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
10 daysfs/qnx6: fix pointer arithmetic in directory iterationArpith Kalaginanavoor
The conversion to qnx6_get_folio() in commit b2aa61556fcf ("qnx6: Convert qnx6_get_page() to qnx6_get_folio()") introduced a regression in directory iteration. The pointer 'de' and the 'limit' address were calculated using byte offsets from a char pointer without scaling by the size of a QNX6 directory entry. This causes the driver to read from incorrect memory offsets, leading to "invalid direntry size" errors and premature termination of directory scans. Fix this by casting 'kaddr' to 'struct qnx6_dir_entry *' before applying the offset and last_entry(...) increments. This allows the compiler to correctly scale the pointer arithmetic by the 32-byte stride of the directory entry structure. Fixes: b2aa61556fcf ("qnx6: Convert qnx6_get_page() to qnx6_get_folio()") Cc: stable@vger.kernel.org Signed-off-by: Arpith Kalaginanavoor <arpithk@nvidia.com> Link: https://patch.msgid.link/20260526123858.1683035-1-arpithk@nvidia.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 daysVFS: fix possible failure to unlock in nfsd4_create_file()NeilBrown
atomic_create() in fs/namei.c drops the reference to the dentry when it returns an error. This behaviour was imported into dentry_create() so that it will drop the reference if an error is returned from atomic_create(), though not if vfs_create() returns an error (in the case where ->atomic_create is not supported). The caller - nfsd4_create_file() - is made aware of this by checking path->dentry, which will either be a counted reference to a dentry, or an error pointer. However the change to use start_creating()/end_creating() (which landed shortly before the dentry_create() change landed, though was likely developed around the same time) means that nfsd4_create_file() *needs* a valid dentry so that it can unlock the parent. The net result is that if NFSD exports a filesystem which uses ->atomic_create, and if a call to ->atomic_create returns an error, then nfsd4_create_file() will pass an error pointer to end_creating() and the parent will not be unlocked. Fix this by changing dentry_create() to make sure path->dentry is always a valid dentry, never an error-pointer. The actual error is already returned a different way. Note that if ->atomic_create() returns a different dentry (which may not be possible in practice) we are guaranteed (because it is only ever provided by d_spliace_alias()) that it will have the same d_parent and so it will have the same effect when passed to end_creating(). Fixes: 64a989dbd144 ("VFS/knfsd: Teach dentry_create() to use atomic_open()") Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/177969022571.3379282.16448744624428323496@noble.neil.brown.name Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: Jori Koolstra <jkoolstra@xs4all.nl> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
11 dayssmb: client: fix uninitialized variable in smb2_writev_callbackSteve French
compiling with W=2 pointed out that "written may be used uninitialized" Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") Cc: stable@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
11 dayssmb: client: detect short folioq copy in cifs_copy_folioq_to_iter()Jeremy Erazo
cifs_copy_folioq_to_iter() copies a requested number of bytes from a folio queue into the destination iterator. Since the encrypted SMB2 READ path was changed to pass the server-declared payload length (data_len) instead of the larger folioq buffer length, the caller can ask for fewer bytes than the folio queue holds. In that case the helper continues walking the remaining folios after data_size has reached zero and calls copy_folio_to_iter() with len = 0, which is unnecessary work. The helper also returns 0 (success) when the folio queue is exhausted before data_size bytes have been copied. The caller has no way to distinguish that from a full copy and the reported transfer count ends up larger than the amount of data placed in the iterator. Add an early exit when data_size reaches zero, and return an error when the folio queue is exhausted before all requested bytes have been copied. Signed-off-by: Jeremy Erazo <mendozayt13@gmail.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: fix FSCTL permission bypass by adding a permission check for ↵Sean Shen
FSCTL_SET_SPARSE FSCTL_SET_SPARSE in fsctl_set_sparse() modifies the file's sparse attribute and saves it through xattr without any permission checks. This exposes two issues: 1) A client on a read-only share can change the sparse attribute on files it opened, even though the share is read-only. Other FSCTL write operations already check test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE), but FSCTL_SET_SPARSE does not. 2) Even on writable shares, clients without FILE_WRITE_DATA or FILE_WRITE_ATTRIBUTES access should not modify the sparse attribute. Similar handle-level checks exist in other functions but are missing here. Add both share-level writable check and per-handle access check. Use goto out on error to avoid leaking file references. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steve French <smfrench@gmail.com> Signed-off-by: Sean Shen <grayhat@foxmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: release ksmbd_inode ref via ksmbd_inode_put on lookup pathsAleksandr Golovnya
ksmbd_query_inode_status() and ksmbd_lookup_fd_inode() both take a reference on a ksmbd_inode via __ksmbd_inode_lookup() (which performs atomic_inc_not_zero()) and later release it using a bare atomic_dec(&ci->m_count). Unlike ksmbd_inode_put(), a bare atomic_dec() does not check whether the reference count has reached zero, so if the caller happens to drop the last reference, the ksmbd_inode is leaked: it stays in the global inode hash table with m_count == 0, future __ksmbd_inode_lookup() calls reject it via atomic_inc_not_zero(), and ksmbd_inode_free() is never invoked. The race is: T1: __ksmbd_inode_lookup() -> atomic_inc_not_zero(): m_count = 2 T2: ksmbd_inode_put() -> atomic_dec_and_test(): m_count = 1 (not freed) T1: atomic_dec(&ci->m_count) -> m_count = 0 return (LEAK) In ksmbd_lookup_fd_inode() the matched-fp path (which now also uses ksmbd_inode_put()) cannot currently reach m_count == 0 because the matched ksmbd_file holds its own reference on ci, but converting it to the proper API keeps the three call sites consistent and avoids future regressions if the locking changes. Because ksmbd_inode_put() may free the ksmbd_inode if this drops the last reference, the call must happen after up_read(&ci->m_lock) on the two affected paths in ksmbd_lookup_fd_inode(). On the no-match path this is a pure reordering; on the matched path ksmbd_fp_get() is moved above the unlock so that the returned ksmbd_file is pinned before the inode reference is released. Signed-off-by: Aleksandr Golovnya <cofedish@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: OOB read regression in smb_check_perm_dacl() ACE-walk loopsAli Ganiyev
Commit d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") introduced a transposed bounds check: if (offsetof(struct smb_ace, sid) + aces_size < CIFS_SID_BASE_SIZE) Since offsetof(..sid) is 8 and CIFS_SID_BASE_SIZE is 8, this evaluates to `aces_size < 0`. Because `aces_size` is always non-negative, this check becomes dead code and never breaks the loop. Worse, that commit removed the old 4-byte guard, meaning the loop now reads `ace->size` (offset 2) even when `aces_size` is 0-3 bytes. This re-opens a 2-byte heap out-of-bounds (OOB) read past the pntsd allocation during subsequent SMB2_CREATE operations. Fix this by properly transposing the comparison to require at least 16 bytes (8-byte offset + 8-byte SID base), matching the correct form used in smb_inherit_dacl(). Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") Cc: stable@vger.kernel.org Signed-off-by: Ali Ganiyev <ali.qaniyev@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysMerge tag 'nfsd-7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Regressions: - Tighten bounds checking for sunrpc cache hash tables - Don't report key material in the ftrace log Stable fix: - Fix lockd's implementation of the NLM TEST procedure" * tag 'nfsd-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: lockd: fix TEST handling when not all permissions are available. NFSD: Report whether fh_key was actually updated sunrpc: prevent out-of-bounds read in __cache_seq_start()
12 daysMerge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
13 dayshpfs: fix a crash if hpfs_map_dnode_bitmap failsMikulas Patocka
If hpfs_map_dnode_bitmap fails, the code would call hpfs_brelse4 on uninitialized quad buffer head, causing a crash. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Farhad Alemi <farhad.alemi@berkeley.edu> Cc: stable@vger.kernel.org
2026-05-23Merge tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - fix for creating tmpfiles - fix durable reconnect error path - validate SID in security descriptor when inheriting DACL * tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: smb/server: promote S_DEL_ON_CLS to S_DEL_PENDING when close ksmbd: validate SID in parent security descriptor during ACL inheritance ksmbd: fix durable reconnect error path file lifetime
2026-05-23Merge tag 'for-7.1-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A batch of fixes to simple quotas: - add conditional rescheduling point not dependent on the lock during inode iterations to avoid delays with PREEMPT_NONE enabled - fix subvolume deletion so it does not break the squota invariants - properly handle enabling squota, tracking extents in the initial transaction - catch and warn about underflows, clamp to zero to avoid further problems And one fix to inode size handling: - fix handling of preallocated extents beyond i_size when not using the no-holes feature" * tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: swallow btrfs_record_squota_delta() ENOENT btrfs: clamp to avoid squota underflow btrfs: fix squota accounting during enable generation btrfs: check for subvolume before deleting squota qgroup btrfs: always drop root->inodes lock before cond_resched() btrfs: mark file extent range dirty after converting prealloc extents
2026-05-23Merge tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Carlos Maiolino: "A single fix for a race in xfs buffer cache which may lead to filesystem shutdown due to inconsistent metadata if the buffer lookup happens to find an old dead buffer still in the cache" * tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix a buffer lookup against removal race
2026-05-23Merge tag 'driver-core-7.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Remove the software node on platform device release(); without this, the software node remains registered after the device is gone and a subsequent platform_device_register_full() reusing the same node fails with -EBUSY - In sysfs_update_group(), do not remove a pre-existing directory when create_files() fails; the previous code would silently destroy a sysfs group that the caller did not create - Set fwnode->secondary to NULL in fwnode_init() to avoid dereferencing uninitialized memory (e.g. in dev_to_swnode()) when the firmware node is allocated on the stack or via a non-zeroing allocator * tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: device property: set fwnode->secondary to NULL in fwnode_init() sysfs: don't remove existing directory on update failure driver core: platform: remove software node on release()
2026-05-23Merge tag 'xfs-fixes-7.1-rc5' of ↵Carlos Maiolino
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux into test_merge xfs: fixes for v7.1-rc5 Signed-off-by: Carlos Maiolino <cem@kernel.org> Lines starting with '#' will be ignored.
2026-05-22Merge tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix missing lock - Fix dentry in use after unmounting - cifs.upcall security fix - require CAP_NET_ADMIN for swn netlink - change allocation in DUP_CTX_STR to GFP_KERNEL - minor smbdirect debug fix - handle_read_data() folio fix * tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: change allocation requirements in DUP_CTX_STR macro smb: client: require net admin for CIFS SWN netlink smb: smbdirect: divide, not multiply, milliseconds by 1000 cifs: Fix busy dentry used after unmounting smb: client: use data_len for SMB2 READ encrypted folioq copy smb: client: reject userspace cifs.spnego descriptions smb: client: protect tc_count increment in smb2_find_smb_sess_tcon_unlocked()
2026-05-22Merge tag 'zonefs-7.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - Avoid potential overflow when converting a zonefs file number string to an inode number (from Johannes) * tag 'zonefs-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: handle integer overflow in zonefs_fname_to_fno
2026-05-22fuse: reject fuse_notify() pagecache ops on directoriesJann Horn
The operations FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE allow the FUSE daemon to actively write/read pagecache contents. For directories with FOPEN_CACHE_DIR, the pagecache is used as kernel-internal cache storage, and userspace is not supposed to have direct access to this cache - in particular, fuse_parse_cache() will hit WARN_ON() if the cache contains bogus data. Reject FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE on anything other than regular files with -EINVAL. Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260519-fuse-dir-pagecache-v2-1-5428fa48e175@google.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-22fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate foliosJann Horn
FUSE_NOTIFY_RETRIEVE must be limited to uptodate folios; !uptodate folios can contain uninitialized data. Since FUSE_NOTIFY_RETRIEVE is intended to only return data that is already in the page cache and not wait for data from the FUSE daemon, treat !uptodate folios as if they weren't present. This only has security impact on systems that don't enable automatic zero-initialization of all page allocations via CONFIG_INIT_ON_ALLOC_DEFAULT_ON or init_on_alloc=1. Cc: stable@kernel.org Fixes: 2d45ba381a74 ("fuse: add retrieve request") Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260519-fuse-retrieve-uptodate-v1-1-a7a1912a37f9@google.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-21Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare"Lorenzo Stoakes
This reverts commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") with conflict resolution to account for changes in commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare"). The patch incorrectly handled hugetlb VMA lock allocation at the mmap_prepare stage, where a failed allocation occurring after mmap_prepare is called might result in the lock leaking. There is no risk of a merge causing a similar issues, as VMA_DONTEXPAND_BIT is set for hugetlb mappings. As a first step in addressing this issue, simply revert the change so we can rework how we do this having corrected the underlying issues. We maintain the VMA flags changes as best we can, accounting for the fact that we were working with a VMA descriptor previously and propagating like-for-like changes for this. Note that we invoke vma_set_flags() and do not call vma_start_write() as vm_flags_set() does. This is OK as it's being done in an .mmap hook where the VMA is not yet linked into the tree so nobody else can be accessing it. Link: https://lore.kernel.org/20260512160643.266960-1-ljs@kernel.org Fixes: ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Closes: https://lore.kernel.org/linux-mm/20260425070700.562229-1-25181214217@stu.xidian.edu.cn/ Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21smb/server: promote S_DEL_ON_CLS to S_DEL_PENDING when closeChenXiaoSong
Reproducer: 1. server: systemctl start ksmbd 2. client: mount -t cifs //${server_ip}/export /mnt 3. client: C program: openat(AT_FDCWD, "/mnt", O_RDWR | O_TMPFILE, 0600) Do not treat `FILE_DELETE_ON_CLOSE_LE` as delete pending while files remain open. This patch fixes xfstests generic/004. Cc: stable@vger.kernel.org Link: https://chenxiaosong.com/en/smb-xfstests-generic-004.html Co-developed-by: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Tested-by: Steve French <stfrench@microsoft.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-21ksmbd: validate SID in parent security descriptor during ACL inheritanceJunyi Liu
Introduce smb_validate_ntsd_sid() helper to safely validate Owner SID and Group SID inside the NT Security Descriptor (smb_ntsd) retrieved from the parent directory. Cc: stable@vger.kernel.org Signed-off-by: Junyi Liu <moss80199@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-21ksmbd: fix durable reconnect error path file lifetimeJunyi Liu
After a durable reconnect succeeds, ksmbd_reopen_durable_fd() republishes the same ksmbd_file into the session volatile-id table. If smb2_open() then takes a later error path, cleanup first calls ksmbd_fd_put(work, fp) and then unconditionally calls ksmbd_put_durable_fd(dh_info.fp). In this case fp and dh_info.fp are the same object. The first put drops the reconnect lookup reference, but the final durable put can run __ksmbd_close_fd(NULL, fp). Because the final close is not session-aware, it can free the file object without removing the volatile-id entry that was just published into the session table. Use the session-aware put for the final reconnect drop when the reconnect had already succeeded and the error path is cleaning up the republished file. Earlier reconnect failures, before fp is assigned to dh_info.fp, keep using the durable-only put path. Fixes: 1baff47b81f9 ("ksmbd: fix use-after-free in smb2_open during durable reconnect") Signed-off-by: Junyi Liu <moss80199@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-21lockd: fix TEST handling when not all permissions are available.NeilBrown
The F_GETLK fcntl can work with either read access or write access or both. It can query F_RDLCK and F_WRLCK locks in either case. However lockd currently treats F_GETLK similar to F_SETLK in that read access is required to query an F_RDLCK lock and write access is required to query a F_WRLCK lock. This is wrong and can cause problems - e.g. when qemu accesses a read-only (e.g. iso) filesystem image over NFS (though why it queries if it can get a write lock - I don't know. But it does, and this works with local filesystems). So we need TEST requests to be handled differently. To do this: - change nlm_do_fopen() to accept O_RDWR as a mode and in that case succeed if either a O_RDONLY or O_WRONLY file can be opened. - change nlm_lookup_file() to accept a mode argument from caller, instead of deducing base on lock time, and pass that on to nlm_do_fopen() - change nlm4svc_retrieve_args() and nlmsvc_retrieve_args() to detect TEST requests and pass O_RDWR as a mode to nlm_lookup_file, passing the same mode as before for other requests. Also set lock->fl.c.flc_file to whichever file is available for TEST requests. - change nlmsvc_testlock() to also not calculate the mode, but to use whatever was stored in lock->fl.c.flc_file. This behaviour of lockd - requesting O_WRONLY access to TEST for exclusive locks - has been present at least since git history began. However it was hidden until recently because knfsd ignored the access requested by lockd and required only READ access for all locking requests (unless the underlying filesystem provided an f_op->open function which checked access permissions). The commit mentioned in Fixes: below changed nfsd_permission() to NOT override the access request for LOCK requests and this exposed the bug that we are now fixing. Note that there is another issue that this patch does not address. The flock(.., LOCK_EX) call is permitted on a read-only file descriptor. Linux NFS maps this to NLM locking as whole-file byte-range locks. nfsd will see this as though it were fcntl( F_SETLK (F_WRLCK)) and will now require write access, which it might not be able to get. It is not clear if this is a problem in practice, or what the best solution might be. So no attempt is made to address it. Reported-by: Tj <tj.iam.tj@proton.me> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1128861 Fixes: 4cc9b9f2bf4d ("nfsd: refine and rename NFSD_MAY_LOCK") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-05-21NFSD: Report whether fh_key was actually updatedChuck Lever
The nfsd_ctl_fh_key_set tracepoint was introduced to capture operator activity on the filehandle signing key. Earlier revisions logged the key bytes verbatim; the version that landed hashes the 16 key bytes through crc32_le and stores the result. CRC32 is a linear projection of its input rather than a one-way function, and truncating any hash of fixed-size secret material leaves the key recoverable under offline brute force when the threat model includes an attacker with access to the trace ring. The operational question the fingerprint was meant to answer is whether a NFSD_CMD_THREADS_SET call that carries an NFSD_A_SERVER_FH_KEY attribute actually replaced the active key or re-installed the value already in place. Answer that question directly: compare the incoming key bytes against the current nn->fh_key inside nfsd_nl_fh_key_set() and surface a single bit to the tracepoint. The event now prints "updated" when the stored key changed and "unmodified" otherwise. A first set that fails kmalloc reports "unmodified" because no key was installed. Reported-by: jaeyeong <fin@spl.team> Fixes: 62346217fd72 ("NFSD: Add a key for signing filehandles") Cc: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-05-21smb: client: change allocation requirements in DUP_CTX_STR macroFredric Cover
Currently, the macro DUP_CTX_STR allocates new_ctx->field using GFP_ATOMIC. DUP_CTX_STR is only used in smb3_fs_context_dup(), which is never called in an atomic context. Using GFP_ATOMIC puts unnecessary pressure on emergency memory pools. Change GFP_ATOMIC to GFP_KERNEL. Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-21smb: client: require net admin for CIFS SWN netlinkMichael Bommarito
CIFS_GENL_CMD_SWN_NOTIFY is the userspace witness-notify command. The intended sender is the cifs.witness helper, but the generic-netlink operation currently has no capability flag, so any local process can send RESOURCE_CHANGE or CLIENT_MOVE notifications to the in-kernel witness handler. The same family exposes CIFS_GENL_MCGRP_SWN without multicast-group capability flags. Register messages sent to that group include the witness registration id and, for NTLM-authenticated mounts, the username, domain, and password attributes copied from the CIFS session. An unprivileged local process should not be able to join that group and receive those messages. Require CAP_NET_ADMIN for incoming SWN_NOTIFY commands with GENL_ADMIN_PERM, and require CAP_NET_ADMIN over the network namespace for joining the SWN multicast group with GENL_MCAST_CAP_NET_ADMIN. The cifs.witness service runs with the privileges needed for both operations. Fixes: fed979a7e082 ("cifs: Set witness notification handler for messages from userspace daemon") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Steve French <stfrench@microsoft.com>