linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
14 hours	Merge tag 'ntfs-for-7.2-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs Pull ntfs fixes from Namjae Jeon: - Keep RECALL_ON_OPEN in inode flags when reloading them from $FILE_NAME - Check runlist reallocation sizes for negative values and overflow - Drop stale page cache after shrinking non-resident attributes to prevent writeback failures and data loss * tag 'ntfs-for-7.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: ntfs: drop stale page-cache when shrinking a non-resident attr ntfs: harden runlist realloc size calculations ntfs: preserve RECALL_ON_OPEN on WSL special-file reparse points
15 hours	Merge tag 'v7.2-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb server fixes from Steve French: - Use memcmp() when comparing fixed-size binary ClientGUIDs, so embedded NUL bytes are handled correctly - Reject repeated SMB2 NEGOTIATE requests after dialect selection This prevents preauth_info leaks, enforces the SMB2 protocol requirements, and serializes negotiation state updates. - Fix a use-after-free in __close_file_table_ids() by removing the volatile file ID from the owning IDR before dropping the IDR reference * tag 'v7.2-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use memcmp() to compare ClientGUIDs ksmbd: reject repeated SMB2 NEGOTIATE requests ksmbd: fix use-after-free in __close_file_table_ids()
39 hours	Merge tag 'v7.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds
	Pull smb client fixes from Steve French: - fix SMB1 read and write potential buffer leaks - netfs error handling fix - fix check for last write time in truncate and setattr and cleanup use of smb_store_release() - fscache fix and cleanup - validate idmap key payload length - minor SMB1 error mapping cleanup - witness protocol memory allocation fix * tag 'v7.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add fscache_resize_cookie() to cifs_setsize() cifs: fix time_last_write stamp placement in setattr/truncate paths cifs: consolidate time_last_write stamp into _cifsFileInfo_put() smb: client: simplify cifs_fscache_get_super_cookie() smb: client: free partially allocated transform folio queue cifs: validate idmap key payload length smb: client: remove conditional return with no effect smb: client: fix buffer leaks in SMB1 read and write smb: client: use GFP_KERNEL for registry allocation
39 hours	ksmbd: use memcmp() to compare ClientGUIDs	Namjae Jeon
	ClientGUID is a fixed-size binary value and can contain embedded NUL bytes. strncmp() stops comparing at the first NUL byte, so different ClientGUID values can incorrectly be treated as equal. Use memcmp() in SMB3 multichannel session binding and FSCTL_VALIDATE_NEGOTIATE_INFO to compare all SMB2_CLIENT_GUID_SIZE bytes. Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel") Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Reported-by: Samu <nomomentomori@gmail.com> Suggested-by: Samu <nomomentomori@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
39 hours	ksmbd: reject repeated SMB2 NEGOTIATE requests	Namjae Jeon
	Unauthenticated client can send multiple successful SMB2 NEGOTIATE requests on one connection before SESSION_SETUP. While the connection is in KSMBD_SESS_NEED_SETUP, smb2_handle_negotiate() accepts another SMB3.1.1 NEGOTIATE and overwrites conn->preauth_info with a new allocation. Only the final allocation is freed when the connection is released, leaking one object for every additional successful request. A repeated SMB2 NEGOTIATE after a dialect has been selected is a protocol violation. MS-SMB2 section 3.3.5.4 requires the server to disconnect without replying in this case. Set the connection exiting when rejecting the request, in addition to suppressing the response. Reject SMB2 NEGOTIATE unless the connection is new or is waiting for the SMB2 NEGOTIATE that follows an SMB1 multi-protocol negotiate. Serialize both SMB1 and SMB2 negotiation paths under conn->srv_mutex, since they update connection-wide dialect and negotiation state. Move the locking contract to ksmbd_smb_negotiate_common(), where the state and dialect are selected, and add ksmbd_conn_new() for consistent state access. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: Runa Takemoto <takemotoruna223@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
39 hours	ksmbd: fix use-after-free in __close_file_table_ids()	Namjae Jeon
	A ksmbd_file can remain alive after logical close while another session holds a temporary reference obtained through ksmbd_lookup_fd_inode(). ksmbd_close_fd() currently marks the file closed and drops the idr-owned reference, but leaves the pointer published in the closing session's idr until the final reference is dropped. If the foreign holder performs the final ksmbd_fd_put(), __put_fd_final() supplies the foreign session's file table to __ksmbd_close_fd(). The object is then freed without being removed from its owner's idr, and the owner session later dereferences the stale pointer during file-table teardown. Remove the volatile id from the owner's idr while ksmbd_close_fd() still holds that table's lock, and clear volatile_id before dropping the idr-owned reference. A later foreign final put then only performs physical destruction and cannot remove the object from the wrong table. Fixes: 8510a043d334 ("ksmbd: increment reference count of parent fp") Reported-by: Yunseong Kim <yunseong.kim@est.tech> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
4 days	Merge tag 'for-7.2-rc5-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Zoned mode: - fix assertion and handle case of finished zone and truncated extent - fix zone metadata write pointer on actual zone reset - fix deadlock caused metadata writeback and transaction commit - fix return value reuse leading to confusion about chunk reservations raid56 scrub: - fix tracking of sector checksums when there are not checksums found - fix inverted logic when submitting parity read bio mount/remount fixes: - fix leaking 'remount in progress' state which can break other operations to work (qgroup rescan, autodefrag, reclaim) - adjust using global block reserve after read-only mount when using rescue= option - handle missing raid stripe tree when mounted with 'ignorebadroots' Misc: - fix -Wmaybe-uninitialized warning in GET_CSUMS ioctl" * tag 'for-7.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: raid56: fix scrub read assembly submitting no reads btrfs: zoned: skip fully truncated ordered extents at zone finish btrfs: initialize 'args' to avoid compiler warning in btrfs_ioctl_get_csums() btrfs: zoned: fix missing chunk metadata reservation btrfs: raid56: fix an incorrect csum skip during scrub btrfs: report missing raid stripe tree root during lookup btrfs: skip global block reserve accounting for rescue mounts btrfs: zoned: reset meta_write_pointer on zone reset btrfs: zoned: fix deadlock between metadata writeback and transaction commit btrfs: fix leaking BTRFS_FS_STATE_REMOUNTING flag
5 days	cifs: add fscache_resize_cookie() to cifs_setsize()	Frank Sorenson
	Several code paths update the VFS inode size by calling netfs_resize_file() and cifs_setsize(), but omit the corresponding fscache_resize_cookie() call, leaving the fscache cookie out of sync with the actual file size: - cifs_file_set_size() in inode.c: server-side truncation via setattr - cifs_do_truncate() in file.c: truncates to zero on O_TRUNC open - smb2_duplicate_extents() in smb2ops.c: file clone extending EOF - smb3_simple_falloc() in smb2ops.c: two branches that extend EOF via write-range and SMB2_set_eof respectively Since every caller of cifs_setsize() must resize the fscache cookie, add the call to cifs_setsize() itself, consistent with how truncate_pagecache() is already consolidated there. Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") Fixes: 93a43155127f ("cifs: Fix missing set of remote_i_size") Fixes: 110fee6b9bb5 ("smb: client: fix missing timestamp updates with O_TRUNC") Fixes: 7a06d3b816d7 ("smb/client: emulate small EOF-extending mode 0 fallocate ranges") Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Paulo Alcantara <pc@manguebit.org> Cc: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Paulo Alcantara <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
5 days	cifs: fix time_last_write stamp placement in setattr/truncate paths	Frank Sorenson
	cifs_file_set_size() calls cifs_setsize() on success, which calls i_size_write(), updating i_size to the new value. The subsequent check attrs->ia_size != i_size_read() in both cifs_setattr_unix() and cifs_setattr_nounix() therefore always evaluates false after a successful cifs_file_set_size(), making the smp_store_release() of time_last_write dead code. The truncate path was unprotected against stale readdir size updates. Move the stamp to before the cifs_file_set_size() RPC call, guarded by attrs->ia_size != i_size_read() to exclude no-op same-size ftruncate(2) calls from stamping time_last_write unnecessarily. On the error path the stamp remains rather than being restored: restoring a stale snapshot (prev_tlw) could silently erase a concurrent _cifsFileInfo_put() close stamp if that close arrived between the READ_ONCE and the smp_store_release. readdir is suppressed until the stamp expires, which extends beyond one acregmax if the caller retries failed truncations. stat() is unaffected: the cifs_revalidate_dentry_attr() path calls cifs_fattr_to_inode() with from_readdir=false, which bypasses the time_last_write check in is_size_safe_to_change() entirely and always writes the authoritative QUERY_INFO result to i_size. Remove the now-unreachable stamp from the dead block in both functions. Fixes: e8a8d54c2d50 ("cifs: prevent readdir from changing file size due to stale directory metadata") Signed-off-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
5 days	cifs: consolidate time_last_write stamp into _cifsFileInfo_put()	Frank Sorenson
	The time_last_write stamp was scattered across cifs_close(), smb2_deferred_work_close(), and the three drain functions in misc.c. This missed the case where background I/O holds the final reference after userspace close() returns, and required explicit maintenance at each close-path site. Move the smp_store_release() into _cifsFileInfo_put(), immediately before releasing open_file_lock. This single location covers all close paths unconditionally: normal close, background I/O dropping the final reference, deferred close via timer or external drain. The spinlock's store-release/load-acquire pairing with is_inode_writable() already provides the ordering guarantee documented in is_size_safe_to_change(). Remove the now-redundant stamps from cifs_close(), smb2_deferred_work_close(), and all six stamp sites in the misc.c deferred-close drain functions. Fixes: e8a8d54c2d50 ("cifs: prevent readdir from changing file size due to stale directory metadata") Signed-off-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
5 days	smb: client: simplify cifs_fscache_get_super_cookie()	Dmitry Antipov
	Avoid redundant 'strlen()' and use the convenient 'strreplace()' to simplify 'cifs_fscache_get_super_cookie()'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Steve French <stfrench@microsoft.com>
5 days	smb: client: free partially allocated transform folio queue	Yichong Chen
	netfs_alloc_folioq_buffer() may leave a partially allocated folio queue attached to the caller's buffer pointer when it returns an error. smb3_init_transform_rq() stores the buffer in the request only after allocation succeeds, so the common error path cannot free a partial allocation. Store the buffer pointer before checking the return value so err_free releases it. Signed-off-by: Yichong Chen <chenyichong@uniontech.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
5 days	Merge tag 'mm-hotfixes-stable-2026-07-27-14-18' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. All are cc:stable. 11 are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-07-27-14-18' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: fs/proc/task_mmu: fix PAGEMAP_SCAN written state for PMD holes mm/hugetlb: fix list corruption in allocate_file_region_entries() mm: mglru: fix stale batch updates after memcg reparenting selftest: fix headers in fclog.c ocfs2: fix boundary check in ocfs2_check_dir_entry() to use buffer offset mm/percpu-km: fix bitmap overflow and accounting in pcpu_create_chunk() mm/util: don't read __page_2 for order-1 folios in snapshot_page() mm/hugetlb: fix swap entry corruption when clearing uffd-wp at fork() mm: migrate_device: fix pte_pfn/pte_dirty called on non-present PTE fs/proc/task_mmu: fix PAGEMAP_SCAN written state for unpopulated ptes userfaultfd: wait on source PMD during UFFDIO_MOVE lib: test_hmm: use device devt for coherent device range selection mm/vmstat: fold stranded per-cpu node stats when a node comes online
5 days	Merge tag 'erofs-for-7.2-rc6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Fix a regression in page cache sharing which can cause a NULL pointer dereference, and limit LZMA stream memory usage on systems with many CPUs. - Keep a valid f_path for page cache sharing to fix a recent mincore() NULL pointer dereference - Limit LZMA stream pool size when too many processors are available - Sync up with Hongbo Li's latest email address" * tag 'erofs-for-7.2-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: cap LZMA stream pool size erofs: ensure valid f_path for page cache sharing MAINTAINERS: update Hongbo Li's email address
5 days	erofs: cap LZMA stream pool size	Michael Bommarito
	fs/erofs/decompressor_lzma.c sizes the module-global MicroLZMA stream pool from num_possible_cpus() when the lzma_streams module parameter is unset, then z_erofs_load_lzma_config() preallocates one image-supplied dictionary per stream, accepting dictionaries up to 8 MiB. On high-CPU systems, a small EROFS image can pin hundreds of MiB of vmalloc-backed decoder state until the erofs module is unloaded. Impact: An EROFS image mounted by the system can pin up to 8 MiB of vmalloc memory per LZMA stream, either as intended or unexpectedly. Bound the default stream count by a new CONFIG_EROFS_FS_ZIP_LZMA_DEFAULT_MAX_STREAMS option, default 16, so the worst-case default preallocation is 128 MiB if the number of CPUs is no less than 16 while preserving the existing per-image dictionary limit. An explicit lzma_streams module parameter is still honoured as-is, so administrators who deliberately size the pool are not affected. Fixes: 622ceaddb764 ("erofs: lzma compression support") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
5 days	erofs: ensure valid f_path for page cache sharing	Gao Xiang
	Previously, backing files for page cache sharing were set up with f_path left as NULL (only f_inode was valid). It worked, but a recent mincore fix relies on f_path.mnt and crashes (found by "erofs/028" on 7.2-rc4): BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 675528 Comm: fincore Not tainted 7.2.0-rc4-00002-g[]-dirty #1 PREEMPT(lazy) Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 RIP: 0010:__do_sys_mincore+0xc0/0x2c0 ... Specify valid paths using valid disconnected dentries together with erofs_ishare_mnt instead of leaving f_path empty, so they are more like real backing files in a pseudo filesystem and standard backing_file_open() can be used directly. Fixes: e187bc02f8fa ("mm: do file ownership checks with the proper mount idmap") Acked-by: Hongbo Li <hongbohbli@tencent.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
6 days	cifs: validate idmap key payload length	Li Qiang
	The cifs.idmap key type stores its payload length in key->datalen, which is limited to U16_MAX. Accepting a larger key payload truncates the recorded length and can make later users interpret the payload using inconsistent bounds. Reject oversized preparsed payloads before allocating or copying them. This keeps key->datalen consistent with the stored data for both inline and separately allocated idmap payloads. Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
6 days	smb: client: remove conditional return with no effect	Sang-Heon Jeon
	Both branches of the check return the same value, so the check has no effect. Remove it and return the value directly. This is the result of running the Coccinelle script from scripts/coccinelle/misc/cond_return_no_effect.cocci. Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
6 days	smb: client: fix buffer leaks in SMB1 read and write	Dawei Feng
	CIFSSMBRead(), CIFSSMBWrite() and CIFSSMBWrite2() allocate a request buffer before checking whether tcon->ses->server is NULL. If that defensive check ever fails, the helper returns -ECONNABORTED without releasing the request buffer. Fix these leaks by releasing the allocated request buffer before returning from these error paths. Use cifs_small_buf_release() for the buffers allocated by small_smb_init() and cifs_buf_release() for the buffer allocated by smb_init(). The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1.1. An x86_64 allyesconfig build showed no new warnings. Runtime validation used a temporary fault-injection hook to force tcon->ses->server to NULL after request-buffer initialization. On the unfixed kernel, the harness observed two leaked small request buffers and one leaked large request buffer, with directed kmemleak dumps confirming the CIFS buffer allocation stacks. After the fix, no CIFS request-buffer deltas remained. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
6 days	smb: client: use GFP_KERNEL for registry allocation	Fredric Cover
	Currently, cifs_get_swn_reg() allocates new registry entries using GFP_ATOMIC. Since we lock a mutex here, this is clearly not an atomic context. Use GFP_KERNEL instead. Also, fix a minor grammatical error in the comment above the function. Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
6 days	Merge tag 'vfs-7.2-rc5.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - vfs: Preserve the ACL_DONT_CACHE state in forget_cached_acl(). ACL_DONT_CACHE is meant to be a permanent opt-out from ACL caching which FUSE relies on for servers that don't negotiate FUSE_POSIX_ACL. The helper replaced it with ACL_NOT_CACHED, silently re-enabling the cache, and as fuse doesn't invalidate the cache for such servers a properly timed get_acl() returned stale ACLs. Comes with a fuse selftest reproducing this. - pidfs: - Preserve PIDFD_THREAD when a thread pidfd is reopened via open_by_handle_at(). PIDFD_THREAD shares the O_EXCL bit which do_dentry_open() strips after the flags have been validated, so the reopened pidfd silently became a process pidfd. Comes with a selftest. - Add a pidfs_dentry_open() helper so the regular pidfd allocation path and the file handle path share the code that forces O_RDWR and reapplies the pidfd flags that do_dentry_open() strips. - Handle FS_IOC32_GETVERSION in the compat ioctl path. - Make pidfs_ino_lock static. - iomap: - Fix the block range calculation in ifs_clear_range_dirty() so a partial clear doesn't drop the dirty state of blocks the range only partially covers. - Support invalidating partial folios so a partial truncate or hole punch with blocksize < foliosize doesn't leave stale dirty bits behind. - Only set did_zero when iomap_zero_iter() actually zeroed something. - Guard ifs_set_range_dirty() and ifs_set_range_uptodate() against zero-length ranges where the unsigned last-block calculation underflows and bitmap_set() writes far beyond the ifs->state allocation. - Don't merge ioends with different io_private values as the merge could leak or corrupt the private data of the individual ioends. - exec: - Raise bprm->have_execfd only once the binfmt_misc interpreter has actually been opened. The flag was set as soon as a matching 'O' or 'C' entry was found. If the interpreter open failed with ENOEXEC the exec fell through to the next binary format with have_execfd raised but no executable staged and begin_new_exec() NULL derefed past the point of no return. - Fix an unsigned loop counter wrap in transfer_args_to_stack() on nommu. An overlong argument or environment string pushes bprm->p below PAGE_SIZE, the stop index becomes zero, and the loop never terminates, wrapping its counter and copying garbage from in front of the page array into the new process stack. - Make binfmt_elf_fdpic only honour the first PT_INTERP like binfmt_elf does. Each additional PT_INTERP overwrote the previous interpreter, leaking the name allocation and the interpreter file reference together with the write denial open_exec() took, leaving the file unwritable for as long as the system runs. - overlayfs: - Compare the full escaped xattr prefix including the trailing dot. An xattr like "trusted.overlay.overlayfoo" was misclassified as an escaped overlay xattr. - Check read access to the copy_file_range() source with the source's mounter credentials. - super: Thawing a filesystem whose block device was frozen with bdev_freeze() deadlocked. Dropping the last block layer freeze reference from under s_umount ends up in fs_bdev_thaw() which reacquires s_umount on the same task. Pin the superblock with an active reference instead and call bdev_thaw() without holding s_umount. - procfs: Return EACCES instead of success when the ptrace access check for namespace links fails. - afs: Use afs_dir_get_block() rather than afs_dir_find_block() for block 0 in afs_edit_dir_remove(), matching afs_edit_dir_add(). - Push the memcg gating of ->nr_cached_objects() down into the btrfs and shmem callbacks instead of skipping every callback during non-root memcg reclaim. The blanket check short-circuited XFS whose inode reclaim hook is intentionally driven from per-memcg contexts to free memcg-charged slab. - eventpoll: Pin files while checking reverse paths. Since struct file became SLAB_TYPESAFE_BY_RCU a concurrent close could free and recycle the file under the check which then took and dropped the f_lock of whatever live file now occupies that slot. * tag 'vfs-7.2-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) super: fix emergency thaw deadlock on frozen block devices pidfs: make pidfs_ino_lock static eventpoll: pin files while checking reverse paths fs: push nr_cached_objects memcg gating into individual filesystems afs: Fix afs_edit_dir_remove() to get, not find, block 0 iomap: prevent ioend merge when io_private differs iomap: add comments for ifs_clear/set_range_dirty() iomap: fix out-of-bounds bitmap_set() with zero-length range iomap: fix incorrect did_zero setting in iomap_zero_iter() iomap: support invalidating partial folios iomap: correct the range of a partial dirty clear fs/super: fix emergency thaw double-unlock of s_umount pidfs: handle FS_IOC32_GETVERSION in compat ioctl ovl: check access to copy_file_range source with src mounter creds proc: Fix broken error paths for namespace links pidfs: add pidfs_dentry_open() helper selftests/pidfd: check PIDFD_THREAD survives open_by_handle_at() pidfs: preserve thread pidfds reopened by file handle ovl: fix trusted xattr escape prefix matching selftests/fuse: add ACL_DONT_CACHE regression test ...
6 days	super: fix emergency thaw deadlock on frozen block devices	Christian Brauner
	do_thaw_all_callback() calls bdev_thaw() while holding sb->s_umount exclusively. If the block device was frozen via bdev_freeze() dropping the last block layer freeze reference calls fs_bdev_thaw() which reacquires s_umount: do_thaw_all_callback(sb) super_lock_excl(sb) # holds sb->s_umount bdev_thaw(sb->s_bdev) mutex_lock(&bdev->bd_fsfreeze_mutex) # bd_fsfreeze_count drops 1 -> 0 bd_holder_ops->thaw == fs_bdev_thaw get_bdev_super(bdev) bdev_super_lock(bdev, true) super_lock(sb, true) down_write(&sb->s_umount) # same task: deadlock The emergency thaw worker deadlocks against itself holding both s_umount and bd_fsfreeze_mutex. That fscks any subsequent unmount, freeze, or thaw of that filesystem and block device. [ 81.878470] sysrq: Show Blocked State [ 81.880140] task:kworker/0:1 state:D stack:0 pid:11 tgid:11 ppid:2 task_flags:0x4208060 flags:0x00080000 [ 81.884876] Workqueue: events do_thaw_all [ 81.886656] Call Trace: [ 81.887759] <TASK> [ 81.888763] __schedule+0x579/0x1420 [ 81.890372] schedule+0x3a/0x100 [ 81.891794] schedule_preempt_disabled+0x15/0x30 [ 81.893848] rwsem_down_write_slowpath+0x1ea/0x900 [ 81.895191] ? __pfx_do_thaw_all_callback+0x10/0x10 [ 81.896528] down_write+0xbd/0xc0 [ 81.897505] super_lock+0x91/0x180 [ 81.898457] ? __mutex_lock+0xa99/0x1140 [ 81.900748] ? __mutex_unlock_slowpath+0x1f/0x400 [ 81.902069] bdev_super_lock+0x5b/0x150 [ 81.903132] get_bdev_super+0x10/0x60 [ 81.904042] fs_bdev_thaw+0x23/0xf0 [ 81.904755] bdev_thaw+0x82/0x100 [ 81.905484] do_thaw_all_callback+0x2c/0x50 [ 81.906298] __iterate_supers+0x5d/0x130 [ 81.907067] do_thaw_all+0x20/0x40 [ 81.907739] process_one_work+0x206/0x5e0 [ 81.908545] worker_thread+0x1e2/0x3c0 [ 81.909339] ? __pfx_worker_thread+0x10/0x10 [ 81.910171] kthread+0xf4/0x130 [ 81.910799] ? __pfx_kthread+0x10/0x10 [ 81.911528] ret_from_fork+0x2e2/0x3b0 [ 81.912259] ? __pfx_kthread+0x10/0x10 [ 81.913010] ret_from_fork_asm+0x1a/0x30 [ 81.913806] </TASK> bdev_super_lock() even documents the violated requirement with lockdep_assert_not_held(&sb->s_umount). Acquiring bd_fsfreeze_mutex under s_umount also inverts the bd_fsfreeze_mutex vs. s_umount ordering established by bdev_{freeze,thaw}() and can thus ABBA against a concurrent block-layer freeze even when the recursive path isn't hit. Fix this by not holding s_umount around the bdev_thaw() loop at all. Pin the superblock with an active reference instead as filesystems_freeze_callback() does. The active reference keeps the superblock from being shut down and so ->s_bdev stays valid without holding s_umount. The block-layer-held freeze is dropped by fs_bdev_thaw() with FREEZE_MAY_NEST \| FREEZE_HOLDER_USERSPACE exactly as a regular unfreeze would and thaw_super_locked() handles filesystem-level freezes as before. The emergency thaw path has deadlocked like this in one form or another for a long long time but the current exclusively-held shape dates back to commit [1] where thaw_bdev() already ended in thaw_super() with s_umount held by do_thaw_all_callback(). Fixes: 08fdc8a0138a ("buffer.c: call thaw_super during emergency thaw") [1] Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260723-work-super-emergency_thaw-v1-1-7c315c600245@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
8 days	Merge tag 'v7.2-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb server fixes from Steve French: "This contains eight ksmbd fixes covering POSIX ACL handling, SMB signing enforcement, DACL parsing and construction hardening, session lifetime handling, and validation of malformed transform and compressed SMB2 requests: - preserve inherited POSIX ACL mask when creating objects. - enforce the session signing requirement for plaintext SMB requests. - harden DACL/ACE processing against size overflows, incomplete ACE copies, and undersized SIDs. - defer teardown of a previous session until NTLM authentication succeeds. - reject undersized encryption-transform and decompressed SMB2 requests before they can reach normal SMB2 request processing" * tag 'v7.2-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: reject undersized decompressed SMB2 requests ksmbd: validate minimum PDU size for transform requests ksmbd: defer destroy_previous_session() until after NTLM authentication ksmbd: validate ACE size against SID sub-authorities ksmbd: restore DACL size on check_add_overflow() to avoid malformed ACL ksmbd: bound DACL dedup walk to copied ACEs ksmbd: enforce signing required by the session ksmbd: preserve VFS inherited POSIX ACL mask
8 days	Merge tag 'ceph-for-7.2-rc5' of https://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph fixes from Ilya Dryomov: "A bunch of assorted fixes with the majority being hardening against malformed input and invalid data scenarios that don't happen in real deployments but can be utilized to trigger use-after-free and similar issues, some error path leak fixups and two patches from Max to avoid a potential hang in __ceph_get_caps() and unintended nesting of current->journal_info while handling replies from the MDS. All marked for stable" * tag 'ceph-for-7.2-rc5' of https://github.com/ceph/ceph-client: ceph: avoid fs reclaim while using current->journal_info ceph: add owner/capability checks for CEPH_IOC_SET_LAYOUT* ceph: fix hanging __ceph_get_caps() with stale mds_wanted rbd: Reset positive result codes to zero in object map update path libceph: bound pg_{temp,upmap,upmap_items} length to CEPH_PG_MAX_SIZE libceph: refresh auth->authorizer_buf{,_len} after authorizer update ceph: fix refcount leak in ceph_readdir() libceph: guard missing CRUSH type name lookup libceph: remove debugfs files before client teardown libceph: bound get_version reply decode to front len ceph: fix writeback_count leak in write_folio_nounlock() libceph: fix two unsafe bare decodes in decode_lockers() ceph: fix pre-auth out-of-bounds read on snaptrace in ceph_handle_caps() libceph: Reject monmaps advertising zero monitors libceph: reject zero bucket types in crush_decode libceph: Fix multiplication overflow in decode_new_up_state_weight()
8 days	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux	Linus Torvalds
	Pull fscrypt fixes from Eric Biggers: "A couple fixes for AI-detected bugs" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: Avoid dynamic allocation in fscrypt_get_devices() fscrypt: Add missing superblock check in find_or_insert_direct_key()
8 days	pidfs: make pidfs_ino_lock static	Mateusz Guzik
	Fixes: 87caaeef7995 ("pidfs: implement ino allocation without the pidmap lock") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202607231547.ehCQxi0L-lkp@intel.com/ Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260723160114.291515-1-mjguzik@gmail.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	Merge tag 'v7.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds
	Pull smb client fixes from Steve French: - Fix leak in cifs_close_deferred_file() - Fix resolving MacOS symlinks - Fix stale file size in readdir - Update git branches in MAINTAINERS file - Fix bounds check in cifs_filldir - Fix checks in parse_dfs_referrals() - Fix DFS referral checks for malformed packet * tag 'v7.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix cifsFileInfo leak on kmalloc failure in deferred close drain paths cifs: prevent readdir from changing file size due to stale directory metadata smb: client: handle STATUS_STOPPED_ON_SYMLINK responses without a symlink target Add missing git branch info for cifs and ksmbd to MAINTAINERS file smb: client: bound dirent name against end of SMB response in cifs_filldir smb: client: validate DFS referral PathConsumed
9 days	ceph: avoid fs reclaim while using current->journal_info	Max Kellermann
	handle_reply() stores a `ceph_mds_request` pointer in `current->journal_info` while filling the inode and dentry cache from an MDS reply. An allocation in this section can enter direct reclaim and prune dentries from another filesystem. If this dirties an ext4 inode, ext4 starts a JBD2 transaction. JBD2 interprets the Ceph request in `current->journal_info` as a journal handle and dereferences the request's `r_tid` as `h_transaction`, causing a kernel crash, e.g.: Unable to handle kernel paging request at virtual address 00000000077b4818 [...] Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: CPU: 6 UID: 0 PID: 2699135 Comm: kworker/6:3 Tainted: G W 6.18.38-i3 #1113 NONE [...] Workqueue: ceph-msgr ceph_con_workfn pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : jbd2__journal_start+0x2c/0x208 lr : __ext4_journal_start_sb+0x100/0x178 [...] Call trace: jbd2__journal_start+0x2c/0x208 (P) __ext4_journal_start_sb+0x100/0x178 ext4_dirty_inode+0x3c/0x90 __mark_inode_dirty+0x58/0x400 iput.part.0+0x2b0/0x370 iput+0x18/0x30 dentry_unlink_inode+0xc0/0x158 __dentry_kill+0x80/0x250 shrink_dentry_list+0x90/0x130 prune_dcache_sb+0x60/0x98 super_cache_scan+0xe8/0x190 do_shrink_slab+0x174/0x388 shrink_slab+0xd8/0x4c0 shrink_node+0x31c/0x908 do_try_to_free_pages+0xd0/0x508 try_to_free_pages+0x11c/0x238 __alloc_frozen_pages_noprof+0x4d0/0xdd0 __folio_alloc_noprof+0x18/0x70 __filemap_get_folio+0x248/0x440 ceph_readdir_prepopulate+0x570/0x9e8 mds_dispatch+0x1424/0x1ba0 ceph_con_process_message+0x74/0xa0 ceph_con_v1_try_read+0x3a0/0x1510 ceph_con_workfn+0x260/0x460 Enter a scoped NOFS allocation context and leave it after clearing `journal_info`. This prevents filesystem reclaim from recursing into another filesystem while the field contains Ceph-private data. Cc: stable@vger.kernel.org Fixes: 315f24088048 ("ceph: fix security xattr deadlock") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Reviewed-by: Xiubo Li <xiubo.li@clyso.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	ceph: add owner/capability checks for CEPH_IOC_SET_LAYOUT*	Max Kellermann
	These permission checks were already missing in the initial impementation of these ioctls. This Ceph allows any user who owns a file descriptor to manipulate the layout of any file, even if they don't have write permissions. It might be a good idea to guard other ioctls with permission checks as well or even disallow regular users (even if they own the file) to manipulate layout settings completely, as this may be abused to DoS the Ceph servers, but right now, I find it most urgent to have setter checks at all. Cc: stable@vger.kernel.org Fixes: 8f4e91dee2a2 ("ceph: ioctls") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Xiubo Li <xiubo.li@clyso.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	ceph: fix hanging __ceph_get_caps() with stale mds_wanted	Max Kellermann
	A reader can hang forever in __ceph_get_caps() when the client no longer holds `FILE_RD`, but local cap state still says that the capability is already wanted (via `mds_wanted`). One way to trigger this is through MDS cap revocation. If another client performs a conflicting operation, the MDS can revoke `FILE_RD` from the reader; the next read then has to reacquire `FILE_RD`. If the cap update that should request `FILE_RD` never reaches the MDS after `cap->mds_wanted` was raised, the reader is left holding only non-file caps while local `mds_wanted` still includes the file read caps. In that state, try_get_cap_refs() sees `need <= mds_wanted` and returns 0, so __ceph_get_caps() just waits on `i_cap_wq`. If the cap update that was supposed to request `FILE_RD never reaches the MDS after `cap->mds_wanted was` raised, no further request is sent and the waiter can sleep indefinitely until unrelated cap traffic happens to wake it up. The ordering issue is that `cap->mds_wanted` is updated in __prep_cap() before the `CEPH_MSG_CLIENT_CAPS message` is actually queued for send. That makes one field serve two different meanings at once: what this client wants, and what the client believes the MDS already knows it wants. A proper fix would be to split those states and track whether a cap update is actually in flight or has been observed by the MDS. However, simply moving the `cap->mds_wanted assignment` later would not be sufficient: queueing the message in the messenger does not guarantee that the MDS processed that specific wanted set, and reconnect or message loss can still invalidate that assumption. Fixing that properly would require a larger rework of the cap state machine. To allow simpler backports to stable kernels, this patch implements a simpler workaround: - stop waiting forever in __ceph_get_caps(); after a bounded wait, fall back to the renew path - make ceph_renew_caps() issue a synchronous `OPEN` request whenever the inode still does not actually hold the wanted caps, instead of only calling ceph_check_caps() The extra issued-vs-wanted check in ceph_renew_caps() is necessary because the previous test only checked whether the inode still had any real caps at all. That is not enough after revocation: the client can still hold something like `pLs` and yet be missing `FILE_RD` completely. In that case, falling back to ceph_check_caps() is not sufficient, because it still trusts `cap->mds_wanted` and may resend nothing. By requiring `(issued & wanted) == wanted` before taking the asynchronous path, the code only uses ceph_check_caps() when the `wanted caps` are already actually issued. Otherwise, it sends the synchronous `OPEN` renew. This preserves the existing asynchronous fast path when the wanted caps are already issued, avoids changing cap-state semantics, and fixes the hang by guaranteeing that a stalled waiter eventually retries through a path that does not rely on the stale `mds_wanted` state. [ idryomov: move CEPH_GET_CAPS_WAIT_TIMEOUT from libceph.h to mds_client.h, formatting ] Cc: stable@vger.kernel.org Fixes: 0a454bdd501a ("ceph: reorganize __send_cap for less spinlock abuse") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	ceph: fix refcount leak in ceph_readdir()	WenTao Liang
	The ceph_readdir() function allocates a ceph_mds_request via ceph_mdsc_create_request() and stores it in dfi->last_readdir. In the directory entry processing loop, if the entry's offset is less than ctx->pos or if the inode pointer is unexpectedly NULL, the function returns -EIO without releasing the reference held by dfi->last_readdir, causing a refcount leak. Fix this by adding ceph_mdsc_put_request(dfi->last_readdir) before returning on these error paths. Also set dfi->last_readdir to NULL for safety, matching the cleanup done at the normal exit. Cc: stable@vger.kernel.org Fixes: af9ffa6df7e3 ("ceph: add support to readdir for encrypted names") Signed-off-by: WenTao Liang <vulab@iscas.ac.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	ceph: fix writeback_count leak in write_folio_nounlock()	Wentao Liang
	write_folio_nounlock() increments fsc->writeback_count to track in-flight writeback operations. On several error paths where the function returns early (folio lookup failure, snapshot context allocation failure, and writepages submission failure), the function returns without calling atomic_long_dec_return() to decrement the counter. Each leaked increment keeps the counter above zero, which can prevent the filesystem from cleanly unmounting or suspending writes. Add atomic_long_dec_return() calls on all error paths that currently return without decrementing the counter. Cc: stable@vger.kernel.org Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	ceph: fix pre-auth out-of-bounds read on snaptrace in ceph_handle_caps()	Bryam Vargas
	ceph_handle_caps() reads snap_trace_len from the wire-format ceph_mds_caps header and uses it unconditionally to build a fake end pointer (snaptrace + snaptrace_len) that is later handed to ceph_update_snap_trace() in the CEPH_CAP_OP_IMPORT case: snaptrace = h + 1; snaptrace_len = le32_to_cpu(h->snap_trace_len); p = snaptrace + snaptrace_len; ... case CEPH_CAP_OP_IMPORT: if (snaptrace_len) { ... if (ceph_update_snap_trace(mdsc, snaptrace, snaptrace + snaptrace_len, false, &realm)) { ... } ceph_update_snap_trace() then decodes a struct ceph_mds_snap_realm from snaptrace using ceph_decode_need(&p, e, sizeof(ri), bad) with the attacker-supplied fake end e == snaptrace + snaptrace_len. With snaptrace_len == 0xFFFFFFFF the bound check is trivially satisfied, ri = p reads sizeof(struct ceph_mds_snap_realm) past the legitimate msg->front buffer, and ri->num_snaps / ri->num_prior_parent_snaps then drive further out-of-bounds reads of the encoded snap arrays. The eleven msg_version >= 2 .. msg_version >= 12 decoder blocks above the op switch each catch this OOB through their ceph_decode__safe() / ceph_decode_need() helpers, but they sit behind a hdr.version-gated if, so a malicious or compromised MDS that sets msg->hdr.version = 1 reaches the IMPORT path with no version-gated decoder having validated snap_trace_len. The shape has been present since ceph_handle_caps() was introduced. Validate snap_trace_len against the message front buffer before consuming it, using the canonical ceph_decode_need() / ceph_has_room() helper. The helper bounds the length with subtraction (n <= end - p, guarded by end >= p) rather than pointer addition, so it is wrap-safe for the attacker-controlled u32 length on 32-bit builds where p + snap_trace_len could overflow the address space. This matches the rest of the ceph decode path (e.g. the pool_ns_len check a few lines below), and the existing goto bad cleanup already covers this exit path. Cc: stable@vger.kernel.org Fixes: a8599bd821d0 ("ceph: capability management") Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 days	eventpoll: pin files while checking reverse paths	Guidong Han
	Commit 319c15174757 ("epoll: take epitem list out of struct file") intentionally removed temporary file references from the reverse path check list. At the time, both epitems and their files were freed after an RCU grace period, so unlist_file() could obtain file->f_lock through an epitem while clear_tfile_check_list() held rcu_read_lock(). Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") made struct file SLAB_TYPESAFE_BY_RCU and removed its RCU-delayed freeing. RCU still protects the epitem, but no longer keeps the referenced file from being freed and reused. A concurrent close can therefore make unlist_file() lock or unlock f_lock in a recycled file object. This violates the documented SLAB_TYPESAFE_BY_RCU rule requiring a reference before acquiring an object's lock. The race was reproduced, causing a wild unlock of f_lock in a recycled file and breaking its mutual exclusion. Add ->file to epitems_head to remember the pinned file independently of ->epitems. A concurrent EPOLL_CTL_DEL can empty ->epitems before the head is unlisted, leaving no epi->ffd.file from which to drop the reference. In list_file(), acquire the reference before adding the head to the check list. The caller either owns a reference or holds the ep->mtx for the epitem leading to the file. In the latter case, file_ref_get() can fail after the last reference is dropped, but eventpoll_release_file() must acquire the same mutex before the file can be freed. The dying leaf can be skipped because removing links cannot increase the reverse path count. In unlist_file(), epnested_mutex excludes another list_file() or unlist_file(), while head->next prevents a concurrent EPOLL_CTL_DEL from freeing the head. Save head->file locally, clear it with head->next under f_lock, and drop the reference after the RCU-protected operation. Christian Brauner <brauner@kernel.org> quotes: > SLAB_TYPESAFE_BY_RCU allows a slab slot to be reused while an RCU reader > still holds its old address. Once that address contains a new live > struct file, KASAN sees valid, unpoisoned memory and cannot distinguish > the stale object identity. CONFIG_DEBUG_SPINLOCK exposes the failure > instead. > > The failing interleaving is: > > CPU0: nested EPOLL_CTL_ADD CPU1: close/open churn > ------------------------------------ --------------------------------- > p = hlist_first_rcu(&head->epitems) > epi = container_of(p, ...) > close(victim) > __fput() > eventpoll_release_file() > file_free(victim) > // the slot is free; f_lock remains > spin_lock(&epi->ffd.file->f_lock) > open() reuses the slot as new_file > spin_lock_init(&new_file->f_lock) > spin_unlock(&epi->ffd.file->f_lock) // wild unlock of new_file's lock > > CONFIG_DEBUG_SPINLOCK reports: > > BUG: spinlock already unlocked on CPU#0, poc_unlist/150 > lock: 0xffff8880067fb200, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 > CPU: 0 UID: 1000 PID: 150 Comm: poc_unlist Not tainted 7.2.0-rc3-dirty #22 PREEMPTLAZY > Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x64/0x80 > do_raw_spin_unlock+0x75/0xb0 > _raw_spin_unlock+0xe/0x30 > clear_tfile_check_list+0x88/0xe0 > do_epoll_ctl_file+0x519/0xcf0 > ? __pfx_ep_ptable_queue_proc+0x10/0x10 > do_epoll_ctl+0x8f/0x100 > __x64_sys_epoll_ctl+0x6f/0xa0 > do_syscall_64+0xdc/0x520 > ? srso_alias_return_thunk+0x5/0xfbef5 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x42034e > Code: 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 e9 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007a657ff3c198 EFLAGS: 00000202 ORIG_RAX: 00000000000000e9 > RAX: ffffffffffffffda RBX: 00007a657ff3ccdc RCX: 000000000042034e > RDX: 0000000000000003 RSI: 0000000000000001 RDI: 0000000000000004 > RBP: 00007a657ff3c2f0 R08: 0000000000000000 R09: 00007a657ff3c6c0 > R10: 00007a657ff3c1a4 R11: 0000000000000202 R12: 00007a657ff3c6c0 > R13: ffffffffffffffb8 R14: 000000000000000d R15: 00007fffb7de0210 > </TASK> > ------------[ cut here ]------------ > > unlist_file() does not appear as a separate frame because it was inlined > into clear_tfile_check_list(). This report was obtained with mdelay() > instrumentation immediately before spin_lock() and spin_unlock() in > unlist_file() to widen the two race windows. > > More importantly, this is a wild unlock. The stale unlock can target > f_lock of a different live file and invalidate mutual exclusion for > state protected by that lock. Turning this into a reliable exploit > would require precise scheduling and same-slot reuse and is likely > difficult, but the primitive is potentially exploitable. Reported-by: Qi Tang <tpluszz77@gmail.com> Reported-by: Junxi Qian <qjx1298677004@gmail.com> Fixes: 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") Cc: stable@vger.kernel.org Signed-off-by: Guidong Han <2045gemini@gmail.com> Link: https://patch.msgid.link/20260718104406.27897-1-2045gemini@gmail.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	fs: push nr_cached_objects memcg gating into individual filesystems	Usama Arif
	Commit 0baad6f9b997 ("fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink") added a check in fs/super.c that skipped every ->nr_cached_objects() hook whenever the shrinker was invoked for a non-root memcg, on the assumption that none of them honour sc->memcg. That assumption is wrong for XFS, whose inode-reclaim hook is intentionally driven from per-memcg contexts to free memcg-charged slab. Encoding a blanket "never memcg-aware" policy in fs/super.c short-circuits that path. Push the check down into the callbacks whose counters really are irrelevant to per-memcg reclaim - btrfs_nr_cached_objects() and shmem_unused_huge_count() - and drop the fs/super.c gate. Each filesystem can now lift the restriction independently if its counter later grows memcg awareness, without touching fs/super.c. Introduce mem_cgroup_shrink_is_root() in <linux/memcontrol.h> so the callbacks don't open-code "sc->memcg is NULL or root". Fixes: 0baad6f9b997 ("fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink") Acked-by: Qi Zheng <qi.zheng@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Usama Arif <usama.arif@linux.dev> Link: https://patch.msgid.link/20260715103516.2410175-1-usama.arif@linux.dev Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	afs: Fix afs_edit_dir_remove() to get, not find, block 0	David Howells
	Fix afs_edit_dir_remove() to use afs_dir_get_block() to get block 0 rather than afs_dir_find_block() as the latter caches the found block in the afs_dir_iter and may[] switch out the page it's on if another afs_dir_find_block() is done. This parallels what afs_edit_dir_add() does. [] There's more than one block per page. Fixes: a5b5beebcf96 ("afs: Use the contained hashtable to search a directory") Closes: https://sashiko.dev/#/patchset/20260706153408.1231650-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/2380759.1783956175@warthog.procyon.org.uk cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: prevent ioend merge when io_private differs	Zhang Yi
	Different io_private values indicate distinct completion contexts that must not be merged together, as this could leak or corrupt the private data associated with each ioend. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260713074206.1768006-1-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	Merge patch series "iomap: trivial fixes for ext4 conversion"	Christian Brauner
	Zhang Yi <yi.zhang@huaweicloud.com> says: This patch series contains a few trivial iomap-related fixes in preparation for converting ext4 buffered I/O to use iomap. The first three patches are taken from my ext4 conversion series [1], as suggested by Christoph. The fourth patch fixes a bug originally reported by Sashiko during review of my series; although unrelated to the ext4 conversion, it is worth fixing on its own. Please see the following patches for detail. The fifth patch add comments for ifs_clear/set_range_dirty(), and the last patch avoids merging ioends that have different private data. [1] https://lore.kernel.org/linux-ext4/20260511072344.191271-1-yi.zhang@huaweicloud.com/ * patches from https://patch.msgid.link/20260714082325.325163-1-yi.zhang@huaweicloud.com: iomap: add comments for ifs_clear/set_range_dirty() iomap: fix out-of-bounds bitmap_set() with zero-length range iomap: fix incorrect did_zero setting in iomap_zero_iter() iomap: support invalidating partial folios iomap: correct the range of a partial dirty clear Link: https://patch.msgid.link/20260714082325.325163-1-yi.zhang@huaweicloud.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: add comments for ifs_clear/set_range_dirty()	Zhang Yi
	The range alignment strategy differs between ifs_clear_range_dirty() and ifs_set_range_dirty(). The former rounds inwards to clear only fully-covered blocks, while the latter rounds outwards to mark any partially-touched block as dirty. Add comments to document this asymmetry in block range calculation. Suggested-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260714082325.325163-6-yi.zhang@huaweicloud.com Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: fix out-of-bounds bitmap_set() with zero-length range	Zhang Yi
	ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the unsigned subtraction underflows to SIZE_MAX, producing a huge last_blk and nr_blks value that causes bitmap_set() to write far beyond the ifs->state allocation. Regarding ifs_set_range_uptodate(), it is temporarily safe because len cannot be passed in as 0. However, for ifs_set_range_dirty() this is reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() returns 0 (e.g. user buffer fault) and the folio is already uptodate, the guard at the top of __iomap_write_end() does not trigger because !folio_test_uptodate() is false, and iomap_set_range_dirty() is called with copied == 0. Add a !len guard to both functions before the computation, so that a zero-length range is a no-op. Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260714082325.325163-5-yi.zhang@huaweicloud.com Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: fix incorrect did_zero setting in iomap_zero_iter()	Zhang Yi
	The did_zero output parameter was unconditionally set after the loop, which is incorrect. It should only be set when the zeroing operation actually completes, not when IOMAP_F_STALE is set or when IOMAP_F_FOLIO_BATCH is set but !folio causes the loop to break early, or when iomap_iter_advance() returns an error. This causes did_zero to be incorrectly set when zeroing a clean unwritten extent because the loop exits early without actually zeroing any data. Fix it by using a local variable to track whether any folio was actually zeroed, and only set did_zero after the loop if zeroing happened. Fixes: 98eb8d95025b ("iomap: set did_zero to true when zeroing successfully") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260714082325.325163-4-yi.zhang@huaweicloud.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: support invalidating partial folios	Zhang Yi
	Current iomap_invalidate_folio() can only invalidate an entire folio. If we truncate a partial folio on a filesystem where the block size is smaller than the folio size, it will leave behind dirty bits for the truncated or punched blocks. During the write-back process, it will attempt to map the invalid hole range. Fortunately, this has not caused any real problems so far because the ->writeback_range() function corrects the length. However, the implementation of FALLOC_FL_ZERO_RANGE in ext4 depends on the support for invalidating partial folios. When ext4 partially zeroes out a dirty and unwritten folio, it does not perform a flush first like XFS. Therefore, if the dirty bits of the corresponding area cannot be cleared, the zeroed area after writeback remains in the written state rather than reverting to the unwritten state. Fix this by supporting invalidation of partial folios. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260714082325.325163-3-yi.zhang@huaweicloud.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	iomap: correct the range of a partial dirty clear	Zhang Yi
	The block range calculation in ifs_clear_range_dirty() is incorrect when partially clearing a range in a folio. We cannot clear the dirty bit of the first block or the last block if the start or end offset is not blocksize-aligned. This has not yet caused any issues since we always clear a whole folio in iomap_writeback_folio(). Fix this by rounding up the first block to blocksize alignment, and calculate the last block by rounding down (using truncation). Correct the nr_blks calculation accordingly. Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260714082325.325163-2-yi.zhang@huaweicloud.com Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
9 days	fs/super: fix emergency thaw double-unlock of s_umount	Chen Changcheng
	do_thaw_all() iterates over all superblocks via __iterate_supers() with SUPER_ITER_EXCL, which acquires s_umount exclusively before calling the callback and releases it afterwards. However, the callback do_thaw_all_callback() calls thaw_super_locked() which unconditionally releases s_umount on every code path. This results in a second unlock attempt in __iterate_supers() that corrupts the rwsem state, triggering a DEBUG_RWSEMS warning: [ 182.601148] sysrq: Emergency Thaw of all frozen filesystems [ 182.601865] ------------[ cut here ]------------ [ 182.602375] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magic = 0xffff99b1011e5870, owner = 0x0, curr 0xffff99b101b06c80, list not empty [ 182.603817] WARNING: kernel/locking/rwsem.c:1412 at up_write+0xa3/0x170, CPU#2: kworker/2:1/53 [ 182.604578] Modules linked in: [ 182.604864] CPU: 2 UID: 0 PID: 53 Comm: kworker/2:1 Not tainted 7.2.0-rc4-00001-gbd3bd93ea98a-dirty #4 PREEMPT(lazy) [ 182.605711] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1kylin1 04/01/2014 [ 182.606417] Workqueue: events do_thaw_all [ 182.606750] RIP: 0010:up_write+0xaf/0x170 [ 182.607076] Code: 19 3a 92 48 0f 44 c2 48 8b 55 08 48 8b 55 00 4c 8b 45 08 48 8b 55 00 48 8d 3d ad 91 e0 01 48 8b 4d 20 50 48 c7 c6 f0 8c 26 92 <67> 48 0f b9 3a e8 d7 93 4e 00 58 eb 81 48 83 7f 18 00 48 c7 c2 8d [ 182.608563] RSP: 0018:ffffb670001d7e08 EFLAGS: 00010246 [ 182.609007] RAX: ffffffff92349e8d RBX: 0000000000000000 RCX: ffff99b1011e5870 [ 182.609595] RDX: 0000000000000000 RSI: ffffffff92268cf0 RDI: ffffffff92914d10 [ 182.610283] RBP: ffff99b1011e5870 R08: 0000000000000000 R09: ffff99b101b06c80 [ 182.610847] R10: ffff99b10139a808 R11: fefefefefefefeff R12: 0000000000000000 [ 182.611414] R13: ffffffff90cf74d0 R14: 0000000000000000 R15: ffff99b1011e5800 [ 182.612009] FS: 0000000000000000(0000) GS:ffff99b1eaaee000(0000) knlGS:0000000000000000 [ 182.612670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 182.613146] CR2: 00000000005c631c CR3: 00000000013ee000 CR4: 00000000000006f0 [ 182.613722] Call Trace: [ 182.613946] <TASK> [ 182.614130] __iterate_supers+0x128/0x150 [ 182.614463] do_thaw_all+0x1b/0x30 [ 182.614759] process_scheduled_works+0xbb/0x3f0 [ 182.615150] ? __pfx_worker_thread+0x10/0x10 [ 182.615499] worker_thread+0x129/0x270 [ 182.615816] ? __pfx_worker_thread+0x10/0x10 [ 182.616201] kthread+0xe2/0x120 [ 182.616469] ? __pfx_kthread+0x10/0x10 [ 182.616792] ret_from_fork+0x15b/0x240 [ 182.617115] ? __pfx_kthread+0x10/0x10 [ 182.617426] ret_from_fork_asm+0x1a/0x30 [ 182.617761] </TASK> [ 182.617968] ---[ end trace 0000000000000000 ]--- [ 182.618412] Emergency Thaw complete Fix this by switching to SUPER_ITER_UNLOCKED and acquiring s_umount in the callback via super_lock_excl() before calling thaw_super_locked(). This matches the locking pattern expected by thaw_super_locked() and eliminates the double unlock. While at it, remove the dead 'return;' at the end of do_thaw_all_callback(). Fixes: 2992476528ae ("super: use a common iterator (Part 1)") Cc: stable@vger.kernel.org Signed-off-by: Chen Changcheng <chenchangcheng@kylinos.cn> Link: https://patch.msgid.link/20260721064140.152305-1-chenchangcheng@kylinos.cn Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days	ksmbd: reject undersized decompressed SMB2 requests	Namjae Jeon
	ksmbd_decompress_request() bounds the decompressed size only against the maximum request size. A compression transform can therefore produce a buffer smaller than an SMB2 PDU and install it as conn->request_buf. The receive path subsequently calls ksmbd_smb_request(), which reads the protocol ID before the normal SMB2 minimum-size check. If the decompressed output is too short, that read can access beyond the request allocation. Require the decompressed output to contain at least a complete minimum SMB2 PDU before allocating and installing the replacement request buffer. Fixes: a08de24c2b85 ("ksmbd: negotiate and decode SMB2 compression") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
10 days	ksmbd: validate minimum PDU size for transform requests	Namjae Jeon
	The receive path applies the minimum SMB2 PDU size check only when ProtocolId is SMB2_PROTO_NUMBER. A packet carrying SMB2_TRANSFORM_PROTO_NUM bypasses the check even when the negotiated dialect does not provide transform handling. On an SMB 2.1 connection, a short transform packet therefore reaches init_smb2_rsp_hdr(), which interprets the request as a full SMB2 header and reads beyond the request allocation. The copied fields can then be returned to the unauthenticated client. Compression transforms are converted to ordinary SMB2 messages before protocol validation. After that conversion, validate ordinary SMB2 requests against SMB2_MIN_SUPPORTED_PDU_SIZE and require encryption transform requests to contain both a transform header and an SMB2 header. This rejects truncated requests before work allocation. Fixes: 368ba06881c3 ("ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-31063 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
10 days	ksmbd: defer destroy_previous_session() until after NTLM authentication	James Montgomery
	In ntlm_authenticate(), destroy_previous_session() is called using a user pointer resolved from the client-supplied NTLM blob username field before the NTLMv2 response is validated. An authenticated attacker can set the NTLM blob username to match a victim account and set PreviousSessionId to the victim's session ID; destroy_previous_session() destroys the victim's session while ksmbd_decode_ntlmssp_auth_blob() subsequently rejects the request with -EPERM. Move destroy_previous_session() and the prev_id assignment to after ksmbd_decode_ntlmssp_auth_blob() returns success and use sess->user rather than the pre-authentication lookup result. This matches the ordering already used by krb5_authenticate(), where destroy_previous_session() is called only after ksmbd_krb5_authenticate() returns success. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-cifs/20260702155449.3639773-1-james_montgomery@disroot.org/ Signed-off-by: James Montgomery <james_montgomery@disroot.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
10 days	ksmbd: validate ACE size against SID sub-authorities	Namjae Jeon
	set_ntacl_dacl() validates sid.num_subauth before copying an ACE, but does not verify that the declared ACE size contains all sub-authorities described by that field. An undersized ACE can therefore be copied and later make the POSIX ACL deduplication walk inspect data beyond the copied ACE boundary. The existing initial bound check is also too small. It only ensures that the ACE size field is accessible before set_ntacl_dacl() reads sid.num_subauth farther into the input buffer. Require enough input for the fixed SID header before accessing num_subauth, reject ACEs smaller than that header, and skip ACEs whose declared size cannot contain the complete SID. This makes the validation consistent with the other ACE walk paths. Reported-by: LocalHost <localhost.detect@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
10 days	ksmbd: restore DACL size on check_add_overflow() to avoid malformed ACL	Wentao Guan
	check_add_overflow() unconditionally writes the truncated sum into d even on overflow, per its contract in include/linux/overflow.h. The four check_add_overflow() guards in set_posix_acl_entries_dacl() and set_ntacl_dacl() break out of the ACE-building loops on overflow, but the truncated size is then consumed downstream at the end of set_ntacl_dacl(): pndacl->size = cpu_to_le16(le16_to_cpu(pndacl->size) + size); This produces an on-wire NT ACL whose pndacl->size under-reports the bytes actually written by the preceding fill_ace_for_sid()/memcpy() calls, yielding a malformed ACL that can trigger out-of-bounds reads when re-parsed by clients or ksmbd itself. Restore size to its pre-addition value on each overflow branch (via `size -= ace_sz` / `size -= nt_ace_size`) so that after the break, *size once again holds the cumulative size of the successfully-written ACEs. The committed ACL is then truncated-but-self-consistent rather than malformed. The ksmbd DACL builders are the only check_add_overflow() sites found where an overflow path breaks out of a loop and the destination value is consumed afterward. The other nearby break-style cases either return -EINVAL on overflow (transport_ipc.c) or break without consuming the overflowed destination value afterward (buildid.c). Fixes: 299f962c0b02 ("ksmbd: use check_add_overflow() to prevent u16 DACL size overflow") Assisted-by: atomcode:glm-5.2 Assisted-by: Codex:gpt-5.5 Cc: stable@vger.kernel.org Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
10 days	ksmbd: bound DACL dedup walk to copied ACEs	Namjae Jeon
	set_ntacl_dacl() can stop copying ACEs before consuming the full input DACL when size accounting overflows. When that happens, num_aces reflects only the ACEs that were actually copied into the output DACL, but set_posix_acl_entries_dacl() still receives nt_num_aces and uses it to walk the existing ACE array during dedup. That makes the dedup walk scan past the copied ACE array and inspect buffer tail that does not contain valid ACEs. Split the two meanings currently carried by the NT ACE count. Pass the number of copied NT ACEs to bound the dedup walk, and preserve the original "input DACL had NT ACEs" state separately for the Everyone/default ACL fallback. This keeps the dedup walk aligned with the ACEs that are actually present in the rebuilt DACL. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>