linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
47 hours	Merge tag 'block-7.2-20260731' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - A set of fixes for s390/dasd, via Stefan - Fix for a missing stop of the timeout timer, if a disk has never been added - Clear kernel owned fields on ublk setup by default * tag 'block-7.2-20260731' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: s390/dasd: Fix undersized format-check buffer s390/dasd: Fix potential NULL pointer dereference s390/dasd: Fix path verification interrupted by concurrent dasd_sleep_on_immediatly block: stop the timeout timer when releasing a never added disk ublk: reset kernel-owned dev_info fields in ublk_ctrl_add_dev()
6 days	ublk: reset kernel-owned dev_info fields in ublk_ctrl_add_dev()	Ming Lei
	ublk_ctrl_add_dev() memcpy()s the userspace ublksrv_ctrl_dev_info into ub->dev_info and then fixes up the fields the driver owns, but misses ->state and ->ublksrv_pid. A device added with ->state = UBLK_S_DEV_LIVE passes the "->state != UBLK_S_DEV_DEAD" test that ublk_stop_dev_unlocked() uses as its proxy for "a disk is attached", while ->ub_disk is still NULL, so DEL_DEV right after ADD_DEV oopses in del_gendisk(). UBLK_S_DEV_QUIESCED plus UBLK_F_USER_RECOVERY dies one step earlier, in ublk_force_abort_dev(). A poisoned ->state also gets START_USER_RECOVERY and the char device read/write path onto a device that was never started, and wedges START_DEV at -EEXIST. A poisoned ->ublksrv_pid just makes GET_DEV_INFO report an unrelated task as the ublk server. Reset both after the memcpy(), as ublk_detach_disk() does. Userspace only ever reads these back, so correcting them silently breaks nothing. ADD_DEV has copied ->state in unsanitized since ublk was merged, but back then it was harmless: the gendisk was allocated during ADD_DEV, and both teardown and the START_DEV -EEXIST check keyed off disk_live() rather than ->state. The oops became reachable once the disk allocation moved to START_DEV and those checks switched to ->state. Fixes: 6d9e6dfdf3b2 ("ublk: defer disk allocation") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://patch.msgid.link/20260726145025.1507383-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 days	Merge tag 'block-7.2-20260724' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Fix a ublk recovery hang, where END_USER_RECOVERY without a successful START_USER_RECOVERY could be satisfied by a stale completion latch - Fix a stack out-of-bounds read in the CDROMVOLCTRL ioctl - MAINTAINERS email address update for Roger Pau Monne * tag 'block-7.2-20260724' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: MAINTAINERS: update my email address cdrom: fix stack out-of-bounds read in CDROMVOLCTRL ublk: wait on ublk_dev_ready() instead of ub->completion
10 days	rbd: Reset positive result codes to zero in object map update path	Raphael Zimmer
	In a reply message to an RBD request, a positive result code indicates a data payload, which is not allowed for writes. While rbd_osd_req_callback() already resets a positive result code for writes to zero, rbd_object_map_callback() does not. This allows a corrupted reply to an object map update to trigger the rbd_assert(result < 0) in __rbd_obj_handle_request(). This happens, because rbd_object_map_callback() calls rbd_obj_handle_request() -> __rbd_obj_handle_request() and passes this positive result code. From __rbd_obj_handle_request(), rbd_obj_advance_write() is called, which leaves the positive result code unchanged and returns true. Therefore, the if(done && result) branch is executed in __rbd_obj_handle_request() and the assertion triggers. This patch fixes the issue by adjusting the logic in the rbd_object_map_callback() path. A positive result code for an object map update is now reset to zero (similar to rbd_osd_req_callback()), and the message is subsequently handled the same way as if the result code was zero from the beginning. Additionally, a WARN_ON_ONCE() is added for this case. Cc: stable@vger.kernel.org Fixes: 22e8bd51bb04 ("rbd: support for object-map and fast-diff") Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-07-19	Merge tag 'block-7.2-20260717' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Fixes for the dio bounce buffer helpers: correct the alignment of bounced dio read bios to avoid a double unpin, handle huge zero folios in bio_free_folios(), and don't warn on the larger-order folio attempts in the greedy allocation path. - Try a slab allocation in bio_alloc_bioset() before falling back to the mempool, restoring the previous behavior for non-sleeping allocations from a cache-enabled bioset. - Serialize elevator changes for the same queue using the writer lock. - Fix a race in blk_time_get_ns() where a task preempted between setting PF_BLOCK_TS and the cached-timestamp reload could return 0. - blk-cgroup fix for leaks and the online flag on a radix_tree_insert() failure in blkg_create(). - Free the copied pages when blk_rq_map_kern() fails after blk_rq_append_bio() rejects the bio. - Remove manually added partitions on loop device detach, fixing dead partition devices left behind and a subsequent LOOP_CONFIGURE -EBUSY - Bound the AIX partition lvd scan to the sector that was actually read. - Show the block operation in error injection rules (Jackie) * tag 'block-7.2-20260717' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: block: fix aligning of bounced dio read bios block: handle huge zero folios in bio_free_folios block: try slab allocation in bio_alloc_bioset() before mempool block: show operation in error injection rules block: serialize elevator changes for the same queue using a writer lock block: free copied pages when blk_rq_map_kern() fails block: do not warn when doing greedy allocation in folio_alloc_greedy() partitions: aix: bound the lvd scan to one sector blk-cgroup: fix leaks and online flag on radix_tree_insert failure loop: remove manually added partitions on detach block: fix race in blk_time_get_ns() returning 0
2026-07-19	ublk: wait on ublk_dev_ready() instead of ub->completion	Ming Lei
	ub->completion is only re-armed by a successful START_USER_RECOVERY. If the ublk server sends END_USER_RECOVERY without one - e.g. its START failed with -EBUSY and the error was ignored - the wait is satisfied by the stale completion of the previous recovery cycle, and the device is marked LIVE and the requeue list kicked while the FETCH stream is still running and ubq->canceling is still set. The kick redispatches a previously requeued request, __ublk_queue_rq_common() sees ->canceling and parks it again via __ublk_abort_rq(), and after the last FETCH clears ->canceling nothing ever kicks the requeue list again: the request is stranded there while holding its tag. If it is the flush machinery's flush_rq, every subsequent fsync piles up in uninterruptible sleep and teardown hangs on tag draining. This matches a report of a lost PREFLUSH with ext4 on top of ublk after daemon crash recovery. ub->completion is an edge-triggered latch used as a proxy for the level condition "every queue has fetched all I/O commands", which can regress (F_BATCH's UNPREP, daemon death) and whose re-arm can be skipped. Drop it and wait on the real condition instead: the new helper ublk_wait_dev_ready_and_lock() waits on ublk_dev_ready() via wait_var_event_interruptible(), woken from ublk_mark_io_ready(), then re-checks it under ub->mutex, waiting again on regression, and returns with the mutex held and readiness guaranteed. Readiness becomes true in the same ub->mutex critical section that clears the last queue's ->canceling, so END_USER_RECOVERY marks the device LIVE and kicks the requeue list strictly after ->canceling clears. The wait stays interruptible, so a server whose daemon died can still be signalled out. For ublk_ctrl_start_dev() this replaces the fail-fast -EINVAL on an F_BATCH ready->UNPREP regression with waiting until the device is ready again. Reported-by: George Salisbury <gsalisbury@apnic.net> Fixes: 728cbac5fe21 ("ublk: move device reset into ublk_ch_release()") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260719134540.120269-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-07-15	loop: remove manually added partitions on detach	Daan De Meyer
	Commit 267ec4d7223a ("loop: fix partition scan race between udev and loop_reread_partitions()") stopped disk_force_media_change() from setting GD_NEED_PART_SCAN because loop devices with LO_FLAGS_PARTSCAN rescan partitions explicitly. However, partitions can also be added manually with BLKPG while LO_FLAGS_PARTSCAN is clear. When such a loop device is detached, __loop_clr_fd() skips bdev_disk_changed(). Without GD_NEED_PART_SCAN, reopening the unbound device no longer performs the previous lazy cleanup, leaving dead partition devices behind. A subsequent LOOP_CONFIGURE can then fail its partition scan with -EBUSY, as seen in blktests loop/009 after loop/008. Call bdev_disk_changed() unconditionally during __loop_clr_fd(). The disk capacity is already zero and the release path holds open_mutex, so this drops all partitions without rescanning the detached backing file. The new blktests loop/013 case covers this sequence by adding a partition with BLKPG without LO_FLAGS_PARTSCAN, detaching the loop device, and checking that the partition is gone when the device is reopened. Fixes: 267ec4d7223a ("loop: fix partition scan race between udev and loop_reread_partitions()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202607150754.b660f5b9-lkp@intel.com Signed-off-by: Daan De Meyer <daan@amutable.com> Link: https://patch.msgid.link/20260715-b4-loop-partition-cleanup-v1-1-b9f59910cd1e@amutable.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-07-11	Merge tag 'block-7.2-20260710' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Limit blk_hctx_poll() to one jiffy. Prevents buggy drivers from spinning for too long, hence triggering a stalled RCU read section warning - Avoid a potential deadlock on zone revalidation failure, which could otherwise trigger a lockdep circular locking splat during a SCSI disk rescan - Remove a redundant GD_NEED_PART_SCAN set in add_disk_final() - Make writes to queue/wbt_lat_usec honor the WBT enable state - ublk fix to snapshot the batch commands before preparing IO, so that userspace can't change an already processed tag and trip the WARN_ON_ONCE() in the rollback path - xen-blkfront fix for a double completion of split requests on resume - drbd fix to reject data replies with an out-of-range payload size * tag 'block-7.2-20260710' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: block: remove redundant GD_NEED_PART_SCAN in add_disk_final() drbd: reject data replies with an out-of-range payload size xen-blkfront: fix double completion of split requests on resume ublk: snapshot batch commands before preparing I/O block: Make WBT latency writes honor enable state block: avoid potential deadlock on zone revalidation failure blk-mq: bound blk_hctx_poll() to one jiffy
2026-07-10	drbd: reject data replies with an out-of-range payload size	Michael Bommarito
	recv_dless_read() receives a P_DATA_REPLY from a peer into the bio of an outstanding read request. The peer-supplied payload length reaches it as the signed int data_size, and two peer-controlled inputs can make it negative. With a negotiated data-integrity-alg the digest length is subtracted first, so a reply whose payload is smaller than the digest underflows data_size. With no integrity algorithm (the default) data_size is assigned from the unsigned h95/h100 wire length and drbdd() never bounds it for a payload-carrying command, so a length above INT_MAX casts it negative; this path needs no non-default feature. The bio receive loop then computes expect = min_t(int, data_size, bv_len), which is negative, and drbd_recv_all_warn(mapped, expect) receives with a size_t of SIZE_MAX into the first mapped page. The sibling receive path read_in_block() is not affected: it uses an unsigned size and rejects it against DRBD_MAX_BIO_SIZE before receiving. Reject a data reply whose size is negative after the optional digest subtraction, covering both triggers. Impact: a malicious or man-in-the-middle DRBD peer copies attacker-chosen bytes past a bio page in the receiver, corrupting kernel memory. A node that reads from its peer (a diskless node, or read-balancing to the peer) is exposed in the default configuration; data-integrity-alg is not required. Fixes: b411b3637fa7 ("The DRBD driver") Cc: stable@vger.kernel.org Assisted-by: Codex:gpt-5-5-xhigh Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://patch.msgid.link/20260710022837.3738461-1-michael.bommarito@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-07-09	xen-blkfront: fix double completion of split requests on resume	Doruk Tan Ozturk
	When a block request is too large for a single ring entry and the backend does not support indirect descriptors, blkfront splits it across two ring requests. This only happens when the frontend runs on a 64K-page kernel (e.g. arm64): there, even a single-page request may not fit in one ring slot and must be split. blkif_ring_get_request() is called twice and both shadow slots (shadow[id] and shadow[extra_id]) point at the same struct request, linked through associated_id. blkif_completion() collapses the pair on the normal completion path, recycling the second slot and completing the request once. The suspend/resume walk in blkfront_resume() does not: it visits every shadow slot with ->request set and calls blk_mq_end_request() or re-queues ->request. For an in-flight split request it therefore processes the shared struct request twice on resume/migration -- a double completion. Skip the secondary slot of a split request in the resume walk so each logical request is processed exactly once. The secondary slot is the linked one (associated_id != NO_ASSOCIATED_ID) that carries no scatter-gather list (num_sg == 0); the first slot always keeps the sg list. The bug is only reachable on suspend/resume or live migration of such a guest, so it has no local reproducer. Fixes: 6cc568339047 ("xen/blkfront: Handle non-indirect grant with 64KB pages") Assisted-by: 0sec:claude-opus-4-8 Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://patch.msgid.link/20260709100853.7489-1-doruk@0sec.ai Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-07-03	Replace <linux/mod_devicetable.h> by more specific <linux/device-id/*.h> (c ↵	Uwe Kleine-König (The Capable Hub)
	files) Replace the #include of <linux/mod_devicetable.h> by the more specific <linux/device-id/*.h> where applicable. For most cases the include can be dropped completely, only a few drivers need one or two headers added. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/1a3f2007c5c5dcf555c09a4035ce3ae8ef1b6c49.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
2026-07-02	ublk: snapshot batch commands before preparing I/O	Yousef Alhouseen
	The batch prepare path rereads its userspace element array when rolling back a partially prepared batch. Userspace can change an already processed tag before the second read, causing rollback to reject the replacement tag and leave earlier I/O slots prepared. The WARN_ON_ONCE() in the rollback path then fires. Copy the bounded batch into kernel memory before changing any I/O state and use the same snapshot for preparation and rollback. Commit and fetch batches retain the existing chunked userspace walk. Fixes: b256795b3606 ("ublk: handle UBLK_U_IO_PREP_IO_CMDS") Reported-by: syzbot+1a67ee1aa79484801ec6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1a67ee1aa79484801ec6 Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260630211827.50475-1-alhouseenyousef@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-26	Merge tag 'ceph-for-7.2-rc1' of https://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph updates from Ilya Dryomov: "This adds support for manual client session reset in CephFS, allowing operators to get out of tricky livelock situations involving caps and file locks without evicting the problematic client instance on the MDS side or rebooting the client node both of which can be disruptive" * tag 'ceph-for-7.2-rc1' of https://github.com/ceph/ceph-client: ceph: add manual reset debugfs control and tracepoints ceph: add client reset state machine and session teardown ceph: add diagnostic timeout loop to wait_caps_flush() ceph: harden send_mds_reconnect and handle active-MDS peer reset ceph: use proper endian conversion for flock_len in reconnect ceph: convert inode flags to named bit positions and atomic bitops rbd: switch to dynamic root device
2026-06-25	Merge tag 'block-7.2-20260625' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - blk-cgroup locking rework and fixes: - fix a use-after-free in __blkcg_rstat_flush() - defer freeing policy data until after an RCU grace period - defer the blkcg css_put until the blkg is unlinked from the queue - unwind the queue_lock nesting under RCU / blkcg->lock across the lookup, create, associate and destroy paths - NVMe fixes via Keith: - Fix a crash and memory leak during invalid cdev teardown, and related cdev cleanups (Maurizio, John) - nvmet fixes: handle TCP_CLOSING in the tcp state_change handler, reject short AUTH_RECEIVE buffers, handle inline data with a nonzero offset in rdma, fix an sq refcount leak, and allocate ana_state with the port (Maurizio, Michael, Bryam, Wentao, Rosen) - nvme-fc fix to not cancel requests on an IO target before it is initialized (Mohamed) - nvme-apple fix to prevent shared tags across queues on Apple A11 (Nick) - Various smaller fixes and cleanups (John) - MD fixes via Yu Kuai: - raid1/raid10 fixes for writes_pending and barrier reference leaks on write and discard failures, plus REQ_NOWAIT handling fixes (Abd-Alrhman) - raid5 discard accounting and validation, and a batch of fixes for stripe batch races (Yu Kuai, Chen) - Protect raid1 head_position during read balancing (Chen) - block bio-integrity fixes: correct an error injection static key decrement, fix GFP flag confusion in bio_integrity_alloc_buf(), and handle REQ_OP_ZONE_APPEND in __bio_integrity_action() (Christoph) - Fixes for bio_iov_iter_bounce_write(): revert the iov_iter after a short copy, and respect the iov_iter nofault flag (Qu) - Invalidate the cached plug timestamp after a task switch, and clear PF_BLOCK_TS in copy_process() (Usama) - Fix the IORING_URING_CMD_REISSUE flags check in blkdev_uring_cmd() (Yitang) - Remove a redundant plug in __submit_bio() (Wen) - Don't warn when reclassifying a busy socket lock in nbd (Deepanshu) * tag 'block-7.2-20260625' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (45 commits) block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action block: fix GFP_ flags confusion in bio_integrity_alloc_buf block, bfq: don't grab queue_lock to initialize bfq mm/page_io: don't nest queue_lock under rcu in bio_associate_blkg_from_page() blk-cgroup: don't nest queue_lock under blkcg->lock in blkcg_destroy_blkgs() blk-cgroup: don't nest queue_lock under rcu in bio_associate_blkg() blk-cgroup: don't nest queue_lock under rcu in blkg_lookup_create() blk-cgroup: don't nest queue_lock under rcu in blkcg_print_blkgs() blk-cgroup: delay freeing policy data after rcu grace period blk-cgroup: protect iterating blkgs with blkcg->lock in blkcg_print_stat() md/raid5: avoid R5_Overlap races while breaking stripe batches md/raid5: use stripe state snapshot in break_stripe_batch_list() blk-cgroup: defer blkcg css_put until blkg is unlinked from queue blk-cgroup: fix UAF in __blkcg_rstat_flush() block, bfq: protect async queue reset with blkcg locks nbd: don't warn when reclassifying a busy socket lock block: fix incorrect error injection static key decrement md/raid5: let stripe batch bm_seq comparison wrap-safe md/raid1: protect head_position for read balance md/raid1: free r1_bio when REQ_NOWAIT is set and read would block on retry ...
2026-06-22	nbd: don't warn when reclassifying a busy socket lock	Deepanshu Kartikey
	nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is held at the point of reclassification. That assertion was copied from nvme-tcp, where the socket is created internally by the kernel (sock_create_kern()) and is never visible to user space, so the lock is guaranteed to be free. NBD is different: the socket is looked up from a user-supplied fd in nbd_get_socket(), and user space retains that fd. A concurrent syscall on the same socket (or softirq processing taking bh_lock_sock() on a connected TCP socket) can legitimately hold the lock at the instant NBD reclassifies it. sock_allow_reclassification() then returns false and the WARN_ON_ONCE() fires, which turns into a crash under panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT against socket activity on the same fd, as reported by syzbot. Hitting a held lock here is expected for an externally owned socket and is not a kernel bug, so skip reclassification silently instead of warning. Reclassification is a lockdep-only annotation, so skipping it in the rare racing case is harmless. Reported-by: syzbot+6b85d1e39a5b8ed9a954@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6b85d1e39a5b8ed9a954 Fixes: d532cddb6c60 ("nbd: Reclassify sockets to avoid lockdep circular dependency") Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260621235255.66015-1-kartikey406@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-22	rbd: switch to dynamic root device	Johan Hovold
	Driver core expects devices to be dynamically allocated and will, for example, complain loudly when no release function has been provided. Use root_device_register() to allocate and register the root device instead of open coding using a static device. Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-06-19	Merge tag 'mm-stable-2026-06-18-09-26' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "selftests/mm: clean up build output and verbosity" (Li Wang) Remove some noise from the MM selftests build - "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts) Speed up the freeing of a batch of 0-order pages by first scanning them for coalescing opportunities. This is applicable to vfree() and to the releasing of frozen pages - "mm/damon: introduce DAMOS failed region quota charge ratio" (SeongJae Park) Address a DAMOS usability issue: The DAMOS quota often exhausts prematurely because it charges for all memory attempted, causing slow and inconsistent performance when actions fail on unreclaimable memory. To fix this, a new feature lets users set a smaller, flexible quota charge ratio (via a numerator and denominator) for failed regions. Since failed actions cause less overhead, reducing their quota cost ensures more predictable and efficient DAMOS processing - "selftests/cgroup: improve zswap tests robustness and support large page sizes" (Li Wang) Fix various spurious failures and improves the overall robustness of the cgroup zswap selftests - "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga) Fix an issue in the mlock selftests on arm32 - "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao) Some maintenance work in the huge_memory code - "treewide: fixup gfp_t printks" (Brendan Jackman) Use the special vprintf() gfp_t conversion in various places - "mm: Fix vmemmap optimization accounting and initialization" (Muchun Song) Fix several bugs in the vmemmap optimization, mainly around incorrect page accounting and memmap initialization in the DAX and memory hotplug paths. It also fixes pageblock migratetype initialization and struct page initialization for ZONE_DEVICE compound pages - "mm/damon: repost non-hotfix reviewed patches in damon/next tree" A sprinkle of unrelated minor bugfixes for DAMON - "mm: remove page_mapped()" (David Hildenbrand) Remove this function from the tree, replacing it with folio_mapped() - "mm/damon: let DAMON be paused and resumed" (SeongJae Park) Allow DAMON to be paused and resumed without losing its current state - "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad Usama Anjum) Simplify and speed up kasan by removing its ineffective tagging of stacks and page tables - "mm/damon/reclaim,lru_sort: monitor all system rams by default" (SeongJae Park) Simplify deployment on diverse hardware like NUMA systems by updating DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the physical address range covering all System RAM areas by default, replacing the overly restrictive behavior that only targeted the single largest memory block to save on negligible overhead - "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae Park) Update some DAMON docs - "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin) Switch zone->lock handling over to using the guard() mechanisms - "mm/filemap: tighten mmap_miss hit accounting" (fujunjie) Fix a flaw where the mmap_miss counter over-credited page cache hits during fault-arounds and page-fault retries. This results in significant reduction of redundant synchronous mmap readahead I/O, drastically cutting down execution time and gigabytes read for sparse random or strided memory access workloads - "selftests/cgroup: Fix false positive failures in test_percpu_basic" (Li Wang) Fix a couple of false-positives in the cgroup kmem selftests - "mm/damon/reclaim: support monitoring intervals auto-tuning" (SeongJae Park) Add a new parameter to DAMON permitting DAMON_RECLAIM to automatically tune DAMON's sampling and aggregation intervals - "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park) Change DAMON_STAT to provide the pid of its kdamond - "mm/kmemleak: dedupe verbose scan output" (Breno Leitao) Remove large amounts of duplicated backtraces from the verbose-mode kmemleak output - "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David Hildenbrand) Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to removing it entirely in a later series - "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan) Prevent users from passing a non-power-of-2 value of `addr_unit', as this later results in undesirable behavior - "mm: document read_pages and simplify usage" (Frederick Mayle) - "tools/mm/page-types: Fix misc bugs" (Ye Liu) Fix three issues in tools/mm/page-types.c - "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman) Implement several cleanups in the page allocator and related code - "mm, swap: swap table phase IV: unify allocation" (Kairui Song) Unify the allocation and charging of anon and shmem swap in folios, provides better synchronization, consolidates the metadata management, hence dropping the static array and map, and improves performance - "mm/damon: introduce data attributes monitoring" (SeongJae Park( Extend DAMON to monitor general data attributes other than accesses - "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra) Implement the TODO in vrealloc() to unmap and free unused pages when shrinking across a page boundary - "mm/damon: documentation and comment fixes" (niecheng) - "remove mmap_action success, error hooks" (Lorenzo Stoakes) Eliminate custom hooks from mmap_action by removing the problematic success_hook which allowed drivers to improperly access uninitialized VMAs. It replaces the error_hook with a simple error-code field and updates the memory char driver accordingly - "mm/damon: minor improvements for code readability and tests" (SeongJae Park) - "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym Shcherba) - "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike Rapoport) - "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and others) Clean up and slightly improves MGLRU's reclaim loop and dirty writeback handling. Large performance improvements are measured - "use vma locks for proc/pid/{smaps\|numa_maps} reads" (Suren Baghdasaryan) Use per-vma locks when reading /proc/pid/smaps and numa_maps similar to reduce contention on central mmap_lock - "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()" (Ran Xiaokai) Some cleanup work in the THP code - "selftests/memfd: fix compilation warnings" (Konstantin Khorenko) Fix a few build glitches in the memfd selftest code. - "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel Butt) Resolve a 68% performance regression caused by NUMA-node cache thrashing around struct obj_stock_pcp by shrinking its existing fields and expanding it into a multi-slot array that caches up to five obj_cgroup pointers per CPU, allowing per-node variants of the same memcg to coexist within a single 64-byte cache line. - "zram: writeback fixes" (Sergey Senozhatsky) address a couple of unrelated zram writeback issues - "mm: switch THP shrinker to list_lru" (Johannes Weiner) Resolve NUMA-awareness issues and streamlines callsite interaction by refactoring and extending the list_lru API to completely replace the complex, open-coded deferred split queue for Transparent Huge Pages - "mm: improve large folio readahead for exec memory" (Usama Arif) Improve large-folio readahead on systems like 64K-page arm64 by preventing the mmap_miss check from permanently disabling target-oriented VM_EXEC readahead, and by generalizing the force_thp_readahead gate to support mappings with any usefully large maximum folio order under the cache cap. - "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau) Fix a bunch of minor issues in the userfaultfd/pagemap, all of which were flagged by Sashiko review of proposed new material - "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and vmemmap_check_pmd()" (Muchun Song) Provide generic versions of these two functions so the four arch-specific implementations can be removed. - "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device" (Youngjun Park) Address a uswsusp-vs-swapoff race and reduces the swap device reference taking/releasing frequency. - "mm/hmm: A fix and a selftest" (Dev Jain) * tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries fs/proc/task_mmu: do not warn on seeing non-migration pmd entry lib/test_hmm: check alloc_page_vma() return value and handle OOM mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX mm/swap: remove redundant swap device reference in alloc/free mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device mm/filemap: use folio_next_index() for start vmalloc: fix NULL pointer dereference in is_vm_area_hugepages() sparc/mm: drop vmemmap_check_pmd helper and use generic code loongarch/mm: drop vmemmap_check_pmd helper and use generic code riscv/mm: drop vmemmap_pmd helpers and use generic code arm64/mm: drop vmemmap_pmd helpers and use generic code mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd() rust: page: mark Page::nid as inline userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks userfaultfd: gate must_wait writability check on pte_present() mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole() fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry() fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race ...
2026-06-16	Merge tag 'for-7.2/block-20260615' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Per-controller admin and IO timeout sysfs attributes, and letting the block layer set request timeouts (Maurizio, Maximilian) - Multipath passthrough iostats, and PCI P2PDMA enablement for multipath devices (Keith, Kiran) - A new diag sysfs attribute group exporting per-controller counters (retries, multipath failover, error counters, requeue and failure counts, reset and reconnect events) (Nilay) - FDP configuration validation and bounds check fixes (liuxixin) - Various nvmet fixes, including a pre-auth out-of-bounds read in the Discovery Get Log Page handler, auth payload bounds validation, and tcp error-path leak fixes (Bryam, Tianchu, Geliang) - nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki, Eric) - Assorted other fixes and cleanups (John, Yao, Chao, Mateusz, Achkinazi, Wentao) - MD pull request via Yu Kuai: - raid1/raid10 fixes for a deadlock in the read error recovery path, error-path detection and bio accounting with cloned bios, and an nr_pending leak in the REQ_ATOMIC bad-block error path (Abd-Alrhman) - PCI P2PDMA propagation from member devices to the RAID device (Kiran) - dm-raid bio requeue fix, and various smaller fixes and cleanups (Benjamin, Chen, Li, Thorsten) - Enable Clang lock context analysis for the block layer, with the accompanying annotations across queue limits, the blk_holder_ops callbacks, crypto, cgroup, iocost, kyber and mq-deadline (Bart) - Block status code infrastructure work: a tagged status table, a str_to_blk_op() helper, a bio_endio_status() helper, and on top of that a new configurable block-layer error injection facility (Christoph) - DRBD netlink rework, replacing the genl_magic machinery with explicit netlink serialization and moving the DRBD UAPI headers to include/uapi/linux/ (Christoph Böhmwalder) - bvec improvements: a bvec_folio() helper and making the bvec_iter helpers proper inline functions (Willy, Christoph) - ublk cleanups and a canceling-flag fix for the disk-not-allocated case (Caleb, Ming) - Partition handling fixes: bound the AIX pp_count scan, fix an of_node refcount leak, and replace __get_free_page() with kmalloc() (Bryam, Wentao, Mike) - Convert numa_node to int in blk_mq_hw_ctx and ->init_request, and add WQ_PERCPU to the block workqueue users (Mateusz, Marco) - Block statistics and tracing: propagate in-flight to the whole disk on partition IO, export passthrough stats, and a new block_rq_tag_wait tracepoint (Tang, Keith, Aaron) - A round of removals, unexports and cleanups across bio, direct-io and the bvec helpers (Christoph) - Various driver fixes (mtip32xx use-after-free, rbd snap_count validation and strscpy conversion, nbd socket lockdep reclassify, virtio-blk zone report clamp, floppy) and a batch of MAINTAINERS email/list updates (Coly, Li, Yu, Christoph Böhmwalder) - Other little fixes and cleanups all over * tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (117 commits) MAINTAINERS: Update Coly Li's email address block: check bio split for unaligned bvec nbd: Reclassify sockets to avoid lockdep circular dependency block: add configurable error injection block: add a str_to_blk_op helper block: add a "tag" for block status codes block: add a macro to initialize the status table floppy: Drop unused pnp driver data block: propagate in_flight to whole disk on partition I/O virtio-blk: clamp zone report to the report buffer capacity block: optimize I/O merge hot path with unlikely() hints drivers/block/rbd: Use strscpy() to copy strings into arrays partitions: aix: bound the pp_count scan to the ppe array block: Enable lock context analysis block/mq-deadline: Make the lock context annotations compatible with Clang block/Kyber: Make the lock context annotations compatible with Clang block/blk-mq-debugfs: Improve lock context annotations block/blk-iocost: Inline iocg_lock() and iocg_unlock() block/blk-iocost: Split ioc_rqos_throttle() block/crypto: Annotate the crypto functions ...
2026-06-13	nbd: Reclassify sockets to avoid lockdep circular dependency	Eric Dumazet
	syzbot reported a possible circular locking dependency in udp_sendmsg() where fs_reclaim can be triggered while holding sk_lock, and fs_reclaim can eventually depend on another sk_lock (e.g., if NBD is used for swap or writeback and NBD uses TLS/TCP which acquires sk_lock). Since the UDP socket and the NBD TCP/TLS socket are different, this is a false positive. Fix this by reclassifying NBD sockets to a separate lock class when they are added to the NBD device. This is similar to what nvme-tcp and other network block devices do. Fixes: ffa1e7ada456 ("block: Make request_queue lockdep splats show up earlier") Reported-by: syzbot+607cdcf978b3e79da878@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a2cdafe.428ffe26.258b27.0161.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260613042619.1108126-1-edumazet@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-10	floppy: Drop unused pnp driver data	Uwe Kleine-König (The Capable Hub)
	The pnp_device_id array is only used for module data to support auto-loading the floppy module. So the .driver_data member is unused and this assignment can be dropped. While touching that array, align the coding style to what is used most for these. This patch doesn't modify the compiled array, only its representation in source form benefits. The former was confirmed with x86 and arm64 builds. Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Reviewed-by: Denis Efremov (Oracle) <efremov@linux.com> Link: https://patch.msgid.link/99dbf851ffb99229ea1dcfd8f58e9ee6a1f05349.1781075967.git.u.kleine-koenig@baylibre.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-09	virtio-blk: clamp zone report to the report buffer capacity	Michael Bommarito
	virtblk_report_zones() trusts the device-reported number of zones when walking the report buffer: nz = min_t(u64, virtio64_to_cpu(vblk->vdev, report->nr_zones), nr_zones); ... for (i = 0; i < nz && zone_idx < nr_zones; i++) { ret = virtblk_parse_zone(vblk, &report->zones[i], ...); The buffer is allocated by virtblk_alloc_report_buffer(), whose size is capped by the queue's max hardware sectors and max segments and can therefore hold fewer descriptors than nr_zones. nz is bounded only by the device-supplied report->nr_zones and the requested nr_zones, never by the buffer's descriptor capacity. At probe time the request count is unbounded (blk_revalidate_disk_zones() calls report_zones() with nr_zones == UINT_MAX), so the device-supplied report->nr_zones is the sole gate: a device that reports more zones than fit in the buffer drives the loop to read report->zones[i] past the end of the allocation. A malicious or buggy virtio-blk device that reports an inflated nr_zones triggers this during zone revalidation at probe. KASAN reports a vmalloc-out-of-bounds read in virtblk_report_zones() against the report buffer allocated a few lines earlier. Clamp nz to the number of descriptors that actually fit in the report buffer. Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://patch.msgid.link/20260607124834.3059944-1-michael.bommarito@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-08	zram: drop unused bio parameter from write helpers	Cunlong Li
	After "zram: fix use-after-free in zram_bvec_write_partial()", zram_bvec_write_partial() always passes NULL to zram_read_page() and no longer needs the parent bio. Mirror the read side (zram_bvec_read_partial() has not taken a bio since commit 4e3c87b9421d ("zram: fix synchronous reads")) and drop the parameter from zram_bvec_write_partial() and zram_bvec_write(). No functional change. Link: https://lore.kernel.org/20260528-zram-v3-2-cab86eef8764@gmail.com Signed-off-by: Cunlong Li <shenxiaogll@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-08	drivers/block/rbd: Use strscpy() to copy strings into arrays	David Laight
	Replacing strcpy() with strscpy() ensures than overflow of the target buffer cannot happen. Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Alex Elder <elder@riscstar.com> Link: https://patch.msgid.link/20260606202744.5113-5-david.laight.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-04	zram: clear trailing bytes of compressed writeback pages	Sergey Senozhatsky
	Patch series "zram: writeback fixes", v2. Brian (privately) reported a "leak" of writeback bitmap in certain cases, so that backing device can store less pages; and a theoretical data leak in the trailing bytes of compressed writeback pages. Both issues are low risk. This patch (of 2): When compressed writeback is available writtenback pages contain "garbage" in PAGE_SIZE - obj_size trailing bytes. That "garbage" is, basically, whatever data that page held before we got it for writeback. To get advantage of it an attacker needs to be able to read from active backing swap device, which is already catastrophic. Still, just in case, zero out those trailing bytes before writeback to a backing device so that we only store swap-ed out data there. Link: https://lore.kernel.org/20260526022754.2377730-1-senozhatsky@chromium.org Link: https://lore.kernel.org/20260526022754.2377730-3-senozhatsky@chromium.org Fixes: d38fab605c66 ("zram: introduce compressed data writeback") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Richard Chang <richardycc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-04	zram: do not leak blk idx at the end of writeback	Sergey Senozhatsky
	zram_writeback_slots() loop can terminate with valid reserved backing device blk_idx. The problem is that cleanup code doesn't release that reserved blk_idx before zram_writeback_slots() returns, which leads to blk_idx leak (it becomes permanently busy and can not be used for actual writeback.) This does not lead to any system instabilities, it only means that we can writeback less pages. The scenario is hard to hit in practice as it requires writeabck to race with modification (slot-free or overwrite) of the final post-processing slot. Release reserved but unused blk_idx before returning from zram_writeback_slots(). Link: https://lore.kernel.org/20260526022754.2377730-2-senozhatsky@chromium.org Fixes: f405066a1f0db ("zram: introduce writeback bio batching") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Richard Chang <richardycc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-03	zram: fix use-after-free in zram_bvec_write_partial()	Cunlong Li
	zram_read_page() picks the sync or async backing device read path based on whether the parent bio is NULL. zram_bvec_write_partial() passes its parent bio down, so for ZRAM_WB slots the read is dispatched asynchronously and zram_read_page() returns 0 while the bio is still in flight. The caller then runs memcpy_from_bvec(), zram_write_page() and __free_page() on the buffer, leaving the async read to write into a freed page. zram_bvec_read_partial() was switched to NULL in commit 4e3c87b9421d ("zram: fix synchronous reads") for the same reason; the write_partial counterpart was missed. Link: https://lore.kernel.org/20260528-zram-v3-1-cab86eef8764@gmail.com Fixes: 8e654f8fbff5 ("zram: read page from backing device") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Cunlong Li <shenxiaogll@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-02	n64cart: use strscpy in n64cart_probe	Thorsten Blum
	strcpy() has been deprecated [1] because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. While the current code works correctly, replace strcpy() with the safer strscpy() to follow secure coding best practices. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20260517172617.3954-2-thorsten.blum@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-31	rbd: check snap_count against RBD_MAX_SNAP_COUNT	Rosen Penev
	snap_count is u32 but the comparison is against a SIZE_MAX-derived value (~2^61 on 64-bit), which clang flags as always false with -Wtautological-constant-out-of-range-compare. The proper check here should be that snap_count does not go over RBD_MAX_SNAP_COUNT. Assisted-by: Opencode:Big-pickle Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Alex Elder <elder@riscstar.com> Link: https://patch.msgid.link/20260530011255.52916-1-rosenp@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-28	loop: cleanup lo_rw_aio	Christoph Hellwig
	Port over the changes from the zloop driver to remove the need for the local bio, bvec and offset variables and clean up the code by that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://patch.msgid.link/20260527151043.2349900-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-27	ublk: set canceling flag even when disk is not allocated	Ming Lei
	ublk_start_cancel() previously bailed out early when ublk_get_disk() returned NULL, treating it as "our disk has been dead". That is correct for the post-teardown case, but it also wrongly covers the pre-start case: ublk_ctrl_start_dev() has not assigned ub->ub_disk yet, while io_uring is already tearing down the daemon's uring_cmds via ublk_uring_cmd_cancel_fn(). In that window, the cancel path skips ublk_set_canceling(), so ubq->canceling stays false, even though ublk_cancel_cmd() goes on to NULL out every io->cmd. ublk_ctrl_start_dev() then proceeds to set ub->ub_disk, call add_disk(), and schedule partition_scan_work. When ublk_partition_scan_work() runs bdev_disk_changed() and the resulting read reaches ublk_queue_rq() -> ublk_queue_cmd(), the ubq->canceling check passes and the code dereferences the NULL io->cmd: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: ublk_queue_cmd drivers/block/ublk_drv.c [inline] RIP: ublk_queue_rq+0x73/0x100 Call Trace: blk_mq_dispatch_rq_list+0x1c5/0xca0 ... bdev_disk_changed+0x3d4/0x5e0 ublk_partition_scan_work+0x89/0xe0 process_one_work+0x344/0x8a0 Fix it by always setting ub->canceling / ubq->canceling under cancel_mutex. When the disk is allocated, keep the existing quiesce/unquiesce dance so the flag is observed across the ublk_queue_rq() barrier. When the disk is not yet allocated, there is no request_queue and ublk_queue_rq() cannot be running concurrently, so simply flipping the flag is sufficient: any subsequent I/O - including the partition scan started by ublk_ctrl_start_dev() - will see canceling set and be aborted via __ublk_queue_rq_common(). Fixes: 7fc4da6a304b ("ublk: scan partition in async way") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260527144042.2095194-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-26	mtip32xx: fix use-after-free on service thread failure	Yuho Choi
	If service thread creation fails after device_add_disk() succeeds, mtip_block_initialize() calls del_gendisk() and then falls through to put_disk(). Since mtip32xx uses .free_disk to free struct driver_data, put_disk() can release dd on the added-disk path. The same unwind then continues to use dd for blk_mq_free_tag_set() and mtip_hw_exit(), and mtip_pci_probe() can later free dd again. This can cause a use-after-free and double free. Track whether the disk was added in the current initialization call. For the post-add service-thread failure path, remove the disk, release the local hardware resources, and return without dropping the final disk reference. The probe error path can then finish its cleanup and call put_disk() after it is done using dd. Keep the pre-add path using put_disk() before blk_mq_free_tag_set(), and clear dd->disk so the outer probe cleanup frees dd directly. Fixes: e8b58ef09e84 ("mtip32xx: fix device removal") Signed-off-by: Yuho Choi <dbgh9129@gmail.com> Link: https://patch.msgid.link/20260525162531.1406677-1-dbgh9129@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-26	block: switch numa_node to int in blk_mq_hw_ctx and init_request	Mateusz Nowicki
	numa_node in blk_mq_hw_ctx and the matching argument of blk_mq_ops::init_request can be NUMA_NO_NODE (-1). Declared as unsigned int, NUMA_NO_NODE becomes UINT_MAX and walks off nvme_dev::descriptor_pools[] on CONFIG_NUMA=n [1]. Switch the field and the callback prototype to int and update all in-tree init_request implementations. No functional change: cpu_to_node(), kmalloc_node() and blk_alloc_flush_queue() already take int. Link: https://lore.kernel.org/linux-nvme/20260522150628.399288-1-mateusz.nowicki@posteo.net/ [1] Link: https://lore.kernel.org/linux-nvme/20260309062840.2937858-2-iam@sung-woo.kim/ Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Sung-woo Kim <iam@sung-woo.kim> Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260523125210.272274-1-mateusz.nowicki@posteo.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-26	Merge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
2026-05-22	ublk: factor out ublk_init_iod() helper	Caleb Sander Mateos
	The code for initializing struct ublksrv_io_desc on I/O dispatch is largely duplicated in 3 places. Commit 4d4a512a1f87 ("ublk: add PFN- based buffer matching in I/O path") added support to ublk_setup_iod() for matching request buffers against registered UBLK_F_SHMEM_ZC buffers, but missed adding it to ublk_setup_iod_zoned() for zoned requests. Move the duplicated logic to a new helper ublk_init_iod(). This way, zone appends can also benefit from avoiding the data copy. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260520203654.1413640-3-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-22	ublk: move ublk_req_build_flags() earlier	Caleb Sander Mateos
	Move ublk_req_build_flags() above its callers so it doesn't need to be forward-declared. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260520203654.1413640-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-22	ublk: optimize ublk_rq_has_data()	Caleb Sander Mateos
	ublk_rq_has_data() currently uses bio_has_data(), which involves 2 indirections and several branches. Use blk_rq_has_data() instead to save an indirection and NULL check. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260513211846.1956810-3-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-21	zram: fix use-after-free in zram_writeback_endio	Richard Chang
	A crash was observed in zram_writeback_endio due to a NULL pointer dereference in wake_up. The root cause is a race condition between the bio completion handler (zram_writeback_endio) and the writeback task. In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after releasing wb_ctl->done_lock. This creates a race window where the writeback task can see num_inflight become 0, return, and free wb_ctl before zram_writeback_endio calls wake_up(). CPU 0 (zram_writeback_endio) CPU 1 (writeback_store) ============================ ============================ zram_writeback_slots zram_submit_wb_request zram_submit_wb_request wait_event(wb_ctl->done_wait) spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); wake_up(&wb_ctl->done_wait); zram_complete_done_reqs spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); while (num_inflight) > 0) spin_lock(&wb_ctl->done_lock); list_del(&req->entry); spin_unlock(&wb_ctl->done_lock); // num_inflight becomes 0 atomic_dec(num_inflight); // Leave zram_writeback_slots // Free wb_ctl release_wb_ctl(wb_ctl); // UAF crash! wake_up(&wb_ctl->done_wait); This patch fixes this race by using RCU. By protecting wb_ctl with rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free it, we ensure that wb_ctl remains valid during the execution of zram_writeback_endio. Link: https://lore.kernel.org/20260512074918.2606208-1-richardycc@google.com Fixes: f405066a1f0d ("zram: introduce writeback bio batching") Signed-off-by: Richard Chang <richardycc@google.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin Liu <liumartin@google.com> Cc: wang wei <a929244872@163.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-20	rbd: eliminate a race in lock_dwork draining on unmap	Ilya Dryomov
	Given how rbd_lock_add_request() and rbd_img_exclusive_lock() are written, lock_dwork may be (re)queued more than it's actually needed: for example in case a new I/O request comes in while we are in the middle of rbd_acquire_lock() on behalf of another I/O request. This is expected and with rbd_release_lock() preemptively canceling lock_dwork is benign under normal operation. A more problematic example is maybe_kick_acquire(): if (have_requests \|\| delayed_work_pending(&rbd_dev->lock_dwork)) { dout("%s rbd_dev %p kicking lock_dwork\n", __func__, rbd_dev); mod_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); } It's not unrealistic for lock_dwork to get canceled right after delayed_work_pending() returns true and for mod_delayed_work() to requeue it right there anyway. This is a classic TOCTOU race. When it comes to unmapping the image, there is an implicit assumption of no self-initiated exclusive lock activity past the point of return from rbd_dev_image_unlock() which unlocks the lock if it happens to be held. This unlock is assumed to be final and lock_dwork (as well as all other exclusive lock tasks, really) isn't expected to get queued again. However, lock_dwork is canceled only in cancel_tasks_sync() (i.e. later in the unmap sequence) and on top of that the cancellation can get in effect nullified by maybe_kick_acquire(). This may result in rbd_acquire_lock() executing after rbd_dev_device_release() and rbd_dev_image_release() run and free and/or reset a bunch of things. One of the possible failure modes then is a violated rbd_assert(rbd_image_format_valid(rbd_dev->image_format)); in rbd_dev_header_info() which is called via rbd_dev_refresh() from rbd_post_acquire_action(). Redo exclusive lock task draining to provide saner semantics and try to meet the assumptions around rbd_dev_image_unlock(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
2026-05-11	ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation	Ming Lei
	blk_validate_limits() requires max_hw_sectors >= PAGE_SECTORS and fires a WARN_ON_ONCE if this invariant is violated. ublk_validate_params() only checked the upper bound of max_sectors against max_io_buf_bytes, allowing userspace to pass small values (including zero) that trigger the warning when blk_mq_alloc_disk() is called from ublk_ctrl_start_dev(). Before 494ea040bcb5, ublk used blk_queue_max_hw_sectors() which silently clamped small values up to PAGE_SECTORS. The conversion to passing queue_limits directly to blk_mq_alloc_disk() lost that clamping and now hits blk_validate_limits()'s WARN_ON_ONCE instead. Validate that max_sectors is at least PAGE_SECTORS in ublk_validate_params() so invalid values are rejected early with -EINVAL instead of reaching the block layer. Fixes: 494ea040bcb5 ("ublk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260510144843.769031-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-08	drbd: replace genl_magic with explicit netlink serialization	Christoph Böhmwalder
	Replace the genl_magic multi-include macro system with explicit serialization and parsing. The _gen files were initially produced from a YNL spec via a customized ynl-gen-c, but the DRBD netlink family is effectively frozen, so the generator is kept unmodified. All new functionality will land in a separate, properly-designed family. Carry the resulting code as ordinary in-tree source rather than landing the spec and generator changes that produced it. The bulk of the changes are mechanical renames to fit the YNL naming conventions: - Handler functions: drbd_adm_ -> drbd_nl__doit/dumpit - GENL_MAGIC_VERSION -> DRBD_FAMILY_VERSION - GENL_MAGIC_FAMILY_HDRSZ -> sizeof(struct drbd_genlmsghdr) - drbd_genl_family -> drbd_nl_family - Attribute IDs: T_ -> DRBD_A_* Remove the nested_attr_tb static global buffer and move to a per-call allocation approach: each deserialization manages its own nested attribute table. This will be needed anyway when we eventually move to parallel_ops, and it's actually simpler this way, so make the move now. Replace the functionality of the "sensitive" flag: this was only used by a single field (shared_secret); open-code redaction logic for that locally. Also replace the "invariant" flag: this only had a couple of users, and those basically never change. Hard code the check directly inline. The genl_family struct itself is defined manually in drbd_nl.c. Also replace a couple of drbd-specific wrappers (nla_put_u64_0pad, drbd_nla_find_nested) with standard kernel functions while we're at it. Finally, completely remove the genl_magic system; DRBD was its only user. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260506124541.1951772-3-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-08	drbd: move UAPI headers to include/uapi/linux/	Christoph Böhmwalder
	drbd.h and drbd_limits.h contain only type definitions, enums, and constants shared between kernel and userspace. These should be part of UAPI. Split the genl_api header into two: the genlmsghdr and the enums are UAPI, the rest stays there for now (it will be removed by one of the next commits in this series). drbd_config.h is clearly DRBD-internal, so move it there. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260506124541.1951772-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-08	ublk: fix use-after-free in ublk_cancel_cmd()	Ming Lei
	When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit() concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a stale pointer and pass it to io_uring_cmd_done(), causing a use-after-free. Fix by synchronizing the two paths with ubq->cancel_lock: - ublk_cancel_cmd(): read and clear io->cmd under cancel_lock, then call io_uring_cmd_done() on the saved local copy outside the lock. - ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit() so that io->cmd and io->flags are cleared atomically with respect to ublk_cancel_cmd(). Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-06	ublk: validate physical_bs_shift, io_min_shift and io_opt_shift	Ming Lei
	ublk_validate_params() checks logical_bs_shift is within [9, PAGE_SHIFT] but has no upper bound for physical_bs_shift, io_min_shift, or io_opt_shift. A malicious userspace can set any of these to a large value (e.g., 44), causing undefined behavior from `1 << shift` in ublk_ctrl_start_dev() since the result is stored in 32-bit unsigned int. Cap all three at ilog2(SZ_256M) (28). 256M is big enough to cover all practical block sizes, and originates from the maximum physical block size possible in NVMe (lba_size * (1 + npwg), where npwg is 16-bit). Also zero out ub->params with memset() when copy_from_user() fails or ublk_validate_params() returns error, so that no stale or partial params survive for a subsequent START_DEV to consume. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260506082238.22363-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-01	ublk: don't issue uring_cmd from fallback task work	Jens Axboe
	When ublk_ch_uring_cmd_cb() runs as fallback task work (e.g., because the submitting task is exiting), the command should not be issued as current is a kworker, not the daemon task. This can cause io->task to capture the wrong task in __ublk_fetch(), leading to a task mismatch warning in ublk_uring_cmd_cancel_fn(). Check tw.cancel and return -ECANCELED instead of issuing the command from fallback context. Fixes: 3421c7f68bba ("ublk: make sure io cmd handled in submitter task context") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260501112312.947327-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-24	Merge tag 'block-7.1-20260424' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Series for zloop, fixing a variety of issues - t10-pi code cleanup - Fix for a merge window regression with the bio memory allocation mask - Fix for a merge window regression in ublk, caused by an issue with the maple tree iteration code at teardown - ublk self tests additions - Zoned device pgmap fixes - Various little cleanups and fixes * tag 'block-7.1-20260424' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (21 commits) Revert "floppy: fix reference leak on platform_device_register() failure" ublk: avoid unpinning pages under maple tree spinlock ublk: refactor common helper ublk_shmem_remove_ranges() ublk: fix maple tree lockdep warning in ublk_buf_cleanup selftests: ublk: add ublk auto integrity test selftests: ublk: enable test_integrity_02.sh on fio 3.42 selftests: ublk: remove unused argument to _cleanup block: only restrict bio allocation gfp mask asked to block block/blk-throttle: Add WQ_PERCPU to alloc_workqueue users block: Add WQ_PERCPU to alloc_workqueue users block: relax pgmap check in bio_add_page for compatible zone device pages block: add pgmap check to biovec_phys_mergeable floppy: fix reference leak on platform_device_register() failure ublk: use unchecked copy helpers for bio page data t10-pi: reduce ref tag code duplication zloop: remove irq-safe locking zloop: factor out zloop_mark_{full,empty} helpers zloop: set RQF_QUIET when completing requests on deleted devices zloop: improve the unaligned write pointer warning zloop: use vfs_truncate ...
2026-04-24	Merge tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph updates from Ilya Dryomov: "We have a series from Alex which extends CephFS client metrics with support for per-subvolume data I/O performance and latency tracking (metadata operations aren't included) and a good variety of fixes and cleanups across RBD and CephFS" * tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-client: ceph: add subvolume metrics collection and reporting ceph: parse subvolume_id from InodeStat v9 and store in inode ceph: handle InodeStat v8 versioned field in reply parsing libceph: Fix slab-out-of-bounds access in auth message processing rbd: fix null-ptr-deref when device_add_disk() fails crush: cleanup in crush_do_rule() method ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails ceph: only d_add() negative dentries when they are unhashed libceph: update outdated comment in ceph_sock_write_space() libceph: Remove obsolete session key alignment logic ceph: fix num_ops off-by-one when crypto allocation fails libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
2026-04-23	Revert "floppy: fix reference leak on platform_device_register() failure"	Jens Axboe
	This reverts commit e784f2ea0b4fd0e7b70028ff8218f22456c5dcf8. Jiri says the patch is buggy, and it looks like he is right revert it for now. Link: https://lore.kernel.org/linux-block/897f442d-4e04-4b70-b716-38fd10b8af36@kernel.org/ Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23	ublk: avoid unpinning pages under maple tree spinlock	Ming Lei
	ublk_shmem_remove_ranges() calls unpin_user_pages() while holding the maple tree spinlock (mas_lock). Although unpin_user_pages() is safe in atomic context, holding the spinlock across potentially many page unpinning operations is not ideal. Split into __ublk_shmem_remove_ranges() which erases up to 64 ranges under mas_lock, collecting base_pfn and nr_pages into a temporary xarray. Then drop the lock and unpin pages outside spinlock context. ublk_shmem_remove_ranges() loops until all matching ranges are processed. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-4-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23	ublk: refactor common helper ublk_shmem_remove_ranges()	Ming Lei
	Extract the shared walk+erase+unpin+kfree loop into ublk_shmem_remove_ranges(). When buf_index >= 0, only ranges matching that index are removed; when buf_index < 0, all ranges are removed. Also extract ublk_unpin_range_pages() to share the page unpinning loop. Convert both __ublk_ctrl_unreg_buf() and ublk_buf_cleanup() to use the new helper. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-3-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23	ublk: fix maple tree lockdep warning in ublk_buf_cleanup	Ming Lei
	ublk_buf_cleanup() iterates the maple tree with mas_for_each() without holding mas_lock, triggering a lockdep splat on CONFIG_PROVE_RCU kernels since mas_find() internally uses rcu_dereference_check() which requires either RCU or the tree lock. Fix by holding mas_lock around the iteration, and call mas_erase() before freeing each range to avoid dangling pointers in the tree. Fixes: 5e864438e285 ("ublk: replace xarray with IDA for shmem buffer index allocation") Reported-by: Jens Axboe <axboe@kernel.dk> Closes: https://lore.kernel.org/linux-block/0349d72d-dff8-4f9f-b448-919fa5ae96da@kernel.dk/ Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-2-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>