linux.git/block, branch v7.2-rc1

block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action

2026-06-24T12:53:25+00:00

Otherwise zone append commands will miss their integrity data.  While
this works "fine" for auto-PI, it break file system PI and non-PI
metadata.

With this XFS on ZNS namespace with non-PI metadata and 512 byte sectors
with PI work, while PI 4k sector formats with PI work only when Caleb's
"block: fix integrity offset/length conversions" is applied as well.

Note that unlike regular writes, zone append does need remapping as
partitions are not supported on zoned block devices.

Fixes: df3c485e0e60 ("block: switch on bio operation in bio_integrity_prep")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Link: https://patch.msgid.link/20260624080014.1998650-3-hch@lst.de
Signed-off-by: Jens Axboe

block: fix GFP_ flags confusion in bio_integrity_alloc_buf

2026-06-24T12:53:25+00:00

bio_integrity_alloc_buf usage of GFP_ flags is messed up.  For one it
mixes GFP_NOFS and GFP_NOIO for neighbouring allocations, but it also
makes the allocations fail more often than needed.  That code was copied
from bio_alloc_bioset which needs to do that so that it can punt to the
rescuer workqueue, but none of that is needed for the integrity
allocations that either sits in the file system or at the very bottom
of the I/O stack.  Failing early means we'll do a fully waiting
allocation from the mempool ->alloc callback which is usually much
larger than required.

Fix this by passing a gfp_t so that the file system path can pass
GFP_NOFS and the auto-integrity code can pass GFP_NOIO, and don't
modify the allocation type except for disabling warnings.

Fixes: ec7f31b2a2d3 ("block: make bio auto-integrity deadlock safe")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
Link: https://patch.msgid.link/20260624080014.1998650-2-hch@lst.de
Signed-off-by: Jens Axboe

block, bfq: don't grab queue_lock to initialize bfq

2026-06-24T12:42:31+00:00

The request_queue is frozen and quiesced while the elevator init_sched()
method runs, so queue_lock is not needed for BFQ cgroup initialization.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/1965073ea20f33114a8d903816b986e483b9bb34.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: don't nest queue_lock under blkcg->lock in blkcg_destroy_blkgs()

2026-06-24T12:42:31+00:00

The correct lock order is q->queue_lock before blkcg->lock, and in order
to prevent deadlock from blkcg_destroy_blkgs(), trylock is used for
q->queue_lock while blkcg->lock is already held, this is hacky.

Refactor blkcg_destroy_blkgs() to hold blkcg->lock only long enough to
get the first blkg and then release it. Then take q->queue_lock and
blkcg->lock in the correct order to destroy the blkg. This is a very cold
path, so the extra lock/unlock cycles are acceptable.

Also prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/00b03cf74a9937cb4d6dd67a189ddc00a3de0451.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: don't nest queue_lock under rcu in bio_associate_blkg()

2026-06-24T12:42:31+00:00

If a bio is already associated with a blkg, the blkcg is already pinned
until the bio is done, so there is no need for RCU protection. Otherwise,
protect blkcg_css() with RCU independently. Prepare to protect blkcg with
blkcg_mutex instead of queue_lock.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/8496fa234b21d4b31b7f068766906d0bffcac8e6.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: don't nest queue_lock under rcu in blkg_lookup_create()

2026-06-24T12:42:31+00:00

Change this in two steps:

1) hold rcu lock and do blkg_lookup() from fast path;
2) hold queue_lock directly from slow path, and don't nest it under rcu
   lock;

Prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/93f33cc9e5a39dddb78dcd934d0c1d04b564fb00.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: don't nest queue_lock under rcu in blkcg_print_blkgs()

2026-06-24T12:42:19+00:00

With previous modification to delay freeing policy data after an RCU grace
period, prfill() can run under RCU instead of taking queue_lock. However,
policy teardown can still clear blkg->pd[plid] after blkcg_print_blkgs()
observes the policy enabled bit.

Load policy data once with READ_ONCE() and skip the blkg if teardown
already cleared it. Do the same in recursive stat walks for descendant
blkgs. Remove the stale BFQ debug queue_lock assertion because
blkcg_print_blkgs() no longer calls prfill() with queue_lock held. This
also lets ioc_qos_prfill() and ioc_cost_model_prfill() use IRQ-safe
ioc->lock locking without re-enabling IRQs while queue_lock is still held.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/db7633d5e263dd1c2bf9b901762545a84b7d714e.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: delay freeing policy data after rcu grace period

2026-06-24T12:37:54+00:00

Currently blkcg_print_blkgs() must hold RCU to iterate blkgs from a
blkcg, and prfill() must hold queue_lock to prevent policy data from
being freed by policy deactivation. As a consequence, queue_lock has to
be nested under RCU from blkcg_print_blkgs().

Delay freeing policy data until after an RCU grace period so prfill() can
be protected by RCU alone.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/e20e5d984b41a026d61851966bed35eb094c4bff.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: protect iterating blkgs with blkcg->lock in blkcg_print_stat()

2026-06-24T12:37:54+00:00

blkcg_print_one_stat() will be called for each blkg:
- access blkg->iostat, which is freed from rcu callback
  blkg_free_workfn();
- access policy data from pd_stat_fn(), which is freed from
  pd_free_fn(), while pd_free_fn() can be called by removing blkcg or
  deactivating policy;

Take blkcg->lock while iterating so the blkgs stay online and both
blkg->iostat and policy data for activated policies stay valid.  Use
irq-safe locking because blkcg->lock can be nested under q->queue_lock,
which is used from IRQ completion paths.

Prepare to convert protecting blkgs from request_queue with mutex.

Signed-off-by: Yu Kuai 
Link: https://patch.msgid.link/05799877e720dcd300e2ddd4625e8e162959d7cc.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe

blk-cgroup: defer blkcg css_put until blkg is unlinked from queue

2026-06-22T21:59:53+00:00

[BUG]
Our fuzz testing triggered a blkcg use-after-free issue:

  BUG: KASAN: slab-use-after-free in _raw_spin_lock+0x75/0xe0
  Call Trace:
  ...
  blkcg_deactivate_policy+0x244/0x4d0
  ioc_rqos_exit+0x44/0xe0
  rq_qos_exit+0xba/0x120
  __del_gendisk+0x50b/0x800
  del_gendisk+0xff/0x190
  ...

[CAUSE]
process1						process2
cgroup_rmdir
...
  css_killed_work_fn
    offline_css
    ...
      blkcg_destroy_blkgs
      ...
        __blkg_release
	  css_put(&blkg->blkcg->css)
          blkg_free
	    INIT_WORK(xxx, blkg_free_workfn)
	    schedule_work
    css_put
    ...
      blkcg_css_free
        kfree(blkcg)--------blkcg has been freed!!!
====================================schedule_work
              blkg_free_workfn
							__del_gendisk
							  rq_qos_exit
							    ioc_rqos_exit
							      blkcg_deactivate_policy
							        mutex_lock(&q->blkcg_mutex)
								spin_lock_irq(&q->queue_lock)
							        list_for_each_entry(blkg, xxx)
								  blkcg = blkg->blkcg
								  spin_lock(&blkcg->lock)-------UAF!!!
	        mutex_lock(&q->blkcg_mutex)
	        spin_lock_irq(&q->queue_lock)
	        /* Only then is the blkg removed from the list */
	        list_del_init(&blkg->q_node)

As a result, a blkg can still be reachable through q->blkg_list while
its ->blkcg has already been freed.

[Fix]
Fix this by deferring the blkcg css_put() until after the blkg has been
unlinked from q->blkg_list in blkg_free_workfn(). This ensures that the
blkcg outlives every blkg still reachable through q->blkg_list, so any
iterator holding q->queue_lock is guaranteed to observe a valid
blkg->blkcg.

While at it, move css_tryget_online() from blkg_create() into blkg_alloc()
so that the css reference is owned by the alloc/free pair rather than
straddling layers:
blkg_alloc()  <-> blkg_free()
blkg_create() <-> blkg_destroy()

Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Suggested-by: Hou Tao 
Signed-off-by: Zizhi Wo 
Reviewed-by: Yu Kuai 
Reviewed-by: Tang Yizhou 
Link: https://patch.msgid.link/20260616011746.2451461-1-wozizhi@huaweicloud.com
Signed-off-by: Jens Axboe