linux-stable.git/block/blk.h, branch linux-4.9.y

blk-mq: remove ->map_queue

2016-09-15T14:42:03+00:00

All drivers use the default, so provide an inline version of it.  If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Keith Busch 
Signed-off-by: Jens Axboe

block: simplify and export blk_rq_append_bio

2016-07-20T23:38:32+00:00

The target SCSI passthrough backend is much better served with the low-level
blk_rq_append_bio construct then the helpers built on top of it, so export it.

Also use the opportunity to remove the pointless request_queue argument and
make the code flow a little more readable.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe

block: defer timeouts to a workqueue

2015-12-22T16:38:16+00:00

Timer context is not very useful for drivers to perform any meaningful abort
action from.  So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.

Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals.  But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)

Contains a major update from Keith Bush:

"This patch removes synchronizing the timeout work so that the timer can
 start a freeze on its own queue. The timer enters the queue, so timer
 context can only start a freeze, but not wait for frozen."

Signed-off-by: Christoph Hellwig 
Acked-by: Keith Busch 
Signed-off-by: Jens Axboe

block: protect rw_page against device teardown

2015-11-19T21:47:10+00:00

Fix use after free crashes like the following:

 general protection fault: 0000 [#1] SMP
 Call Trace:
  [] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
  [] pmem_rw_page+0x42/0x80 [nd_pmem]
  [] bdev_read_page+0x50/0x60
  [] do_mpage_readpage+0x510/0x770
  [] ? I_BDEV+0x20/0x20
  [] ? lru_cache_add+0x1c/0x50
  [] mpage_readpages+0x107/0x170
  [] ? I_BDEV+0x20/0x20
  [] ? I_BDEV+0x20/0x20
  [] blkdev_readpages+0x1d/0x20
  [] __do_page_cache_readahead+0x28f/0x310
  [] ? __do_page_cache_readahead+0x169/0x310
  [] ? pagecache_get_page+0x2d/0x1d0
  [] filemap_fault+0x396/0x530
  [] __do_fault+0x4e/0xf0
  [] handle_mm_fault+0x11bd/0x1b50

Cc: 
Cc: Jens Axboe 
Cc: Alexander Viro 
Reported-by: kbuild test robot 
Acked-by: Matthew Wilcox 
[willy: symmetry fixups]
Signed-off-by: Dan Williams

Merge branch 'for-4.4/integrity' of git://git.kernel.dk/linux-block

2015-11-05T04:51:48+00:00

Pull block integrity updates from Jens Axboe:
 ""This is the joint work of Dan and Martin, cleaning up and improving
  the support for block data integrity"

* 'for-4.4/integrity' of git://git.kernel.dk/linux-block:
  block, libnvdimm, nvme: provide a built-in blk_integrity nop profile
  block: blk_flush_integrity() for bio-based drivers
  block: move blk_integrity to request_queue
  block: generic request_queue reference counting
  nvme: suspend i/o during runtime blk_integrity_unregister
  md: suspend i/o during runtime blk_integrity_unregister
  md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown
  block: Inline blk_integrity in struct gendisk
  block: Export integrity data interval size in sysfs
  block: Reduce the size of struct blk_integrity
  block: Consolidate static integrity profile properties
  block: Move integrity kobject to struct gendisk

block: fix plug list flushing for nomerge queues

2015-10-21T21:00:48+00:00

Request queues with merging disabled will not flush the plug list after
BLK_MAX_REQUEST_COUNT requests have been queued, since the code relies
on blk_attempt_plug_merge to compute the request_count.  Fix this by
computing the number of queued requests even for nomerge queues.

Signed-off-by: Jeff Moyer 
Signed-off-by: Jens Axboe

block: blk_flush_integrity() for bio-based drivers

2015-10-21T20:43:44+00:00

Since they lack requests to pin the request_queue active, synchronous
bio-based drivers may have in-flight integrity work from
bio_integrity_endio() that is not flushed by blk_freeze_queue().  Flush
that work to prevent races to free the queue and the final usage of the
blk_integrity profile.

This is temporary unless/until bio-based drivers start to generically
take a q_usage_counter reference while a bio is in-flight.

Cc: Martin K. Petersen 
[martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
Tested-by: Ross Zwisler 
Signed-off-by: Dan Williams 
Signed-off-by: Jens Axboe

block: generic request_queue reference counting

2015-10-21T20:43:41+00:00

Allow pmem, and other synchronous/bio-based block drivers, to fallback
on a per-cpu reference count managed by the core for tracking queue
live/dead state.

The existing per-cpu reference count for the blk_mq case is promoted to
be used in all block i/o scenarios.  This involves initializing it by
default, waiting for it to drop to zero at exit, and holding a live
reference over the invocation of q->make_request_fn() in
generic_make_request().  The blk_mq code continues to take its own
reference per blk_mq request and retains the ability to freeze the
queue, but the check that the queue is frozen is moved to
generic_make_request().

This fixes crash signatures like the following:

 BUG: unable to handle kernel paging request at ffff880140000000
 [..]
 Call Trace:
  [] ? copy_user_handle_tail+0x5f/0x70
  [] pmem_do_bvec.isra.11+0x70/0xf0 [nd_pmem]
  [] pmem_make_request+0xd1/0x200 [nd_pmem]
  [] ? mempool_alloc+0x72/0x1a0
  [] generic_make_request+0xd6/0x110
  [] submit_bio+0x76/0x170
  [] submit_bh_wbc+0x12f/0x160
  [] submit_bh+0x12/0x20
  [] jbd2_write_superblock+0x8d/0x170
  [] jbd2_mark_journal_empty+0x5d/0x90
  [] jbd2_journal_destroy+0x24b/0x270
  [] ? put_pwq_unlocked+0x2a/0x30
  [] ? destroy_workqueue+0x225/0x250
  [] ext4_put_super+0x64/0x360
  [] generic_shutdown_super+0x6a/0xf0

Cc: Jens Axboe 
Cc: Keith Busch 
Cc: Ross Zwisler 
Suggested-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Tested-by: Ross Zwisler 
Signed-off-by: Dan Williams 
Signed-off-by: Jens Axboe

Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block

2015-09-11T01:56:14+00:00

Pull blk-cg updates from Jens Axboe:
 "A bit later in the cycle, but this has been in the block tree for a a
  while.  This is basically four patchsets from Tejun, that improve our
  buffered cgroup writeback.  It was dependent on the other cgroup
  changes, but they went in earlier in this cycle.

  Series 1 is set of 5 patches that has cgroup writeback updates:

   - bdi_writeback iteration fix which could lead to some wb's being
     skipped or repeated during e.g. sync under memory pressure.

   - Simplification of wb work wait mechanism.

   - Writeback tracepoints updated to report cgroup.

  Series 2 is is a set of updates for the CFQ cgroup writeback handling:

     cfq has always charged all async IOs to the root cgroup.  It didn't
     have much choice as writeback didn't know about cgroups and there
     was no way to tell who to blame for a given writeback IO.
     writeback finally grew support for cgroups and now tags each
     writeback IO with the appropriate cgroup to charge it against.

     This patchset updates cfq so that it follows the blkcg each bio is
     tagged with.  Async cfq_queues are now shared across cfq_group,
     which is per-cgroup, instead of per-request_queue cfq_data.  This
     makes all IOs follow the weight based IO resource distribution
     implemented by cfq.

     - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.

     - Other misc review points addressed, acks added and rebased.

  Series 3 is the blkcg policy cleanup patches:

     This patchset contains assorted cleanups for blkcg_policy methods
     and blk[c]g_policy_data handling.

     - alloc/free added for blkg_policy_data.  exit dropped.

     - alloc/free added for blkcg_policy_data.

     - blk-throttle's async percpu allocation is replaced with direct
       allocation.

     - all methods now take blk[c]g_policy_data instead of blkcg_gq or
       blkcg.

  And finally, series 4 is a set of patches cleaning up the blkcg stats
  handling:

    blkcg's stats have always been somwhat of a mess.  This patchset
    tries to improve the situation a bit.

     - The following patches added to consolidate blkcg entry point and
       blkg creation.  This is in itself is an improvement and helps
       colllecting common stats on bio issue.

     - per-blkg stats now accounted on bio issue rather than request
       completion so that bio based and request based drivers can behave
       the same way.  The issue was spotted by Vivek.

     - cfq-iosched implements custom recursive stats and blk-throttle
       implements custom per-cpu stats.  This patchset make blkcg core
       support both by default.

     - cfq-iosched and blk-throttle keep track of the same stats
       multiple times.  Unify them"

* 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
  blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
  blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
  blkcg: implement interface for the unified hierarchy
  blkcg: misc preparations for unified hierarchy interface
  blkcg: separate out tg_conf_updated() from tg_set_conf()
  blkcg: move body parsing from blkg_conf_prep() to its callers
  blkcg: mark existing cftypes as legacy
  blkcg: rename subsystem name from blkio to io
  blkcg: refine error codes returned during blkcg configuration
  blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
  blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
  blkcg: remove cfqg_stats->sectors
  blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
  blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
  blkcg: make blkcg_[rw]stat per-cpu
  blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
  blkcg: consolidate blkg creation in blkcg_bio_issue_check()
  blk-throttle: improve queue bypass handling
  blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
  blkcg: inline [__]blkg_lookup()
  ...

blkcg: consolidate blkg creation in blkcg_bio_issue_check()

2015-08-18T22:49:17+00:00

blkg (blkcg_gq) currently is created by blkcg policies invoking
blkg_lookup_create() which ends up repeating about the same code in
different policies.  Theoretically, this can avoid the overhead of
looking and/or creating blkg's if blkcg is enabled but no policy is in
use; however, the cost of blkg lookup / creation is very low
especially if only the root blkcg is in use which is highly likely if
no blkcg policy is in active use - it boils down to a single very
predictable conditional and surrounding RCU protection.

This patch consolidates blkg creation to a new function
blkcg_bio_issue_check() which is called during bio issue from
generic_make_request_checks().  blkcg_bio_issue_check() is now the
only function which tries to create missing blkg's.  The subsequent
policy and request_list operations just perform blkg_lookup() and if
missing falls back to the root.

* blk_get_rl() no longer tries to create blkg.  It uses blkg_lookup()
  instead of blkg_lookup_create().

* blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu
  read locked and blkg already looked up.  Both throtl_lookup_tg() and
  throtl_lookup_create_tg() are dropped.

* cfq is similarly updated.  cfq_lookup_create_cfqg() is replaced with
  cfq_lookup_cfqg()which uses blkg_lookup().

This consolidates blkg handling and avoids unnecessary blkg creation
retries under memory pressure.  In addition, this provides a common
bio entry point into blkcg where things like common accounting can be
performed.

v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and
    !CONFIG_BLK_DEV_THROTTLING.

Signed-off-by: Tejun Heo 
Cc: Vivek Goyal 
Cc: Arianna Avanzini 
Signed-off-by: Jens Axboe