linux-stable.git/block/blk-sysfs.c, branch linux-4.9.y

blk-mq: register device instead of disk

2016-09-21T13:56:16+00:00

Enable devices without a gendisk instance to register itself with blk-mq
and expose the associated multi-queue sysfs entries.

Signed-off-by: Matias Bjørling 
Signed-off-by: Jens Axboe

block: expose QUEUE_FLAG_DAX in sysfs

2016-07-21T03:01:08+00:00

Provides the ability to identify DAX enabled devices in userspace.

Signed-off-by: Yigal Korman 
Signed-off-by: Toshi Kani 
Acked-by: Dan Williams 
Signed-off-by: Mike Snitzer 
Signed-off-by: Jens Axboe

block: add ability to flag write back caching on a device

2016-04-12T21:46:27+00:00

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.

Signed-off-by: Jens Axboe 
Reviewed-by: Christoph Hellwig

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

2016-04-04T17:41:08+00:00

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 -  << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

 -  >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov 
Acked-by: Michal Hocko 
Signed-off-by: Linus Torvalds

blk: fix overflow in queue_discard_max_hw_show

2016-02-17T17:20:42+00:00

We get this right for queue_discard_max_show but not max_hw_show. Follow the
same pattern as queue_discard_max_show instead so that we don't truncate.

Signed-off-by: Alan Cox 
Signed-off-by: Jens Axboe

Merge branch 'mkp-fixes' into fixes

2015-12-03T17:32:33+00:00

block/sd: Fix device-imposed transfer length limits

2015-11-26T02:38:58+00:00

Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.

Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.

 - Introduce a new queue limit, max_dev_sectors, which is used by the
   ULD to signal the maximum sectors for a REQ_TYPE_FS request.

 - Ensure that max_dev_sectors is correctly stacked and taken into
   account when overriding max_sectors through sysfs.

 - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
   values for later processing.

 - In sd_revalidate() set the queue's max_dev_sectors based on the
   MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
   is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
   field size.

 - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
   VPD--if reported and sane--to signal the preferred device transfer
   size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

 - blk_limits_max_hw_sectors() is no longer used and can be removed.

Signed-off-by: Martin K. Petersen 
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig 
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets 
Tested-by: David Eisner 
Tested-by: Mario Kicherer 
Signed-off-by: Martin K. Petersen

block: add block polling support

2015-11-07T17:40:47+00:00

Add basic support for polling for specific IO to complete. This uses
the cookie that blk-mq passes back, which enables the block layer
to pass this cookie to the driver to spin for a specific request.

This will be combined with request latency tracking, so we can make
qualified decisions about when to poll and when not to. For now, for
benchmark purposes, we add a sysfs file that controls whether polling
is enabled or not.

Signed-off-by: Jens Axboe 
Acked-by: Christoph Hellwig 
Acked-by: Keith Busch

Merge branch 'for-4.4/integrity' of git://git.kernel.dk/linux-block

2015-11-05T04:51:48+00:00

Pull block integrity updates from Jens Axboe:
 ""This is the joint work of Dan and Martin, cleaning up and improving
  the support for block data integrity"

* 'for-4.4/integrity' of git://git.kernel.dk/linux-block:
  block, libnvdimm, nvme: provide a built-in blk_integrity nop profile
  block: blk_flush_integrity() for bio-based drivers
  block: move blk_integrity to request_queue
  block: generic request_queue reference counting
  nvme: suspend i/o during runtime blk_integrity_unregister
  md: suspend i/o during runtime blk_integrity_unregister
  md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown
  block: Inline blk_integrity in struct gendisk
  block: Export integrity data interval size in sysfs
  block: Reduce the size of struct blk_integrity
  block: Consolidate static integrity profile properties
  block: Move integrity kobject to struct gendisk

block: generic request_queue reference counting

2015-10-21T20:43:41+00:00

Allow pmem, and other synchronous/bio-based block drivers, to fallback
on a per-cpu reference count managed by the core for tracking queue
live/dead state.

The existing per-cpu reference count for the blk_mq case is promoted to
be used in all block i/o scenarios.  This involves initializing it by
default, waiting for it to drop to zero at exit, and holding a live
reference over the invocation of q->make_request_fn() in
generic_make_request().  The blk_mq code continues to take its own
reference per blk_mq request and retains the ability to freeze the
queue, but the check that the queue is frozen is moved to
generic_make_request().

This fixes crash signatures like the following:

 BUG: unable to handle kernel paging request at ffff880140000000
 [..]
 Call Trace:
  [] ? copy_user_handle_tail+0x5f/0x70
  [] pmem_do_bvec.isra.11+0x70/0xf0 [nd_pmem]
  [] pmem_make_request+0xd1/0x200 [nd_pmem]
  [] ? mempool_alloc+0x72/0x1a0
  [] generic_make_request+0xd6/0x110
  [] submit_bio+0x76/0x170
  [] submit_bh_wbc+0x12f/0x160
  [] submit_bh+0x12/0x20
  [] jbd2_write_superblock+0x8d/0x170
  [] jbd2_mark_journal_empty+0x5d/0x90
  [] jbd2_journal_destroy+0x24b/0x270
  [] ? put_pwq_unlocked+0x2a/0x30
  [] ? destroy_workqueue+0x225/0x250
  [] ext4_put_super+0x64/0x360
  [] generic_shutdown_super+0x6a/0xf0

Cc: Jens Axboe 
Cc: Keith Busch 
Cc: Ross Zwisler 
Suggested-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Tested-by: Ross Zwisler 
Signed-off-by: Dan Williams 
Signed-off-by: Jens Axboe