<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/block/genhd.c, branch v2.6.31</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Driver Core: block: add nodename support for block drivers.</title>
<updated>2009-06-16T04:30:25+00:00</updated>
<author>
<name>Kay Sievers</name>
<email>kay.sievers@vrfy.org</email>
</author>
<published>2009-04-30T13:23:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b03f38b685e2e1db174fb8982930e789a516f414'/>
<id>b03f38b685e2e1db174fb8982930e789a516f414</id>
<content type='text'>
This adds support for block drivers to report their requested nodename
to userspace.  It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.

Signed-off-by: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Jan Blunck &lt;jblunck@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds support for block drivers to report their requested nodename
to userspace.  It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.

Signed-off-by: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Jan Blunck &lt;jblunck@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Export I/O topology for block devices and partitions</title>
<updated>2009-05-22T21:22:55+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2009-05-22T21:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c72758f33784e5e2a1a4bb9421ef3e6de8f9fcf3'/>
<id>c72758f33784e5e2a1a4bb9421ef3e6de8f9fcf3</id>
<content type='text'>
To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment.  This patch adds support for exposing
I/O topology characteristics as devices are stacked.

  logical_block_size is the smallest unit the device can address.

  physical_block_size indicates the smallest I/O the device can write
  without incurring a read-modify-write penalty.

  The io_min parameter is the smallest preferred I/O size reported by
  the device.  In many cases this is the same as the physical block
  size.  However, the io_min parameter can be scaled up when stacking
  (RAID5 chunk size &gt; physical block size).

  The io_opt characteristic indicates the optimal I/O size reported by
  the device.  This is usually the stripe width for arrays.

  The alignment_offset parameter indicates the number of bytes the start
  of the device/partition is offset from the device's natural alignment.
  Partition tools and MD/DM utilities can use this to pad their offsets
  so filesystems start on proper boundaries.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment.  This patch adds support for exposing
I/O topology characteristics as devices are stacked.

  logical_block_size is the smallest unit the device can address.

  physical_block_size indicates the smallest I/O the device can write
  without incurring a read-modify-write penalty.

  The io_min parameter is the smallest preferred I/O size reported by
  the device.  In many cases this is the same as the physical block
  size.  However, the io_min parameter can be scaled up when stacking
  (RAID5 chunk size &gt; physical block size).

  The io_opt characteristic indicates the optimal I/O size reported by
  the device.  This is usually the stripe width for arrays.

  The alignment_offset parameter indicates the number of bytes the start
  of the device/partition is offset from the device's natural alignment.
  Partition tools and MD/DM utilities can use this to pad their offsets
  so filesystems start on proper boundaries.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: include empty disks in /proc/diskstats</title>
<updated>2009-04-22T06:35:10+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2009-04-17T06:34:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=71982a409f12c50d011325a4471aa20666bb908d'/>
<id>71982a409f12c50d011325a4471aa20666bb908d</id>
<content type='text'>
/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca7afa6f230104e8e65eba3420263714a5 accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Daniel Collins &lt;solemnwarning@solemnwarning.no-ip.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca7afa6f230104e8e65eba3420263714a5 accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Daniel Collins &lt;solemnwarning@solemnwarning.no-ip.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add documentation for register_blkdev()</title>
<updated>2009-02-26T09:45:48+00:00</updated>
<author>
<name>Márton Németh</name>
<email>nm127@freemail.hu</email>
</author>
<published>2009-02-20T07:12:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9e8c0bccdc944bd09361672d47660810c027bcaa'/>
<id>9e8c0bccdc944bd09361672d47660810c027bcaa</id>
<content type='text'>
Add documentation for register_blkdev() function and for the parameters.

Signed-off-by: Márton Németh &lt;nm127@freemail.hu&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add documentation for register_blkdev() function and for the parameters.

Signed-off-by: Márton Németh &lt;nm127@freemail.hu&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix booting from partitioned md array</title>
<updated>2009-02-18T09:33:59+00:00</updated>
<author>
<name>Neil Brown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-02-18T09:33:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=41b8c853a495438208faa5be03bbb0050859163b'/>
<id>41b8c853a495438208faa5be03bbb0050859163b</id>
<content type='text'>
Hi Tejun,

 it looks like your commit:

   block: don't depend on consecutive minor space
   f331c0296f2a9fee0d396a70598b954062603015

 broke a particular case for booting from partitioned md/raid devices.
 That is the second time this has been broken recently.  The previous
 time was fixed by

   block: do_mounts - accept root=&lt;non-existant partition&gt;
   30f2f0eb4bd2c43d10a8b0d872c6e5ad8f31c9a0

 Because the data isn't available when an md device is first created
 (we add disks and set it up after creation), the initial partition
 scan finds nothing.  It is not until the device is opened that
 another partition scan happens and finds something.

 So at the point where the kernel parameter "root=/dev/md_d0p1" is
 being parsed, md_d0 exists, but md_d0p1 does not.
 However if we let blk_lookup_devt return the correct device number
 even though the device doesn't exist, then the attempt to mount it
 will successfully find the partition.

 I have tried in the past to find a way to get the partition table to
 be read as soon as the array is assembled but that proved impossible
 (at the time).  I don't remember the details, and could possibly
 revisit it.  However it would be really nice if blk_lookup_devt
 could be adjusted to again accept non existant partitions.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hi Tejun,

 it looks like your commit:

   block: don't depend on consecutive minor space
   f331c0296f2a9fee0d396a70598b954062603015

 broke a particular case for booting from partitioned md/raid devices.
 That is the second time this has been broken recently.  The previous
 time was fixed by

   block: do_mounts - accept root=&lt;non-existant partition&gt;
   30f2f0eb4bd2c43d10a8b0d872c6e5ad8f31c9a0

 Because the data isn't available when an md device is first created
 (we add disks and set it up after creation), the initial partition
 scan finds nothing.  It is not until the device is opened that
 another partition scan happens and finds something.

 So at the point where the kernel parameter "root=/dev/md_d0p1" is
 being parsed, md_d0 exists, but md_d0p1 does not.
 However if we let blk_lookup_devt return the correct device number
 even though the device doesn't exist, then the attempt to mount it
 will successfully find the partition.

 I have tried in the past to find a way to get the partition table to
 be read as soon as the array is assembled but that proved impossible
 (at the time).  I don't remember the details, and could possibly
 revisit it.  However it would be really nice if blk_lookup_devt
 could be adjusted to again accept non existant partitions.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: struct device - replace bus_id with dev_name(), dev_set_name()</title>
<updated>2009-01-06T18:44:43+00:00</updated>
<author>
<name>Kay Sievers</name>
<email>kay.sievers@vrfy.org</email>
</author>
<published>2009-01-06T18:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3ada8b7e980dac7cc42937d42d90ee51b19204fe'/>
<id>3ada8b7e980dac7cc42937d42d90ee51b19204fe</id>
<content type='text'>
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add one-hit cache for disk partition lookup</title>
<updated>2008-12-29T07:29:51+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2008-10-24T10:52:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a6f23657d3072bde6844055bbc2290e497f33fbc'/>
<id>a6f23657d3072bde6844055bbc2290e497f33fbc</id>
<content type='text'>
disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: set disk-&gt;node_id before it's being used</title>
<updated>2008-12-03T11:41:20+00:00</updated>
<author>
<name>Cheng Renquan</name>
<email>crquan@gmail.com</email>
</author>
<published>2008-11-20T07:37:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bf91db18ac2852a3ff39fe25ff56c5557c0fff78'/>
<id>bf91db18ac2852a3ff39fe25ff56c5557c0fff78</id>
<content type='text'>
disk-&gt;node_id will be refered in allocating in disk_expand_part_tbl, so we
should set it before disk-&gt;node_id is refered.

Signed-off-by: Cheng Renquan &lt;crquan@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
disk-&gt;node_id will be refered in allocating in disk_expand_part_tbl, so we
should set it before disk-&gt;node_id is refered.

Signed-off-by: Cheng Renquan &lt;crquan@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash</title>
<updated>2008-11-18T14:08:56+00:00</updated>
<author>
<name>Zhang, Yanmin</name>
<email>yanmin_zhang@linux.intel.com</email>
</author>
<published>2008-11-14T07:26:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=561ec68e4de7947167937c49c451728e6b19e63b'/>
<id>561ec68e4de7947167937c49c451728e6b19e63b</id>
<content type='text'>
We run into system boot failure with kernel 2.6.28-rc. We found it on a
couple of machines, including T61 notebook, nehalem machine, and another
HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
With kernel prior to 2.6.28-rc, system boot doesn't fail.

I debug it and locate the root cause. Pls. see
http://bugzilla.kernel.org/show_bug.cgi?id=11899
https://bugzilla.redhat.com/show_bug.cgi?id=471517

As a matter of fact, there are 2 bugs.

1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
and fails once. nash has a bug. Some of its functions misuse return
value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
0 means nash gets an uevent, but the uevent isn't block-related (for
exmaple, usb). If by coincidence, kernel tells nash that uevents are
available, but kernel also set timeout, nash might stops collecting
other uevents in queue if current uevent isn't block-related.  I work
out a patch for nash to fix it.
http://bugzilla.kernel.org/attachment.cgi?id=18858

2) root=LABEL=/, system always can't boot. initrd init reports
switchroot fails. Here is an executation branch of nash when booting:
    (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
    (2) nash query /proc/devices with the major number; It found line
	"8 sd";
    (3) nash use 'sd' to search its own probe table to find device (DISK)
	type for the device and add it to its own list;
    (4) Later on, it probes all devices in its list to get filesystem
	labels; scsi register "8 sd" always.

When major is 259, nash fails to find the device(DISK) type. I enables
CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
for device /dev/sda1, which causes nash to fail to find device (DISK)
type.

To fixing issue 2), I create a patch for nash and another patch for
kernel.

http://bugzilla.kernel.org/attachment.cgi?id=18859
http://bugzilla.kernel.org/attachment.cgi?id=18837

Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
block device in proc/devices.

With 2 patches on nash and 1 patch on kernel, I boot my machines for
dozens of times without failure.

Signed-off-by Zhang Yanmin &lt;yanmin.zhang@linux.intel.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We run into system boot failure with kernel 2.6.28-rc. We found it on a
couple of machines, including T61 notebook, nehalem machine, and another
HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
With kernel prior to 2.6.28-rc, system boot doesn't fail.

I debug it and locate the root cause. Pls. see
http://bugzilla.kernel.org/show_bug.cgi?id=11899
https://bugzilla.redhat.com/show_bug.cgi?id=471517

As a matter of fact, there are 2 bugs.

1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
and fails once. nash has a bug. Some of its functions misuse return
value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
0 means nash gets an uevent, but the uevent isn't block-related (for
exmaple, usb). If by coincidence, kernel tells nash that uevents are
available, but kernel also set timeout, nash might stops collecting
other uevents in queue if current uevent isn't block-related.  I work
out a patch for nash to fix it.
http://bugzilla.kernel.org/attachment.cgi?id=18858

2) root=LABEL=/, system always can't boot. initrd init reports
switchroot fails. Here is an executation branch of nash when booting:
    (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
    (2) nash query /proc/devices with the major number; It found line
	"8 sd";
    (3) nash use 'sd' to search its own probe table to find device (DISK)
	type for the device and add it to its own list;
    (4) Later on, it probes all devices in its list to get filesystem
	labels; scsi register "8 sd" always.

When major is 259, nash fails to find the device(DISK) type. I enables
CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
for device /dev/sda1, which causes nash to fail to find device (DISK)
type.

To fixing issue 2), I create a patch for nash and another patch for
kernel.

http://bugzilla.kernel.org/attachment.cgi?id=18859
http://bugzilla.kernel.org/attachment.cgi?id=18837

Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
block device in proc/devices.

With 2 patches on nash and 1 patch on kernel, I boot my machines for
dozens of times without failure.

Signed-off-by Zhang Yanmin &lt;yanmin.zhang@linux.intel.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: move /proc/diskstats boilerplate to block/genhd.c</title>
<updated>2008-10-23T13:57:37+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2008-10-06T08:55:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=31d85ab28e71b0c938e0ef48af45747e80d99b53'/>
<id>31d85ab28e71b0c938e0ef48af45747e80d99b53</id>
<content type='text'>
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
