<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/block/genhd.c, branch v3.0-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>block: always allocate genhd-&gt;ev if check_events is implemented</title>
<updated>2011-05-26T19:06:50+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-05-26T19:06:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=75e3f3ee3c64968d42f4843ec49e579f84b5aa0c'/>
<id>75e3f3ee3c64968d42f4843ec49e579f84b5aa0c</id>
<content type='text'>
9fd097b149 (block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe
drivers) removed DISK_EVENT_MEDIA_CHANGE from legacy/fringe block
drivers which have inadequate -&gt;check_events().  Combined with earlier
change 7c88a168da (block: don't propagate unlisted DISK_EVENTs to
userland), this enables using -&gt;check_events() for internal processing
while avoiding enabling in-kernel block event polling which can lead
to infinite event loop.

Unfortunately, this made many drivers including floppy without any bit
set in disk-&gt;events and -&gt;async_events in which case disk_add_events()
simply skipped allocation of disk-&gt;ev, which disables whole event
handling.  As -&gt;check_events() is still used during open processing
for revalidation, this can lead to open failure.

This patch always allocates disk-&gt;ev if -&gt;check_events is implemented.
In the long term, it would make sense to simply include the event
structure inline into genhd as it's now used by virtually all block
devices.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Ondrej Zary &lt;linux@rainbow-software.org&gt;
Reported-by: Alex Villacis Lasso &lt;avillaci@ceibo.fiec.espol.edu.ec&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
9fd097b149 (block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe
drivers) removed DISK_EVENT_MEDIA_CHANGE from legacy/fringe block
drivers which have inadequate -&gt;check_events().  Combined with earlier
change 7c88a168da (block: don't propagate unlisted DISK_EVENTs to
userland), this enables using -&gt;check_events() for internal processing
while avoiding enabling in-kernel block event polling which can lead
to infinite event loop.

Unfortunately, this made many drivers including floppy without any bit
set in disk-&gt;events and -&gt;async_events in which case disk_add_events()
simply skipped allocation of disk-&gt;ev, which disables whole event
handling.  As -&gt;check_events() is still used during open processing
for revalidation, this can lead to open failure.

This patch always allocates disk-&gt;ev if -&gt;check_events is implemented.
In the long term, it would make sense to simply include the event
structure inline into genhd as it's now used by virtually all block
devices.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Ondrej Zary &lt;linux@rainbow-software.org&gt;
Reported-by: Alex Villacis Lasso &lt;avillaci@ceibo.fiec.espol.edu.ec&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: don't propagate unlisted DISK_EVENTs to userland</title>
<updated>2011-04-21T17:43:58+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-04-21T17:43:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7c88a168da8003fd4d8fb6ae103c4ecf29cb1130'/>
<id>7c88a168da8003fd4d8fb6ae103c4ecf29cb1130</id>
<content type='text'>
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk-&gt;events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk-&gt;events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix common misspellings</title>
<updated>2011-03-31T14:26:23+00:00</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@profusion.mobi</email>
</author>
<published>2011-03-31T01:57:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25985edcedea6396277003854657b5f3cb31a628'/>
<id>25985edcedea6396277003854657b5f3cb31a628</id>
<content type='text'>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Don't implicitly trigger event check on disk_unblock_events()</title>
<updated>2011-03-09T18:54:27+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-03-09T18:54:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=facc31ddc3570a3a0d8951c94f16b898e01b464d'/>
<id>facc31ddc3570a3a0d8951c94f16b898e01b464d</id>
<content type='text'>
Currently, disk_unblock_events() implicitly kick event check if the
block count reaches zero.  This behavior is not described in the
comment and hinders with future changes.  Make the unblocker
explicitly check events by calling disk_check_events() as necessary.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, disk_unblock_events() implicitly kick event check if the
block count reaches zero.  This behavior is not described in the
comment and hinders with future changes.  Make the unblocker
explicitly check events by calling disk_check_events() as necessary.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core</title>
<updated>2011-03-04T18:09:02+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-03-04T18:09:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e83a46bbb1d4c03defd733a64b727632a40059ad'/>
<id>e83a46bbb1d4c03defd733a64b727632a40059ad</id>
<content type='text'>
This merge creates two set of conflicts.  One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.

The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q-&gt;request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core.  The conflict
isn't trivial but the resolution is straight-forward.

* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
  should be called with @force_kblockd set to %true.

* elv_insert() in blk_kick_flush() should use
  %ELEVATOR_INSERT_REQUEUE.

Both changes are to avoid invoking -&gt;request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This merge creates two set of conflicts.  One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.

The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q-&gt;request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core.  The conflict
isn't trivial but the resolution is straight-forward.

* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
  should be called with @force_kblockd set to %true.

* elv_insert() in blk_kick_flush() should use
  %ELEVATOR_INSERT_REQUEUE.

Both changes are to avoid invoking -&gt;request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block/genhd: Change some numerals into macros</title>
<updated>2011-03-02T16:00:15+00:00</updated>
<author>
<name>Liu Yuan</name>
<email>tailai.ly@taobao.com</email>
</author>
<published>2011-03-02T16:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=53f22956effe1c9e7961b8c6e4362ecca5e460b7'/>
<id>53f22956effe1c9e7961b8c6e4362ecca5e460b7</id>
<content type='text'>
Rename the numerals in the diskstats_show() into the macros.

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Liu Yuan &lt;tailai.ly@taobao.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename the numerals in the diskstats_show() into the macros.

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Liu Yuan &lt;tailai.ly@taobao.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix over-zealous flush_disk when changing device size.</title>
<updated>2011-02-24T06:25:47+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2011-02-24T06:25:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=93b270f76e7ef3b81001576860c2701931cdc78b'/>
<id>93b270f76e7ef3b81001576860c2701931cdc78b</id>
<content type='text'>
There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Andrew Patterson &lt;andrew.patterson@hp.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Andrew Patterson &lt;andrew.patterson@hp.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core</title>
<updated>2011-01-13T13:47:54+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-01-13T13:47:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81c5e2ae33c4b19e53966b427e33646bf6811830'/>
<id>81c5e2ae33c4b19e53966b427e33646bf6811830</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add internal hd part table references</title>
<updated>2011-01-07T07:43:37+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-01-07T07:43:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c23a9681c0fe7fb7dd331b39dda11926f43746e'/>
<id>6c23a9681c0fe7fb7dd331b39dda11926f43746e</id>
<content type='text'>
We can't use krefs since it's apparently restricted to very basic
reference counting.

This reverts commit e4a683c8.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can't use krefs since it's apparently restricted to very basic
reference counting.

This reverts commit e4a683c8.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix accounting bug on cross partition merges</title>
<updated>2011-01-05T15:57:38+00:00</updated>
<author>
<name>Jerome Marchand</name>
<email>jmarchan@redhat.com</email>
</author>
<published>2011-01-05T15:57:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=09e099d4bafea3b15be003d548bdf94b4b6e0e17'/>
<id>09e099d4bafea3b15be003d548bdf94b4b6e0e17</id>
<content type='text'>
/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct-&gt;in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct-&gt;in_flight
   is 0 and sda2's one is 1.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct-&gt;in_flight are not changed.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct-&gt;in_flight, not a sda1's one, is decremented.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.

Signed-off-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct-&gt;in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct-&gt;in_flight
   is 0 and sda2's one is 1.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct-&gt;in_flight are not changed.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct-&gt;in_flight, not a sda1's one, is decremented.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.

Signed-off-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
