<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/raid10.c, branch v2.6.37</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>md: protect against NULL reference when waiting to start a raid10.</title>
<updated>2010-12-09T06:02:14+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-12-09T06:02:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=589a594be1fb8815b3f18e517be696c48664f728'/>
<id>589a594be1fb8815b3f18e517be696c48664f728</id>
<content type='text'>
When we fail to start a raid10 for some reason, we call
md_unregister_thread to kill the thread that was created.

Unfortunately md_thread() will then make one call into the handler
(raid10d) even though md_wakeup_thread has not been called.  This is
not safe and as md_unregister_thread is called after mddev-&gt;private
has been set to NULL, it will definitely cause a NULL dereference.

So fix this at both ends:
 - md_thread should only call the handler if THREAD_WAKEUP has been
   set.
 - raid10 should call md_unregister_thread before setting things
   to NULL just like all the other raid modules do.

This is applicable to 2.6.35 and later.

Cc: stable@kernel.org
Reported-by: "Citizen" &lt;citizen_lee@thecus.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we fail to start a raid10 for some reason, we call
md_unregister_thread to kill the thread that was created.

Unfortunately md_thread() will then make one call into the handler
(raid10d) even though md_wakeup_thread has not been called.  This is
not safe and as md_unregister_thread is called after mddev-&gt;private
has been set to NULL, it will definitely cause a NULL dereference.

So fix this at both ends:
 - md_thread should only call the handler if THREAD_WAKEUP has been
   set.
 - raid10 should call md_unregister_thread before setting things
   to NULL just like all the other raid modules do.

This is applicable to 2.6.35 and later.

Cc: stable@kernel.org
Reported-by: "Citizen" &lt;citizen_lee@thecus.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use separate bio pool for each md device.</title>
<updated>2010-10-28T06:36:15+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-10-26T07:31:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a167f663243662aa9153c01086580a11cde9ffdc'/>
<id>a167f663243662aa9153c01086580a11cde9ffdc</id>
<content type='text'>
bio_clone and bio_alloc allocate from a common bio pool.
If an md device is stacked with other devices that use this pool, or under
something like swap which uses the pool, then the multiple calls on
the pool can cause deadlocks.

So allocate a local bio pool for each md array and use that rather
than the common pool.

This pool is used both for regular IO and metadata updates.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bio_clone and bio_alloc allocate from a common bio pool.
If an md device is stacked with other devices that use this pool, or under
something like swap which uses the pool, then the multiple calls on
the pool can cause deadlocks.

So allocate a local bio pool for each md array and use that rather
than the common pool.

This pool is used both for regular IO and metadata updates.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: change type of first arg to sync_page_io.</title>
<updated>2010-10-28T06:36:11+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-10-27T04:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2b193363ef68667ad717a6723165e0dccf99470f'/>
<id>2b193363ef68667ad717a6723165e0dccf99470f</id>
<content type='text'>
Currently sync_page_io takes a 'bdev'.
Every caller passes 'rdev-&gt;bdev'.
We will soon want another field out of the rdev in sync_page_io,
So just pass the rdev instead of the bdev out of it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently sync_page_io takes a 'bdev'.
Every caller passes 'rdev-&gt;bdev'.
We will soon want another field out of the rdev in sync_page_io,
So just pass the rdev instead of the bdev out of it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use bio_kmalloc rather than bio_alloc when failure is acceptable.</title>
<updated>2010-10-28T06:36:06+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-10-26T06:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6746557f0325a66f57d179126426e38a8ea66945'/>
<id>6746557f0325a66f57d179126426e38a8ea66945</id>
<content type='text'>
bio_alloc can never fail (as it uses a mempool) but an block
indefinitely, especially if the caller is holding a reference to a
previously allocated bio.

So these to places which both handle failure and hold multiple bios
should not use bio_alloc, they should use bio_kmalloc.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bio_alloc can never fail (as it uses a mempool) but an block
indefinitely, especially if the caller is holding a reference to a
previously allocated bio.

So these to places which both handle failure and hold multiple bios
should not use bio_alloc, they should use bio_kmalloc.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Fix possible deadlock with multiple mempool allocations.</title>
<updated>2010-10-28T06:34:07+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-10-19T01:54:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e78064f42ad474ce9c31760861f7fb0cfc22532'/>
<id>4e78064f42ad474ce9c31760861f7fb0cfc22532</id>
<content type='text'>
It is not safe to allocate from a mempool while holding an item
previously allocated from that mempool as that can deadlock when the
mempool is close to exhaustion.

So don't use a bio list to collect the bios to write to multiple
devices in raid1 and raid10.
Instead queue each bio as it becomes available so an unplug will
activate all previously allocated bios and so a new bio has a chance
of being allocated.

This means we must set the 'remaining' count to '1' before submitting
any requests, then when all are submitted, decrement 'remaining' and
possible handle the write completion at that point.

Reported-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Tested-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is not safe to allocate from a mempool while holding an item
previously allocated from that mempool as that can deadlock when the
mempool is close to exhaustion.

So don't use a bio list to collect the bios to write to multiple
devices in raid1 and raid10.
Instead queue each bio as it becomes available so an unplug will
activate all previously allocated bios and so a new bio has a chance
of being allocated.

This means we must set the 'remaining' count to '1' before submitting
any requests, then when all are submitted, decrement 'remaining' and
possible handle the write completion at that point.

Reported-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Tested-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use sector_t in bitmap_get_counter</title>
<updated>2010-10-28T06:32:26+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-10-18T23:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=57dab0bdf689d42972975ec646d862b0900a4bf3'/>
<id>57dab0bdf689d42972975ec646d862b0900a4bf3</id>
<content type='text'>
bitmap_get_counter returns the number of sectors covered
by the counter in a pass-by-reference variable.
In some cases this can be very large, so make it a sector_t
for safety.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bitmap_get_counter returns the number of sectors covered
by the counter in a pass-by-reference variable.
In some cases this can be very large, so make it a sector_t
for safety.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: implment REQ_FLUSH/FUA support</title>
<updated>2010-09-10T10:35:38+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-09-03T09:56:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e9c7469bb4f502dafc092166201bea1ad5fc0fbf'/>
<id>e9c7469bb4f502dafc092166201bea1ad5fc0fbf</id>
<content type='text'>
This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER.  In the core part (md.c), the following
changes are notable.

* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
  processing of other requests and thus there is no reason to mark the
  queue congested while FLUSH/FUA is in progress.

* REQ_FLUSH/FUA failures are final and its users don't need retry
  logic.  Retry logic is removed.

* Preflush needs to be issued to all member devices but FUA writes can
  be handled the same way as other writes - their processing can be
  deferred to request_queue of member devices.  md_barrier_request()
  is renamed to md_flush_request() and simplified accordingly.

For linear, raid0 and multipath, the core changes are enough.  raid1,
5 and 10 need the following conversions.

* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
  request_queues of member devices.  Barrier related logic removed.

* raid5: Queue draining logic dropped.  FUA bit is propagated through
  biodrain and stripe resconstruction such that all the updated parts
  of the stripe are written out with FUA writes if any of the dirtying
  writes was FUA.  preread_active_stripes handling in make_request()
  is updated as suggested by Neil Brown.

* raid10: FUA bit needs to be propagated to write clones.

linear, raid0, 1, 5 and 10 tested.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER.  In the core part (md.c), the following
changes are notable.

* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
  processing of other requests and thus there is no reason to mark the
  queue congested while FLUSH/FUA is in progress.

* REQ_FLUSH/FUA failures are final and its users don't need retry
  logic.  Retry logic is removed.

* Preflush needs to be issued to all member devices but FUA writes can
  be handled the same way as other writes - their processing can be
  deferred to request_queue of member devices.  md_barrier_request()
  is renamed to md_flush_request() and simplified accordingly.

For linear, raid0 and multipath, the core changes are enough.  raid1,
5 and 10 need the following conversions.

* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
  request_queues of member devices.  Barrier related logic removed.

* raid5: Queue draining logic dropped.  FUA bit is propagated through
  biodrain and stripe resconstruction such that all the updated parts
  of the stripe are written out with FUA writes if any of the dirtying
  writes was FUA.  preread_active_stripes handling in make_request()
  is updated as suggested by Neil Brown.

* raid10: FUA bit needs to be propagated to write clones.

linear, raid0, 1, 5 and 10 tested.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md raid-1/10 Fix bio_rw bit manipulations again</title>
<updated>2010-08-18T06:16:05+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-08-18T06:16:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2c7d46ec192e4f2b350f67a0e185b9bce646cd6b'/>
<id>2c7d46ec192e4f2b350f67a0e185b9bce646cd6b</id>
<content type='text'>
commit 7b6d91daee5cac6402186ff224c3af39d79f4a0e changed the behaviour
of a few variables in raid1 and raid10 from flags to bit-sets, but
left them as type 'bool' so they did not work.

Change them (back) to unsigned long.
(historical note: see 1ef04fefe2241087d9db7e9615c3f11b516e36cf)

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reported-by: Jiri Slaby &lt;jslaby@suse.cz&gt; and many others
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7b6d91daee5cac6402186ff224c3af39d79f4a0e changed the behaviour
of a few variables in raid1 and raid10 from flags to bit-sets, but
left them as type 'bool' so they did not work.

Change them (back) to unsigned long.
(historical note: see 1ef04fefe2241087d9db7e9615c3f11b516e36cf)

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reported-by: Jiri Slaby &lt;jslaby@suse.cz&gt; and many others
</pre>
</div>
</content>
</entry>
<entry>
<title>md: provide appropriate return value for spare_active functions.</title>
<updated>2010-08-18T02:04:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-08-18T01:56:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b9656205469269c050963c71fca1998b247a560'/>
<id>6b9656205469269c050963c71fca1998b247a560</id>
<content type='text'>
md_check_recovery expects -&gt;spare_active to return 'true' if any
spares were activated, but none of them do, so the consequent change
in 'degraded' is not notified through sysfs.

So count the number of spares activated, subtract it from 'degraded'
just once, and return it.

Reported-by: Adrian Drzewiecki &lt;adriand@vmware.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
md_check_recovery expects -&gt;spare_active to return 'true' if any
spares were activated, but none of them do, so the consequent change
in 'degraded' is not notified through sysfs.

So count the number of spares activated, subtract it from 'degraded'
just once, and return it.

Reported-by: Adrian Drzewiecki &lt;adriand@vmware.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Notify sysfs when RAID1/5/10 disk is In_sync.</title>
<updated>2010-08-18T01:49:02+00:00</updated>
<author>
<name>Adrian Drzewiecki</name>
<email>adriand@vmware.com</email>
</author>
<published>2010-08-18T01:49:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e6ffbcb6cd0ac471223df24ae77eb486c1ee68cc'/>
<id>e6ffbcb6cd0ac471223df24ae77eb486c1ee68cc</id>
<content type='text'>
When RAID1 is done syncing disks, it'll update the state
of synced rdevs to In_sync. But it neglected to notify
sysfs that the attribute changed. So any programs that
are waiting for an rdev's state to change will not be
woken.

(raid5/raid10 added by neilb)

Signed-off-by: Adrian Drzewiecki &lt;adriand@vmware.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When RAID1 is done syncing disks, it'll update the state
of synced rdevs to In_sync. But it neglected to notify
sysfs that the attribute changed. So any programs that
are waiting for an rdev's state to change will not be
woken.

(raid5/raid10 added by neilb)

Signed-off-by: Adrian Drzewiecki &lt;adriand@vmware.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
