linux-stable.git/drivers/md/raid1.c, branch linux-3.17.y

md/raid1: fix_read_error should act on all non-faulty devices.

2014-09-22T01:26:01+00:00

If a devices is being recovered it is not InSync and is not Faulty.

If a read error is experienced on that device, fix_read_error()
will be called, but it ignores non-InSync devices.  So it will
neither fix the error nor fail the device.

It is incorrect that fix_read_error() ignores non-InSync devices.
It should only ignore Faulty devices.  So fix it.

This became a bug when we allowed reading from a device that was being
recovered.  It is suitable for any subsequent -stable kernel.

Fixes: da8840a747c0dbf49506ec906757a6b87b9741e9
Cc: stable@vger.kernel.org (v3.5+)
Reported-by: Alexander Lyakas 
Tested-by: Alexander Lyakas 
Signed-off-by: NeilBrown

md/raid1: count resync requests in nr_pending.

2014-09-22T01:26:01+00:00

Both normal IO and resync IO can be retried with reschedule_retry()
and so be counted into ->nr_queued, but only normal IO gets counted in
->nr_pending.

Before the recent improvement to RAID1 resync there could only
possibly have been one or the other on the queue.  When handling a
read failure it could only be normal IO.  So when handle_read_error()
called freeze_array() the fact that freeze_array only compares
->nr_queued against ->nr_pending was safe.

But now that these two types can interleave, we can have both normal
and resync IO requests queued, so we need to count them both in
nr_pending.

This error can lead to freeze_array() hanging if there is a read
error, so it is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Reported-by: Brassow Jonathan 
Signed-off-by: NeilBrown

md/raid1: update next_resync under resync_lock.

2014-09-22T01:26:01+00:00

raise_barrier() uses next_resync as part of its calculations, so it
really should be updated first, instead of afterwards.

next_resync is always used under resync_lock so update it under
resync lock to, just before it is used.  That is safest.

This could cause normal IO and resync IO to interact badly so
it suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Signed-off-by: NeilBrown

md/raid1: Don't use next_resync to determine how far resync has progressed

2014-09-22T01:26:01+00:00

next_resync is (approximately) the location for the next resync request.
However it does *not* reliably determine the earliest location
at which resync might be happening.
This is because resync requests can complete out of order, and
we only limit the number of current requests, not the distance
from the earliest pending request to the latest.

mddev->curr_resync_completed is a reliable indicator of the earliest
position at which resync could be happening.   It is updated less
frequently, but is actually reliable which is more important.

So use it to determine if a write request is before the region
being resynced and so safe from conflict.

This error can allow resync IO to interfere with normal IO which
could lead to data corruption. Hence: stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Signed-off-by: NeilBrown

md/raid1: make sure resync waits for conflicting writes to complete.

2014-09-22T01:26:01+00:00

The resync/recovery process for raid1 was recently changed
so that writes could happen in parallel with resync providing
they were in different regions of the device.

There is a problem though:  While a write request will always
wait for conflicting resync to complete, a resync request
will *not* always wait for conflicting writes to complete.

Two changes are needed to fix this:

1/ raise_barrier (which waits until it is safe to do resync)
   must wait until current_window_requests is zero
2/ wait_battier (which waits at the start of a new write request)
   must update current_window_requests if the request could
   possible conflict with a concurrent resync.

As concurrent writes and resync can lead to data loss,
this patch is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Cc: stable@vger.kernel.org (v3.13+)
Cc: majianpeng 
Signed-off-by: NeilBrown

md/raid1: clean up request counts properly in close_sync()

2014-09-22T01:26:01+00:00

If there are outstanding writes when close_sync is called,
the change to ->start_next_window might cause them to
decrement the wrong counter when they complete.  Fix this
by merging the two counters into the one that will be decremented.

Having an incorrect value in a counter can cause raise_barrier()
to hangs, so this is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Signed-off-by: NeilBrown

md/raid1: be more cautious where we read-balance during resync.

2014-09-22T00:26:41+00:00

commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 made
it possible for reads to happen concurrently with resync.
This means that we need to be more careful where read_balancing
is allowed during resync - we can no longer be sure that any
resync that has already started will definitely finish.

So keep read_balancing to before recovery_cp, which is conservative
but safe.

This bug makes it possible to read from a device that doesn't
have up-to-date data, so it can cause data corruption.
So it is suitable for any kernel since 3.11.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Signed-off-by: NeilBrown

md/raid1: intialise start_next_window for READ case to avoid hang

2014-09-22T00:18:03+00:00

r1_bio->start_next_window is not initialised in the READ
case, so allow_barrier may incorrectly decrement
   conf->current_window_requests
which can cause raise_barrier() to block forever.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.13+)
Reported-by: Brassow Jonathan 
Signed-off-by: NeilBrown

md/raid1,raid10: always abort recover on write error.

2014-07-31T00:16:52+00:00

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui 
Reported-and-tested-by: jiao hui 
Signed-off-by: NeilBrown 
Cc: stable@vger.kernel.org

md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails.

2014-04-09T04:42:23+00:00

When performing a user-request check/repair (MD_RECOVERY_REQUEST is set)
on a raid1, we allocate multiple bios each with their own set of pages.

If the page allocations for one bio fails, we currently do *not* free
the pages allocated for the previous bios, nor do we free the bio itself.

This patch frees all the already-allocate pages, and makes sure that
all the bios are freed as well.

This bug can cause a memory leak which can ultimately OOM a machine.
It was introduced in 3.10-rc1.

Fixes: a07876064a0b73ab5ef1ebcf14b1cf0231c07858
Cc: Kent Overstreet 
Cc: stable@vger.kernel.org (3.10+)
Reported-by: Russell King - ARM Linux 
Signed-off-by: NeilBrown