linux-stable.git/drivers/md/raid5.c, branch linux-3.16.y

md/raid: raid5 preserve the writeback action after the parity check

2019-11-22T15:57:32+00:00

commit b2176a1dfb518d870ee073445d27055fea64dfb8 upstream.

The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.

Note: fixed warning reported by kbuild test robot 

Signed-off-by: Dan Williams 
Signed-off-by: Nigel Croxon 
Signed-off-by: Song Liu 
Signed-off-by: Ben Hutchings

md: Fix failed allocation of md_register_thread

2019-07-09T21:04:16+00:00

commit e406f12dde1a8375d77ea02d91f313fb1a9c6aec upstream.

mddev->sync_thread can be set to NULL on kzalloc failure downstream.
The patch checks for such a scenario and frees allocated resources.

Committer node:

Added similar fix to raid5.c, as suggested by Guoqing.

Acked-by: Guoqing Jiang 
Signed-off-by: Aditya Pakki 
Signed-off-by: Song Liu 
Signed-off-by: Ben Hutchings

md/raid5: add thread_group worker async_tx_issue_pending_all

2017-11-11T13:33:07+00:00

commit 7e96d559634b73a8158ee99a7abece2eacec2668 upstream.

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.

Signed-off-by: Ofer Heifetz 
Signed-off-by: Shaohua Li 
Signed-off-by: Ben Hutchings

Raid5 should update rdev->sectors after reshape

2017-11-11T13:32:53+00:00

commit b5d27718f38843a74552e9a93d32e2391fd3999f upstream.

The raid5 md device is created by the disks which we don't use the total size. For example,
the size of the device is 5G and it just uses 3G of the devices to create one raid5 device.
Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid
and assemble it again. It fails.
mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean
mdadm /dev/md0 --grow --chunk=64
wait reshape to finish
mdadm -S /dev/md0
mdadm -As
The error messages:
[197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing!
[197519.821686] md: md_import_device returned -22

After reshape the data offset is changed. It selects backwards direction in this condition.
In function super_1_load it compares the available space of the underlying device with
sb->data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL.
rdev->sectors is updated in md_finish_reshape. Then sb->data_size is set in super_1_sync based
on rdev->sectors. So add md_finish_reshape in end_reshape.

Signed-off-by: Xiao Ni 
Acked-by: Guoqing Jiang 
Signed-off-by: Shaohua Li 
Signed-off-by: Ben Hutchings

md: don't use flush_signals in userspace processes

2017-10-12T14:27:48+00:00

commit f9c79bc05a2a91f4fba8bfd653579e066714b1ec upstream.

The function flush_signals clears all pending signals for the process. It
may be used by kernel threads when we need to prepare a kernel thread for
responding to signals. However using this function for an userspaces
processes is incorrect - clearing signals without the program expecting it
can cause misbehavior.

The raid1 and raid5 code uses flush_signals in its request routine because
it wants to prepare for an interruptible wait. This patch drops
flush_signals and uses sigprocmask instead to block all signals (including
SIGKILL) around the schedule() call. The signals are not lost, but the
schedule() call won't respond to them.

Signed-off-by: Mikulas Patocka 
Acked-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Ben Hutchings

md/raid5: limit request size according to implementation limits

2017-03-16T02:26:26+00:00

commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5 upstream.

Current implementation employ 16bit counter of active stripes in lower
bits of bio->bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov 
Cc: Shaohua Li 
Cc: Neil Brown 
Signed-off-by: Shaohua Li 
Signed-off-by: Ben Hutchings

md/raid5: Compare apples to apples (or sectors to sectors)

2016-04-30T22:05:51+00:00

commit e7597e69dec59b65c5525db1626b9d34afdfa678 upstream.

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Signed-off-by: Jes Sorensen 
Signed-off-by: Shaohua Li 
Signed-off-by: Ben Hutchings

md/raid5: fix locking in handle_stripe_clean_event()

2015-11-16T11:27:16+00:00

commit b8a9d66d043ffac116100775a469f05f5158c16f upstream.

After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()")
__find_stripe() is called under conf->hash_locks + hash.
But handle_stripe_clean_event() calls remove_hash() under
conf->device_lock.

Under some cirscumstances the hash chain can be circuited,
and we get an infinite loop with disabled interrupts and locked hash
lock in __find_stripe(). This leads to hard lockup on multiple CPUs
and following system crash.

I was able to reproduce this behavior on raid6 over 6 ssd disks.
The devices_handle_discard_safely option should be set to enable trim
support. The following script was used:

for i in `seq 1 32`; do
    dd if=/dev/zero of=large$i bs=10M count=100 &
done

Signed-off-by: Roman Gushchin 
Fixes: 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()")
Signed-off-by: NeilBrown 
Cc: Shaohua Li 
[ luis: backported to 3.16: used Roman's backport to 3.14 ]
Signed-off-by: Luis Henriques

md/raid5: don't record new size if resize_stripes fails.

2015-05-28T09:00:00+00:00

commit 6e9eac2dcee5e19f125967dd2be3e36558c42fff upstream.

If any memory allocation in resize_stripes fails we will return
-ENOMEM, but in some cases we update conf->pool_size anyway.

This means that if we try again, the allocations will be assumed
to be larger than they are, and badness results.

So only update pool_size if there is no error.

This bug was introduced in 2.6.17 and the patch is suitable for
-stable.

Fixes: ad01c9e3752f ("[PATCH] md: Allow stripes to be expanded in preparation for expanding an array")
Signed-off-by: NeilBrown 
Signed-off-by: Luis Henriques

md/raid5: Fix livelock when array is both resyncing and degraded.

2015-03-02T15:04:30+00:00

commit 26ac107378c4742978216be1005b7291b799c7b2 upstream.

Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f:
  md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.

Causes an RCW cycle to be forced even when the array is degraded.
A degraded array cannot support RCW as that requires reading all data
blocks, and one may be missing.

Forcing an RCW when it is not possible causes a live-lock and the code
spins, repeatedly deciding to do something that cannot succeed.

So change the condition to only force RCW on non-degraded arrays.

Reported-by: Manibalan P 
Bisected-by: Jes Sorensen 
Tested-by: Jes Sorensen 
Signed-off-by: NeilBrown 
Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f
Signed-off-by: Luis Henriques