linux-stable.git/drivers/md, branch v3.18.48

dm space map metadata: fix 'struct sm_metadata' leak on failed create

2017-01-15T14:49:51+00:00

[ Upstream commit 314c25c56c1ee5026cf99c570bdfe01847927acb ]

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.

Signed-off-by: Benjamin Marzinski 
Acked-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

md/raid5: limit request size according to implementation limits

2017-01-15T14:49:50+00:00

[ Upstream commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5 ]

Current implementation employ 16bit counter of active stripes in lower
bits of bio->bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov 
Cc: Shaohua Li 
Cc: Neil Brown 
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

dm crypt: mark key as invalid until properly loaded

2017-01-15T14:49:49+00:00

[ Upstream commit 265e9098bac02bc5e36cda21fdbad34cb5b2f48d ]

In crypt_set_key(), if a failure occurs while replacing the old key
(e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
set.  Otherwise, the crypto layer would have an invalid key that still
has DM_CRYPT_KEY_VALID flag set.

Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Kozina 
Reviewed-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin

md: be careful not lot leak internal curr_resync value into metadata. -- (all)

2016-11-24T04:09:03+00:00

[ Upstream commit 1217e1d1999ed6c9c1e1b1acae0a74ab70464ae2 ]

mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.

 1 - means that the array is trying to start a resync, but has yielded
     to another array which shares physical devices, and also needs to
     start a resync
 2 - means the array is trying to start resync, but has found another
     array which shares physical devices and has already started resync.

 3 - means that resync has commensed, but it is possible that nothing
     has actually been resynced yet.

It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint.  In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.

There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.

Change them to only propagate the value if it is > 3.

As this can cause an array to fail, the patch is suitable for -stable.

Cc: stable@vger.kernel.org (v3.7+)
Reported-by: Viswesh 
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

md: sync sync_completed has correct value as recovery finishes.

2016-11-24T04:08:55+00:00

[ Upstream commit 5ed1df2eacc0ba92c8c7e2499c97594b5ef928a8 ]

There can be a small window between the moment that recovery
actually writes the last block and the time when various sysfs
and /proc/mdstat attributes report that it has finished.
During this time, 'sync_completed' can have the wrong value.
This can confuse monitoring software.

So:
 - don't set curr_resync_completed beyond the end of the devices,
 - set it correctly when resync/recovery has completed.

Signed-off-by: NeilBrown 
Signed-off-by: Sasha Levin

dm table: fix missing dm_put_target_type() in dm_table_add_target()

2016-11-24T03:34:22+00:00

[ Upstream commit dafa724bf582181d9a7d54f5cb4ca0bf8ef29269 ]

dm_get_target_type() was previously called so any error returned from
dm_table_add_target() must first call dm_put_target_type().  Otherwise
the DM target module's reference count will leak and the associated
kernel module will be unable to be removed.

Also, leverage the fact that r is already -EINVAL and remove an extra
newline.

Fixes: 36a0456 ("dm table: add immutable feature")
Fixes: cc6cbe1 ("dm table: add always writeable feature")
Fixes: 3791e2f ("dm table: add singleton feature")
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: tang.junhui 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin

dm crypt: fix free of bad values after tfm allocation failure

2016-09-15T22:54:06+00:00

[ Upstream commit 5d0be84ec0cacfc7a6d6ea548afdd07d481324cd ]

If crypt_alloc_tfms() had to allocate multiple tfms and it failed before
the last allocation, then it would call crypt_free_tfms() and could free
pointers from uninitialized memory -- due to the crypt_free_tfms() check
for non-zero cc->tfms[i].  Fix by allocating zeroed memory.

Signed-off-by: Eric Biggers 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

dm crypt: fix error with too large bios

2016-09-15T22:54:06+00:00

[ Upstream commit 4e870e948fbabf62b78e8410f04c67703e7c816b ]

When dm-crypt processes writes, it allocates a new bio in
crypt_alloc_buffer().  The bio is allocated from a bio set and it can
have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
larger (e.g. if it was allocated by bcache).  If the incoming bio is
larger, bio_alloc_bioset() fails and an error is returned.

To avoid the error, we test for a too large bio in the function
crypt_map() and use dm_accept_partial_bio() to split the bio.
dm_accept_partial_bio() trims the current bio to the desired size and
asks DM core to send another bio with the rest of the data.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Sasha Levin

dm flakey: fix reads to be issued if drop_writes configured

2016-09-01T02:05:44+00:00

[ Upstream commit 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc ]

v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.

2016-09-01T02:05:44+00:00

[ Upstream commit acc9cf8c66c66b2cbbdb4a375537edee72be64df ]

This patch fixes a cachedev registration-time allocation deadlock.
This can deadlock on boot if your initrd auto-registeres bcache devices:

Allocator thread:
[  720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
[  720.732361]  [] schedule+0x37/0x90
[  720.732963]  [] bch_bucket_alloc+0x188/0x360 [bcache]
[  720.733538]  [] ? prepare_to_wait_event+0xf0/0xf0
[  720.734137]  [] bch_prio_write+0x19d/0x340 [bcache]
[  720.734715]  [] bch_allocator_thread+0x3ff/0x470 [bcache]
[  720.735311]  [] ? __schedule+0x2dc/0x950
[  720.735884]  [] ? invalidate_buckets+0x980/0x980 [bcache]

Registration thread:
[  720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
[  720.715226]  [] schedule+0x37/0x90
[  720.715805]  [] __bch_btree_map_nodes+0x12d/0x150 [bcache]
[  720.716409]  [] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
[  720.717008]  [] bch_btree_insert+0xf4/0x170 [bcache]
[  720.717586]  [] ? prepare_to_wait_event+0xf0/0xf0
[  720.718191]  [] bch_journal_replay+0x14a/0x290 [bcache]
[  720.718766]  [] ? ttwu_do_activate.constprop.94+0x5d/0x70
[  720.719369]  [] ? try_to_wake_up+0x1d4/0x350
[  720.719968]  [] run_cache_set+0x580/0x8e0 [bcache]
[  720.720553]  [] register_bcache+0xe2e/0x13b0 [bcache]
[  720.721153]  [] kobj_attr_store+0xf/0x20
[  720.721730]  [] sysfs_kf_write+0x3d/0x50
[  720.722327]  [] kernfs_fop_write+0x12a/0x180
[  720.722904]  [] __vfs_write+0x37/0x110
[  720.723503]  [] ? __sb_start_write+0x58/0x110
[  720.724100]  [] ? security_file_permission+0x23/0xa0
[  720.724675]  [] vfs_write+0xa9/0x1b0
[  720.725275]  [] ? do_audit_syscall_entry+0x6c/0x70
[  720.725849]  [] SyS_write+0x55/0xd0
[  720.726451]  [] ? do_page_fault+0x30/0x80
[  720.727045]  [] system_call_fastpath+0x12/0x71

The fifo code in upstream bcache can't use the last element in the buffer,
which was the cause of the bug: if you asked for a power of two size,
it'd give you a fifo that could hold one less than what you asked for
rather than allocating a buffer twice as big.

Signed-off-by: Kent Overstreet 
Tested-by: Eric Wheeler 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin