linux-stable.git/drivers/md/dm.c, branch linux-3.18.y

dm: remove duplicate dm_get_live_table() in __dm_destroy()

2018-11-22T06:32:45+00:00

[3.18.y only, to fix a previous patch]

__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) twice
which leads to a deadlock.  Remove taking lock before holding
suspend_lock to prevent a different potential deadlock.

Signed-off-by: Corey Wright 
Fixes: e1db66a5fdcc ("dm: fix AB-BA deadlock in __dm_destroy()")
Cc: Sasha Levin

dm: fix AB-BA deadlock in __dm_destroy()

2018-11-10T15:39:18+00:00

[ Upstream commit 2a708cff93f1845b9239bc7d6310aef54e716c6a ]

__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
suspend_lock in reverse order.  Doing so can cause AB-BA deadlock:

  __dm_destroy                    dm_swap_table
  ---------------------------------------------------
                                  mutex_lock(suspend_lock)
  dm_get_live_table()
    srcu_read_lock(io_barrier)
                                  dm_sync_table()
                                    synchronize_srcu(io_barrier)
                                      .. waiting for dm_put_live_table()
  mutex_lock(suspend_lock)
    .. waiting for suspend_lock

Fix this by taking the locks in proper order.

Signed-off-by: Jun'ichi Nomura 
Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
Acked-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

dm: fix race between dm_get_from_kobject() and __dm_destroy()

2017-11-30T08:35:49+00:00

commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream.

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [] dm_get_from_kobject+0x34/0x3a
     [] dm_attr_show+0x2b/0x5e
     [] ? mutex_lock+0x26/0x44
     [] sysfs_kf_seq_show+0x83/0xcf
     [] kernfs_seq_show+0x23/0x25
     [] seq_read+0x16f/0x325
     [] kernfs_fop_read+0x3a/0x13f
     [] __vfs_read+0x26/0x9d
     [] ? security_file_permission+0x3c/0x44
     [] ? rw_verify_area+0x83/0xd9
     [] vfs_read+0x8f/0xcf
     [] ? __fdget_pos+0x12/0x41
     [] SyS_read+0x4b/0x76
     [] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.

Signed-off-by: Hou Tao 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm: flush queued bios when process blocks to avoid deadlock

2017-04-18T05:55:51+00:00

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm snapshot: suspend merging snapshot when doing exception handover

2015-04-17T00:13:10+00:00

[ Upstream commit 09ee96b21456883e108c3b00597bb37ec512151b ]

The "dm snapshot: suspend origin when doing exception handover" commit
fixed a exception store handover bug associated with pending exceptions
to the "snapshot-origin" target.

However, a similar problem exists in snapshot merging.  When snapshot
merging is in progress, we use the target "snapshot-merge" instead of
"snapshot-origin".  Consequently, during exception store handover, we
must find the snapshot-merge target and suspend its associated
mapped_device.

To avoid lockdep warnings, the target must be suspended and resumed
without holding _origins_lock.

Introduce a dm_hold() function that grabs a reference on a
mapped_device, but unlike dm_get(), it doesn't crash if the device has
the DMF_FREEING flag set, it returns an error in this case.

In snapshot_resume() we grab the reference to the origin device using
dm_hold() while holding _origins_lock (_origins_lock guarantees that the
device won't disappear).  Then we release _origins_lock, suspend the
device and grab _origins_lock again.

NOTE to stable@ people:
When backporting to kernels 3.18 and older, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

dm snapshot: suspend origin when doing exception handover

2015-04-17T00:12:50+00:00

[ Upstream commit b735fede8d957d9d255e9c5cf3964cfa59799637 ]

In the function snapshot_resume we perform exception store handover.  If
there is another active snapshot target, the exception store is moved
from this target to the target that is being resumed.

The problem is that if there is some pending exception, it will point to
an incorrect exception store after that handover, causing a crash due to
dm-snap-persistent.c:get_exception()'s BUG_ON.

This bug can be triggered by repeatedly changing snapshot permissions
with "lvchange -p r" and "lvchange -p rw" while there are writes on the
associated origin device.

To fix this bug, we must suspend the origin device when doing the
exception store handover to make sure that there are no pending
exceptions:
- introduce _origin_hash that keeps track of dm_origin structures.
- introduce functions __lookup_dm_origin, __insert_dm_origin and
  __remove_dm_origin that manipulate the origin hash.
- modify snapshot_resume so that it calls dm_internal_suspend_fast() and
  dm_internal_resume_fast() on the origin device.

NOTE to stable@ people:

When backporting to kernels 3.12-3.18, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.

When backporting to kernels older than 3.12, you need to pick functions
dm_internal_suspend and dm_internal_resume from the commit
fd2ed4d252701d3bbed4cd3e3d267ad469bb832a.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

dm: hold suspend_lock while suspending device during device deletion

2015-04-17T00:11:45+00:00

[ Upstream commit ab7c7bb6f4ab95dbca96fcfc4463cd69843e3e24 ]

__dm_destroy() must take the suspend_lock so that its presuspend and
postsuspend calls do not race with an internal suspend.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

dm: fix a race condition in dm_get_md

2015-03-24T01:02:47+00:00

commit 2bec1f4a8832e74ebbe859f176d8a9cb20dd97f4 upstream.

The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.

dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.

There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.

To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm: fix missed error code if .end_io isn't implemented by target_type

2015-01-27T16:29:39+00:00

commit 5164bece1673cdf04782f8ed3fba70743700f5da upstream.

In bio-based DM's clone_endio(), when target_type doesn't implement
.end_io (e.g. linear) r will be always be initialized 0.  So if a
WRITE SAME bio fails WRITE SAME will not be disabled as intended.

Fix this by initializing r to error, rather than 0, in clone_endio().

Signed-off-by: Alex Chen 
Signed-off-by: Mike Snitzer 
Fixes: 7eee4ae2db ("dm: disable WRITE SAME if it fails")
Signed-off-by: Greg Kroah-Hartman

dm: allow active and inactive tables to share dm_devs

2014-10-06T00:03:35+00:00

Until this change, when loading a new DM table, DM core would re-open
all of the devices in the DM table.  Now, DM core will avoid redundant
device opens (and closes when destroying the old table) if the old
table already has a device open using the same mode.  This is achieved
by managing reference counts on the table_devices that DM core now
stores in the mapped_device structure (rather than in the dm_table
structure).  So a mapped_device's active and inactive dm_tables' dm_dev
lists now just point to the dm_devs stored in the mapped_device's
table_devices list.

This improvement in DM core's device reference counting has the
side-effect of fixing a long-standing limitation of the multipath
target: a DM multipath table couldn't include any paths that were unusable
(failed).  For example: if all paths have failed and you add a new,
working, path to the table; you can't use it since the table load would
fail due to it still containing failed paths.  Now a re-load of a
multipath table can include failed devices and when those devices become
active again they can be used instantly.

The device list code in dm.c isn't a straight copy/paste from the code in
dm-table.c, but it's very close (aside from some variable renames).  One
subtle difference is that find_table_device for the tables_devices list
will only match devices with the same name and mode.  This is because we
don't want to upgrade a device's mode in the active table when an
inactive table is loaded.

Access to the mapped_device structure's tables_devices list requires a
mutex (tables_devices_lock), so that tables cannot be created and
destroyed concurrently.

Signed-off-by: Benjamin Marzinski 
Signed-off-by: Mike Snitzer