<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/dm-cache-target.c, branch v3.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dm cache: fix race causing dirty blocks to be marked as clean</title>
<updated>2014-09-10T15:20:47+00:00</updated>
<author>
<name>Anssi Hannula</name>
<email>anssi.hannula@iki.fi</email>
</author>
<published>2014-09-05T00:11:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=40aa978eccec61347cd47b97c598df49acde8be5'/>
<id>40aa978eccec61347cd47b97c598df49acde8be5</id>
<content type='text'>
When a writeback or a promotion of a block is completed, the cell of
that block is removed from the prison, the block is marked as clean, and
the clear_dirty() callback of the cache policy is called.

Unfortunately, performing those actions in this order allows an incoming
new write bio for that block to come in before clearing the dirty status
is completed and therefore possibly causing one of these two scenarios:

Scenario A:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called,
.                               but block already dirty
.                               =&gt; it does nothing
clear_dirty()                 .
- block marked clean          .
- policy clear_dirty() called .

Result: Block is marked clean even though it is actually dirty. No
writeback will occur.

Scenario B:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
clear_dirty()                 .
- block marked clean          .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called
.                             - block marked dirty
.                             - policy set_dirty() called
- policy clear_dirty() called .

Result: Block is properly marked as dirty, but policy thinks it is clean
and therefore never asks us to writeback it.
This case is visible in "dmsetup status" dirty block count (which
normally decreases to 0 on a quiet device).

Fix these issues by calling clear_dirty() before calling cell_defer().
Incoming bios for that block will then be detained in the cell and
released only after clear_dirty() has completed, so the race will not
occur.

Found by inspecting the code after noticing spurious dirty counts
(scenario B).

Signed-off-by: Anssi Hannula &lt;anssi.hannula@iki.fi&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a writeback or a promotion of a block is completed, the cell of
that block is removed from the prison, the block is marked as clean, and
the clear_dirty() callback of the cache policy is called.

Unfortunately, performing those actions in this order allows an incoming
new write bio for that block to come in before clearing the dirty status
is completed and therefore possibly causing one of these two scenarios:

Scenario A:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called,
.                               but block already dirty
.                               =&gt; it does nothing
clear_dirty()                 .
- block marked clean          .
- policy clear_dirty() called .

Result: Block is marked clean even though it is actually dirty. No
writeback will occur.

Scenario B:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
clear_dirty()                 .
- block marked clean          .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called
.                             - block marked dirty
.                             - policy set_dirty() called
- policy clear_dirty() called .

Result: Block is properly marked as dirty, but policy thinks it is clean
and therefore never asks us to writeback it.
This case is visible in "dmsetup status" dirty block count (which
normally decreases to 0 on a quiet device).

Fix these issues by calling clear_dirty() before calling cell_defer().
Incoming bios for that block will then be detained in the cell and
released only after clear_dirty() has completed, so the race will not
occur.

Found by inspecting the code after noticing spurious dirty counts
(scenario B).

Signed-off-by: Anssi Hannula &lt;anssi.hannula@iki.fi&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: set minimum_io_size to cache's data block size</title>
<updated>2014-08-01T16:30:36+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2014-07-19T17:25:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b02465308f13f3266480275fd143747bfd441a2f'/>
<id>b02465308f13f3266480275fd143747bfd441a2f</id>
<content type='text'>
Before, if the block layer's limit stacking didn't establish an
optimal_io_size that was compatible with the cache's data block size
we'd set optimal_io_size to the data block size and minimum_io_size to 0
(which the block layer adjusts to be physical_block_size).

Update cache_io_hints() to set both minimum_io_size and optimal_io_size
to the cache's data block size.  This fixes an issue where mkfs.xfs
would create more XFS Allocation Groups on cache volumes than on a
normal linear LV of comparable size.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before, if the block layer's limit stacking didn't establish an
optimal_io_size that was compatible with the cache's data block size
we'd set optimal_io_size to the data block size and minimum_io_size to 0
(which the block layer adjusts to be physical_block_size).

Update cache_io_hints() to set both minimum_io_size and optimal_io_size
to the cache's data block size.  This fixes an issue where mkfs.xfs
would create more XFS Allocation Groups on cache volumes than on a
normal linear LV of comparable size.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache metadata: use dm-space-map-metadata.h defined size limits</title>
<updated>2014-08-01T16:30:33+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2014-07-14T19:37:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=895b47d7989af3aacea16380b190b1bb8f046362'/>
<id>895b47d7989af3aacea16380b190b1bb8f046362</id>
<content type='text'>
Commit 7d48935e cleaned up the persistent-data's space-map-metadata
limits by elevating them to dm-space-map-metadata.h.  Update
dm-cache-metadata to use these same limits.

The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the
sizeof the disk_bitmap_header.  So the supported maximum metadata size
is a bit smaller (reduced from 33423360 to 33292800 sectors).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 7d48935e cleaned up the persistent-data's space-map-metadata
limits by elevating them to dm-space-map-metadata.h.  Update
dm-cache-metadata to use these same limits.

The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the
sizeof the disk_bitmap_header.  So the supported maximum metadata size
is a bit smaller (reduced from 33423360 to 33292800 sectors).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: fail migrations in the do_worker error path</title>
<updated>2014-08-01T16:30:33+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2014-06-24T19:36:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=304affaa88328fe8f40ff14f5e6ccbfad8f121eb'/>
<id>304affaa88328fe8f40ff14f5e6ccbfad8f121eb</id>
<content type='text'>
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: simplify deferred set reference count increments</title>
<updated>2014-08-01T16:30:32+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2014-05-13T15:18:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c081b52c6833a30a69ea3bdcef316eccc740c87'/>
<id>8c081b52c6833a30a69ea3bdcef316eccc740c87</id>
<content type='text'>
Factor out inc_and_issue and inc_ds helpers to simplify deferred set
reference count increments.  Also cleanup cache_map to consistently call
cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED.

No functional change.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out inc_and_issue and inc_ds helpers to simplify deferred set
reference count increments.  Also cleanup cache_map to consistently call
cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED.

No functional change.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: fix race affecting dirty block count</title>
<updated>2014-08-01T16:25:22+00:00</updated>
<author>
<name>Anssi Hannula</name>
<email>anssi.hannula@iki.fi</email>
</author>
<published>2014-08-01T15:55:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=44fa816bb778edbab6b6ddaaf24908dd6295937e'/>
<id>44fa816bb778edbab6b6ddaaf24908dd6295937e</id>
<content type='text'>
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.  This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.

People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648

Fix this by using an atomic_t for nr_dirty.

Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula &lt;anssi.hannula@iki.fi&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.  This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.

People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648

Fix this by using an atomic_t for nr_dirty.

Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula &lt;anssi.hannula@iki.fi&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: always split discards on cache block boundaries</title>
<updated>2014-05-27T14:33:05+00:00</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2014-05-23T18:10:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1daa838e861ae1a0fb7cd9721a21258430fcc8c'/>
<id>f1daa838e861ae1a0fb7cd9721a21258430fcc8c</id>
<content type='text'>
The DM cache target cannot cope with discards that span multiple cache
blocks, so each discard bio that spans more than one cache block must
get split by the DM core.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org # v3.9+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The DM cache target cannot cope with discards that span multiple cache
blocks, so each discard bio that spans more than one cache block must
get split by the DM core.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org # v3.9+
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: fix writethrough mode quiescing in cache_map</title>
<updated>2014-05-01T20:14:24+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2014-05-01T20:14:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=131cd131a9ff63d4b84f3fe15073a2984ac30066'/>
<id>131cd131a9ff63d4b84f3fe15073a2984ac30066</id>
<content type='text'>
Commit 2ee57d58735 ("dm cache: add passthrough mode") inadvertently
removed the deferred set reference that was taken in cache_map()'s
writethrough mode support.  Restore taking this reference.

This issue was found with code inspection.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.13+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2ee57d58735 ("dm cache: add passthrough mode") inadvertently
removed the deferred set reference that was taken in cache_map()'s
writethrough mode support.  Restore taking this reference.

This issue was found with code inspection.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.13+
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: fix a lock-inversion</title>
<updated>2014-04-04T18:53:05+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2014-04-03T15:16:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0596661f0a16d9d69bf1033320e70b6ff52b5e81'/>
<id>0596661f0a16d9d69bf1033320e70b6ff52b5e81</id>
<content type='text'>
When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata().  This led to this
lock order:

      policy-&gt;lock
        cache_metadata-&gt;root_lock

When loading the cache target the policy is populated while the metadata
lock is held:

      cache_metadata-&gt;root_lock
         policy-&gt;lock

Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy-&gt;lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).

Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option.  However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.

Reported-by: Marian Csontos &lt;mcsontos@redhat.com&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata().  This led to this
lock order:

      policy-&gt;lock
        cache_metadata-&gt;root_lock

When loading the cache target the policy is populated while the metadata
lock is held:

      cache_metadata-&gt;root_lock
         policy-&gt;lock

Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy-&gt;lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).

Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option.  However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.

Reported-by: Marian Csontos &lt;mcsontos@redhat.com&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>dm cache: remove remainder of distinct discard block size</title>
<updated>2014-03-27T20:56:23+00:00</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2014-03-27T19:14:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=64ab346a360a4b15c28fb8531918d4a01f4eabd9'/>
<id>64ab346a360a4b15c28fb8531918d4a01f4eabd9</id>
<content type='text'>
Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.

Completely remove all code that enabled a distinction between the
cache block size and discard block size.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.

Completely remove all code that enabled a distinction between the
cache block size and discard block size.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
