<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/persistent-data/dm-btree.c, branch v6.1</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dm btree: add a defensive bounds check to insert_at()</title>
<updated>2022-01-04T18:56:03+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-12-10T13:44:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=85bca3c05b6cca31625437eedf2060e846c4bbad'/>
<id>85bca3c05b6cca31625437eedf2060e846c4bbad</id>
<content type='text'>
Corrupt metadata could trigger an out of bounds write.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Corrupt metadata could trigger an out of bounds write.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm space maps: improve performance with inc/dec on ranges of blocks</title>
<updated>2021-06-04T16:07:22+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-04-13T10:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=be500ed721a6ec8d49bf0814c277ce7162acee0e'/>
<id>be500ed721a6ec8d49bf0814c277ce7162acee0e</id>
<content type='text'>
When we break sharing on btree nodes we typically need to increment
the reference counts to every value held in the node.  This can
cause a lot of repeated calls to the space maps.  Fix this by changing
the interface to the space map inc/dec methods to take ranges of
adjacent blocks to be operated on.

For installations that are using a lot of snapshots this will reduce
cpu overhead of fundamental operations such as provisioning a new block,
or deleting a snapshot, by as much as 10 times.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we break sharing on btree nodes we typically need to increment
the reference counts to every value held in the node.  This can
cause a lot of repeated calls to the space maps.  Fix this by changing
the interface to the space map inc/dec methods to take ranges of
adjacent blocks to be operated on.

For installations that are using a lot of snapshots this will reduce
cpu overhead of fundamental operations such as provisioning a new block,
or deleting a snapshot, by as much as 10 times.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: improve btree residency</title>
<updated>2021-06-04T16:07:20+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-04-08T12:47:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4eafdb1515a708d97e4659bd488ddac19f274c4f'/>
<id>4eafdb1515a708d97e4659bd488ddac19f274c4f</id>
<content type='text'>
This commit improves the residency of btrees built in the metadata for
dm-thin and dm-cache.

When inserting a new entry into a full btree node the current code
splits the node into two.  This can result in very many half full nodes,
particularly if the insertions are occurring in an ascending order (as
happens in dm-thin with large writes).

With this commit, when we insert into a full node we first try and move
some entries to a neighbouring node that has space, failing that it
tries to split two neighbouring nodes into three.

Results are given below.  'Residency' is how full nodes are on average
as a percentage.  Average instruction counts for the operations
are given to show the extra processing has little overhead.

                         +--------------------------+--------------------------+
                         |         Before           |         After            |
+------------+-----------+-----------+--------------+-----------+--------------+
|    Test    |   Phase   | Residency | Instructions | Residency | Instructions |
+------------+-----------+-----------+--------------+-----------+--------------+
| Ascending  | insert    |        50 |         1876 |        96 |         1930 |
|            | overwrite |        50 |         1789 |        96 |         1746 |
|            | lookup    |        50 |          778 |        96 |          778 |
| Descending | insert    |        50 |         3024 |        96 |         3181 |
|            | overwrite |        50 |         1789 |        96 |         1746 |
|            | lookup    |        50 |          778 |        96 |          778 |
| Random     | insert    |        68 |         3800 |        84 |         3736 |
|            | overwrite |        68 |         4254 |        84 |         3911 |
|            | lookup    |        68 |          779 |        84 |          779 |
| Runs       | insert    |        63 |         2546 |        82 |         2815 |
|            | overwrite |        63 |         2013 |        82 |         1986 |
|            | lookup    |        63 |          778 |        82 |          779 |
+------------+-----------+-----------+--------------+-----------+--------------+

   Ascending - keys are inserted in ascending order.
   Descending - keys are inserted in descending order.
   Random - keys are inserted in random order.
   Runs - keys are split into ascending runs of ~20 length.  Then
          the runs are shuffled.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt; # contains_key() fix
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit improves the residency of btrees built in the metadata for
dm-thin and dm-cache.

When inserting a new entry into a full btree node the current code
splits the node into two.  This can result in very many half full nodes,
particularly if the insertions are occurring in an ascending order (as
happens in dm-thin with large writes).

With this commit, when we insert into a full node we first try and move
some entries to a neighbouring node that has space, failing that it
tries to split two neighbouring nodes into three.

Results are given below.  'Residency' is how full nodes are on average
as a percentage.  Average instruction counts for the operations
are given to show the extra processing has little overhead.

                         +--------------------------+--------------------------+
                         |         Before           |         After            |
+------------+-----------+-----------+--------------+-----------+--------------+
|    Test    |   Phase   | Residency | Instructions | Residency | Instructions |
+------------+-----------+-----------+--------------+-----------+--------------+
| Ascending  | insert    |        50 |         1876 |        96 |         1930 |
|            | overwrite |        50 |         1789 |        96 |         1746 |
|            | lookup    |        50 |          778 |        96 |          778 |
| Descending | insert    |        50 |         3024 |        96 |         3181 |
|            | overwrite |        50 |         1789 |        96 |         1746 |
|            | lookup    |        50 |          778 |        96 |          778 |
| Random     | insert    |        68 |         3800 |        84 |         3736 |
|            | overwrite |        68 |         4254 |        84 |         3911 |
|            | lookup    |        68 |          779 |        84 |          779 |
| Runs       | insert    |        63 |         2546 |        82 |         2815 |
|            | overwrite |        63 |         2013 |        82 |         1986 |
|            | lookup    |        63 |          778 |        82 |          779 |
+------------+-----------+-----------+--------------+-----------+--------------+

   Ascending - keys are inserted in ascending order.
   Descending - keys are inserted in descending order.
   Random - keys are inserted in random order.
   Runs - keys are split into ascending runs of ~20 length.  Then
          the runs are shuffled.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt; # contains_key() fix
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm thin metadata: Remove unused local variable when create thin and snap</title>
<updated>2020-09-29T20:33:11+00:00</updated>
<author>
<name>Huaisheng Ye</name>
<email>yehs1@lenovo.com</email>
</author>
<published>2020-09-15T08:56:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=399c9bdbd6500254895bdbb574a4acbb860cda41'/>
<id>399c9bdbd6500254895bdbb574a4acbb860cda41</id>
<content type='text'>
The local variable disk details is not used during the creating of thin &amp; snap
devices. Remove them from dm-thin-metadata, and add pointer validity check for
pointer value in btree_lookup_raw. Skip memory copy when the caller doesn't need
the value.

Signed-off-by: Huaisheng Ye &lt;yehs1@lenovo.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The local variable disk details is not used during the creating of thin &amp; snap
devices. Remove them from dm-thin-metadata, and add pointer validity check for
pointer value in btree_lookup_raw. Skip memory copy when the caller doesn't need
the value.

Signed-off-by: Huaisheng Ye &lt;yehs1@lenovo.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: fix order of block initialization in btree_split_beneath</title>
<updated>2019-08-22T20:11:23+00:00</updated>
<author>
<name>ZhangXiaoxu</name>
<email>zhangxiaoxu5@huawei.com</email>
</author>
<published>2019-08-17T05:32:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e4f9d6013820d1eba1432d51dd1c5795759aa77f'/>
<id>e4f9d6013820d1eba1432d51dd1c5795759aa77f</id>
<content type='text'>
When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right.  If right block's allocation
failed, the left block will be unlocked and marked dirty.  If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block.  Upon return, when flushing the left block to disk, the
validator will fail when check this block.  Then a BUG_ON is raised.

Fix this by completely initializing the left block before allocating and
initializing the right block.

Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu &lt;zhangxiaoxu5@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right.  If right block's allocation
failed, the left block will be unlocked and marked dirty.  If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block.  Upon return, when flushing the left block to disk, the
validator will fail when check this block.  Then a BUG_ON is raised.

Fix this by completely initializing the left block before allocating and
initializing the right block.

Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu &lt;zhangxiaoxu5@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: fix serious bug in btree_split_beneath()</title>
<updated>2018-01-17T14:07:55+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>thornber@redhat.com</email>
</author>
<published>2017-12-20T09:56:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bc68d0a43560e950850fc69b58f0f8254b28f6d6'/>
<id>bc68d0a43560e950850fc69b58f0f8254b28f6d6</id>
<content type='text'>
When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

  i) space for a new entry
  ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them.  The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children.  This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out.  This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use.  A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
  https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine.  Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.

Cc: stable@vger.kernel.org
Reported-by: Monty Pavel &lt;monty_pavel@sina.com&gt;
Signed-off-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

  i) space for a new entry
  ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them.  The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children.  This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out.  This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use.  A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
  https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine.  Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.

Cc: stable@vger.kernel.org
Reported-by: Monty Pavel &lt;monty_pavel@sina.com&gt;
Signed-off-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: fix for dm_btree_find_lowest_key()</title>
<updated>2017-04-24T18:47:49+00:00</updated>
<author>
<name>Vinothkumar Raja</name>
<email>vinraja@cs.stonybrook.edu</email>
</author>
<published>2017-04-07T02:09:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d1fedb6e96a960aa91e4ff70714c3fb09195a5a'/>
<id>7d1fedb6e96a960aa91e4ff70714c3fb09195a5a</id>
<content type='text'>
dm_btree_find_lowest_key() is giving incorrect results.  find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key.  dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.

Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.

Cc: stable@vger.kernel.org
Signed-off-by: Erez Zadok &lt;ezk@fsl.cs.sunysb.edu&gt;
Signed-off-by: Vinothkumar Raja &lt;vinraja@cs.stonybrook.edu&gt;
Signed-off-by: Nidhi Panpalia &lt;npanpalia@cs.stonybrook.edu&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm_btree_find_lowest_key() is giving incorrect results.  find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key.  dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.

Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.

Cc: stable@vger.kernel.org
Signed-off-by: Erez Zadok &lt;ezk@fsl.cs.sunysb.edu&gt;
Signed-off-by: Vinothkumar Raja &lt;vinraja@cs.stonybrook.edu&gt;
Signed-off-by: Nidhi Panpalia &lt;npanpalia@cs.stonybrook.edu&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm persistent data: add cursor skip functions to the cursor APIs</title>
<updated>2017-02-16T18:12:50+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2016-10-05T14:40:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b696229aa7de356675a938c6c8a70b46085ed66'/>
<id>9b696229aa7de356675a938c6c8a70b46085ed66</id>
<content type='text'>
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: use GFP_NOFS in dm_btree_del()</title>
<updated>2017-02-16T18:09:10+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2015-11-19T13:36:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9f9ef0657d53d988dc07b096052b3dd07d6e3c46'/>
<id>9f9ef0657d53d988dc07b096052b3dd07d6e3c46</id>
<content type='text'>
dm_btree_del() is called from an ioctl so don't recurse into FS.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm_btree_del() is called from an ioctl so don't recurse into FS.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm btree: introduce cursor api</title>
<updated>2016-09-22T15:15:04+00:00</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2016-09-15T14:49:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d111c81fa29041c730010450618917fb05cab62'/>
<id>7d111c81fa29041c730010450618917fb05cab62</id>
<content type='text'>
This uses prefetching to speed up iteration through a btree.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This uses prefetching to speed up iteration through a btree.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
