<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/radix-tree.h, branch v4.10.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: workingset: fix use-after-free in shadow node shrinker</title>
<updated>2017-01-08T02:22:40+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-01-07T00:21:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ea07b862ac8ef9b8c8358517d2e39f847dda6659'/>
<id>ea07b862ac8ef9b8c8358517d2e39f847dda6659</id>
<content type='text'>
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
     delete_node+0x1e4/0x200
     __radix_tree_delete_node+0xd/0x10
     shadow_lru_isolate+0xe6/0x220
     __list_lru_walk_one.isra.4+0x9b/0x190
     list_lru_walk_one+0x23/0x30
     scan_shadow_nodes+0x2e/0x40
     shrink_slab.part.44+0x23d/0x5d0
     shrink_node+0x22c/0x330
     kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&amp;node-&gt;private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk.  If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

       root-&gt;rnode
            |
       [0       n]
        |       |
     [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

       root-&gt;rnode
            |
       [0        ]
        |
    [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

       root-&gt;rnode
            |
       [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root-&gt;rnode directly and free
the node - which is a leaf node on the shadow node LRU.

  root-&gt;rnode
       |
       s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner &lt;david@fromorbit.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Reported-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-and-tested-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Chris Leech &lt;cleech@redhat.com&gt;
Cc: Lee Duncan &lt;lduncan@suse.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
     delete_node+0x1e4/0x200
     __radix_tree_delete_node+0xd/0x10
     shadow_lru_isolate+0xe6/0x220
     __list_lru_walk_one.isra.4+0x9b/0x190
     list_lru_walk_one+0x23/0x30
     scan_shadow_nodes+0x2e/0x40
     shrink_slab.part.44+0x23d/0x5d0
     shrink_node+0x22c/0x330
     kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&amp;node-&gt;private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk.  If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

       root-&gt;rnode
            |
       [0       n]
        |       |
     [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

       root-&gt;rnode
            |
       [0        ]
        |
    [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

       root-&gt;rnode
            |
       [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root-&gt;rnode directly and free
the node - which is a leaf node on the shadow node LRU.

  root-&gt;rnode
       |
       s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner &lt;david@fromorbit.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Reported-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-and-tested-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Chris Leech &lt;cleech@redhat.com&gt;
Cc: Lee Duncan &lt;lduncan@suse.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: add radix_tree_split_preload()</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@linux.intel.com</email>
</author>
<published>2016-12-14T23:09:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2791653a6814d170fa893344618563a7b1da95c6'/>
<id>2791653a6814d170fa893344618563a7b1da95c6</id>
<content type='text'>
Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp-&gt;nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp-&gt;nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: add radix_tree_split</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@linux.intel.com</email>
</author>
<published>2016-12-14T23:09:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e157b555945fb16ddc6cce605a1eb6b4135ea5f1'/>
<id>e157b555945fb16ddc6cce605a1eb6b4135ea5f1</id>
<content type='text'>
This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: add radix_tree_join</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@linux.intel.com</email>
</author>
<published>2016-12-14T23:08:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=175542f575723e43f897ddb09d0011c13f7cf0ec'/>
<id>175542f575723e43f897ddb09d0011c13f7cf0ec</id>
<content type='text'>
This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry.  From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry.  From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: delete radix_tree_range_tag_if_tagged()</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2016-12-14T23:08:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=268f42de718128cd0301293177e79c08c38e39a6'/>
<id>268f42de718128cd0301293177e79c08c38e39a6</id>
<content type='text'>
This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: delete radix_tree_locate_item()</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2016-12-14T23:08:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=478922e2b0f41567e4a530771bfb3f693f857d45'/>
<id>478922e2b0f41567e4a530771bfb3f693f857d45</id>
<content type='text'>
This rather complicated function can be better implemented as an
iterator.  It has only one caller, so move the functionality to the only
place that needs it.  Update the test suite to follow the same pattern.

Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This rather complicated function can be better implemented as an
iterator.  It has only one caller, so move the functionality to the only
place that needs it.  Update the test suite to follow the same pattern.

Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: improve multiorder iterators</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2016-12-14T23:08:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=148deab223b23734069abcacb5c7118b0e7deadc'/>
<id>148deab223b23734069abcacb5c7118b0e7deadc</id>
<content type='text'>
This fixes several interlinked problems with the iterators in the
presence of multiorder entries.

1. radix_tree_iter_next() would only advance by one slot, which would
   result in the iterators returning the same entry more than once if
   there were sibling entries.

2. radix_tree_next_slot() could return an internal pointer instead of
   a user pointer if a tagged multiorder entry was immediately followed by
   an entry of lower order.

3. radix_tree_next_slot() expanded to a lot more code than it used to
   when multiorder support was compiled in.  And I wasn't comfortable with
   entry_to_node() being in a header file.

Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree.  Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.

radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced.  It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry).  Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.

Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes several interlinked problems with the iterators in the
presence of multiorder entries.

1. radix_tree_iter_next() would only advance by one slot, which would
   result in the iterators returning the same entry more than once if
   there were sibling entries.

2. radix_tree_next_slot() could return an internal pointer instead of
   a user pointer if a tagged multiorder entry was immediately followed by
   an entry of lower order.

3. radix_tree_next_slot() expanded to a lot more code than it used to
   when multiorder support was compiled in.  And I wasn't comfortable with
   entry_to_node() being in a header file.

Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree.  Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.

radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced.  It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry).  Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.

Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: move rcu_head into a union with private_list</title>
<updated>2016-12-15T00:04:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2016-12-14T23:08:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=91d9c05ac6c788531136888d31ef18c6a0ec160f'/>
<id>91d9c05ac6c788531136888d31ef18c6a0ec160f</id>
<content type='text'>
I want to be able to reference node-&gt;parent after freeing node.

Currently node-&gt;parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list.  We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.

Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I want to be able to reference node-&gt;parent after freeing node.

Currently node-&gt;parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list.  We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.

Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Tested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: workingset: move shadow entry tracking to radix tree exceptional tracking</title>
<updated>2016-12-13T02:55:08+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-12-13T00:43:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=14b468791fa955d442f962fdf5207dfd39a131c8'/>
<id>14b468791fa955d442f962fdf5207dfd39a131c8</id>
<content type='text'>
Currently, we track the shadow entries in the page cache in the upper
bits of the radix_tree_node-&gt;count, behind the back of the radix tree
implementation.  Because the radix tree code has no awareness of them,
we rely on random subtleties throughout the implementation (such as the
node-&gt;count != 1 check in the shrinking code, which is meant to exclude
multi-entry nodes but also happens to skip nodes with only one shadow
entry, as that's accounted in the upper bits).  This is error prone and
has, in fact, caused the bug fixed in d3798ae8c6f3 ("mm: filemap: don't
plant shadow entries without radix tree node").

To remove these subtleties, this patch moves shadow entry tracking from
the upper bits of node-&gt;count to the existing counter for exceptional
entries.  node-&gt;count goes back to being a simple counter of valid
entries in the tree node and can be shrunk to a single byte.

This vastly simplifies the page cache code.  All accounting happens
natively inside the radix tree implementation, and maintaining the LRU
linkage of shadow nodes is consolidated into a single function in the
workingset code that is called for leaf nodes affected by a change in
the page cache tree.

This also removes the last user of the __radix_delete_node() return
value.  Eliminate it.

Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, we track the shadow entries in the page cache in the upper
bits of the radix_tree_node-&gt;count, behind the back of the radix tree
implementation.  Because the radix tree code has no awareness of them,
we rely on random subtleties throughout the implementation (such as the
node-&gt;count != 1 check in the shrinking code, which is meant to exclude
multi-entry nodes but also happens to skip nodes with only one shadow
entry, as that's accounted in the upper bits).  This is error prone and
has, in fact, caused the bug fixed in d3798ae8c6f3 ("mm: filemap: don't
plant shadow entries without radix tree node").

To remove these subtleties, this patch moves shadow entry tracking from
the upper bits of node-&gt;count to the existing counter for exceptional
entries.  node-&gt;count goes back to being a simple counter of valid
entries in the tree node and can be shrunk to a single byte.

This vastly simplifies the page cache code.  All accounting happens
natively inside the radix tree implementation, and maintaining the LRU
linkage of shadow nodes is consolidated into a single function in the
workingset code that is called for leaf nodes affected by a change in
the page cache tree.

This also removes the last user of the __radix_delete_node() return
value.  Eliminate it.

Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: radix-tree: update callback for changing leaf nodes</title>
<updated>2016-12-13T02:55:08+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-12-13T00:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4d693d08607ab319095ec8942909df4b4aebdf66'/>
<id>4d693d08607ab319095ec8942909df4b4aebdf66</id>
<content type='text'>
Support handing __radix_tree_replace() a callback that gets invoked for
all leaf nodes that change or get freed as a result of the slot
replacement, to assist users tracking nodes with node-&gt;private_list.

This prepares for putting page cache shadow entries into the radix tree
root again and drastically simplifying the shadow tracking.

Link: http://lkml.kernel.org/r/20161117193134.GD23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support handing __radix_tree_replace() a callback that gets invoked for
all leaf nodes that change or get freed as a result of the slot
replacement, to assist users tracking nodes with node-&gt;private_list.

This prepares for putting page cache shadow entries into the radix tree
root again and drastically simplifying the shadow tracking.

Link: http://lkml.kernel.org/r/20161117193134.GD23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
