<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/rcu/tree.h, branch v4.8</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>rcu: Correctly handle sparse possible cpus</title>
<updated>2016-06-15T23:00:05+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2016-06-03T14:20:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bc75e99983df1efd977a5cd468893d55d52b8d70'/>
<id>bc75e99983df1efd977a5cd468893d55d52b8d70</id>
<content type='text'>
In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [&lt;ffffff80081109a8&gt;] lr : [&lt;ffffff8008110924&gt;] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [&lt;ffffff80081109a8&gt;] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [&lt;ffffff80081128b4&gt;] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [&lt;ffffff80086b987c&gt;] synchronize_net+0x24/0x30
[ 3369.990499] [&lt;ffffff80086ddb54&gt;] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [&lt;ffffff80086b6bb8&gt;] __dev_close_many+0x60/0xd0
[ 3370.002052] [&lt;ffffff80086b6d48&gt;] __dev_close+0x28/0x40
[ 3370.007178] [&lt;ffffff80086bf62c&gt;] __dev_change_flags+0x8c/0x158
[ 3370.012999] [&lt;ffffff80086bf718&gt;] dev_change_flags+0x20/0x60
[ 3370.018558] [&lt;ffffff80086cf7f0&gt;] do_setlink+0x288/0x918
[ 3370.023771] [&lt;ffffff80086d0798&gt;] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [&lt;ffffff80086cee84&gt;] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [&lt;ffffff80086e274c&gt;] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [&lt;ffffff80086ced8c&gt;] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [&lt;ffffff80086e1fe8&gt;] netlink_unicast+0x160/0x238
[ 3370.051309] [&lt;ffffff80086e24b8&gt;] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [&lt;ffffff80086a0070&gt;] sock_sendmsg+0x18/0x30
[ 3370.062168] [&lt;ffffff80086a21cc&gt;] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [&lt;ffffff80086a30ac&gt;] __sys_sendmsg+0x44/0x88
[ 3370.073027] [&lt;ffffff80086a3100&gt;] SyS_sendmsg+0x10/0x20
[ 3370.078153] [&lt;ffffff8008085e70&gt;] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reported-by: Dennis Chen &lt;dennis.chen@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Steve Capper &lt;steve.capper@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [&lt;ffffff80081109a8&gt;] lr : [&lt;ffffff8008110924&gt;] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [&lt;ffffff80081109a8&gt;] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [&lt;ffffff80081128b4&gt;] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [&lt;ffffff80086b987c&gt;] synchronize_net+0x24/0x30
[ 3369.990499] [&lt;ffffff80086ddb54&gt;] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [&lt;ffffff80086b6bb8&gt;] __dev_close_many+0x60/0xd0
[ 3370.002052] [&lt;ffffff80086b6d48&gt;] __dev_close+0x28/0x40
[ 3370.007178] [&lt;ffffff80086bf62c&gt;] __dev_change_flags+0x8c/0x158
[ 3370.012999] [&lt;ffffff80086bf718&gt;] dev_change_flags+0x20/0x60
[ 3370.018558] [&lt;ffffff80086cf7f0&gt;] do_setlink+0x288/0x918
[ 3370.023771] [&lt;ffffff80086d0798&gt;] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [&lt;ffffff80086cee84&gt;] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [&lt;ffffff80086e274c&gt;] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [&lt;ffffff80086ced8c&gt;] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [&lt;ffffff80086e1fe8&gt;] netlink_unicast+0x160/0x238
[ 3370.051309] [&lt;ffffff80086e24b8&gt;] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [&lt;ffffff80086a0070&gt;] sock_sendmsg+0x18/0x30
[ 3370.062168] [&lt;ffffff80086a21cc&gt;] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [&lt;ffffff80086a30ac&gt;] __sys_sendmsg+0x44/0x88
[ 3370.073027] [&lt;ffffff80086a3100&gt;] SyS_sendmsg+0x10/0x20
[ 3370.078153] [&lt;ffffff8008085e70&gt;] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reported-by: Dennis Chen &lt;dennis.chen@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Steve Capper &lt;steve.capper@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'torture.2016.04.21a' into HEAD</title>
<updated>2016-04-21T20:48:20+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-04-21T20:48:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dcd36d01fb3f99d1d5df01714f6ccbe3fbbaf81f'/>
<id>dcd36d01fb3f99d1d5df01714f6ccbe3fbbaf81f</id>
<content type='text'>
doc.2016.04.19a: Documentation updates
exp.2016.03.31d: Expedited grace-period updates
fixes.2016.03.31d: Miscellaneous fixes
torture.2016.004.21a Torture-test updates
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
doc.2016.04.19a: Documentation updates
exp.2016.03.31d: Expedited grace-period updates
fixes.2016.03.31d: Miscellaneous fixes
torture.2016.004.21a Torture-test updates
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Awaken grace-period kthread if too long since FQS</title>
<updated>2016-03-31T20:34:50+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-01-04T04:29:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8c7c4829a81c1838f18c12ce5a3a5c29a08bf0a8'/>
<id>8c7c4829a81c1838f18c12ce5a3a5c29a08bf0a8</id>
<content type='text'>
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing.  This commit is a crude hack that does
a wakeup if a scheduling-clock interrupt sees that it has been
too long since force-quiescent-state (FQS) processing.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing.  This commit is a crude hack that does
a wakeup if a scheduling-clock interrupt sees that it has been
too long since force-quiescent-state (FQS) processing.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Overlap wakeups with next expedited grace period</title>
<updated>2016-03-31T20:34:11+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-03-16T23:47:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3b5f668e715bc19610ad967ef97a7e8c55a186ec'/>
<id>3b5f668e715bc19610ad967ef97a7e8c55a186ec</id>
<content type='text'>
The current expedited grace-period implementation makes subsequent grace
periods wait on wakeups for the prior grace period.  This does not fit
the dictionary definition of "expedited", so this commit allows these two
phases to overlap.  Doing this requires four waitqueues rather than two
because tasks can now be waiting on the previous, current, and next grace
periods.  The fourth waitqueue makes the bit masking work out nicely.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current expedited grace-period implementation makes subsequent grace
periods wait on wakeups for the prior grace period.  This does not fit
the dictionary definition of "expedited", so this commit allows these two
phases to overlap.  Doing this requires four waitqueues rather than two
because tasks can now be waiting on the previous, current, and next grace
periods.  The fourth waitqueue makes the bit masking work out nicely.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Enforce expedited-GP fairness via funnel wait queue</title>
<updated>2016-03-31T20:34:08+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-01-31T01:57:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6a12f34a448cc8a624070fd365c29c890138a48'/>
<id>f6a12f34a448cc8a624070fd365c29c890138a48</id>
<content type='text'>
The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness.  The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do.  A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.

This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node.  If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose.  This decouples awakening of old tasks from
the arrival of new tasks.

If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness.  The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do.  A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.

This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node.  If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose.  This decouples awakening of old tasks from
the arrival of new tasks.

If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Shorten expedited_workdone* to exp_workdone*</title>
<updated>2016-03-31T20:34:08+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-03-08T22:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d40a4f09a448382961fa9b1a2f7d4f34813f0273'/>
<id>d40a4f09a448382961fa9b1a2f7d4f34813f0273</id>
<content type='text'>
Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Remove expedited GP funnel-lock bypass</title>
<updated>2016-03-31T20:34:07+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2016-01-31T01:23:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e2fd9d35847d1936398d44c4df68dceb3d7f64e7'/>
<id>e2fd9d35847d1936398d44c4df68dceb3d7f64e7</id>
<content type='text'>
Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need.  This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need.  This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge commit 'fixes.2015.02.23a' into core/rcu</title>
<updated>2016-03-15T08:01:06+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-03-15T08:00:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8bc6782fe20bd2584c73a35c47329c9fd0a8d34c'/>
<id>8bc6782fe20bd2584c73a35c47329c9fd0a8d34c</id>
<content type='text'>
 Conflicts:
	kernel/rcu/tree.c

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 Conflicts:
	kernel/rcu/tree.c

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Use simple wait queues where possible in rcutree</title>
<updated>2016-02-25T10:27:16+00:00</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2016-02-19T08:46:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=abedf8e2419fb873d919dd74de2e84b510259339'/>
<id>abedf8e2419fb873d919dd74de2e84b510259339</id>
<content type='text'>
As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads
GP waits") the RCU subsystem started making use of wait queues.

Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.

Originally this was done for RT kernels[1], since we would get things like...

  BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
  in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
  Pid: 8, comm: rcu_preempt Not tainted
  Call Trace:
   [&lt;ffffffff8106c8d0&gt;] __might_sleep+0xd0/0xf0
   [&lt;ffffffff817d77b4&gt;] rt_spin_lock+0x24/0x50
   [&lt;ffffffff8106fcf6&gt;] __wake_up+0x36/0x70
   [&lt;ffffffff810c4542&gt;] rcu_gp_kthread+0x4d2/0x680
   [&lt;ffffffff8105f910&gt;] ? __init_waitqueue_head+0x50/0x50
   [&lt;ffffffff810c4070&gt;] ? rcu_gp_fqs+0x80/0x80
   [&lt;ffffffff8105eabb&gt;] kthread+0xdb/0xe0
   [&lt;ffffffff8106b912&gt;] ? finish_task_switch+0x52/0x100
   [&lt;ffffffff817e0754&gt;] kernel_thread_helper+0x4/0x10
   [&lt;ffffffff8105e9e0&gt;] ? __init_kthread_worker+0x60/0x60
   [&lt;ffffffff817e0750&gt;] ? gs_change+0xb/0xb

...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.

[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads
GP waits") the RCU subsystem started making use of wait queues.

Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.

Originally this was done for RT kernels[1], since we would get things like...

  BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
  in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
  Pid: 8, comm: rcu_preempt Not tainted
  Call Trace:
   [&lt;ffffffff8106c8d0&gt;] __might_sleep+0xd0/0xf0
   [&lt;ffffffff817d77b4&gt;] rt_spin_lock+0x24/0x50
   [&lt;ffffffff8106fcf6&gt;] __wake_up+0x36/0x70
   [&lt;ffffffff810c4542&gt;] rcu_gp_kthread+0x4d2/0x680
   [&lt;ffffffff8105f910&gt;] ? __init_waitqueue_head+0x50/0x50
   [&lt;ffffffff810c4070&gt;] ? rcu_gp_fqs+0x80/0x80
   [&lt;ffffffff8105eabb&gt;] kthread+0xdb/0xe0
   [&lt;ffffffff8106b912&gt;] ? finish_task_switch+0x52/0x100
   [&lt;ffffffff817e0754&gt;] kernel_thread_helper+0x4/0x10
   [&lt;ffffffff8105e9e0&gt;] ? __init_kthread_worker+0x60/0x60
   [&lt;ffffffff817e0750&gt;] ? gs_change+0xb/0xb

...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.

[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp-&gt;lock</title>
<updated>2016-02-25T10:27:16+00:00</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2016-02-19T08:46:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=065bb78c5b09df54d1c32e03227deb777ddff57b'/>
<id>065bb78c5b09df54d1c32e03227deb777ddff57b</id>
<content type='text'>
rcu_nocb_gp_cleanup() is called while holding rnp-&gt;lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.

By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp-&gt;lock.

Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.

If we would hold the rnp-&gt;lock and use swait, lockdep reports
following:

 =================================
 [ INFO: inconsistent lock state ]
 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
 ---------------------------------
 inconsistent {IN-SOFTIRQ-W} -&gt; {SOFTIRQ-ON-W} usage.
 rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (rcu_node_1){+.?...}, at: [&lt;ffffffff811387c7&gt;] rcu_gp_kthread+0xb97/0xeb0
 {IN-SOFTIRQ-W} state was registered at:
   [&lt;ffffffff81109b9f&gt;] __lock_acquire+0xd5f/0x21e0
   [&lt;ffffffff8110be0f&gt;] lock_acquire+0xdf/0x2b0
   [&lt;ffffffff81841cc9&gt;] _raw_spin_lock_irqsave+0x59/0xa0
   [&lt;ffffffff81136991&gt;] rcu_process_callbacks+0x141/0x3c0
   [&lt;ffffffff810b1a9d&gt;] __do_softirq+0x14d/0x670
   [&lt;ffffffff810b2214&gt;] irq_exit+0x104/0x110
   [&lt;ffffffff81844e96&gt;] smp_apic_timer_interrupt+0x46/0x60
   [&lt;ffffffff81842e70&gt;] apic_timer_interrupt+0x70/0x80
   [&lt;ffffffff810dba66&gt;] rq_attach_root+0xa6/0x100
   [&lt;ffffffff810dbc2d&gt;] cpu_attach_domain+0x16d/0x650
   [&lt;ffffffff810e4b42&gt;] build_sched_domains+0x942/0xb00
   [&lt;ffffffff821777c2&gt;] sched_init_smp+0x509/0x5c1
   [&lt;ffffffff821551e3&gt;] kernel_init_freeable+0x172/0x28f
   [&lt;ffffffff8182cdce&gt;] kernel_init+0xe/0xe0
   [&lt;ffffffff8184231f&gt;] ret_from_fork+0x3f/0x70
 irq event stamp: 76
 hardirqs last  enabled at (75): [&lt;ffffffff81841330&gt;] _raw_spin_unlock_irq+0x30/0x60
 hardirqs last disabled at (76): [&lt;ffffffff8184116f&gt;] _raw_spin_lock_irq+0x1f/0x90
 softirqs last  enabled at (0): [&lt;ffffffff810a8df2&gt;] copy_process.part.26+0x602/0x1cf0
 softirqs last disabled at (0): [&lt;          (null)&gt;]           (null)
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(rcu_node_1);
   &lt;Interrupt&gt;
     lock(rcu_node_1);
  *** DEADLOCK ***
 1 lock held by rcu_preempt/8:
  #0:  (rcu_node_1){+.?...}, at: [&lt;ffffffff811387c7&gt;] rcu_gp_kthread+0xb97/0xeb0
 stack backtrace:
 CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
  0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
  0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
  0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
 Call Trace:
  [&lt;ffffffff818379e0&gt;] dump_stack+0x4f/0x7b
  [&lt;ffffffff8110813b&gt;] print_usage_bug+0x1db/0x1e0
  [&lt;ffffffff8102fa4f&gt;] ? save_stack_trace+0x2f/0x50
  [&lt;ffffffff811087ad&gt;] mark_lock+0x66d/0x6e0
  [&lt;ffffffff81107790&gt;] ? check_usage_forwards+0x150/0x150
  [&lt;ffffffff81108898&gt;] mark_held_locks+0x78/0xa0
  [&lt;ffffffff81841330&gt;] ? _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff81108a28&gt;] trace_hardirqs_on_caller+0x168/0x220
  [&lt;ffffffff81108aed&gt;] trace_hardirqs_on+0xd/0x10
  [&lt;ffffffff81841330&gt;] _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff810fd1c7&gt;] swake_up_all+0xb7/0xe0
  [&lt;ffffffff811386e1&gt;] rcu_gp_kthread+0xab1/0xeb0
  [&lt;ffffffff811089bf&gt;] ? trace_hardirqs_on_caller+0xff/0x220
  [&lt;ffffffff81841341&gt;] ? _raw_spin_unlock_irq+0x41/0x60
  [&lt;ffffffff81137c30&gt;] ? rcu_barrier+0x20/0x20
  [&lt;ffffffff810d2014&gt;] kthread+0x104/0x120
  [&lt;ffffffff81841330&gt;] ? _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff810d1f10&gt;] ? kthread_create_on_node+0x260/0x260
  [&lt;ffffffff8184231f&gt;] ret_from_fork+0x3f/0x70
  [&lt;ffffffff810d1f10&gt;] ? kthread_create_on_node+0x260/0x260

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rcu_nocb_gp_cleanup() is called while holding rnp-&gt;lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.

By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp-&gt;lock.

Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.

If we would hold the rnp-&gt;lock and use swait, lockdep reports
following:

 =================================
 [ INFO: inconsistent lock state ]
 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
 ---------------------------------
 inconsistent {IN-SOFTIRQ-W} -&gt; {SOFTIRQ-ON-W} usage.
 rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (rcu_node_1){+.?...}, at: [&lt;ffffffff811387c7&gt;] rcu_gp_kthread+0xb97/0xeb0
 {IN-SOFTIRQ-W} state was registered at:
   [&lt;ffffffff81109b9f&gt;] __lock_acquire+0xd5f/0x21e0
   [&lt;ffffffff8110be0f&gt;] lock_acquire+0xdf/0x2b0
   [&lt;ffffffff81841cc9&gt;] _raw_spin_lock_irqsave+0x59/0xa0
   [&lt;ffffffff81136991&gt;] rcu_process_callbacks+0x141/0x3c0
   [&lt;ffffffff810b1a9d&gt;] __do_softirq+0x14d/0x670
   [&lt;ffffffff810b2214&gt;] irq_exit+0x104/0x110
   [&lt;ffffffff81844e96&gt;] smp_apic_timer_interrupt+0x46/0x60
   [&lt;ffffffff81842e70&gt;] apic_timer_interrupt+0x70/0x80
   [&lt;ffffffff810dba66&gt;] rq_attach_root+0xa6/0x100
   [&lt;ffffffff810dbc2d&gt;] cpu_attach_domain+0x16d/0x650
   [&lt;ffffffff810e4b42&gt;] build_sched_domains+0x942/0xb00
   [&lt;ffffffff821777c2&gt;] sched_init_smp+0x509/0x5c1
   [&lt;ffffffff821551e3&gt;] kernel_init_freeable+0x172/0x28f
   [&lt;ffffffff8182cdce&gt;] kernel_init+0xe/0xe0
   [&lt;ffffffff8184231f&gt;] ret_from_fork+0x3f/0x70
 irq event stamp: 76
 hardirqs last  enabled at (75): [&lt;ffffffff81841330&gt;] _raw_spin_unlock_irq+0x30/0x60
 hardirqs last disabled at (76): [&lt;ffffffff8184116f&gt;] _raw_spin_lock_irq+0x1f/0x90
 softirqs last  enabled at (0): [&lt;ffffffff810a8df2&gt;] copy_process.part.26+0x602/0x1cf0
 softirqs last disabled at (0): [&lt;          (null)&gt;]           (null)
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(rcu_node_1);
   &lt;Interrupt&gt;
     lock(rcu_node_1);
  *** DEADLOCK ***
 1 lock held by rcu_preempt/8:
  #0:  (rcu_node_1){+.?...}, at: [&lt;ffffffff811387c7&gt;] rcu_gp_kthread+0xb97/0xeb0
 stack backtrace:
 CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
  0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
  0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
  0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
 Call Trace:
  [&lt;ffffffff818379e0&gt;] dump_stack+0x4f/0x7b
  [&lt;ffffffff8110813b&gt;] print_usage_bug+0x1db/0x1e0
  [&lt;ffffffff8102fa4f&gt;] ? save_stack_trace+0x2f/0x50
  [&lt;ffffffff811087ad&gt;] mark_lock+0x66d/0x6e0
  [&lt;ffffffff81107790&gt;] ? check_usage_forwards+0x150/0x150
  [&lt;ffffffff81108898&gt;] mark_held_locks+0x78/0xa0
  [&lt;ffffffff81841330&gt;] ? _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff81108a28&gt;] trace_hardirqs_on_caller+0x168/0x220
  [&lt;ffffffff81108aed&gt;] trace_hardirqs_on+0xd/0x10
  [&lt;ffffffff81841330&gt;] _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff810fd1c7&gt;] swake_up_all+0xb7/0xe0
  [&lt;ffffffff811386e1&gt;] rcu_gp_kthread+0xab1/0xeb0
  [&lt;ffffffff811089bf&gt;] ? trace_hardirqs_on_caller+0xff/0x220
  [&lt;ffffffff81841341&gt;] ? _raw_spin_unlock_irq+0x41/0x60
  [&lt;ffffffff81137c30&gt;] ? rcu_barrier+0x20/0x20
  [&lt;ffffffff810d2014&gt;] kthread+0x104/0x120
  [&lt;ffffffff81841330&gt;] ? _raw_spin_unlock_irq+0x30/0x60
  [&lt;ffffffff810d1f10&gt;] ? kthread_create_on_node+0x260/0x260
  [&lt;ffffffff8184231f&gt;] ret_from_fork+0x3f/0x70
  [&lt;ffffffff810d1f10&gt;] ? kthread_create_on_node+0x260/0x260

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
