<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/rcu/tree.h, branch v5.19.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Merge branch 'exp.2022.05.11a' into HEAD</title>
<updated>2022-05-11T18:49:35+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-05-11T18:49:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ce13389053a347aa9f8ffbfda2238352536e15c9'/>
<id>ce13389053a347aa9f8ffbfda2238352536e15c9</id>
<content type='text'>
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Move expedited grace period (GP) work to RT kthread_worker</title>
<updated>2022-05-11T18:47:10+00:00</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2022-04-09T00:35:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9621fbee44df940e2e1b94b0676460a538dffefa'/>
<id>9621fbee44df940e2e1b94b0676460a538dffefa</id>
<content type='text'>
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Tim Murray &lt;timmurray@google.com&gt;
Reported-by: Wei Wang &lt;wvw@google.com&gt;
Tested-by: Kyle Lin &lt;kylelin@google.com&gt;
Tested-by: Chunwei Lu &lt;chunweilu@google.com&gt;
Tested-by: Lulu Wang &lt;luluw@google.com&gt;
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Tim Murray &lt;timmurray@google.com&gt;
Reported-by: Wei Wang &lt;wvw@google.com&gt;
Tested-by: Kyle Lin &lt;kylelin@google.com&gt;
Tested-by: Chunwei Lu &lt;chunweilu@google.com&gt;
Tested-by: Lulu Wang &lt;luluw@google.com&gt;
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', 'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD</title>
<updated>2022-05-03T17:21:40+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-05-03T17:21:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=be05ee54378d451e1d60196921566d6087f3079f'/>
<id>be05ee54378d451e1d60196921566d6087f3079f</id>
<content type='text'>
docs.2022.04.20a: Documentation updates.
fixes.2022.04.20a: Miscellaneous fixes.
nocb.2022.04.11b: Callback-offloading updates.
rcu-tasks.2022.04.11b: RCU-tasks updates.
srcu.2022.05.03a: Put SRCU on a memory diet.
torture.2022.04.11b: Torture-test updates.
torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration.
torturescript.2022.04.20a: Torture-test scripting updates.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
docs.2022.04.20a: Documentation updates.
fixes.2022.04.20a: Miscellaneous fixes.
nocb.2022.04.11b: Callback-offloading updates.
rcu-tasks.2022.04.11b: RCU-tasks updates.
srcu.2022.05.03a: Put SRCU on a memory diet.
torture.2022.04.11b: Torture-test updates.
torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration.
torturescript.2022.04.20a: Torture-test scripting updates.
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Check for jiffies going backwards</title>
<updated>2022-04-12T00:28:48+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-02-24T01:29:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c708b08c65a0dfae127b9ee33b0fb73535a5e066'/>
<id>c708b08c65a0dfae127b9ee33b0fb73535a5e066</id>
<content type='text'>
A report of a 12-jiffy normal RCU CPU stall warning raises interesting
questions about the nature of time on the offending system.  This commit
instruments rcu_sched_clock_irq(), which is RCU's hook into the
scheduling-clock interrupt, checking for the jiffies counter going
backwards.

Reported-by: Saravanan D &lt;sarvanand@fb.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A report of a 12-jiffy normal RCU CPU stall warning raises interesting
questions about the nature of time on the offending system.  This commit
instruments rcu_sched_clock_irq(), which is RCU's hook into the
scheduling-clock interrupt, checking for the jiffies counter going
backwards.

Reported-by: Saravanan D &lt;sarvanand@fb.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization</title>
<updated>2022-04-12T00:05:58+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2022-02-16T15:42:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=87c5adf06bfbf14c9d13e59d5d174ff5f2aafc0e'/>
<id>87c5adf06bfbf14c9d13e59d5d174ff5f2aafc0e</id>
<content type='text'>
The rcu_spawn_gp_kthread() function is called as an early initcall, which
means that SMP initialization hasn't happened yet and only the boot CPU is
online. Therefore, create only the NOCB kthreads related to the boot CPU.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The rcu_spawn_gp_kthread() function is called as an early initcall, which
means that SMP initialization hasn't happened yet and only the boot CPU is
online. Therefore, create only the NOCB kthreads related to the boot CPU.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Initialize boost kthread only for boot node prior SMP initialization</title>
<updated>2022-04-12T00:05:58+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2022-02-16T15:42:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3352911fa9b47a90165e5c6fed440048c55146d1'/>
<id>3352911fa9b47a90165e5c6fed440048c55146d1</id>
<content type='text'>
The rcu_spawn_gp_kthread() function is called as an early initcall,
which means that SMP initialization hasn't happened yet and only the
boot CPU is online.  Therefore, create only the boost kthread for the
leaf node of the boot CPU.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The rcu_spawn_gp_kthread() function is called as an early initcall,
which means that SMP initialization hasn't happened yet and only the
boot CPU is online.  Therefore, create only the boost kthread for the
leaf node of the boot CPU.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu/nocb: Move rcu_nocb_is_setup to rcu_state</title>
<updated>2022-04-12T00:05:57+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2022-02-14T13:23:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d2aaa9b7c290e766a41f29c71ec72192851d538'/>
<id>8d2aaa9b7c290e766a41f29c71ec72192851d538</id>
<content type='text'>
This commit moves the RCU nocb initialization witness within rcu_state
to consolidate RCU's global state.

Reported-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit moves the RCU nocb initialization witness within rcu_state
to consolidate RCU's global state.

Reported-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Cc: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'exp.2022.02.24a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD</title>
<updated>2022-02-24T17:38:46+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-02-24T17:38:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d5578190bed3d110203e3b6b29c5a7a39d51c6c0'/>
<id>d5578190bed3d110203e3b6b29c5a7a39d51c6c0</id>
<content type='text'>
exp.2022.02.24a: Expedited grace-period updates.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug.
rcu-tasks.2022.02.08a: RCU-tasks updates.
rt.2022.02.01b: Real-time-related updates.
torture.2022.02.01b: Torture-test updates.
torturescript.2022.02.08a: Torture-test scripting updates.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
exp.2022.02.24a: Expedited grace-period updates.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug.
rcu-tasks.2022.02.08a: RCU-tasks updates.
rt.2022.02.01b: Real-time-related updates.
torture.2022.02.01b: Torture-test updates.
torturescript.2022.02.08a: Torture-test scripting updates.
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Add mutex for rcu boost kthread spawning and affinity setting</title>
<updated>2022-02-14T18:36:35+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2021-12-08T23:41:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=218b957a6959a2fb5b3967fc824072bb89ac2611'/>
<id>218b957a6959a2fb5b3967fc824072bb89ac2611</id>
<content type='text'>
As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rcu: Make rcu_barrier() no longer block CPU-hotplug operations</title>
<updated>2022-02-08T18:12:28+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2021-12-14T21:35:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=80b3fd474c91b3ecfd845b4a0bfb58706b877ba5'/>
<id>80b3fd474c91b3ecfd845b4a0bfb58706b877ba5</id>
<content type='text'>
This commit removes the cpus_read_lock() and cpus_read_unlock() calls
from rcu_barrier(), thus allowing CPUs to come and go during the course
of rcu_barrier() execution.  Posting of the -&gt;barrier_head callbacks does
synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
are held for short time periods on both sides.  Thus, full CPU-hotplug
operations could both start and finish during the execution of a given
rcu_barrier() invocation.

Additional synchronization is provided by a global -&gt;barrier_lock.
Since the -&gt;barrier_lock is only used during rcu_barrier() execution and
during onlining/offlining a CPU, the contention for this lock should
be low.  It might be tempting to make use of a per-CPU lock just on
general principles, but straightforward attempts to do this have the
problems shown below.

Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
any callback and CPU2 has callbacks.

1. CPU0 calls rcu_barrier().

2. CPU1 starts offlining for CPU2. CPU1 calls
   rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
   from rcutree_migrate_callbacks(), with CPU2's rdp-&gt;barrier_lock.
   It does not entrain -&gt;barrier_head for CPU2, as rcu_barrier()
   on CPU0 hasn't started the barrier sequence (by calling
   rcu_seq_start(&amp;rcu_state.barrier_sequence)) yet.

3. CPU0 starts new barrier sequence. It iterates over
   CPU0 and CPU1, after acquiring their per-cpu -&gt;barrier_lock
   and finds 0 segcblist length. It updates -&gt;barrier_seq_snap
   for CPU0 and CPU1 and continues loop iteration to CPU2.

    for_each_possible_cpu(cpu) {
        raw_spin_lock_irqsave(&amp;rdp-&gt;barrier_lock, flags);
        if (!rcu_segcblist_n_cbs(&amp;rdp-&gt;cblist)) {
            WRITE_ONCE(rdp-&gt;barrier_seq_snap, gseq);
            raw_spin_unlock_irqrestore(&amp;rdp-&gt;barrier_lock, flags);
            rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
            continue;
        }

4. rcutree_migrate_callbacks() completes execution on CPU1.
   Segcblist len for CPU2 becomes 0.

5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&amp;rdp-&gt;cblist)
   for CPU2 and completes the loop iteration after setting
   -&gt;barrier_seq_snap.

6. As there isn't any -&gt;barrier_head callback entrained; at
   this point, rcu_barrier() in CPU0 returns.

7. The callbacks, which migrated from CPU2 to CPU1, execute.

Straightforward per-CPU locking is also subject to the following race
condition noted by Boqun Feng:

1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
   rcu_seq_start() and init_completion(), but does not yet initialize
   rcu_state.barrier_cpu_count.

2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
   which in turn calls rcu_barrier_entrain() holding CPU2's.
   rdp-&gt;barrier_lock.  It then entrains -&gt;barrier_head for CPU2
   and atomically increments rcu_state.barrier_cpu_count, which is
   unfortunately not yet initialized to the value 2.

3. The just-entrained RCU callback is invoked.  It atomically
   decrements rcu_state.barrier_cpu_count and sees that it is
   now zero.  This callback therefore invokes complete().

4. CPU0 continues executing rcu_barrier(), but is not blocked
   by its call to wait_for_completion().  This results in rcu_barrier()
   returning before all pre-existing callbacks have been invoked,
   which is a bug.

Therefore, synchronization is provided by rcu_state.barrier_lock,
which is also held across the initialization sequence, especially the
rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
to the value 2.  In addition, this lock is held when entraining the
rcu_barrier() callback, when deciding whether or not a CPU has callbacks
that rcu_barrier() must wait on, when setting the -&gt;qsmaskinitnext for
incoming CPUs, and when migrating callbacks from a CPU that is going
offline.

Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Co-developed-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Signed-off-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit removes the cpus_read_lock() and cpus_read_unlock() calls
from rcu_barrier(), thus allowing CPUs to come and go during the course
of rcu_barrier() execution.  Posting of the -&gt;barrier_head callbacks does
synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
are held for short time periods on both sides.  Thus, full CPU-hotplug
operations could both start and finish during the execution of a given
rcu_barrier() invocation.

Additional synchronization is provided by a global -&gt;barrier_lock.
Since the -&gt;barrier_lock is only used during rcu_barrier() execution and
during onlining/offlining a CPU, the contention for this lock should
be low.  It might be tempting to make use of a per-CPU lock just on
general principles, but straightforward attempts to do this have the
problems shown below.

Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
any callback and CPU2 has callbacks.

1. CPU0 calls rcu_barrier().

2. CPU1 starts offlining for CPU2. CPU1 calls
   rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
   from rcutree_migrate_callbacks(), with CPU2's rdp-&gt;barrier_lock.
   It does not entrain -&gt;barrier_head for CPU2, as rcu_barrier()
   on CPU0 hasn't started the barrier sequence (by calling
   rcu_seq_start(&amp;rcu_state.barrier_sequence)) yet.

3. CPU0 starts new barrier sequence. It iterates over
   CPU0 and CPU1, after acquiring their per-cpu -&gt;barrier_lock
   and finds 0 segcblist length. It updates -&gt;barrier_seq_snap
   for CPU0 and CPU1 and continues loop iteration to CPU2.

    for_each_possible_cpu(cpu) {
        raw_spin_lock_irqsave(&amp;rdp-&gt;barrier_lock, flags);
        if (!rcu_segcblist_n_cbs(&amp;rdp-&gt;cblist)) {
            WRITE_ONCE(rdp-&gt;barrier_seq_snap, gseq);
            raw_spin_unlock_irqrestore(&amp;rdp-&gt;barrier_lock, flags);
            rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
            continue;
        }

4. rcutree_migrate_callbacks() completes execution on CPU1.
   Segcblist len for CPU2 becomes 0.

5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&amp;rdp-&gt;cblist)
   for CPU2 and completes the loop iteration after setting
   -&gt;barrier_seq_snap.

6. As there isn't any -&gt;barrier_head callback entrained; at
   this point, rcu_barrier() in CPU0 returns.

7. The callbacks, which migrated from CPU2 to CPU1, execute.

Straightforward per-CPU locking is also subject to the following race
condition noted by Boqun Feng:

1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
   rcu_seq_start() and init_completion(), but does not yet initialize
   rcu_state.barrier_cpu_count.

2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
   which in turn calls rcu_barrier_entrain() holding CPU2's.
   rdp-&gt;barrier_lock.  It then entrains -&gt;barrier_head for CPU2
   and atomically increments rcu_state.barrier_cpu_count, which is
   unfortunately not yet initialized to the value 2.

3. The just-entrained RCU callback is invoked.  It atomically
   decrements rcu_state.barrier_cpu_count and sees that it is
   now zero.  This callback therefore invokes complete().

4. CPU0 continues executing rcu_barrier(), but is not blocked
   by its call to wait_for_completion().  This results in rcu_barrier()
   returning before all pre-existing callbacks have been invoked,
   which is a bug.

Therefore, synchronization is provided by rcu_state.barrier_lock,
which is also held across the initialization sequence, especially the
rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
to the value 2.  In addition, this lock is held when entraining the
rcu_barrier() callback, when deciding whether or not a CPU has callbacks
that rcu_barrier() must wait on, when setting the -&gt;qsmaskinitnext for
incoming CPUs, and when migrating callbacks from a CPU that is going
offline.

Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Co-developed-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Signed-off-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
