linux.git/kernel/rcu/tasks.h, branch v5.17

rcu-tasks: Fix computation of CPU-to-list shift counts

2022-01-26T21:04:05+00:00

The ->percpu_enqueue_shift field is used to map from the running CPU
number to the index of the corresponding callback list.  This mapping
can change at runtime in response to varying callback load, resulting
in varying levels of contention on the callback-list locks.

Unfortunately, the initial value of this field is correct only if the
system happens to have a power-of-two number of CPUs, otherwise the
callbacks from the high-numbered CPUs can be placed into the callback list
indexed by 1 (rather than 0), and those index-1 callbacks will be ignored.
This can result in soft lockups and hangs.

This commit therefore corrects this mapping, adding one to this shift
count as needed for systems having odd numbers of CPUs.

Fixes: 7a30871b6a27 ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
Reported-by: Andrii Nakryiko 
Cc: Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Use fewer callbacks queues if callback flood ends

2021-12-09T18:52:11+00:00

By default, when lock contention is encountered, the RCU Tasks flavors
of RCU switch to using per-CPU queueing.  However, if the callback
flood ends, per-CPU queueing continues to be used, which introduces
significant additional overhead, especially for callback invocation,
which fans out a series of workqueue handlers.

This commit therefore switches back to single-queue operation if at the
beginning of a grace period there are very few callbacks.  The definition
of "very few" is set by the rcupdate.rcu_task_collapse_lim module
parameter, which defaults to 10.  This switch happens in two phases,
with the first phase causing future callbacks to be enqueued on CPU 0's
queue, but with all queues continuing to be checked for grace periods
and callback invocation.  The second phase checks to see if an RCU grace
period has elapsed and if all remaining RCU-Tasks callbacks are queued
on CPU 0.  If so, only CPU 0 is checked for future grace periods and
callback operation.

Of course, the return of contention anywhere during this process will
result in returning to per-CPU callback queueing.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing

2021-12-09T18:52:11+00:00

Decreasing the number of callback queues is a bit tricky because it
is necessary to handle callbacks that were queued before the number of
queues decreased, but which were not ready to invoke until afterwards.
This commit takes a first step in this direction by maintaining a separate
->percpu_dequeue_lim to control callback dequeueing, in addition to the
existing ->percpu_enqueue_lim which now controls only enqueueing.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Use more callback queues if contention encountered

2021-12-09T18:52:11+00:00

The rcupdate.rcu_task_enqueue_lim module parameter allows system
administrators to tune the number of callback queues used by the RCU
Tasks flavors.  However if callback storms are infrequent, it would
be better to operate with a single queue on a given system unless and
until that system actually needed more queues.  Systems not needing
more queues can then avoid the overhead of checking the extra queues
and especially avoid the overhead of fanning workqueue handlers out to
all CPUs to invoke callbacks.

This commit therefore switches to using all the CPUs' callback queues if
call_rcu_tasks_generic() encounters too much lock contention.  The amount
of lock contention to tolerate defaults to 100 contended lock acquisitions
per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim
module parameter.

Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim
module parameter is negative, which is its default value (-1).
This allows savvy systems administrators to set the number of queues
to some known good value and to not have to worry about the kernel doing
any second guessing.

[ paulmck: Apply feedback from Guillaume Tucker and kernelci. ]

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()

2021-12-09T18:52:11+00:00

If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(),
or call_rcu_tasks_trace() holds a raw spinlock, and then if
call_rcu_tasks_generic() determines that the grace-period kthread must
be awakened, then the wakeup might acquire a normal spinlock while a
raw spinlock is held.  This results in lockdep splats when the
kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y.

This commit therefore defers the wakeup using irq_work_queue().

It would be nice to directly invoke wakeup when a raw spinlock is not
held, but there is currently no way to check for this in all kernels.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention

2021-12-09T18:52:11+00:00

This commit converts the unconditional raw_spin_lock_rcu_node() lock
acquisition in call_rcu_tasks_generic() to a trylock followed by an
unconditional acquisition if the trylock fails.  If the trylock fails,
the failure is counted, but the count is reset to zero on each new jiffy.

This statistic will be used to determine when to move from a single
callback queue to per-CPU callback queues.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing

2021-12-09T18:52:11+00:00

This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that
sets the initial number of callback queues to use for the RCU Tasks
family of RCU implementations.  This parameter allows testing of various
fanout values.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues

2021-12-09T18:52:11+00:00

Currently, rcu_barrier_tasks(), rcu_barrier_tasks_rude(),
and rcu_barrier_tasks_trace() simply invoke the corresponding
synchronize_rcu_tasks*() function.  This works because there is only
one callback queue.

However, there will soon be multiple callback queues.  This commit
therefore scans the queues currently in use, entraining a callback on
each non-empty queue.  Sequence numbers and reference counts are used
to synchronize this process in a manner similar to the approach taken
by rcu_barrier().

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations

2021-12-09T18:52:00+00:00

If there is a flood of callbacks, it is necessary to put multiple
CPUs to work invoking those callbacks.  This commit therefore uses a
workqueue-flooding approach to parallelize RCU Tasks callback execution.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney

rcu-tasks: Abstract invocations of callbacks

2021-12-09T18:51:44+00:00

This commit adds a rcu_tasks_invoke_cbs() function that invokes all
ready callbacks on all of the per-CPU lists that are currently in use.

Reported-by: Martin Lau 
Cc: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney