linux-stable.git/kernel/time/timer.c, branch linux-6.10.y

Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

2024-05-18T00:31:24+00:00

Pull sysctl updates from Joel Granados:

 - Remove sentinel elements from ctl_table structs in kernel/*

   Removing sentinels in ctl_table arrays reduces the build time size
   and runtime memory consumed by ~64 bytes per array. Removals for
   net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
   through their respective subsystems making the next release the most
   likely place where the final series that removes the check for
   proc_name == NULL will land.

   This adds to removals already in arch/, drivers/ and fs/.

 - Adjust ctl_table definitions and references to allow constification
     - Remove unused ctl_table function arguments
     - Move non-const elements from ctl_table to ctl_table_header
     - Make ctl_table pointers const in ctl_table_root structure

   Making the static ctl_table structs const will increase safety by
   keeping the pointers to proc_handler functions in .rodata. Though no
   ctl_tables where made const in this PR, the ground work for making
   that possible has started with these changes sent by Thomas
   Weißschuh.

* tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: drop now unnecessary out-of-bounds check
  sysctl: move sysctl type to ctl_table_header
  sysctl: drop sysctl_is_perm_empty_ctl_table
  sysctl: treewide: constify argument ctl_table_root::permissions(table)
  sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
  bpf: Remove the now superfluous sentinel elements from ctl_table array
  delayacct: Remove the now superfluous sentinel elements from ctl_table array
  kprobes: Remove the now superfluous sentinel elements from ctl_table array
  printk: Remove the now superfluous sentinel elements from ctl_table array
  scheduler: Remove the now superfluous sentinel elements from ctl_table array
  seccomp: Remove the now superfluous sentinel elements from ctl_table array
  timekeeping: Remove the now superfluous sentinel elements from ctl_table array
  ftrace: Remove the now superfluous sentinel elements from ctl_table array
  umh: Remove the now superfluous sentinel elements from ctl_table array
  kernel misc: Remove the now superfluous sentinel elements from ctl_table array

Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2024-05-14T00:18:51+00:00

Pull scheduler updates from Ingo Molnar:

 - Add cpufreq pressure feedback for the scheduler

 - Rework misfit load-balancing wrt affinity restrictions

 - Clean up and simplify the code around ::overutilized and
   ::overload access.

 - Simplify sched_balance_newidle()

 - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
   handling that changed the output.

 - Rework & clean up  interactions wrt arch_vtime_task_switch()

 - Reorganize, clean up and unify most of the higher level
   scheduler balancing function names around the sched_balance_*()
   prefix

 - Simplify the balancing flag code (sched_balance_running)

 - Miscellaneous cleanups & fixes

* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/pelt: Remove shift of thermal clock
  sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched/cpufreq: Take cpufreq feedback into account
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched/fair: Fix update of rd->sg_overutilized
  sched/vtime: Do not include  header
  s390/irq,nmi: Include  header directly
  s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
  sched/vtime: Get rid of generic vtime_task_switch() implementation
  sched/vtime: Remove confusing arch_vtime_task_switch() declaration
  sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
  sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
  sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
  sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
  sched/fair: Rename root_domain::overload to ::overloaded
  sched/fair: Use helper functions to access root_domain::overload
  sched/fair: Check root_domain::overload value before update
  sched/fair: Combine EAS check with root_domain::overutilized access
  sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
  ...

timekeeping: Remove the now superfluous sentinel elements from ctl_table array

2024-04-24T07:43:54+00:00

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel element from time_sysctl

Signed-off-by: Joel Granados

timers: Fix text inconsistencies and spelling

2024-04-01T08:36:35+00:00

Fix some text for consistency: s/lvl/level/ in a comment and use
correct/full function names in comments.

Correct spelling errors as reported by codespell.

Signed-off-by: Randy Dunlap 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20240331172652.14086-7-rdunlap@infradead.org

Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch

2024-03-25T10:32:29+00:00

Signed-off-by: Ingo Molnar

timers: Fix removed self-IPI on global timer's enqueue in nohz_full

2024-03-19T09:14:55+00:00

While running in nohz_full mode, a task may enqueue a timer while the
tick is stopped. However the only places where the timer wheel,
alongside the timer migration machinery's decision, may reprogram the
next event accordingly with that new timer's expiry are the idle loop or
any IRQ tail.

However neither the idle task nor an interrupt may run on the CPU if it
resumes busy work in userspace for a long while in full dynticks mode.

To solve this, the timer enqueue path raises a self-IPI that will
re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
avoids potential locking inversion.

This is supposed to happen both for local and global timers but commit:

	b2cf7507e186 ("timers: Always queue timers on the local CPU")

broke the global timers case with removing the ->is_idle field handling
for the global base. As a result, global timers enqueue may go unnoticed
in nohz_full.

Fix this with restoring the idle tracking of the global timer's base,
allowing self-IPIs again on enqueue time.

Fixes: b2cf7507e186 ("timers: Always queue timers on the local CPU")
Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20240318230729.15497-3-frederic@kernel.org

sched/balancing: Rename scheduler_tick() => sched_tick()

2024-03-12T10:59:59+00:00

- Standardize on prefixing scheduler-internal functions defined
  in  with sched_*() prefix. scheduler_tick() was
  the only function using the scheduler_ prefix. Harmonize it.

- The other reason to rename it is the NOHZ scheduler tick
  handling functions are already named sched_tick_*().
  Make the 'git grep sched_tick' more meaningful.

Signed-off-by: Ingo Molnar 
Acked-by: Valentin Schneider 
Reviewed-by: Shrikanth Hegde 
Link: https://lore.kernel.org/r/20240308111819.1101550-3-mingo@kernel.org

timers: Assert no next dyntick timer look-up while CPU is offline

2024-02-26T10:37:32+00:00

The next timer (re-)evaluation, with the purpose of entering/updating
the dyntick mode, can happen from 3 sites and none of them are relevant
while the CPU is offline:

1) The idle loop:
	a) From the quick check helping the cpuidle governor to heuristically
	   predict the best C-state.
	b) While stopping the tick.

   But if the CPU is offline, the tick has been cancelled and there is
   consequently no need to further stop the tick.

2) Remote expiry: when a CPU remotely expires global timers on behalf of
   another CPU, the latter target's next timer is re-evaluated
   afterwards. However remote expîry doesn't happen on offline CPUs.

3) IRQ exit: on nohz_full mode, the tick is (re-)evaluated on IRQ exit.
   But full dynticks is disabled on offline CPUs.

Therefore it is safe to assume that no next dyntick timer lookup can
be performed on offline CPUs.

Assert this expectation to report any surprise.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20240225225508.11587-17-frederic@kernel.org

timers: Always queue timers on the local CPU

2024-02-22T16:52:32+00:00

The timer pull model is in place so we can remove the heuristics which try
to guess the best target CPU at enqueue/modification time.

All non pinned timers are queued on the local CPU in the separate storage
and eventually pulled at expiry time to a remote CPU.

Originally-by: Richard Cochran (linutronix GmbH) 
Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Frederic Weisbecker 
Link: https://lore.kernel.org/r/20240221090548.36600-21-anna-maria@linutronix.de

timers: Implement the hierarchical pull model

2024-02-22T16:52:32+00:00

Placing timers at enqueue time on a target CPU based on dubious heuristics
does not make any sense:

 1) Most timer wheel timers are canceled or rearmed before they expire.

 2) The heuristics to predict which CPU will be busy when the timer expires
    are wrong by definition.

So placing the timers at enqueue wastes precious cycles.

The proper solution to this problem is to always queue the timers on the
local CPU and allow the non pinned timers to be pulled onto a busy CPU at
expiry time.

Therefore split the timer storage into local pinned and global timers:
Local pinned timers are always expired on the CPU on which they have been
queued. Global timers can be expired on any CPU.

As long as a CPU is busy it expires both local and global timers. When a
CPU goes idle it arms for the first expiring local timer. If the first
expiring pinned (local) timer is before the first expiring movable timer,
then no action is required because the CPU will wake up before the first
movable timer expires. If the first expiring movable timer is before the
first expiring pinned (local) timer, then this timer is queued into an idle
timerqueue and eventually expired by another active CPU.

To avoid global locking the timerqueues are implemented as a hierarchy. The
lowest level of the hierarchy holds the CPUs. The CPUs are associated to
groups of 8, which are separated per node. If more than one CPU group
exist, then a second level in the hierarchy collects the groups. Depending
on the size of the system more than 2 levels are required. Each group has a
"migrator" which checks the timerqueue during the tick for remote expirable
timers.

If the last CPU in a group goes idle it reports the first expiring event in
the group up to the next group(s) in the hierarchy. If the last CPU goes
idle it arms its timer for the first system wide expiring timer to ensure
that no timer event is missed.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Frederic Weisbecker 
Link: https://lore.kernel.org/r/20240222103710.32582-1-anna-maria@linutronix.de