linux-stable.git/kernel/sched/cputime.c, branch v5.12.2

irqtime: Move irqtime entry accounting after irq offset incrementation

2020-12-02T19:20:05+00:00

IRQ time entry is currently accounted before HARDIRQ_OFFSET or
SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
index the cputime to account is dispatched.

Unfortunately it prevents tick_irq_enter() from being called under
HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
entry accounting due to the necessary clock catch up. As a result we
don't benefit from appropriate lockdep coverage on tick_irq_enter().

To prepare for fixing this, move the IRQ entry cputime accounting after
the preempt offset is incremented. This requires the cputime dispatch
code to handle the extra offset.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lore.kernel.org/r/20201202115732.27827-5-frederic@kernel.org

sched/vtime: Consolidate IRQ time accounting

2020-12-02T19:20:05+00:00

The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
all have their own version of irq time accounting that dispatch the
cputime to the appropriate index: hardirq, softirq, system, idle,
guest... from an all-in-one function.

Instead of having these ad-hoc versions, move the cputime destination
dispatch decision to the core code and leave only the actual per-index
cputime accounting to the architecture.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org

s390/vtime: Use the generic IRQ entry accounting

2020-12-02T19:20:04+00:00

s390 has its own version of IRQ entry accounting because it doesn't
account the idle time the same way the other architectures do. Only
the actual idle sleep time is accounted as idle time, the rest of the
idle task execution is accounted as system time.

Make the generic IRQ entry accounting aware of architectures that have
their own way of accounting idle time and convert s390 to use it.

This prepares s390 to get involved in further consolidations of IRQ
time accounting.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org

sched/cputime: Remove symbol exports from IRQ time accounting

2020-12-02T19:20:04+00:00

account_irq_enter_time() and account_irq_exit_time() are not called
from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
cputime accounting functions called from there.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201202115732.27827-2-frederic@kernel.org

sched/cputime: Improve cputime_adjust()

2020-06-15T12:10:00+00:00

People report that utime and stime from /proc//stat become very
wrong when the numbers are big enough, especially if you watch these
counters incrementally.

Specifically, the current implementation of: stime*rtime/total,
results in a saw-tooth function on top of the desired line, where the
teeth grow in size the larger the values become. IOW, it has a
relative error.

The result is that, when watching incrementally as time progresses
(for large values), we'll see periods of pure stime or utime increase,
irrespective of the actual ratio we're striving for.

Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64()
that is far more accurate. This also allows architectures to override
the implementation -- for instance they can opt for the old algorithm
if this new one turns out to be too expensive for them.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20200519172506.GA317395@hirez.programming.kicks-ass.net

sched/vtime: Work around an unitialized variable warning

2020-04-15T09:06:50+00:00

Work around this warning:

  kernel/sched/cputime.c: In function ‘kcpustat_field’:
  kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]

because GCC can't see that val is used only when err is 0.

Acked-by: Peter Zijlstra 
Signed-off-by: Borislav Petkov 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic

sched/vtime: Prevent unstable evaluation of WARN(vtime->state)

2020-03-06T11:57:16+00:00

As the vtime is sampled under loose seqcount protection by kcpustat, the
vtime fields may change as the code flows. Where logic dictates a field
has a static value, use a READ_ONCE.

Signed-off-by: Chris Wilson 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Fixes: 74722bb223d0 ("sched/vtime: Bring up complete kcpustat accessor")
Link: https://lkml.kernel.org/r/20200123180849.28486-1-frederic@kernel.org

sched/cputime: move rq parameter in irqtime_account_process_tick

2020-01-17T09:19:21+00:00

Every time we call irqtime_account_process_tick() is in a interrupt,
Every caller will get and assign a parameter rq = this_rq(), This is
unnecessary and increase the code size a little bit. Move the rq getting
action to irqtime_account_process_tick internally is better.

             base               with this patch
cputime.o    578792 bytes        577888 bytes

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/1577959674-255537-1-git-send-email-alex.shi@linux.alibaba.com

sched/vtime: Bring up complete kcpustat accessor

2019-11-21T06:33:24+00:00

Many callsites want to fetch the values of system, user, user_nice, guest
or guest_nice kcpustat fields altogether or at least a pair of these.

In that case calling kcpustat_field() for each requested field brings
unecessary overhead when we could fetch all of them in a row.

So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array
in a vtime safe way under the same RCU and seqcount block.

Signed-off-by: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Wanpeng Li 
Cc: Yauheni Kaliuta 
Link: https://lkml.kernel.org/r/20191121024430.19938-3-frederic@kernel.org
Signed-off-by: Ingo Molnar

sched/cputime: Support other fields on kcpustat_field()

2019-11-21T06:33:23+00:00

Provide support for user, nice, guest and guest_nice fields through
kcpustat_field().

Whether we account the delta to a nice or not nice field is decided on
top of the nice value snapshot taken at the time we call kcpustat_field().
If the nice value of the task has been changed since the last vtime
update, we may have inacurrate distribution of the nice VS unnice
cputime.

However this is considered as a minor issue compared to the proper fix
that would involve interrupting the target on nice updates, which is
undesired on nohz_full CPUs.

Signed-off-by: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Wanpeng Li 
Cc: Yauheni Kaliuta 
Link: https://lkml.kernel.org/r/20191121024430.19938-2-frederic@kernel.org
Signed-off-by: Ingo Molnar