<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/sched, branch v5.7-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>sched/vtime: Work around an unitialized variable warning</title>
<updated>2020-04-15T09:06:50+00:00</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2020-03-27T21:43:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e0d648f9d883ec1efab261af158d73aa30e9dd12'/>
<id>e0d648f9d883ec1efab261af158d73aa30e9dd12</id>
<content type='text'>
Work around this warning:

  kernel/sched/cputime.c: In function ‘kcpustat_field’:
  kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]

because GCC can't see that val is used only when err is 0.

Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Work around this warning:

  kernel/sched/cputime.c: In function ‘kcpustat_field’:
  kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]

because GCC can't see that val is used only when err is 0.

Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters</title>
<updated>2020-04-15T08:38:26+00:00</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2020-04-03T22:35:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3662daf023500dc084fa3b96f68a6f46179ddc73'/>
<id>3662daf023500dc084fa3b96f68a6f46179ddc73</id>
<content type='text'>
The "isolcpus=" parameter allows sub-parameters before the cpulist is
specified, and if the parser detects an unknown sub-parameters the whole
parameter will be ignored.

This design is incompatible with itself when new sub-parameters are added.
An older kernel will not recognize the new sub-parameter and will
invalidate the whole parameter so the CPU isolation will not take
effect. It emits a warning:

    isolcpus: Error, unknown flag

The better and compatible way is to allow "isolcpus=" to skip unknown
sub-parameters, so that even if new sub-parameters are added an older
kernel will still be able to behave as usual even if with the new
sub-parameter specified on the command line.

Ideally this should have been there when the first sub-parameter for
"isolcpus=" was introduced.

Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The "isolcpus=" parameter allows sub-parameters before the cpulist is
specified, and if the parser detects an unknown sub-parameters the whole
parameter will be ignored.

This design is incompatible with itself when new sub-parameters are added.
An older kernel will not recognize the new sub-parameter and will
invalidate the whole parameter so the CPU isolation will not take
effect. It emits a warning:

    isolcpus: Error, unknown flag

The better and compatible way is to allow "isolcpus=" to skip unknown
sub-parameters, so that even if new sub-parameters are added an older
kernel will still be able to behave as usual even if with the new
sub-parameter specified on the command line.

Ideally this should have been there when the first sub-parameter for
"isolcpus=" was introduced.

Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Add task uclamp values to SCHED_DEBUG procfs</title>
<updated>2020-04-08T09:35:27+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2020-02-26T12:45:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=96e74ebf8d594496f3dda5f8e26af6b4e161e4e9'/>
<id>96e74ebf8d594496f3dda5f8e26af6b4e161e4e9</id>
<content type='text'>
Requested and effective uclamp values can be a bit tricky to decipher when
playing with cgroup hierarchies. Add them to a task's procfs when
SCHED_DEBUG is enabled.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Requested and effective uclamp values can be a bit tricky to decipher when
playing with cgroup hierarchies. Add them to a task's procfs when
SCHED_DEBUG is enabled.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Factor out printing formats into common macros</title>
<updated>2020-04-08T09:35:26+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2020-02-26T12:45:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9e3bf9469c29f7e4e49c5c0d8fecaf8ac57d1fe4'/>
<id>9e3bf9469c29f7e4e49c5c0d8fecaf8ac57d1fe4</id>
<content type='text'>
The printing macros in debug.c keep redefining the same output
format. Collect each output format in a single definition, and reuse that
definition in the other macros. While at it, add a layer of parentheses and
replace printf's  with the newly introduced macros.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The printing macros in debug.c keep redefining the same output
format. Collect each output format in a single definition, and reuse that
definition in the other macros. While at it, add a layer of parentheses and
replace printf's  with the newly introduced macros.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Remove redundant macro define</title>
<updated>2020-04-08T09:35:24+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2020-02-26T12:45:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c745a6212c9923eb2253f4229e5d7277ca3d9d8e'/>
<id>c745a6212c9923eb2253f4229e5d7277ca3d9d8e</id>
<content type='text'>
Most printing macros for procfs are defined globally in debug.c, and they
are re-defined (to the exact same thing) within proc_sched_show_task().

Get rid of the duplicate defines.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Most printing macros for procfs are defined globally in debug.c, and they
are re-defined (to the exact same thing) within proc_sched_show_task().

Get rid of the duplicate defines.

Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Remove unused rq::last_load_update_tick</title>
<updated>2020-04-08T09:35:23+00:00</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2020-03-20T13:21:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=275b2f6723ab9173484e1055ae138d4c2dd9d7c5'/>
<id>275b2f6723ab9173484e1055ae138d4c2dd9d7c5</id>
<content type='text'>
The following commit:

  5e83eafbfd3b ("sched/fair: Remove the rq-&gt;cpu_load[] update code")

eliminated the last use case for rq-&gt;last_load_update_tick, so remove
the field as well.

Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/1584710495-308969-1-git-send-email-vincent.donnefort@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following commit:

  5e83eafbfd3b ("sched/fair: Remove the rq-&gt;cpu_load[] update code")

eliminated the last use case for rq-&gt;last_load_update_tick, so remove
the field as well.

Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/1584710495-308969-1-git-send-email-vincent.donnefort@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: Remove the warning in wq_worker_sleeping()</title>
<updated>2020-04-08T09:35:20+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2020-03-27T23:29:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=62849a9612924a655c67cf6962920544aa5c20db'/>
<id>62849a9612924a655c67cf6962920544aa5c20db</id>
<content type='text'>
The kernel test robot triggered a warning with the following race:
   task-ctx A                            interrupt-ctx B
 worker
  -&gt; process_one_work()
    -&gt; work_item()
      -&gt; schedule();
         -&gt; sched_submit_work()
           -&gt; wq_worker_sleeping()
             -&gt; -&gt;sleeping = 1
               atomic_dec_and_test(nr_running)
         __schedule();                *interrupt*
                                       async_page_fault()
                                       -&gt; local_irq_enable();
                                       -&gt; schedule();
                                          -&gt; sched_submit_work()
                                            -&gt; wq_worker_sleeping()
                                               -&gt; if (WARN_ON(-&gt;sleeping)) return
                                          -&gt; __schedule()
                                            -&gt;  sched_update_worker()
                                              -&gt; wq_worker_running()
                                                 -&gt; atomic_inc(nr_running);
                                                 -&gt; -&gt;sleeping = 0;

      -&gt;  sched_update_worker()
        -&gt; wq_worker_running()
          if (!-&gt;sleeping) return

In this context the warning is pointless everything is fine.
An interrupt before wq_worker_sleeping() will perform the -&gt;sleeping
assignment (0 -&gt; 1 &gt; 0) twice.
An interrupt after wq_worker_sleeping() will trigger the warning and
nr_running will be decremented (by A) and incremented once (only by B, A
will skip it). This is the case until the -&gt;sleeping is zeroed again in
wq_worker_running().

Remove the WARN statement because this condition may happen. Document
that preemption around wq_worker_sleeping() needs to be disabled to
protect -&gt;sleeping and not just as an optimisation.

Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kernel test robot triggered a warning with the following race:
   task-ctx A                            interrupt-ctx B
 worker
  -&gt; process_one_work()
    -&gt; work_item()
      -&gt; schedule();
         -&gt; sched_submit_work()
           -&gt; wq_worker_sleeping()
             -&gt; -&gt;sleeping = 1
               atomic_dec_and_test(nr_running)
         __schedule();                *interrupt*
                                       async_page_fault()
                                       -&gt; local_irq_enable();
                                       -&gt; schedule();
                                          -&gt; sched_submit_work()
                                            -&gt; wq_worker_sleeping()
                                               -&gt; if (WARN_ON(-&gt;sleeping)) return
                                          -&gt; __schedule()
                                            -&gt;  sched_update_worker()
                                              -&gt; wq_worker_running()
                                                 -&gt; atomic_inc(nr_running);
                                                 -&gt; -&gt;sleeping = 0;

      -&gt;  sched_update_worker()
        -&gt; wq_worker_running()
          if (!-&gt;sleeping) return

In this context the warning is pointless everything is fine.
An interrupt before wq_worker_sleeping() will perform the -&gt;sleeping
assignment (0 -&gt; 1 &gt; 0) twice.
An interrupt after wq_worker_sleeping() will trigger the warning and
nr_running will be decremented (by A) and incremented once (only by B, A
will skip it). This is the case until the -&gt;sleeping is zeroed again in
wq_worker_running().

Remove the WARN statement because this condition may happen. Document
that preemption around wq_worker_sleeping() needs to be disabled to
protect -&gt;sleeping and not just as an optimisation.

Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix negative imbalance in imbalance calculation</title>
<updated>2020-04-08T09:35:20+00:00</updated>
<author>
<name>Aubrey Li</name>
<email>aubrey.li@intel.com</email>
</author>
<published>2020-03-26T05:42:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=111688ca1c4a43a7e482f5401f82c46326b8ed49'/>
<id>111688ca1c4a43a7e482f5401f82c46326b8ed49</id>
<content type='text'>
A negative imbalance value was observed after imbalance calculation,
this happens when the local sched group type is group_fully_busy,
and the average load of local group is greater than the selected
busiest group. Fix this problem by comparing the average load of the
local and busiest group before imbalance calculation formula.

Suggested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Aubrey Li &lt;aubrey.li@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A negative imbalance value was observed after imbalance calculation,
this happens when the local sched group type is group_fully_busy,
and the average load of local group is greater than the selected
busiest group. Fix this problem by comparing the average load of the
local and busiest group before imbalance calculation formula.

Suggested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Aubrey Li &lt;aubrey.li@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix race between runtime distribution and assignment</title>
<updated>2020-04-08T09:35:19+00:00</updated>
<author>
<name>Huaixin Chang</name>
<email>changhuaixin@linux.alibaba.com</email>
</author>
<published>2020-03-27T03:26:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26a8b12747c975b33b4a82d62e4a307e1c07f31b'/>
<id>26a8b12747c975b33b4a82d62e4a307e1c07f31b</id>
<content type='text'>
Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b-&gt;runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b-&gt;runtime at the same time.

Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b-&gt;runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.

A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far more than the
expected 10%.

On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.

Solve this by holding lock inside distribute_cfs_runtime().

Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Huaixin Chang &lt;changhuaixin@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b-&gt;runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b-&gt;runtime at the same time.

Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b-&gt;runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.

A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far more than the
expected 10%.

On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.

Solve this by holding lock inside distribute_cfs_runtime().

Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Huaixin Chang &lt;changhuaixin@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Align rq-&gt;avg_idle and rq-&gt;avg_scan_cost</title>
<updated>2020-04-08T09:35:18+00:00</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2020-03-30T09:01:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb'/>
<id>d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb</id>
<content type='text'>
sched/core.c uses update_avg() for rq-&gt;avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq-&gt;avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq-&gt;avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

  """
  The result of E1 &gt;&gt; E2 is E1 right-shifted E2 bit positions. [...] If E1
  has a signed type and a negative value, the resulting value is
  implementation-defined.
  """

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

  -4      -&gt; 0xF..FC
  -4 &gt;&gt; 3 -&gt; 0xF..FF == -1 != -4 / 8

  -8      -&gt; 0xF..F8
  -8 &gt;&gt; 3 -&gt; 0xF..FF == -1 == -8 / 8

  -9      -&gt; 0xF..F7
  -9 &gt;&gt; 3 -&gt; 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

  - sub x1, x1, x0
  + subs x1, x1, x0 // set condition codes
  + add x0, x1, #0x7
  + csel x0, x0, x1, mi // x0 = x1 &lt; 0 ? x0 : x1
    add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sched/core.c uses update_avg() for rq-&gt;avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq-&gt;avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq-&gt;avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

  """
  The result of E1 &gt;&gt; E2 is E1 right-shifted E2 bit positions. [...] If E1
  has a signed type and a negative value, the resulting value is
  implementation-defined.
  """

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

  -4      -&gt; 0xF..FC
  -4 &gt;&gt; 3 -&gt; 0xF..FF == -1 != -4 / 8

  -8      -&gt; 0xF..F8
  -8 &gt;&gt; 3 -&gt; 0xF..FF == -1 == -8 / 8

  -9      -&gt; 0xF..F7
  -9 &gt;&gt; 3 -&gt; 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

  - sub x1, x1, x0
  + subs x1, x1, x0 // set condition codes
  + add x0, x1, #0x7
  + csel x0, x0, x1, mi // x0 = x1 &lt; 0 ? x0 : x1
    add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
</pre>
</div>
</content>
</entry>
</feed>
