<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched/fair.c, branch linux-4.6.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched/fair: Fix effective_load() to consistently use smoothed load</title>
<updated>2016-08-10T10:54:48+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-06-24T13:53:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=81dc160128e7d8a9e3d5a85657514cd9a110007d'/>
<id>81dc160128e7d8a9e3d5a85657514cd9a110007d</id>
<content type='text'>
commit 7dd4912594daf769a46744848b05bd5bc6d62469 upstream.

Starting with the following commit:

  fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j &gt;= rw_i we cannot use tg-&gt;load_avg directly, since that might be
lagging a correction on the current cfs_rq-&gt;avg.load_avg value.
Therefore we use tg-&gt;load_avg - cfs_rq-&gt;tg_load_avg_contrib +
cfs_rq-&gt;avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq-&gt;avg.load_avg, as is later used in @w, but uses
cfs_rq-&gt;load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7dd4912594daf769a46744848b05bd5bc6d62469 upstream.

Starting with the following commit:

  fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j &gt;= rw_i we cannot use tg-&gt;load_avg directly, since that might be
lagging a correction on the current cfs_rq-&gt;avg.load_avg value.
Therefore we use tg-&gt;load_avg - cfs_rq-&gt;tg_load_avg_contrib +
cfs_rq-&gt;avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq-&gt;avg.load_avg, as is later used in @w, but uses
cfs_rq-&gt;load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Fix deadlock when enabling sched events</title>
<updated>2016-08-10T10:54:44+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2016-06-13T07:32:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6e9bad4cb245dac00bf99b568cce1168cf5cecf'/>
<id>f6e9bad4cb245dac00bf99b568cce1168cf5cecf</id>
<content type='text'>
commit eda8dca519269c92a0771668b3d5678792de7b78 upstream.

I see a hang when enabling sched events:

  echo 1 &gt; /sys/kernel/debug/tracing/events/sched/enable

The printk buffer shows:

  BUG: spinlock recursion on CPU#1, swapper/1/0
   lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  ...
  Call Trace:
   &lt;IRQ&gt;  [&lt;ffffffff8143d663&gt;] dump_stack+0x85/0xc2
   [&lt;ffffffff81115948&gt;] spin_dump+0x78/0xc0
   [&lt;ffffffff81115aea&gt;] do_raw_spin_lock+0x11a/0x150
   [&lt;ffffffff81891471&gt;] _raw_spin_lock+0x61/0x80
   [&lt;ffffffff810e5466&gt;] ? try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff810e5466&gt;] try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff81891a0a&gt;] ? _raw_spin_unlock_irqrestore+0x4a/0x80
   [&lt;ffffffff810e5705&gt;] wake_up_process+0x15/0x20
   [&lt;ffffffff810cebb4&gt;] insert_work+0x84/0xc0
   [&lt;ffffffff810ced7f&gt;] __queue_work+0x18f/0x660
   [&lt;ffffffff810cf9a6&gt;] queue_work_on+0x46/0x90
   [&lt;ffffffffa00cd95b&gt;] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
   [&lt;ffffffffa00cdac0&gt;] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
   [&lt;ffffffff814babcd&gt;] soft_cursor+0x1ad/0x230
   [&lt;ffffffff814ba379&gt;] bit_cursor+0x649/0x680
   [&lt;ffffffff814b9d30&gt;] ? update_attr.isra.2+0x90/0x90
   [&lt;ffffffff814b5e6a&gt;] fbcon_cursor+0x14a/0x1c0
   [&lt;ffffffff81555ef8&gt;] hide_cursor+0x28/0x90
   [&lt;ffffffff81558b6f&gt;] vt_console_print+0x3bf/0x3f0
   [&lt;ffffffff81122c63&gt;] call_console_drivers.constprop.24+0x183/0x200
   [&lt;ffffffff811241f4&gt;] console_unlock+0x3d4/0x610
   [&lt;ffffffff811247f5&gt;] vprintk_emit+0x3c5/0x610
   [&lt;ffffffff81124bc9&gt;] vprintk_default+0x29/0x40
   [&lt;ffffffff811e965b&gt;] printk+0x57/0x73
   [&lt;ffffffff810f7a9e&gt;] enqueue_entity+0xc2e/0xc70
   [&lt;ffffffff810f7b39&gt;] enqueue_task_fair+0x59/0xab0
   [&lt;ffffffff8106dcd9&gt;] ? kvm_sched_clock_read+0x9/0x20
   [&lt;ffffffff8103fb39&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810e3fcc&gt;] activate_task+0x5c/0xa0
   [&lt;ffffffff810e4514&gt;] ttwu_do_activate+0x54/0xb0
   [&lt;ffffffff810e5cea&gt;] sched_ttwu_pending+0x7a/0xb0
   [&lt;ffffffff810e5e51&gt;] scheduler_ipi+0x61/0x170
   [&lt;ffffffff81059e7f&gt;] smp_trace_reschedule_interrupt+0x4f/0x2a0
   [&lt;ffffffff81893ba6&gt;] trace_reschedule_interrupt+0x96/0xa0
   &lt;EOI&gt;  [&lt;ffffffff8106e0d6&gt;] ? native_safe_halt+0x6/0x10
   [&lt;ffffffff8110fb1d&gt;] ? trace_hardirqs_on+0xd/0x10
   [&lt;ffffffff81040ac0&gt;] default_idle+0x20/0x1a0
   [&lt;ffffffff8104147f&gt;] arch_cpu_idle+0xf/0x20
   [&lt;ffffffff81102f8f&gt;] default_idle_call+0x2f/0x50
   [&lt;ffffffff8110332e&gt;] cpu_startup_entry+0x37e/0x450
   [&lt;ffffffff8105af70&gt;] start_secondary+0x160/0x1a0

Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.

The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eda8dca519269c92a0771668b3d5678792de7b78 upstream.

I see a hang when enabling sched events:

  echo 1 &gt; /sys/kernel/debug/tracing/events/sched/enable

The printk buffer shows:

  BUG: spinlock recursion on CPU#1, swapper/1/0
   lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  ...
  Call Trace:
   &lt;IRQ&gt;  [&lt;ffffffff8143d663&gt;] dump_stack+0x85/0xc2
   [&lt;ffffffff81115948&gt;] spin_dump+0x78/0xc0
   [&lt;ffffffff81115aea&gt;] do_raw_spin_lock+0x11a/0x150
   [&lt;ffffffff81891471&gt;] _raw_spin_lock+0x61/0x80
   [&lt;ffffffff810e5466&gt;] ? try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff810e5466&gt;] try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff81891a0a&gt;] ? _raw_spin_unlock_irqrestore+0x4a/0x80
   [&lt;ffffffff810e5705&gt;] wake_up_process+0x15/0x20
   [&lt;ffffffff810cebb4&gt;] insert_work+0x84/0xc0
   [&lt;ffffffff810ced7f&gt;] __queue_work+0x18f/0x660
   [&lt;ffffffff810cf9a6&gt;] queue_work_on+0x46/0x90
   [&lt;ffffffffa00cd95b&gt;] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
   [&lt;ffffffffa00cdac0&gt;] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
   [&lt;ffffffff814babcd&gt;] soft_cursor+0x1ad/0x230
   [&lt;ffffffff814ba379&gt;] bit_cursor+0x649/0x680
   [&lt;ffffffff814b9d30&gt;] ? update_attr.isra.2+0x90/0x90
   [&lt;ffffffff814b5e6a&gt;] fbcon_cursor+0x14a/0x1c0
   [&lt;ffffffff81555ef8&gt;] hide_cursor+0x28/0x90
   [&lt;ffffffff81558b6f&gt;] vt_console_print+0x3bf/0x3f0
   [&lt;ffffffff81122c63&gt;] call_console_drivers.constprop.24+0x183/0x200
   [&lt;ffffffff811241f4&gt;] console_unlock+0x3d4/0x610
   [&lt;ffffffff811247f5&gt;] vprintk_emit+0x3c5/0x610
   [&lt;ffffffff81124bc9&gt;] vprintk_default+0x29/0x40
   [&lt;ffffffff811e965b&gt;] printk+0x57/0x73
   [&lt;ffffffff810f7a9e&gt;] enqueue_entity+0xc2e/0xc70
   [&lt;ffffffff810f7b39&gt;] enqueue_task_fair+0x59/0xab0
   [&lt;ffffffff8106dcd9&gt;] ? kvm_sched_clock_read+0x9/0x20
   [&lt;ffffffff8103fb39&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810e3fcc&gt;] activate_task+0x5c/0xa0
   [&lt;ffffffff810e4514&gt;] ttwu_do_activate+0x54/0xb0
   [&lt;ffffffff810e5cea&gt;] sched_ttwu_pending+0x7a/0xb0
   [&lt;ffffffff810e5e51&gt;] scheduler_ipi+0x61/0x170
   [&lt;ffffffff81059e7f&gt;] smp_trace_reschedule_interrupt+0x4f/0x2a0
   [&lt;ffffffff81893ba6&gt;] trace_reschedule_interrupt+0x96/0xa0
   &lt;EOI&gt;  [&lt;ffffffff8106e0d6&gt;] ? native_safe_halt+0x6/0x10
   [&lt;ffffffff8110fb1d&gt;] ? trace_hardirqs_on+0xd/0x10
   [&lt;ffffffff81040ac0&gt;] default_idle+0x20/0x1a0
   [&lt;ffffffff8104147f&gt;] arch_cpu_idle+0xf/0x20
   [&lt;ffffffff81102f8f&gt;] default_idle_call+0x2f/0x50
   [&lt;ffffffff8110332e&gt;] cpu_startup_entry+0x37e/0x450
   [&lt;ffffffff8105af70&gt;] start_secondary+0x160/0x1a0

Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.

The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix cfs_rq avg tracking underflow</title>
<updated>2016-07-27T15:42:13+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-06-16T08:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=59eca1a4c30d3bc3e3c002dd4cfbc12bc43c1874'/>
<id>59eca1a4c30d3bc3e3c002dd4cfbc12bc43c1874</id>
<content type='text'>
commit 8974189222159154c55f24ddad33e3613960521a upstream.

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

&gt; the code generated from update_cfs_rq_load_avg():
&gt;
&gt; 	if (atomic_long_read(&amp;cfs_rq-&gt;removed_load_avg)) {
&gt; 		s64 r = atomic_long_xchg(&amp;cfs_rq-&gt;removed_load_avg, 0);
&gt; 		sa-&gt;load_avg = max_t(long, sa-&gt;load_avg - r, 0);
&gt; 		sa-&gt;load_sum = max_t(s64, sa-&gt;load_sum - r * LOAD_AVG_MAX, 0);
&gt; 		removed_load = 1;
&gt; 	}
&gt;
&gt; turns into:
&gt;
&gt; ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
&gt; ffffffff8108706b:       48 85 c0                test   %rax,%rax
&gt; ffffffff8108706e:       74 40                   je     ffffffff810870b0 &lt;update_blocked_averages+0xc0&gt;
&gt; ffffffff81087070:       4c 89 f8                mov    %r15,%rax
&gt; ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
&gt; ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
&gt; ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
&gt; ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
&gt; ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
&gt; ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
&gt;
&gt; Which you'll note ends up with sa-&gt;load_avg -= r in memory at
&gt; ffffffff8108707a.

So I _should_ have looked at other unserialized users of -&gt;load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8974189222159154c55f24ddad33e3613960521a upstream.

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

&gt; the code generated from update_cfs_rq_load_avg():
&gt;
&gt; 	if (atomic_long_read(&amp;cfs_rq-&gt;removed_load_avg)) {
&gt; 		s64 r = atomic_long_xchg(&amp;cfs_rq-&gt;removed_load_avg, 0);
&gt; 		sa-&gt;load_avg = max_t(long, sa-&gt;load_avg - r, 0);
&gt; 		sa-&gt;load_sum = max_t(s64, sa-&gt;load_sum - r * LOAD_AVG_MAX, 0);
&gt; 		removed_load = 1;
&gt; 	}
&gt;
&gt; turns into:
&gt;
&gt; ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
&gt; ffffffff8108706b:       48 85 c0                test   %rax,%rax
&gt; ffffffff8108706e:       74 40                   je     ffffffff810870b0 &lt;update_blocked_averages+0xc0&gt;
&gt; ffffffff81087070:       4c 89 f8                mov    %r15,%rax
&gt; ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
&gt; ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
&gt; ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
&gt; ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
&gt; ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
&gt; ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
&gt;
&gt; Which you'll note ends up with sa-&gt;load_avg -= r in memory at
&gt; ffffffff8108707a.

So I _should_ have looked at other unserialized users of -&gt;load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "sched/fair: Fix fairness issue on migration"</title>
<updated>2016-05-11T06:25:53+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-05-11T06:25:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=53d3bc773eaa7ab1cf63585e76af7ee869d5e709'/>
<id>53d3bc773eaa7ab1cf63585e76af7ee869d5e709</id>
<content type='text'>
Mike reported that this recent commit:

  3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

... broke interactivity and the signal starvation test.

We have a proper fix series in the works but ran out of time for
v4.6, so revert the commit.

Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mike reported that this recent commit:

  3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

... broke interactivity and the signal starvation test.

We have a proper fix series in the works but ran out of time for
v4.6, so revert the commit.

Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage</title>
<updated>2016-05-07T05:45:34+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2016-05-06T12:58:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=536bd00cdbb7b908573e5a93bae67b64cbae60d8'/>
<id>536bd00cdbb7b908573e5a93bae67b64cbae60d8</id>
<content type='text'>
The following commit:

  34e2c555f3e1 ("cpufreq: Add mechanism for registering utilization update callbacks")

overlooked the fact that update_load_avg(), where CFS invokes cpufreq
utilization update callbacks, becomes an empty stub on UP kernels.

In consequence, if !CONFIG_SMP, cpufreq governors are never invoked
from CFS and they do not have a chance to evaluate CPU performace
levels and update them often enough.

Needless to say, things don't work as expected then.

Fix the problem by making the !CONFIG_SMP stub of update_load_avg()
invoke cpufreq update callbacks too.

Reported-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Tested-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Linux PM list &lt;linux-pm@vger.kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Fixes: 34e2c555f3e1 (cpufreq: Add mechanism for registering utilization update callbacks)
Link: http://lkml.kernel.org/r/6282396.VVEdgVYxO3@vostro.rjw.lan
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following commit:

  34e2c555f3e1 ("cpufreq: Add mechanism for registering utilization update callbacks")

overlooked the fact that update_load_avg(), where CFS invokes cpufreq
utilization update callbacks, becomes an empty stub on UP kernels.

In consequence, if !CONFIG_SMP, cpufreq governors are never invoked
from CFS and they do not have a chance to evaluate CPU performace
levels and update them often enough.

Needless to say, things don't work as expected then.

Fix the problem by making the !CONFIG_SMP stub of update_load_avg()
invoke cpufreq update callbacks too.

Reported-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Tested-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Linux PM list &lt;linux-pm@vger.kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Fixes: 34e2c555f3e1 (cpufreq: Add mechanism for registering utilization update callbacks)
Link: http://lkml.kernel.org/r/6282396.VVEdgVYxO3@vostro.rjw.lan
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2016-03-24T16:42:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-03-24T16:42:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=be53f58fa0fcd97c62a84f2eb98cff528f8b2443'/>
<id>be53f58fa0fcd97c62a84f2eb98cff528f8b2443</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: a cgroup fix, a fair-scheduler migration accounting fix, a
  cputime fix and two cpuacct cleanups"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cpuacct: Simplify the cpuacct code
  sched/cpuacct: Rename parameter in cpuusage_write() for readability
  sched/fair: Add comments to explain select_idle_sibling()
  sched/fair: Fix fairness issue on migration
  sched/cgroup: Fix/cleanup cgroup teardown/init
  sched/cputime: Fix steal time accounting vs. CPU hotplug
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: a cgroup fix, a fair-scheduler migration accounting fix, a
  cputime fix and two cpuacct cleanups"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cpuacct: Simplify the cpuacct code
  sched/cpuacct: Rename parameter in cpuusage_write() for readability
  sched/fair: Add comments to explain select_idle_sibling()
  sched/fair: Fix fairness issue on migration
  sched/cgroup: Fix/cleanup cgroup teardown/init
  sched/cputime: Fix steal time accounting vs. CPU hotplug
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Add comments to explain select_idle_sibling()</title>
<updated>2016-03-21T09:52:51+00:00</updated>
<author>
<name>Matt Fleming</name>
<email>matt@codeblueprint.co.uk</email>
</author>
<published>2016-03-09T14:59:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d4335581dc30ec6545999c7443bb9fead274a980'/>
<id>d4335581dc30ec6545999c7443bb9fead274a980</id>
<content type='text'>
It's not entirely obvious how the main loop in select_idle_sibling()
works on first glance. Sprinkle a few comments to explain the design
and intention behind the loop based on some conversations with Mike
and Peter.

Signed-off-by: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.com&gt;
Cc: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1457535548-15329-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's not entirely obvious how the main loop in select_idle_sibling()
works on first glance. Sprinkle a few comments to explain the design
and intention behind the loop based on some conversations with Mike
and Peter.

Signed-off-by: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.com&gt;
Cc: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1457535548-15329-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix fairness issue on migration</title>
<updated>2016-03-21T09:49:23+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-03-09T12:04:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3a47d5124a957358274e9ca7b115b2f3a914f56d'/>
<id>3a47d5124a957358274e9ca7b115b2f3a914f56d</id>
<content type='text'>
Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.

The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).

This is done because update_curr() needs the current task, and we
might be placing the current task.

The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.

Reported-by: Pavan Kondeti &lt;pkondeti@codeaurora.org&gt;
Tested-by: Pavan Kondeti &lt;pkondeti@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: byungchul.park@lge.com
Link: http://lkml.kernel.org/r/20160309120403.GK6344@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.

The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).

This is done because update_curr() needs the current task, and we
might be placing the current task.

The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.

Reported-by: Pavan Kondeti &lt;pkondeti@codeaurora.org&gt;
Tested-by: Pavan Kondeti &lt;pkondeti@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: byungchul.park@lge.com
Link: http://lkml.kernel.org/r/20160309120403.GK6344@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2016-03-16T21:10:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-03-16T21:10:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=277edbabf6fece057b14fb6db5e3a34e00f42f42'/>
<id>277edbabf6fece057b14fb6db5e3a34e00f42f42</id>
<content type='text'>
Pull power management and ACPI updates from Rafael Wysocki:
 "This time the majority of changes go into cpufreq and they are
  significant.

  First off, the way CPU frequency updates are triggered is different
  now.  Instead of having to set up and manage a deferrable timer for
  each CPU in the system to evaluate and possibly change its frequency
  periodically, cpufreq governors set up callbacks to be invoked by the
  scheduler on a regular basis (basically on utilization updates).  The
  "old" governors, "ondemand" and "conservative", still do all of their
  work in process context (although that is triggered by the scheduler
  now), but intel_pstate does it all in the callback invoked by the
  scheduler with no need for any additional asynchronous processing.

  Of course, this eliminates the overhead related to the management of
  all those timers, but also it allows the cpufreq governor code to be
  simplified quite a bit.  On top of that, the common code and data
  structures used by the "ondemand" and "conservative" governors are
  cleaned up and made more straightforward and some long-standing and
  quite annoying problems are addressed.  In particular, the handling of
  governor sysfs attributes is modified and the related locking becomes
  more fine grained which allows some concurrency problems to be avoided
  (particularly deadlocks with the core cpufreq code).

  In principle, the new mechanism for triggering frequency updates
  allows utilization information to be passed from the scheduler to
  cpufreq.  Although the current code doesn't make use of it, in the
  works is a new cpufreq governor that will make decisions based on the
  scheduler's utilization data.  That should allow the scheduler and
  cpufreq to work more closely together in the long run.

  In addition to the core and governor changes, cpufreq drivers are
  updated too.  Fixes and optimizations go into intel_pstate, the
  cpufreq-dt driver is updated on top of some modification in the
  Operating Performance Points (OPP) framework and there are fixes and
  other updates in the powernv cpufreq driver.

  Apart from the cpufreq updates there is some new ACPICA material,
  including a fix for a problem introduced by previous ACPICA updates,
  and some less significant changes in the ACPI code, like CPPC code
  optimizations, ACPI processor driver cleanups and support for loading
  ACPI tables from initrd.

  Also updated are the generic power domains framework, the Intel RAPL
  power capping driver and the turbostat utility and we have a bunch of
  traditional assorted fixes and cleanups.

  Specifics:

   - Redesign of cpufreq governors and the intel_pstate driver to make
     them use callbacks invoked by the scheduler to trigger CPU
     frequency evaluation instead of using per-CPU deferrable timers for
     that purpose (Rafael Wysocki).

   - Reorganization and cleanup of cpufreq governor code to make it more
     straightforward and fix some concurrency problems in it (Rafael
     Wysocki, Viresh Kumar).

   - Cleanup and improvements of locking in the cpufreq core (Viresh
     Kumar).

   - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
     Kumar, Eric Biggers).

   - intel_pstate driver updates including fixes, optimizations and a
     modification to make it enable enable hardware-coordinated P-state
     selection (HWP) by default if supported by the processor (Philippe
     Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
     Franciosi).

   - Operating Performance Points (OPP) framework updates to improve its
     handling of voltage regulators and device clocks and updates of the
     cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

   - Updates of the powernv cpufreq driver to fix initialization and
     cleanup problems in it and correct its worker thread handling with
     respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
     Bhat).

   - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

   - ACPICA updates including one fix for a regression introduced by
     previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
     Colin Ian King).

   - Support for installing ACPI tables from initrd (Lv Zheng).

   - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
     Chaugule).

   - Support for _HID(ACPI0010) devices (ACPI processor containers) and
     ACPI processor driver cleanups (Sudeep Holla).

   - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
     Aleksey Makarov).

   - Modification of the ACPI PCI IRQ management code to make it treat
     255 in the Interrupt Line register as "not connected" on x86 (as
     per the specification) and avoid attempts to use that value as a
     valid interrupt vector (Chen Fan).

   - ACPI APEI fixes related to resource leaks (Josh Hunt).

   - Removal of modularity from a few ACPI drivers (BGRT, GHES,
     intel_pmic_crc) that cannot be built as modules in practice (Paul
     Gortmaker).

   - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
     as a valid resource type (Harb Abdulhamid).

   - New device ID (future AMD I2C controller) in the ACPI driver for
     AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

   - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

   - cpuidle menu governor optimization to avoid a square root
     computation in it (Rasmus Villemoes).

   - Fix for potential use-after-free in the generic device properties
     framework (Heikki Krogerus).

   - Updates of the generic power domains (genpd) framework including
     support for multiple power states of a domain, fixes and debugfs
     output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
     Geert Uytterhoeven).

   - Intel RAPL power capping driver updates to reduce IPI overhead in
     it (Jacob Pan).

   - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
     Sengar).

   - Year 2038 fix for the process freezer (Abhilash Jindal).

   - turbostat utility updates including new features (decoding of more
     registers and CPUID fields, sub-second intervals support, GFX MHz
     and RC6 printout, --out command line option), fixes (syscall jitter
     detection and workaround, reductioin of the number of syscalls
     made, fixes related to Xeon x200 processors, compiler warning
     fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  ACPI / APEI: ERST: Fixed leaked resources in erst_init
  ACPI / APEI: Fix leaked resources
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull power management and ACPI updates from Rafael Wysocki:
 "This time the majority of changes go into cpufreq and they are
  significant.

  First off, the way CPU frequency updates are triggered is different
  now.  Instead of having to set up and manage a deferrable timer for
  each CPU in the system to evaluate and possibly change its frequency
  periodically, cpufreq governors set up callbacks to be invoked by the
  scheduler on a regular basis (basically on utilization updates).  The
  "old" governors, "ondemand" and "conservative", still do all of their
  work in process context (although that is triggered by the scheduler
  now), but intel_pstate does it all in the callback invoked by the
  scheduler with no need for any additional asynchronous processing.

  Of course, this eliminates the overhead related to the management of
  all those timers, but also it allows the cpufreq governor code to be
  simplified quite a bit.  On top of that, the common code and data
  structures used by the "ondemand" and "conservative" governors are
  cleaned up and made more straightforward and some long-standing and
  quite annoying problems are addressed.  In particular, the handling of
  governor sysfs attributes is modified and the related locking becomes
  more fine grained which allows some concurrency problems to be avoided
  (particularly deadlocks with the core cpufreq code).

  In principle, the new mechanism for triggering frequency updates
  allows utilization information to be passed from the scheduler to
  cpufreq.  Although the current code doesn't make use of it, in the
  works is a new cpufreq governor that will make decisions based on the
  scheduler's utilization data.  That should allow the scheduler and
  cpufreq to work more closely together in the long run.

  In addition to the core and governor changes, cpufreq drivers are
  updated too.  Fixes and optimizations go into intel_pstate, the
  cpufreq-dt driver is updated on top of some modification in the
  Operating Performance Points (OPP) framework and there are fixes and
  other updates in the powernv cpufreq driver.

  Apart from the cpufreq updates there is some new ACPICA material,
  including a fix for a problem introduced by previous ACPICA updates,
  and some less significant changes in the ACPI code, like CPPC code
  optimizations, ACPI processor driver cleanups and support for loading
  ACPI tables from initrd.

  Also updated are the generic power domains framework, the Intel RAPL
  power capping driver and the turbostat utility and we have a bunch of
  traditional assorted fixes and cleanups.

  Specifics:

   - Redesign of cpufreq governors and the intel_pstate driver to make
     them use callbacks invoked by the scheduler to trigger CPU
     frequency evaluation instead of using per-CPU deferrable timers for
     that purpose (Rafael Wysocki).

   - Reorganization and cleanup of cpufreq governor code to make it more
     straightforward and fix some concurrency problems in it (Rafael
     Wysocki, Viresh Kumar).

   - Cleanup and improvements of locking in the cpufreq core (Viresh
     Kumar).

   - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
     Kumar, Eric Biggers).

   - intel_pstate driver updates including fixes, optimizations and a
     modification to make it enable enable hardware-coordinated P-state
     selection (HWP) by default if supported by the processor (Philippe
     Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
     Franciosi).

   - Operating Performance Points (OPP) framework updates to improve its
     handling of voltage regulators and device clocks and updates of the
     cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

   - Updates of the powernv cpufreq driver to fix initialization and
     cleanup problems in it and correct its worker thread handling with
     respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
     Bhat).

   - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

   - ACPICA updates including one fix for a regression introduced by
     previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
     Colin Ian King).

   - Support for installing ACPI tables from initrd (Lv Zheng).

   - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
     Chaugule).

   - Support for _HID(ACPI0010) devices (ACPI processor containers) and
     ACPI processor driver cleanups (Sudeep Holla).

   - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
     Aleksey Makarov).

   - Modification of the ACPI PCI IRQ management code to make it treat
     255 in the Interrupt Line register as "not connected" on x86 (as
     per the specification) and avoid attempts to use that value as a
     valid interrupt vector (Chen Fan).

   - ACPI APEI fixes related to resource leaks (Josh Hunt).

   - Removal of modularity from a few ACPI drivers (BGRT, GHES,
     intel_pmic_crc) that cannot be built as modules in practice (Paul
     Gortmaker).

   - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
     as a valid resource type (Harb Abdulhamid).

   - New device ID (future AMD I2C controller) in the ACPI driver for
     AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

   - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

   - cpuidle menu governor optimization to avoid a square root
     computation in it (Rasmus Villemoes).

   - Fix for potential use-after-free in the generic device properties
     framework (Heikki Krogerus).

   - Updates of the generic power domains (genpd) framework including
     support for multiple power states of a domain, fixes and debugfs
     output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
     Geert Uytterhoeven).

   - Intel RAPL power capping driver updates to reduce IPI overhead in
     it (Jacob Pan).

   - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
     Sengar).

   - Year 2038 fix for the process freezer (Abhilash Jindal).

   - turbostat utility updates including new features (decoding of more
     registers and CPUID fields, sub-second intervals support, GFX MHz
     and RC6 printout, --out command line option), fixes (syscall jitter
     detection and workaround, reductioin of the number of syscalls
     made, fixes related to Xeon x200 processors, compiler warning
     fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  ACPI / APEI: ERST: Fixed leaked resources in erst_init
  ACPI / APEI: Fix leaked resources
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>cpufreq: Add mechanism for registering utilization update callbacks</title>
<updated>2016-03-09T13:39:19+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2016-02-15T19:20:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=34e2c555f3e13c90e9284e23d00f03be8a6e06c5'/>
<id>34e2c555f3e13c90e9284e23d00f03be8a6e06c5</id>
<content type='text'>
Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
