<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel, branch linux-4.6.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble</title>
<updated>2016-08-10T10:54:49+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-12T19:59:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5e8123b618c8761b607817e84a53ca65d27e8475'/>
<id>5e8123b618c8761b607817e84a53ca65d27e8475</id>
<content type='text'>
commit a7c734140aa36413944eef0f8c660e0e2256357d upstream.

Xiaolong Ye reported lock debug warnings triggered by the following commit:

  8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")

The bug is the following: the cpuhp_bp_states[] array is cut short when
CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
and happily scribble outside of the array bounds...

We need to store them in case that the state is unregistered so we can invoke
the teardown function. That's independent of CONFIG_SMP. Make sure the array
is large enough.

Reported-by: kernel test robot &lt;xiaolong.ye@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Adam Borowski &lt;kilobyte@angband.pl&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: lkp@01.org
Cc: tipbuild@zytor.com
Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a7c734140aa36413944eef0f8c660e0e2256357d upstream.

Xiaolong Ye reported lock debug warnings triggered by the following commit:

  8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")

The bug is the following: the cpuhp_bp_states[] array is cut short when
CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
and happily scribble outside of the array bounds...

We need to store them in case that the state is unregistered so we can invoke
the teardown function. That's independent of CONFIG_SMP. Make sure the array
is large enough.

Reported-by: kernel test robot &lt;xiaolong.ye@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Adam Borowski &lt;kilobyte@angband.pl&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: lkp@01.org
Cc: tipbuild@zytor.com
Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>posix_cpu_timer: Exit early when process has been reaped</title>
<updated>2016-08-10T10:54:49+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2016-07-07T22:39:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2acf7a3a4af4e9590f06bd89518c139d3a2884db'/>
<id>2acf7a3a4af4e9590f06bd89518c139d3a2884db</id>
<content type='text'>
commit 2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 upstream.

Variable "now" seems to be genuinely used unintialized
if branch

	if (CPUCLOCK_PERTHREAD(timer-&gt;it_clock)) {

is not taken and branch

	if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 upstream.

Variable "now" seems to be genuinely used unintialized
if branch

	if (CPUCLOCK_PERTHREAD(timer-&gt;it_clock)) {

is not taken and branch

	if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix effective_load() to consistently use smoothed load</title>
<updated>2016-08-10T10:54:48+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-06-24T13:53:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=81dc160128e7d8a9e3d5a85657514cd9a110007d'/>
<id>81dc160128e7d8a9e3d5a85657514cd9a110007d</id>
<content type='text'>
commit 7dd4912594daf769a46744848b05bd5bc6d62469 upstream.

Starting with the following commit:

  fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j &gt;= rw_i we cannot use tg-&gt;load_avg directly, since that might be
lagging a correction on the current cfs_rq-&gt;avg.load_avg value.
Therefore we use tg-&gt;load_avg - cfs_rq-&gt;tg_load_avg_contrib +
cfs_rq-&gt;avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq-&gt;avg.load_avg, as is later used in @w, but uses
cfs_rq-&gt;load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7dd4912594daf769a46744848b05bd5bc6d62469 upstream.

Starting with the following commit:

  fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")

calc_tg_weight() doesn't compute the right value as expected by effective_load().

The difference is in the 'correction' term. In order to ensure \Sum
rw_j &gt;= rw_i we cannot use tg-&gt;load_avg directly, since that might be
lagging a correction on the current cfs_rq-&gt;avg.load_avg value.
Therefore we use tg-&gt;load_avg - cfs_rq-&gt;tg_load_avg_contrib +
cfs_rq-&gt;avg.load_avg.

Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq-&gt;avg.load_avg, as is later used in @w, but uses
cfs_rq-&gt;load.weight instead.

So stop using calc_tg_weight() and do it explicitly.

The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Disable IRQs while holding css_set_lock</title>
<updated>2016-08-10T10:54:46+00:00</updated>
<author>
<name>Daniel Bristot de Oliveira</name>
<email>bristot@redhat.com</email>
</author>
<published>2016-06-22T20:28:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2644b9787694c3e18fc82ec74087daf15eef3249'/>
<id>2644b9787694c3e18fc82ec74087daf15eef3249</id>
<content type='text'>
commit 82d6489d0fed2ec8a8c48c19e8d8a04ac8e5bb26 upstream.

While testing the deadline scheduler + cgroup setup I hit this
warning.

[  132.612935] ------------[ cut here ]------------
[  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[  132.612952] Modules linked in: (a ton of modules...)
[  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[  132.612986] Call Trace:
[  132.612987]  &lt;IRQ&gt;  [&lt;ffffffff813d229e&gt;] dump_stack+0x63/0x85
[  132.612994]  [&lt;ffffffff810a652b&gt;] __warn+0xcb/0xf0
[  132.612997]  [&lt;ffffffff810e76a0&gt;] ? push_dl_task.part.32+0x170/0x170
[  132.612999]  [&lt;ffffffff810a665d&gt;] warn_slowpath_null+0x1d/0x20
[  132.613000]  [&lt;ffffffff810aba5b&gt;] __local_bh_enable_ip+0x6b/0x80
[  132.613008]  [&lt;ffffffff817d6c8a&gt;] _raw_write_unlock_bh+0x1a/0x20
[  132.613010]  [&lt;ffffffff817d6c9e&gt;] _raw_spin_unlock_bh+0xe/0x10
[  132.613015]  [&lt;ffffffff811388ac&gt;] put_css_set+0x5c/0x60
[  132.613016]  [&lt;ffffffff8113dc7f&gt;] cgroup_free+0x7f/0xa0
[  132.613017]  [&lt;ffffffff810a3912&gt;] __put_task_struct+0x42/0x140
[  132.613018]  [&lt;ffffffff810e776a&gt;] dl_task_timer+0xca/0x250
[  132.613027]  [&lt;ffffffff810e76a0&gt;] ? push_dl_task.part.32+0x170/0x170
[  132.613030]  [&lt;ffffffff8111371e&gt;] __hrtimer_run_queues+0xee/0x270
[  132.613031]  [&lt;ffffffff81113ec8&gt;] hrtimer_interrupt+0xa8/0x190
[  132.613034]  [&lt;ffffffff81051a58&gt;] local_apic_timer_interrupt+0x38/0x60
[  132.613035]  [&lt;ffffffff817d9b0d&gt;] smp_apic_timer_interrupt+0x3d/0x50
[  132.613037]  [&lt;ffffffff817d7c5c&gt;] apic_timer_interrupt+0x8c/0xa0
[  132.613038]  &lt;EOI&gt;  [&lt;ffffffff81063466&gt;] ? native_safe_halt+0x6/0x10
[  132.613043]  [&lt;ffffffff81037a4e&gt;] default_idle+0x1e/0xd0
[  132.613044]  [&lt;ffffffff810381cf&gt;] arch_cpu_idle+0xf/0x20
[  132.613046]  [&lt;ffffffff810e8fda&gt;] default_idle_call+0x2a/0x40
[  132.613047]  [&lt;ffffffff810e92d7&gt;] cpu_startup_entry+0x2e7/0x340
[  132.613048]  [&lt;ffffffff81050235&gt;] start_secondary+0x155/0x190
[  132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&amp;css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Juri Lelli &lt;juri.lelli@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: "Luis Claudio R. Goncalves" &lt;lgoncalv@redhat.com&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 82d6489d0fed2ec8a8c48c19e8d8a04ac8e5bb26 upstream.

While testing the deadline scheduler + cgroup setup I hit this
warning.

[  132.612935] ------------[ cut here ]------------
[  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[  132.612952] Modules linked in: (a ton of modules...)
[  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[  132.612986] Call Trace:
[  132.612987]  &lt;IRQ&gt;  [&lt;ffffffff813d229e&gt;] dump_stack+0x63/0x85
[  132.612994]  [&lt;ffffffff810a652b&gt;] __warn+0xcb/0xf0
[  132.612997]  [&lt;ffffffff810e76a0&gt;] ? push_dl_task.part.32+0x170/0x170
[  132.612999]  [&lt;ffffffff810a665d&gt;] warn_slowpath_null+0x1d/0x20
[  132.613000]  [&lt;ffffffff810aba5b&gt;] __local_bh_enable_ip+0x6b/0x80
[  132.613008]  [&lt;ffffffff817d6c8a&gt;] _raw_write_unlock_bh+0x1a/0x20
[  132.613010]  [&lt;ffffffff817d6c9e&gt;] _raw_spin_unlock_bh+0xe/0x10
[  132.613015]  [&lt;ffffffff811388ac&gt;] put_css_set+0x5c/0x60
[  132.613016]  [&lt;ffffffff8113dc7f&gt;] cgroup_free+0x7f/0xa0
[  132.613017]  [&lt;ffffffff810a3912&gt;] __put_task_struct+0x42/0x140
[  132.613018]  [&lt;ffffffff810e776a&gt;] dl_task_timer+0xca/0x250
[  132.613027]  [&lt;ffffffff810e76a0&gt;] ? push_dl_task.part.32+0x170/0x170
[  132.613030]  [&lt;ffffffff8111371e&gt;] __hrtimer_run_queues+0xee/0x270
[  132.613031]  [&lt;ffffffff81113ec8&gt;] hrtimer_interrupt+0xa8/0x190
[  132.613034]  [&lt;ffffffff81051a58&gt;] local_apic_timer_interrupt+0x38/0x60
[  132.613035]  [&lt;ffffffff817d9b0d&gt;] smp_apic_timer_interrupt+0x3d/0x50
[  132.613037]  [&lt;ffffffff817d7c5c&gt;] apic_timer_interrupt+0x8c/0xa0
[  132.613038]  &lt;EOI&gt;  [&lt;ffffffff81063466&gt;] ? native_safe_halt+0x6/0x10
[  132.613043]  [&lt;ffffffff81037a4e&gt;] default_idle+0x1e/0xd0
[  132.613044]  [&lt;ffffffff810381cf&gt;] arch_cpu_idle+0xf/0x20
[  132.613046]  [&lt;ffffffff810e8fda&gt;] default_idle_call+0x2a/0x40
[  132.613047]  [&lt;ffffffff810e92d7&gt;] cpu_startup_entry+0x2e7/0x340
[  132.613048]  [&lt;ffffffff81050235&gt;] start_secondary+0x155/0x190
[  132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&amp;css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Juri Lelli &lt;juri.lelli@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: "Luis Claudio R. Goncalves" &lt;lgoncalv@redhat.com&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: set css-&gt;id to -1 during init</title>
<updated>2016-08-10T10:54:46+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-05-26T19:42:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=65856851b6229563e43e256b3932d5c8513c047b'/>
<id>65856851b6229563e43e256b3932d5c8513c047b</id>
<content type='text'>
commit 8fa3b8d689a54d6d04ff7803c724fb7aca6ce98e upstream.

If percpu_ref initialization fails during css_create(), the free path
can end up trying to free css-&gt;id of zero.  As ID 0 is unused, it
doesn't cause a critical breakage but it does trigger a warning
message.  Fix it by setting css-&gt;id to -1 from init_and_link_css().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Wenwei Tao &lt;ww.tao0320@gmail.com&gt;
Fixes: 01e586598b22 ("cgroup: release css-&gt;id after css_free")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8fa3b8d689a54d6d04ff7803c724fb7aca6ce98e upstream.

If percpu_ref initialization fails during css_create(), the free path
can end up trying to free css-&gt;id of zero.  As ID 0 is unused, it
doesn't cause a critical breakage but it does trigger a warning
message.  Fix it by setting css-&gt;id to -1 from init_and_link_css().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Wenwei Tao &lt;ww.tao0320@gmail.com&gt;
Fixes: 01e586598b22 ("cgroup: release css-&gt;id after css_free")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: remove redundant cleanup in css_create</title>
<updated>2016-08-10T10:54:46+00:00</updated>
<author>
<name>Wenwei Tao</name>
<email>ww.tao0320@gmail.com</email>
</author>
<published>2016-05-13T14:59:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=64e7dc481cbb34dcf11623905d54e67fe7ffd035'/>
<id>64e7dc481cbb34dcf11623905d54e67fe7ffd035</id>
<content type='text'>
commit b00c52dae6d9ee8d0f2407118ef6544ae5524781 upstream.

When create css failed, before call css_free_rcu_fn, we remove the css
id and exit the percpu_ref, but we will do these again in
css_free_work_fn, so they are redundant.  Especially the css id, that
would cause problem if we remove it twice, since it may be assigned to
another css after the first remove.

tj: This was broken by two commits updating the free path without
    synchronizing the creation failure path.  This can be easily
    triggered by trying to create more than 64k memory cgroups.

Signed-off-by: Wenwei Tao &lt;ww.tao0320@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Fixes: 9a1049da9bd2 ("percpu-refcount: require percpu_ref to be exited explicitly")
Fixes: 01e586598b22 ("cgroup: release css-&gt;id after css_free")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b00c52dae6d9ee8d0f2407118ef6544ae5524781 upstream.

When create css failed, before call css_free_rcu_fn, we remove the css
id and exit the percpu_ref, but we will do these again in
css_free_work_fn, so they are redundant.  Especially the css id, that
would cause problem if we remove it twice, since it may be assigned to
another css after the first remove.

tj: This was broken by two commits updating the free path without
    synchronizing the creation failure path.  This can be easily
    triggered by trying to create more than 64k memory cgroups.

Signed-off-by: Wenwei Tao &lt;ww.tao0320@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Fixes: 9a1049da9bd2 ("percpu-refcount: require percpu_ref to be exited explicitly")
Fixes: 01e586598b22 ("cgroup: release css-&gt;id after css_free")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Fix deadlock when enabling sched events</title>
<updated>2016-08-10T10:54:44+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2016-06-13T07:32:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6e9bad4cb245dac00bf99b568cce1168cf5cecf'/>
<id>f6e9bad4cb245dac00bf99b568cce1168cf5cecf</id>
<content type='text'>
commit eda8dca519269c92a0771668b3d5678792de7b78 upstream.

I see a hang when enabling sched events:

  echo 1 &gt; /sys/kernel/debug/tracing/events/sched/enable

The printk buffer shows:

  BUG: spinlock recursion on CPU#1, swapper/1/0
   lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  ...
  Call Trace:
   &lt;IRQ&gt;  [&lt;ffffffff8143d663&gt;] dump_stack+0x85/0xc2
   [&lt;ffffffff81115948&gt;] spin_dump+0x78/0xc0
   [&lt;ffffffff81115aea&gt;] do_raw_spin_lock+0x11a/0x150
   [&lt;ffffffff81891471&gt;] _raw_spin_lock+0x61/0x80
   [&lt;ffffffff810e5466&gt;] ? try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff810e5466&gt;] try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff81891a0a&gt;] ? _raw_spin_unlock_irqrestore+0x4a/0x80
   [&lt;ffffffff810e5705&gt;] wake_up_process+0x15/0x20
   [&lt;ffffffff810cebb4&gt;] insert_work+0x84/0xc0
   [&lt;ffffffff810ced7f&gt;] __queue_work+0x18f/0x660
   [&lt;ffffffff810cf9a6&gt;] queue_work_on+0x46/0x90
   [&lt;ffffffffa00cd95b&gt;] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
   [&lt;ffffffffa00cdac0&gt;] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
   [&lt;ffffffff814babcd&gt;] soft_cursor+0x1ad/0x230
   [&lt;ffffffff814ba379&gt;] bit_cursor+0x649/0x680
   [&lt;ffffffff814b9d30&gt;] ? update_attr.isra.2+0x90/0x90
   [&lt;ffffffff814b5e6a&gt;] fbcon_cursor+0x14a/0x1c0
   [&lt;ffffffff81555ef8&gt;] hide_cursor+0x28/0x90
   [&lt;ffffffff81558b6f&gt;] vt_console_print+0x3bf/0x3f0
   [&lt;ffffffff81122c63&gt;] call_console_drivers.constprop.24+0x183/0x200
   [&lt;ffffffff811241f4&gt;] console_unlock+0x3d4/0x610
   [&lt;ffffffff811247f5&gt;] vprintk_emit+0x3c5/0x610
   [&lt;ffffffff81124bc9&gt;] vprintk_default+0x29/0x40
   [&lt;ffffffff811e965b&gt;] printk+0x57/0x73
   [&lt;ffffffff810f7a9e&gt;] enqueue_entity+0xc2e/0xc70
   [&lt;ffffffff810f7b39&gt;] enqueue_task_fair+0x59/0xab0
   [&lt;ffffffff8106dcd9&gt;] ? kvm_sched_clock_read+0x9/0x20
   [&lt;ffffffff8103fb39&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810e3fcc&gt;] activate_task+0x5c/0xa0
   [&lt;ffffffff810e4514&gt;] ttwu_do_activate+0x54/0xb0
   [&lt;ffffffff810e5cea&gt;] sched_ttwu_pending+0x7a/0xb0
   [&lt;ffffffff810e5e51&gt;] scheduler_ipi+0x61/0x170
   [&lt;ffffffff81059e7f&gt;] smp_trace_reschedule_interrupt+0x4f/0x2a0
   [&lt;ffffffff81893ba6&gt;] trace_reschedule_interrupt+0x96/0xa0
   &lt;EOI&gt;  [&lt;ffffffff8106e0d6&gt;] ? native_safe_halt+0x6/0x10
   [&lt;ffffffff8110fb1d&gt;] ? trace_hardirqs_on+0xd/0x10
   [&lt;ffffffff81040ac0&gt;] default_idle+0x20/0x1a0
   [&lt;ffffffff8104147f&gt;] arch_cpu_idle+0xf/0x20
   [&lt;ffffffff81102f8f&gt;] default_idle_call+0x2f/0x50
   [&lt;ffffffff8110332e&gt;] cpu_startup_entry+0x37e/0x450
   [&lt;ffffffff8105af70&gt;] start_secondary+0x160/0x1a0

Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.

The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eda8dca519269c92a0771668b3d5678792de7b78 upstream.

I see a hang when enabling sched events:

  echo 1 &gt; /sys/kernel/debug/tracing/events/sched/enable

The printk buffer shows:

  BUG: spinlock recursion on CPU#1, swapper/1/0
   lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  ...
  Call Trace:
   &lt;IRQ&gt;  [&lt;ffffffff8143d663&gt;] dump_stack+0x85/0xc2
   [&lt;ffffffff81115948&gt;] spin_dump+0x78/0xc0
   [&lt;ffffffff81115aea&gt;] do_raw_spin_lock+0x11a/0x150
   [&lt;ffffffff81891471&gt;] _raw_spin_lock+0x61/0x80
   [&lt;ffffffff810e5466&gt;] ? try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff810e5466&gt;] try_to_wake_up+0x256/0x4e0
   [&lt;ffffffff81891a0a&gt;] ? _raw_spin_unlock_irqrestore+0x4a/0x80
   [&lt;ffffffff810e5705&gt;] wake_up_process+0x15/0x20
   [&lt;ffffffff810cebb4&gt;] insert_work+0x84/0xc0
   [&lt;ffffffff810ced7f&gt;] __queue_work+0x18f/0x660
   [&lt;ffffffff810cf9a6&gt;] queue_work_on+0x46/0x90
   [&lt;ffffffffa00cd95b&gt;] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
   [&lt;ffffffffa00cdac0&gt;] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
   [&lt;ffffffff814babcd&gt;] soft_cursor+0x1ad/0x230
   [&lt;ffffffff814ba379&gt;] bit_cursor+0x649/0x680
   [&lt;ffffffff814b9d30&gt;] ? update_attr.isra.2+0x90/0x90
   [&lt;ffffffff814b5e6a&gt;] fbcon_cursor+0x14a/0x1c0
   [&lt;ffffffff81555ef8&gt;] hide_cursor+0x28/0x90
   [&lt;ffffffff81558b6f&gt;] vt_console_print+0x3bf/0x3f0
   [&lt;ffffffff81122c63&gt;] call_console_drivers.constprop.24+0x183/0x200
   [&lt;ffffffff811241f4&gt;] console_unlock+0x3d4/0x610
   [&lt;ffffffff811247f5&gt;] vprintk_emit+0x3c5/0x610
   [&lt;ffffffff81124bc9&gt;] vprintk_default+0x29/0x40
   [&lt;ffffffff811e965b&gt;] printk+0x57/0x73
   [&lt;ffffffff810f7a9e&gt;] enqueue_entity+0xc2e/0xc70
   [&lt;ffffffff810f7b39&gt;] enqueue_task_fair+0x59/0xab0
   [&lt;ffffffff8106dcd9&gt;] ? kvm_sched_clock_read+0x9/0x20
   [&lt;ffffffff8103fb39&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810e3fcc&gt;] activate_task+0x5c/0xa0
   [&lt;ffffffff810e4514&gt;] ttwu_do_activate+0x54/0xb0
   [&lt;ffffffff810e5cea&gt;] sched_ttwu_pending+0x7a/0xb0
   [&lt;ffffffff810e5e51&gt;] scheduler_ipi+0x61/0x170
   [&lt;ffffffff81059e7f&gt;] smp_trace_reschedule_interrupt+0x4f/0x2a0
   [&lt;ffffffff81893ba6&gt;] trace_reschedule_interrupt+0x96/0xa0
   &lt;EOI&gt;  [&lt;ffffffff8106e0d6&gt;] ? native_safe_halt+0x6/0x10
   [&lt;ffffffff8110fb1d&gt;] ? trace_hardirqs_on+0xd/0x10
   [&lt;ffffffff81040ac0&gt;] default_idle+0x20/0x1a0
   [&lt;ffffffff8104147f&gt;] arch_cpu_idle+0xf/0x20
   [&lt;ffffffff81102f8f&gt;] default_idle_call+0x2f/0x50
   [&lt;ffffffff8110332e&gt;] cpu_startup_entry+0x37e/0x450
   [&lt;ffffffff8105af70&gt;] start_secondary+0x160/0x1a0

Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.

The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w</title>
<updated>2016-08-10T10:54:43+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2016-06-09T12:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=543a87abcf8080f62a449d5566332c48b4eecddb'/>
<id>543a87abcf8080f62a449d5566332c48b4eecddb</id>
<content type='text'>
commit 57675cb976eff977aefb428e68e4e0236d48a9ff upstream.

Lengthy output of sysrq-w may take a lot of time on slow serial console.

Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.

So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.

Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 57675cb976eff977aefb428e68e4e0236d48a9ff upstream.

Lengthy output of sysrq-w may take a lot of time on slow serial console.

Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.

So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.

Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Handle NULL formats in hold_module_trace_bprintk_format()</title>
<updated>2016-07-27T15:42:15+00:00</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2016-06-17T20:10:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=faa84d836ea205dc53942c3d115f73f11227515c'/>
<id>faa84d836ea205dc53942c3d115f73f11227515c</id>
<content type='text'>
commit 70c8217acd4383e069fe1898bbad36ea4fcdbdcc upstream.

If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.

Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com

Reported-by: xingzhen &lt;zhengjun.xing@intel.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()")
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 70c8217acd4383e069fe1898bbad36ea4fcdbdcc upstream.

If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.

Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com

Reported-by: xingzhen &lt;zhengjun.xing@intel.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()")
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix cfs_rq avg tracking underflow</title>
<updated>2016-07-27T15:42:13+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-06-16T08:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=59eca1a4c30d3bc3e3c002dd4cfbc12bc43c1874'/>
<id>59eca1a4c30d3bc3e3c002dd4cfbc12bc43c1874</id>
<content type='text'>
commit 8974189222159154c55f24ddad33e3613960521a upstream.

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

&gt; the code generated from update_cfs_rq_load_avg():
&gt;
&gt; 	if (atomic_long_read(&amp;cfs_rq-&gt;removed_load_avg)) {
&gt; 		s64 r = atomic_long_xchg(&amp;cfs_rq-&gt;removed_load_avg, 0);
&gt; 		sa-&gt;load_avg = max_t(long, sa-&gt;load_avg - r, 0);
&gt; 		sa-&gt;load_sum = max_t(s64, sa-&gt;load_sum - r * LOAD_AVG_MAX, 0);
&gt; 		removed_load = 1;
&gt; 	}
&gt;
&gt; turns into:
&gt;
&gt; ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
&gt; ffffffff8108706b:       48 85 c0                test   %rax,%rax
&gt; ffffffff8108706e:       74 40                   je     ffffffff810870b0 &lt;update_blocked_averages+0xc0&gt;
&gt; ffffffff81087070:       4c 89 f8                mov    %r15,%rax
&gt; ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
&gt; ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
&gt; ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
&gt; ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
&gt; ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
&gt; ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
&gt;
&gt; Which you'll note ends up with sa-&gt;load_avg -= r in memory at
&gt; ffffffff8108707a.

So I _should_ have looked at other unserialized users of -&gt;load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8974189222159154c55f24ddad33e3613960521a upstream.

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

&gt; the code generated from update_cfs_rq_load_avg():
&gt;
&gt; 	if (atomic_long_read(&amp;cfs_rq-&gt;removed_load_avg)) {
&gt; 		s64 r = atomic_long_xchg(&amp;cfs_rq-&gt;removed_load_avg, 0);
&gt; 		sa-&gt;load_avg = max_t(long, sa-&gt;load_avg - r, 0);
&gt; 		sa-&gt;load_sum = max_t(s64, sa-&gt;load_sum - r * LOAD_AVG_MAX, 0);
&gt; 		removed_load = 1;
&gt; 	}
&gt;
&gt; turns into:
&gt;
&gt; ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
&gt; ffffffff8108706b:       48 85 c0                test   %rax,%rax
&gt; ffffffff8108706e:       74 40                   je     ffffffff810870b0 &lt;update_blocked_averages+0xc0&gt;
&gt; ffffffff81087070:       4c 89 f8                mov    %r15,%rax
&gt; ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
&gt; ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
&gt; ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
&gt; ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
&gt; ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
&gt; ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
&gt;
&gt; Which you'll note ends up with sa-&gt;load_avg -= r in memory at
&gt; ffffffff8108707a.

So I _should_ have looked at other unserialized users of -&gt;load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</pre>
</div>
</content>
</entry>
</feed>
