<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched/fair.c, branch v4.5</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched/numa: Fix use-after-free bug in the task_numa_compare</title>
<updated>2016-01-22T12:51:04+00:00</updated>
<author>
<name>Gavin Guo</name>
<email>gavin.guo@canonical.com</email>
</author>
<published>2016-01-20T04:36:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1dff76b92f69051e579bdc131e01500da9fa2a91'/>
<id>1dff76b92f69051e579bdc131e01500da9fa2a91</id>
<content type='text'>
The following message can be observed on the Ubuntu v3.13.0-65 with KASan
backported:

  ==================================================================
  BUG: KASan: use after free in task_numa_find_cpu+0x64c/0x890 at addr ffff880dd393ecd8
  Read of size 8 by task qemu-system-x86/3998900
  =============================================================================
  BUG kmalloc-128 (Tainted: G    B        ): kasan: bad access detected
  -----------------------------------------------------------------------------

  INFO: Allocated in task_numa_fault+0xc1b/0xed0 age=41980 cpu=18 pid=3998890
	__slab_alloc+0x4f8/0x560
	__kmalloc+0x1eb/0x280
	task_numa_fault+0xc1b/0xed0
	do_numa_page+0x192/0x200
	handle_mm_fault+0x808/0x1160
	__do_page_fault+0x218/0x750
	do_page_fault+0x1a/0x70
	page_fault+0x28/0x30
	SyS_poll+0x66/0x1a0
	system_call_fastpath+0x1a/0x1f
  INFO: Freed in task_numa_free+0x1d2/0x200 age=62 cpu=18 pid=0
	__slab_free+0x2ab/0x3f0
	kfree+0x161/0x170
	task_numa_free+0x1d2/0x200
	finish_task_switch+0x1d2/0x210
	__schedule+0x5d4/0xc60
	schedule_preempt_disabled+0x40/0xc0
	cpu_startup_entry+0x2da/0x340
	start_secondary+0x28f/0x360
  Call Trace:
   [&lt;ffffffff81a6ce35&gt;] dump_stack+0x45/0x56
   [&lt;ffffffff81244aed&gt;] print_trailer+0xfd/0x170
   [&lt;ffffffff8124ac36&gt;] object_err+0x36/0x40
   [&lt;ffffffff8124cbf9&gt;] kasan_report_error+0x1e9/0x3a0
   [&lt;ffffffff8124d260&gt;] kasan_report+0x40/0x50
   [&lt;ffffffff810dda7c&gt;] ? task_numa_find_cpu+0x64c/0x890
   [&lt;ffffffff8124bee9&gt;] __asan_load8+0x69/0xa0
   [&lt;ffffffff814f5c38&gt;] ? find_next_bit+0xd8/0x120
   [&lt;ffffffff810dda7c&gt;] task_numa_find_cpu+0x64c/0x890
   [&lt;ffffffff810de16c&gt;] task_numa_migrate+0x4ac/0x7b0
   [&lt;ffffffff810de523&gt;] numa_migrate_preferred+0xb3/0xc0
   [&lt;ffffffff810e0b88&gt;] task_numa_fault+0xb88/0xed0
   [&lt;ffffffff8120ef02&gt;] do_numa_page+0x192/0x200
   [&lt;ffffffff81211038&gt;] handle_mm_fault+0x808/0x1160
   [&lt;ffffffff810d7dbd&gt;] ? sched_clock_cpu+0x10d/0x160
   [&lt;ffffffff81068c52&gt;] ? native_load_tls+0x82/0xa0
   [&lt;ffffffff81a7bd68&gt;] __do_page_fault+0x218/0x750
   [&lt;ffffffff810c2186&gt;] ? hrtimer_try_to_cancel+0x76/0x160
   [&lt;ffffffff81a6f5e7&gt;] ? schedule_hrtimeout_range_clock.part.24+0xf7/0x1c0
   [&lt;ffffffff81a7c2ba&gt;] do_page_fault+0x1a/0x70
   [&lt;ffffffff81a772e8&gt;] page_fault+0x28/0x30
   [&lt;ffffffff8128cbd4&gt;] ? do_sys_poll+0x1c4/0x6d0
   [&lt;ffffffff810e64f6&gt;] ? enqueue_task_fair+0x4b6/0xaa0
   [&lt;ffffffff810233c9&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810cf70a&gt;] ? resched_task+0x7a/0xc0
   [&lt;ffffffff810d0663&gt;] ? check_preempt_curr+0xb3/0x130
   [&lt;ffffffff8128b5c0&gt;] ? poll_select_copy_remaining+0x170/0x170
   [&lt;ffffffff810d3bc0&gt;] ? wake_up_state+0x10/0x20
   [&lt;ffffffff8112a28f&gt;] ? drop_futex_key_refs.isra.14+0x1f/0x90
   [&lt;ffffffff8112d40e&gt;] ? futex_requeue+0x3de/0xba0
   [&lt;ffffffff8112e49e&gt;] ? do_futex+0xbe/0x8f0
   [&lt;ffffffff81022c89&gt;] ? read_tsc+0x9/0x20
   [&lt;ffffffff8111bd9d&gt;] ? ktime_get_ts+0x12d/0x170
   [&lt;ffffffff8108f699&gt;] ? timespec_add_safe+0x59/0xe0
   [&lt;ffffffff8128d1f6&gt;] SyS_poll+0x66/0x1a0
   [&lt;ffffffff81a830dd&gt;] system_call_fastpath+0x1a/0x1f

As commit 1effd9f19324 ("sched/numa: Fix unsafe get_task_struct() in
task_numa_assign()") points out, the rcu_read_lock() cannot protect the
task_struct from being freed in the finish_task_switch(). And the bug
happens in the process of calculation of imp which requires the access of
p-&gt;numa_faults being freed in the following path:

do_exit()
        current-&gt;flags |= PF_EXITING;
    release_task()
        ~~delayed_put_task_struct()~~
    schedule()
    ...
    ...
rq-&gt;curr = next;
    context_switch()
        finish_task_switch()
            put_task_struct()
                __put_task_struct()
		    task_numa_free()

The fix here to get_task_struct() early before end of dst_rq-&gt;lock to
protect the calculation process and also put_task_struct() in the
corresponding point if finally the dst_rq-&gt;curr somehow cannot be
assigned.

Additional credit to Liang Chen who helped fix the error logic and add the
put_task_struct() to the place it missed.

Signed-off-by: Gavin Guo &lt;gavin.guo@canonical.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: jay.vosburgh@canonical.com
Cc: liang.chen@canonical.com
Link: http://lkml.kernel.org/r/1453264618-17645-1-git-send-email-gavin.guo@canonical.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following message can be observed on the Ubuntu v3.13.0-65 with KASan
backported:

  ==================================================================
  BUG: KASan: use after free in task_numa_find_cpu+0x64c/0x890 at addr ffff880dd393ecd8
  Read of size 8 by task qemu-system-x86/3998900
  =============================================================================
  BUG kmalloc-128 (Tainted: G    B        ): kasan: bad access detected
  -----------------------------------------------------------------------------

  INFO: Allocated in task_numa_fault+0xc1b/0xed0 age=41980 cpu=18 pid=3998890
	__slab_alloc+0x4f8/0x560
	__kmalloc+0x1eb/0x280
	task_numa_fault+0xc1b/0xed0
	do_numa_page+0x192/0x200
	handle_mm_fault+0x808/0x1160
	__do_page_fault+0x218/0x750
	do_page_fault+0x1a/0x70
	page_fault+0x28/0x30
	SyS_poll+0x66/0x1a0
	system_call_fastpath+0x1a/0x1f
  INFO: Freed in task_numa_free+0x1d2/0x200 age=62 cpu=18 pid=0
	__slab_free+0x2ab/0x3f0
	kfree+0x161/0x170
	task_numa_free+0x1d2/0x200
	finish_task_switch+0x1d2/0x210
	__schedule+0x5d4/0xc60
	schedule_preempt_disabled+0x40/0xc0
	cpu_startup_entry+0x2da/0x340
	start_secondary+0x28f/0x360
  Call Trace:
   [&lt;ffffffff81a6ce35&gt;] dump_stack+0x45/0x56
   [&lt;ffffffff81244aed&gt;] print_trailer+0xfd/0x170
   [&lt;ffffffff8124ac36&gt;] object_err+0x36/0x40
   [&lt;ffffffff8124cbf9&gt;] kasan_report_error+0x1e9/0x3a0
   [&lt;ffffffff8124d260&gt;] kasan_report+0x40/0x50
   [&lt;ffffffff810dda7c&gt;] ? task_numa_find_cpu+0x64c/0x890
   [&lt;ffffffff8124bee9&gt;] __asan_load8+0x69/0xa0
   [&lt;ffffffff814f5c38&gt;] ? find_next_bit+0xd8/0x120
   [&lt;ffffffff810dda7c&gt;] task_numa_find_cpu+0x64c/0x890
   [&lt;ffffffff810de16c&gt;] task_numa_migrate+0x4ac/0x7b0
   [&lt;ffffffff810de523&gt;] numa_migrate_preferred+0xb3/0xc0
   [&lt;ffffffff810e0b88&gt;] task_numa_fault+0xb88/0xed0
   [&lt;ffffffff8120ef02&gt;] do_numa_page+0x192/0x200
   [&lt;ffffffff81211038&gt;] handle_mm_fault+0x808/0x1160
   [&lt;ffffffff810d7dbd&gt;] ? sched_clock_cpu+0x10d/0x160
   [&lt;ffffffff81068c52&gt;] ? native_load_tls+0x82/0xa0
   [&lt;ffffffff81a7bd68&gt;] __do_page_fault+0x218/0x750
   [&lt;ffffffff810c2186&gt;] ? hrtimer_try_to_cancel+0x76/0x160
   [&lt;ffffffff81a6f5e7&gt;] ? schedule_hrtimeout_range_clock.part.24+0xf7/0x1c0
   [&lt;ffffffff81a7c2ba&gt;] do_page_fault+0x1a/0x70
   [&lt;ffffffff81a772e8&gt;] page_fault+0x28/0x30
   [&lt;ffffffff8128cbd4&gt;] ? do_sys_poll+0x1c4/0x6d0
   [&lt;ffffffff810e64f6&gt;] ? enqueue_task_fair+0x4b6/0xaa0
   [&lt;ffffffff810233c9&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff810cf70a&gt;] ? resched_task+0x7a/0xc0
   [&lt;ffffffff810d0663&gt;] ? check_preempt_curr+0xb3/0x130
   [&lt;ffffffff8128b5c0&gt;] ? poll_select_copy_remaining+0x170/0x170
   [&lt;ffffffff810d3bc0&gt;] ? wake_up_state+0x10/0x20
   [&lt;ffffffff8112a28f&gt;] ? drop_futex_key_refs.isra.14+0x1f/0x90
   [&lt;ffffffff8112d40e&gt;] ? futex_requeue+0x3de/0xba0
   [&lt;ffffffff8112e49e&gt;] ? do_futex+0xbe/0x8f0
   [&lt;ffffffff81022c89&gt;] ? read_tsc+0x9/0x20
   [&lt;ffffffff8111bd9d&gt;] ? ktime_get_ts+0x12d/0x170
   [&lt;ffffffff8108f699&gt;] ? timespec_add_safe+0x59/0xe0
   [&lt;ffffffff8128d1f6&gt;] SyS_poll+0x66/0x1a0
   [&lt;ffffffff81a830dd&gt;] system_call_fastpath+0x1a/0x1f

As commit 1effd9f19324 ("sched/numa: Fix unsafe get_task_struct() in
task_numa_assign()") points out, the rcu_read_lock() cannot protect the
task_struct from being freed in the finish_task_switch(). And the bug
happens in the process of calculation of imp which requires the access of
p-&gt;numa_faults being freed in the following path:

do_exit()
        current-&gt;flags |= PF_EXITING;
    release_task()
        ~~delayed_put_task_struct()~~
    schedule()
    ...
    ...
rq-&gt;curr = next;
    context_switch()
        finish_task_switch()
            put_task_struct()
                __put_task_struct()
		    task_numa_free()

The fix here to get_task_struct() early before end of dst_rq-&gt;lock to
protect the calculation process and also put_task_struct() in the
corresponding point if finally the dst_rq-&gt;curr somehow cannot be
assigned.

Additional credit to Liang Chen who helped fix the error logic and add the
put_task_struct() to the place it missed.

Signed-off-by: Gavin Guo &lt;gavin.guo@canonical.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: jay.vosburgh@canonical.com
Cc: liang.chen@canonical.com
Link: http://lkml.kernel.org/r/1453264618-17645-1-git-send-email-gavin.guo@canonical.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()</title>
<updated>2016-01-06T10:06:29+00:00</updated>
<author>
<name>Yuyang Du</name>
<email>yuyang.du@intel.com</email>
</author>
<published>2015-12-16T23:34:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0905f04eb21fc1c2e690bed5d0418a061d56c225'/>
<id>0905f04eb21fc1c2e690bed5d0418a061d56c225</id>
<content type='text'>
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.

Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().

Reported-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Signed-off-by: Yuyang Du &lt;yuyang.du@intel.com&gt;
[peterz: cfs_rq_last_update_time]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Juri Lelli &lt;Juri.Lelli@arm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.

Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().

Reported-by: Steve Muckle &lt;steve.muckle@linaro.org&gt;
Signed-off-by: Yuyang Du &lt;yuyang.du@intel.com&gt;
[peterz: cfs_rq_last_update_time]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Juri Lelli &lt;Juri.Lelli@arm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging new patches</title>
<updated>2016-01-06T10:02:29+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-01-06T10:02:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=567bee2803cb46caeb6011de5b738fde33dc3896'/>
<id>567bee2803cb46caeb6011de5b738fde33dc3896</id>
<content type='text'>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Fix multiplication overflow on 32-bit systems</title>
<updated>2016-01-06T10:01:05+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2015-12-14T12:47:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9e0e83a1eca66f8369e5a02973f85aad65c32416'/>
<id>9e0e83a1eca66f8369e5a02973f85aad65c32416</id>
<content type='text'>
Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
on 32-bit systems:

	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
	signed integer overflow:
	87950 * 47742 cannot be represented in type 'int'

The most likely effect of this bug are bad load average numbers
resulting in weird scheduling. It's also likely that this can
persist for a longer time - until the system goes idle for
a long time so that all load avg numbers get reset.

[ This is the CFS load average metric, not the procfs output, which
  is separate. ]

Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
on 32-bit systems:

	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
	signed integer overflow:
	87950 * 47742 cannot be represented in type 'int'

The most likely effect of this bug are bad load average numbers
resulting in weird scheduling. It's also likely that this can
persist for a longer time - until the system goes idle for
a long time so that all load avg numbers get reset.

[ This is the CFS load average metric, not the procfs output, which
  is separate. ]

Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Disable the task group load_avg update for the root_task_group</title>
<updated>2015-12-04T09:34:49+00:00</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2015-12-02T18:41:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aa0b7ae06387d40a988ce16a189082dee6e570bc'/>
<id>aa0b7ae06387d40a988ce16a189082dee6e570bc</id>
<content type='text'>
Currently, the update_tg_load_avg() function attempts to update the
tg's load_avg value whenever the load changes even for root_task_group
where the load_avg value will never be used. This patch will disable
the load_avg update when the given task group is the root_task_group.

Running a Java benchmark with noautogroup and a 4.3 kernel on a
16-socket IvyBridge-EX system, the amount of CPU time (as reported by
perf) consumed by task_tick_fair() which includes update_tg_load_avg()
decreased from 0.71% to 0.22%, a more than 3X reduction. The Max-jOPs
results also increased slightly from 983015 to 986449.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Link: http://lkml.kernel.org/r/1449081710-20185-4-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, the update_tg_load_avg() function attempts to update the
tg's load_avg value whenever the load changes even for root_task_group
where the load_avg value will never be used. This patch will disable
the load_avg update when the given task group is the root_task_group.

Running a Java benchmark with noautogroup and a 4.3 kernel on a
16-socket IvyBridge-EX system, the amount of CPU time (as reported by
perf) consumed by task_tick_fair() which includes update_tg_load_avg()
decreased from 0.71% to 0.22%, a more than 3X reduction. The Max-jOPs
results also increased slightly from 983015 to 986449.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Link: http://lkml.kernel.org/r/1449081710-20185-4-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()</title>
<updated>2015-12-04T09:34:47+00:00</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2015-11-25T19:09:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a426f99c91d1036767a7819aaaba6bd3191b7f06'/>
<id>a426f99c91d1036767a7819aaaba6bd3191b7f06</id>
<content type='text'>
Part of the responsibility of the update_sg_lb_stats() function is to
update the idle_cpus statistical counter in struct sg_lb_stats. This
check is done by calling idle_cpu(). The idle_cpu() function, in
turn, checks a number of fields within the run queue structure such
as rq-&gt;curr and rq-&gt;nr_running.

With the current layout of the run queue structure, rq-&gt;curr and
rq-&gt;nr_running are in separate cachelines. The rq-&gt;curr variable is
checked first followed by nr_running. As nr_running is also accessed
by update_sg_lb_stats() earlier, it makes no sense to load another
cacheline when nr_running is not 0 as idle_cpu() will always return
false in this case.

This patch eliminates this redundant cacheline load by checking the
cached nr_running before calling idle_cpu().

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1448478580-26467-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Part of the responsibility of the update_sg_lb_stats() function is to
update the idle_cpus statistical counter in struct sg_lb_stats. This
check is done by calling idle_cpu(). The idle_cpu() function, in
turn, checks a number of fields within the run queue structure such
as rq-&gt;curr and rq-&gt;nr_running.

With the current layout of the run queue structure, rq-&gt;curr and
rq-&gt;nr_running are in separate cachelines. The rq-&gt;curr variable is
checked first followed by nr_running. As nr_running is also accessed
by update_sg_lb_stats() earlier, it makes no sense to load another
cacheline when nr_running is not 0 as idle_cpu() will always return
false in this case.

This patch eliminates this redundant cacheline load by checking the
cached nr_running before calling idle_cpu().

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1448478580-26467-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Make it possible to account fair load avg consistently</title>
<updated>2015-12-04T09:34:42+00:00</updated>
<author>
<name>Byungchul Park</name>
<email>byungchul.park@lge.com</email>
</author>
<published>2015-10-23T16:16:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ad936d8658fd348338cb7d42c577dac77892b074'/>
<id>ad936d8658fd348338cb7d42c577dac77892b074</id>
<content type='text'>
The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park &lt;byungchul.park@lge.com&gt;
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park &lt;byungchul.park@lge.com&gt;
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Modify the comment about lock assumptions in migrate_task_rq_fair()</title>
<updated>2015-11-23T08:48:20+00:00</updated>
<author>
<name>Byungchul Park</name>
<email>byungchul.park@lge.com</email>
</author>
<published>2015-11-18T00:34:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=525628c73bd6af65f27d927e699e7460d7d55ed3'/>
<id>525628c73bd6af65f27d927e699e7460d7d55ed3</id>
<content type='text'>
The comment describing migrate_task_rq_fair() says that the caller
should hold p-&gt;pi_lock. But in some cases the caller can hold
task_rq(p)-&gt;lock instead of p-&gt;pi_lock. So the comment is broken and
this patch fixes it.

Signed-off-by: Byungchul Park &lt;byungchul.park@lge.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447806899-20303-1-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The comment describing migrate_task_rq_fair() says that the caller
should hold p-&gt;pi_lock. But in some cases the caller can hold
task_rq(p)-&gt;lock instead of p-&gt;pi_lock. So the comment is broken and
this patch fixes it.

Signed-off-by: Byungchul Park &lt;byungchul.park@lge.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447806899-20303-1-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Fix incorrect wait time and wait count statistics</title>
<updated>2015-11-23T08:48:17+00:00</updated>
<author>
<name>Joonwoo Park</name>
<email>joonwoop@codeaurora.org</email>
</author>
<published>2015-11-13T03:38:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3ea94de15ce9f3a217f6d0a7e9e0f48388902bb7'/>
<id>3ea94de15ce9f3a217f6d0a7e9e0f48388902bb7</id>
<content type='text'>
At present scheduler resets task's wait start timestamp when the task
migrates to another rq.  This misleads scheduler itself into reporting
less wait time than actual by omitting time spent for waiting prior to
migration and also more wait count than actual by counting migration as
wait end event which can be seen by trace or /proc/&lt;pid&gt;/sched with
CONFIG_SCHEDSTATS=y.

Carry forward migrating task's wait time prior to migration and
don't count migration as a wait end event to fix such statistics error.

In order to determine whether task is migrating mark task-&gt;on_rq with
TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration.

Signed-off-by: Joonwoo Park &lt;joonwoop@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: ohaugan@codeaurora.org
Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At present scheduler resets task's wait start timestamp when the task
migrates to another rq.  This misleads scheduler itself into reporting
less wait time than actual by omitting time spent for waiting prior to
migration and also more wait count than actual by counting migration as
wait end event which can be seen by trace or /proc/&lt;pid&gt;/sched with
CONFIG_SCHEDSTATS=y.

Carry forward migrating task's wait time prior to migration and
don't count migration as a wait end event to fix such statistics error.

In order to determine whether task is migrating mark task-&gt;on_rq with
TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration.

Signed-off-by: Joonwoo Park &lt;joonwoop@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: ohaugan@codeaurora.org
Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Remove old email address</title>
<updated>2015-11-23T08:44:58+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-11-16T10:08:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=90eec103b96e30401c0b846045bf8a1c7159b6da'/>
<id>90eec103b96e30401c0b846045bf8a1c7159b6da</id>
<content type='text'>
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
