<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched/core.c, branch linux-4.15.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched, cgroup: Don't reject lower cpu.max on ancestors</title>
<updated>2018-03-28T16:22:57+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-01-22T19:26:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6823e0efcb013d09d7c350ebf7aa3dcd7733eab7'/>
<id>6823e0efcb013d09d7c350ebf7aa3dcd7733eab7</id>
<content type='text'>
commit c53593e5cb693d59d9e8b64fb3a79436bf99c3b3 upstream.

While adding cgroup2 interface for the cpu controller, 0d5936344f30
("sched: Implement interface for cgroup unified hierarchy") forgot to
update input validation and left it to reject cpu.max config if any
descendant has set a higher value.

cgroup2 officially supports delegation and a descendant must not be
able to restrict what its ancestors can configure.  For absolute
limits such as cpu.max and memory.max, this means that the config at
each level should only act as the upper limit at that level and
shouldn't interfere with what other cgroups can configure.

This patch updates config validation on cgroup2 so that the cpu
controller follows the same convention.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 0d5936344f30 ("sched: Implement interface for cgroup unified hierarchy")
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c53593e5cb693d59d9e8b64fb3a79436bf99c3b3 upstream.

While adding cgroup2 interface for the cpu controller, 0d5936344f30
("sched: Implement interface for cgroup unified hierarchy") forgot to
update input validation and left it to reject cpu.max config if any
descendant has set a higher value.

cgroup2 officially supports delegation and a descendant must not be
able to restrict what its ancestors can configure.  For absolute
limits such as cpu.max and memory.max, this means that the config at
each level should only act as the upper limit at that level and
shouldn't interfere with what other cgroups can configure.

This patch updates config validation on cgroup2 so that the cpu
controller follows the same convention.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 0d5936344f30 ("sched: Implement interface for cgroup unified hierarchy")
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Stop resched_cpu() from sending IPIs to offline CPUs</title>
<updated>2018-03-19T08:09:51+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2017-10-13T23:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fc82675c04f8ce18ebba73fb0c48dd5d98ada5e5'/>
<id>fc82675c04f8ce18ebba73fb0c48dd5d98ada5e5</id>
<content type='text'>
[ Upstream commit a0982dfa03efca6c239c52cabebcea4afb93ea6b ]

The rcutorture test suite occasionally provokes a splat due to invoking
resched_cpu() on an offline CPU:

WARNING: CPU: 2 PID: 8 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
Modules linked in:
CPU: 2 PID: 8 Comm: rcu_preempt Not tainted 4.14.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff902ede9daf00 task.stack: ffff96c50010c000
RIP: 0010:native_smp_send_reschedule+0x37/0x40
RSP: 0018:ffff96c50010fdb8 EFLAGS: 00010096
RAX: 000000000000002e RBX: ffff902edaab4680 RCX: 0000000000000003
RDX: 0000000080000003 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: ffff96c50010fdb8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 00000000299f36ae R12: 0000000000000001
R13: ffffffff9de64240 R14: 0000000000000001 R15: ffffffff9de64240
FS:  0000000000000000(0000) GS:ffff902edfc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f7d4c642 CR3: 000000001e0e2000 CR4: 00000000000006e0
Call Trace:
 resched_curr+0x8f/0x1c0
 resched_cpu+0x2c/0x40
 rcu_implicit_dynticks_qs+0x152/0x220
 force_qs_rnp+0x147/0x1d0
 ? sync_rcu_exp_select_cpus+0x450/0x450
 rcu_gp_kthread+0x5a9/0x950
 kthread+0x142/0x180
 ? force_qs_rnp+0x1d0/0x1d0
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Code: 14 01 0f 92 c0 84 c0 74 14 48 8b 05 14 4f f4 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 38 89 ca 9d e8 e5 56 08 00 &lt;0f&gt; ff 5d c3 0f 1f 44 00 00 8b 05 52 9e 37 02 85 c0 75 38 55 48
---[ end trace 26df9e5df4bba4ac ]---

This splat cannot be generated by expedited grace periods because they
always invoke resched_cpu() on the current CPU, which is good because
expedited grace periods require that resched_cpu() unconditionally
succeed.  However, other parts of RCU can tolerate resched_cpu() acting
as a no-op, at least as long as it doesn't happen too often.

This commit therefore makes resched_cpu() invoke resched_curr() only if
the CPU is either online or is the current CPU.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;

Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a0982dfa03efca6c239c52cabebcea4afb93ea6b ]

The rcutorture test suite occasionally provokes a splat due to invoking
resched_cpu() on an offline CPU:

WARNING: CPU: 2 PID: 8 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
Modules linked in:
CPU: 2 PID: 8 Comm: rcu_preempt Not tainted 4.14.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff902ede9daf00 task.stack: ffff96c50010c000
RIP: 0010:native_smp_send_reschedule+0x37/0x40
RSP: 0018:ffff96c50010fdb8 EFLAGS: 00010096
RAX: 000000000000002e RBX: ffff902edaab4680 RCX: 0000000000000003
RDX: 0000000080000003 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: ffff96c50010fdb8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 00000000299f36ae R12: 0000000000000001
R13: ffffffff9de64240 R14: 0000000000000001 R15: ffffffff9de64240
FS:  0000000000000000(0000) GS:ffff902edfc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f7d4c642 CR3: 000000001e0e2000 CR4: 00000000000006e0
Call Trace:
 resched_curr+0x8f/0x1c0
 resched_cpu+0x2c/0x40
 rcu_implicit_dynticks_qs+0x152/0x220
 force_qs_rnp+0x147/0x1d0
 ? sync_rcu_exp_select_cpus+0x450/0x450
 rcu_gp_kthread+0x5a9/0x950
 kthread+0x142/0x180
 ? force_qs_rnp+0x1d0/0x1d0
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Code: 14 01 0f 92 c0 84 c0 74 14 48 8b 05 14 4f f4 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 38 89 ca 9d e8 e5 56 08 00 &lt;0f&gt; ff 5d c3 0f 1f 44 00 00 8b 05 52 9e 37 02 85 c0 75 38 55 48
---[ end trace 26df9e5df4bba4ac ]---

This splat cannot be generated by expedited grace periods because they
always invoke resched_cpu() on the current CPU, which is good because
expedited grace periods require that resched_cpu() unconditionally
succeed.  However, other parts of RCU can tolerate resched_cpu() acting
as a no-op, at least as long as it doesn't happen too often.

This commit therefore makes resched_cpu() invoke resched_curr() only if
the CPU is either online or is the current CPU.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;

Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>delayacct: Account blkio completion on the correct task</title>
<updated>2018-01-16T02:29:36+00:00</updated>
<author>
<name>Josh Snyder</name>
<email>joshs@netflix.com</email>
</author>
<published>2017-12-18T16:15:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c96f5471ce7d2aefd0dda560cc23f08ab00bc65d'/>
<id>c96f5471ce7d2aefd0dda560cc23f08ab00bc65d</id>
<content type='text'>
Before commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

delayacct_blkio_end() was called after context-switching into the task which
completed I/O.

This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.

With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.

But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.

Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.

Signed-off-by: Josh Snyder &lt;joshs@netflix.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Brendan Gregg &lt;bgregg@netflix.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-block@vger.kernel.org
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

delayacct_blkio_end() was called after context-switching into the task which
completed I/O.

This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.

With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.

But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.

Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.

Signed-off-by: Josh Snyder &lt;joshs@netflix.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Brendan Gregg &lt;bgregg@netflix.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-block@vger.kernel.org
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Fix kernel-doc warnings after code movement</title>
<updated>2017-12-11T15:10:42+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2017-12-03T21:19:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2064a5ab04707c55003e099e5abbf19a0826bbac'/>
<id>2064a5ab04707c55003e099e5abbf19a0826bbac</id>
<content type='text'>
Fix the following kernel-doc warnings after code restructuring:

  ../kernel/sched/core.c:5113: warning: No description found for parameter 't'
  ../kernel/sched/core.c:5113: warning: Excess function parameter 'interval' description in 'sched_rr_get_interval'

	get rid of set_fs()")

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: abca5fc535a3e ("sched_rr_get_interval(): move compat to native,
Link: http://lkml.kernel.org/r/995c6ded-b32e-bbe4-d9f5-4d42d121aff1@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the following kernel-doc warnings after code restructuring:

  ../kernel/sched/core.c:5113: warning: No description found for parameter 't'
  ../kernel/sched/core.c:5113: warning: Excess function parameter 'interval' description in 'sched_rr_get_interval'

	get rid of set_fs()")

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: abca5fc535a3e ("sched_rr_get_interval(): move compat to native,
Link: http://lkml.kernel.org/r/995c6ded-b32e-bbe4-d9f5-4d42d121aff1@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2017-11-17T19:54:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-17T19:54:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=93f30c73ecd0281cf3685ef0e4e384980a176176'/>
<id>93f30c73ecd0281cf3685ef0e4e384980a176176</id>
<content type='text'>
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to -&gt;compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to -&gt;compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2017-11-15T22:29:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-15T22:29:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=22714a2ba4b55737cd7d5299db7aaf1fa8287354'/>
<id>22714a2ba4b55737cd7d5299db7aaf1fa8287354</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:
 "Cgroup2 cpu controller support is finally merged.

   - Basic cpu statistics support to allow monitoring by default without
     the CPU controller enabled.

   - cgroup2 cpu controller support.

   - /sys/kernel/cgroup files to help dealing with new / optional
     features"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: export list of cgroups v2 features using sysfs
  cgroup: export list of delegatable control files using sysfs
  cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
  MAINTAINERS: relocate cpuset.c
  cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
  sched: Implement interface for cgroup unified hierarchy
  sched: Misc preps for cgroup unified hierarchy interface
  sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
  cgroup: statically initialize init_css_set-&gt;dfl_cgrp
  cgroup: Implement cgroup2 basic CPU usage accounting
  cpuacct: Introduce cgroup_account_cputime[_field]()
  sched/cputime: Expose cputime_adjust()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull cgroup updates from Tejun Heo:
 "Cgroup2 cpu controller support is finally merged.

   - Basic cpu statistics support to allow monitoring by default without
     the CPU controller enabled.

   - cgroup2 cpu controller support.

   - /sys/kernel/cgroup files to help dealing with new / optional
     features"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: export list of cgroups v2 features using sysfs
  cgroup: export list of delegatable control files using sysfs
  cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
  MAINTAINERS: relocate cpuset.c
  cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
  sched: Implement interface for cgroup unified hierarchy
  sched: Misc preps for cgroup unified hierarchy interface
  sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
  cgroup: statically initialize init_css_set-&gt;dfl_cgrp
  cgroup: Implement cgroup2 basic CPU usage accounting
  cpuacct: Introduce cgroup_account_cputime[_field]()
  sched/cputime: Expose cputime_adjust()
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2017-11-13T21:37:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-13T21:37:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3e2014637c50e5d6a77cd63d5db6c209fe29d1b1'/>
<id>3e2014637c50e5d6a77cd63d5db6c209fe29d1b1</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Group balancing enhancements and cleanups (Brendan Jackman)

   - Move CPU isolation related functionality into its separate
     kernel/sched/isolation.c file, with related 'housekeeping_*()'
     namespace and nomenclature et al. (Frederic Weisbecker)

   - Improve the interactive/cpu-intense fairness calculation (Josef
     Bacik)

   - Improve the PELT code and related cleanups (Peter Zijlstra)

   - Improve the logic of pick_next_task_fair() (Uladzislau Rezki)

   - Improve the RT IPI based balancing logic (Steven Rostedt)

   - Various micro-optimizations:

   - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)

   - better idle loop (Cheng Jian)

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
  sched/sysctl: Fix attributes of some extern declarations
  sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
  sched/isolation: Add basic isolcpus flags
  sched/isolation: Move isolcpus= handling to the housekeeping code
  sched/isolation: Handle the nohz_full= parameter
  sched/isolation: Introduce housekeeping flags
  sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
  sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
  sched/isolation: Use its own static key
  sched/isolation: Make the housekeeping cpumask private
  sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
  sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
  sched/isolation: Move housekeeping related code to its own file
  sched/idle: Micro-optimize the idle loop
  sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
  x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
  sched/rt: Simplify the IPI based RT balancing logic
  block/ioprio: Use a helper to check for RT prio
  sched/rt: Add a helper to test for a RT task
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Group balancing enhancements and cleanups (Brendan Jackman)

   - Move CPU isolation related functionality into its separate
     kernel/sched/isolation.c file, with related 'housekeeping_*()'
     namespace and nomenclature et al. (Frederic Weisbecker)

   - Improve the interactive/cpu-intense fairness calculation (Josef
     Bacik)

   - Improve the PELT code and related cleanups (Peter Zijlstra)

   - Improve the logic of pick_next_task_fair() (Uladzislau Rezki)

   - Improve the RT IPI based balancing logic (Steven Rostedt)

   - Various micro-optimizations:

   - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)

   - better idle loop (Cheng Jian)

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
  sched/sysctl: Fix attributes of some extern declarations
  sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
  sched/isolation: Add basic isolcpus flags
  sched/isolation: Move isolcpus= handling to the housekeeping code
  sched/isolation: Handle the nohz_full= parameter
  sched/isolation: Introduce housekeeping flags
  sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
  sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
  sched/isolation: Use its own static key
  sched/isolation: Make the housekeeping cpumask private
  sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
  sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
  sched/isolation: Move housekeeping related code to its own file
  sched/idle: Micro-optimize the idle loop
  sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
  x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
  sched/rt: Simplify the IPI based RT balancing logic
  block/ioprio: Use a helper to check for RT prio
  sched/rt: Add a helper to test for a RT task
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds</title>
<updated>2017-11-09T06:35:08+00:00</updated>
<author>
<name>Patrick Bellasi</name>
<email>patrick.bellasi@arm.com</email>
</author>
<published>2017-11-08T18:41:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=765cc3a4b224e22bf524fabe40284a524f37cdd0'/>
<id>765cc3a4b224e22bf524fabe40284a524f37cdd0</id>
<content type='text'>
When the kernel is compiled with !CONFIG_SCHED_DEBUG support, we expect that
all SCHED_FEAT are turned into compile time constants being propagated
to support compiler optimizations.

Specifically, we expect that code blocks like this:

   if (sched_feat(FEATURE_NAME) [&amp;&amp; &lt;other_conditions&gt;]) {
	/* FEATURE CODE */
   }

are turned into dead-code in case FEATURE_NAME defaults to FALSE, and thus
being removed by the compiler from the finale image.

For this mechanism to properly work it's required for the compiler to
have full access, from each translation unit, to whatever is the value
defined by the sched_feat macro. This macro is defined as:

   #define sched_feat(x) (sysctl_sched_features &amp; (1UL &lt;&lt; __SCHED_FEAT_##x))

and thus, the compiler can optimize that code only if the value of
sysctl_sched_features is visible within each translation unit.

Since:

   029632fbb ("sched: Make separate sched*.c translation units")

the scheduler code has been split into separate translation units
however the definition of sysctl_sched_features is part of
kernel/sched/core.c while, for all the other scheduler modules, it is
visible only via kernel/sched/sched.h as an:

   extern const_debug unsigned int sysctl_sched_features

Unfortunately, an extern reference does not allow the compiler to apply
constants propagation. Thus, on !CONFIG_SCHED_DEBUG kernel we still end up
with code to load a memory reference and (eventually) doing an unconditional
jump of a chunk of code.

This mechanism is unavoidable when sched_features can be turned on and off at
run-time. However, this is not the case for "production" kernels compiled with
!CONFIG_SCHED_DEBUG. In this case, sysctl_sched_features is just a constant value
which cannot be changed at run-time and thus memory loads and jumps can be
avoided altogether.

This patch fixes the case of !CONFIG_SCHED_DEBUG kernel by declaring a local version
of the sysctl_sched_features constant for each translation unit. This will
ultimately allow the compiler to perform constants propagation and dead-code
pruning.

Tests have been done, with !CONFIG_SCHED_DEBUG on a v4.14-rc8 with and without
the patch, by running 30 iterations of:

   perf bench sched messaging --pipe --thread --group 4 --loop 50000

on a 40 cores Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz using the
powersave governor to rule out variations due to frequency scaling.

Statistics on the reported completion time:

                   count     mean       std     min       99%     max
  v4.14-rc8         30.0  15.7831  0.176032  15.442  16.01226  16.014
  v4.14-rc8+patch   30.0  15.5033  0.189681  15.232  15.93938  15.962

... show a 1.8% speedup on average completion time and 0.5% speedup in the
99 percentile.

Signed-off-by: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Signed-off-by: Chris Redpath &lt;chris.redpath@arm.com&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Brendan Jackman &lt;brendan.jackman@arm.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: http://lkml.kernel.org/r/20171108184101.16006-1-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the kernel is compiled with !CONFIG_SCHED_DEBUG support, we expect that
all SCHED_FEAT are turned into compile time constants being propagated
to support compiler optimizations.

Specifically, we expect that code blocks like this:

   if (sched_feat(FEATURE_NAME) [&amp;&amp; &lt;other_conditions&gt;]) {
	/* FEATURE CODE */
   }

are turned into dead-code in case FEATURE_NAME defaults to FALSE, and thus
being removed by the compiler from the finale image.

For this mechanism to properly work it's required for the compiler to
have full access, from each translation unit, to whatever is the value
defined by the sched_feat macro. This macro is defined as:

   #define sched_feat(x) (sysctl_sched_features &amp; (1UL &lt;&lt; __SCHED_FEAT_##x))

and thus, the compiler can optimize that code only if the value of
sysctl_sched_features is visible within each translation unit.

Since:

   029632fbb ("sched: Make separate sched*.c translation units")

the scheduler code has been split into separate translation units
however the definition of sysctl_sched_features is part of
kernel/sched/core.c while, for all the other scheduler modules, it is
visible only via kernel/sched/sched.h as an:

   extern const_debug unsigned int sysctl_sched_features

Unfortunately, an extern reference does not allow the compiler to apply
constants propagation. Thus, on !CONFIG_SCHED_DEBUG kernel we still end up
with code to load a memory reference and (eventually) doing an unconditional
jump of a chunk of code.

This mechanism is unavoidable when sched_features can be turned on and off at
run-time. However, this is not the case for "production" kernels compiled with
!CONFIG_SCHED_DEBUG. In this case, sysctl_sched_features is just a constant value
which cannot be changed at run-time and thus memory loads and jumps can be
avoided altogether.

This patch fixes the case of !CONFIG_SCHED_DEBUG kernel by declaring a local version
of the sysctl_sched_features constant for each translation unit. This will
ultimately allow the compiler to perform constants propagation and dead-code
pruning.

Tests have been done, with !CONFIG_SCHED_DEBUG on a v4.14-rc8 with and without
the patch, by running 30 iterations of:

   perf bench sched messaging --pipe --thread --group 4 --loop 50000

on a 40 cores Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz using the
powersave governor to rule out variations due to frequency scaling.

Statistics on the reported completion time:

                   count     mean       std     min       99%     max
  v4.14-rc8         30.0  15.7831  0.176032  15.442  16.01226  16.014
  v4.14-rc8+patch   30.0  15.5033  0.189681  15.232  15.93938  15.962

... show a 1.8% speedup on average completion time and 0.5% speedup in the
99 percentile.

Signed-off-by: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Signed-off-by: Chris Redpath &lt;chris.redpath@arm.com&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Brendan Jackman &lt;brendan.jackman@arm.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: http://lkml.kernel.org/r/20171108184101.16006-1-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Move isolcpus= handling to the housekeeping code</title>
<updated>2017-10-27T07:55:30+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2017-10-27T02:42:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=edb9382175c3ebdced8ffdb3e0f20052ad9fdbe9'/>
<id>edb9382175c3ebdced8ffdb3e0f20052ad9fdbe9</id>
<content type='text'>
We want to centralize the isolation features, to be done by the housekeeping
subsystem and scheduler domain isolation is a significant part of it.

No intended behaviour change, we just reuse the housekeeping cpumask
and core code.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1509072159-31808-11-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We want to centralize the isolation features, to be done by the housekeeping
subsystem and scheduler domain isolation is a significant part of it.

No intended behaviour change, we just reuse the housekeeping cpumask
and core code.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1509072159-31808-11-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Introduce housekeeping flags</title>
<updated>2017-10-27T07:55:29+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2017-10-27T02:42:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=de201559df872f83d0c08fb4effe3efd28e6cbc8'/>
<id>de201559df872f83d0c08fb4effe3efd28e6cbc8</id>
<content type='text'>
Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want NOHZ_FULL
without the full scheduler isolation, others want full scheduler
isolation without NOHZ_FULL.

So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want NOHZ_FULL
without the full scheduler isolation, others want full scheduler
isolation without NOHZ_FULL.

So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
