<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched/core.c, branch v3.4.112</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched/core: Fix TASK_DEAD race in finish_task_switch()</title>
<updated>2016-04-27T10:55:21+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-09-29T12:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a12321d34f35fb8eacb3f39d1d53eb8d6e52fa8a'/>
<id>a12321d34f35fb8eacb3f39d1d53eb8d6e52fa8a</id>
<content type='text'>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A-&gt;pi_lock
                                          A-&gt;on_cpu == 0
        finish_task_switch(A)
          prev_state = A-&gt;state  &lt;-.
          WMB                      |
          A-&gt;on_cpu = 0;           |
          UNLOCK rq0-&gt;lock         |
                                   |    context_switch(C, A)
                                   `--  A-&gt;state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A-&gt;state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A-&gt;state on CPU0
to cross over and observe CPU1's store of A-&gt;state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A-&gt;state while holding rq-&gt;lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq-&gt;lock; it takes A-&gt;pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&amp;A-&gt;on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq-&gt;lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which
 is not defined in 3.4.y]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A-&gt;pi_lock
                                          A-&gt;on_cpu == 0
        finish_task_switch(A)
          prev_state = A-&gt;state  &lt;-.
          WMB                      |
          A-&gt;on_cpu = 0;           |
          UNLOCK rq0-&gt;lock         |
                                   |    context_switch(C, A)
                                   `--  A-&gt;state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A-&gt;state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A-&gt;state on CPU0
to cross over and observe CPU1's store of A-&gt;state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A-&gt;state while holding rq-&gt;lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq-&gt;lock; it takes A-&gt;pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&amp;A-&gt;on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq-&gt;lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which
 is not defined in 3.4.y]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Queue RT tasks to head when prio drops</title>
<updated>2015-09-18T01:20:47+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-02-07T19:58:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aaedb09057b05c7c9e213dc465bff5f70e708535'/>
<id>aaedb09057b05c7c9e213dc465bff5f70e708535</id>
<content type='text'>
commit 81a44c5441d7f7d2c3dc9105f4d65ad0d5818617 upstream.

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:

 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 81a44c5441d7f7d2c3dc9105f4d65ad0d5818617 upstream.

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:

 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix RLIMIT_RTTIME when PI-boosting to RT</title>
<updated>2015-06-19T03:40:29+00:00</updated>
<author>
<name>Brian Silverman</name>
<email>brian@peloton-tech.com</email>
</author>
<published>2015-02-19T00:23:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=36cddaebe771b9476da10b724da435d5130bb0aa'/>
<id>36cddaebe771b9476da10b724da435d5130bb0aa</id>
<content type='text'>
commit 746db9443ea57fd9c059f62c4bfbf41cf224fe13 upstream.

When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.

I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.

Signed-off-by: Brian Silverman &lt;brian@peloton-tech.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: austin@peloton-tech.com
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[lizf: Backported to 3.4: adjust contest]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 746db9443ea57fd9c059f62c4bfbf41cf224fe13 upstream.

When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.

I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.

Signed-off-by: Brian Silverman &lt;brian@peloton-tech.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: austin@peloton-tech.com
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[lizf: Backported to 3.4: adjust contest]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>printk: rename printk_sched to printk_deferred</title>
<updated>2014-08-07T19:00:10+00:00</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-06-04T23:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f73ff697833654ee578b606ea746d15dc1220aab'/>
<id>f73ff697833654ee578b606ea746d15dc1220aab</id>
<content type='text'>
commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream.

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jiri Bohac &lt;jbohac@suse.cz&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream.

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jiri Bohac &lt;jbohac@suse.cz&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix hotplug vs. set_cpus_allowed_ptr()</title>
<updated>2014-06-11T19:04:11+00:00</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-05-16T03:50:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d6e81d4f7b4e8cea2817c7568e3756798cd29aae'/>
<id>d6e81d4f7b4e8cea2817c7568e3756798cd29aae</id>
<content type='text'>
commit 6acbfb96976fc3350e30d964acb1dbbdf876d55e upstream.

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Gautham R. Shenoy &lt;ego@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael wang &lt;wangyun@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6acbfb96976fc3350e30d964acb1dbbdf876d55e upstream.

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Gautham R. Shenoy &lt;ego@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael wang &lt;wangyun@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Fix sd-&gt;*_idx limit range avoiding overflow</title>
<updated>2014-06-07T23:02:04+00:00</updated>
<author>
<name>libin</name>
<email>huawei.libin@huawei.com</email>
</author>
<published>2013-04-08T06:39:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e5f1ec5d39f6d59bfe5c53f4b28a94e3a4245e02'/>
<id>e5f1ec5d39f6d59bfe5c53f4b28a94e3a4245e02</id>
<content type='text'>
commit fd9b86d37a600488dbd80fe60cca46b822bff1cd upstream.

Commit 201c373e8e ("sched/debug: Limit sd-&gt;*_idx range on
sysctl") was an incomplete bug fix.

This patch fixes sd-&gt;*_idx limit range to [0 ~ CPU_LOAD_IDX_MAX-1]
avoiding array overflow caused by setting sd-&gt;*_idx to CPU_LOAD_IDX_MAX
on sysctl.

Signed-off-by: Libin &lt;huawei.libin@huawei.com&gt;
Cc: &lt;jiang.liu@huawei.com&gt;
Cc: &lt;guohanjun@huawei.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/51626610.2040607@huawei.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit fd9b86d37a600488dbd80fe60cca46b822bff1cd upstream.

Commit 201c373e8e ("sched/debug: Limit sd-&gt;*_idx range on
sysctl") was an incomplete bug fix.

This patch fixes sd-&gt;*_idx limit range to [0 ~ CPU_LOAD_IDX_MAX-1]
avoiding array overflow caused by setting sd-&gt;*_idx to CPU_LOAD_IDX_MAX
on sysctl.

Signed-off-by: Libin &lt;huawei.libin@huawei.com&gt;
Cc: &lt;jiang.liu@huawei.com&gt;
Cc: &lt;guohanjun@huawei.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/51626610.2040607@huawei.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/debug: Limit sd-&gt;*_idx range on sysctl</title>
<updated>2014-06-07T23:02:04+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung.kim@lge.com</email>
</author>
<published>2012-08-16T08:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4aff95ab8f985d14f585187b5f78c65a56dd716b'/>
<id>4aff95ab8f985d14f585187b5f78c65a56dd716b</id>
<content type='text'>
commit 201c373e8e4823700d3160d5c28e1ab18fd1193e upstream.

Various sd-&gt;*_idx's are used for refering the rq's load average table
when selecting a cpu to run.  However they can be set to any number
with sysctl knobs so that it can crash the kernel if something bad is
given. Fix it by limiting them into the actual range.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1345104204-8317-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 201c373e8e4823700d3160d5c28e1ab18fd1193e upstream.

Various sd-&gt;*_idx's are used for refering the rq's load average table
when selecting a cpu to run.  However they can be set to any number
with sysctl knobs so that it can crash the kernel if something bad is
given. Fix it by limiting them into the actual range.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1345104204-8317-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/nohz: Fix rq-&gt;cpu_load calculations some more</title>
<updated>2014-02-20T18:45:32+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-05-17T15:15:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f5a4c4b79e57f875b6788f6f8352ca246bfd8450'/>
<id>f5a4c4b79e57f875b6788f6f8352ca246bfd8450</id>
<content type='text'>
commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq-&gt;cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq-&gt;cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq-&gt;cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq-&gt;cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/nohz: Fix rq-&gt;cpu_load[] calculations</title>
<updated>2014-02-20T18:45:32+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-05-11T15:31:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e2d51f27e382be7b70a755f3ea2fbbeacdb50834'/>
<id>e2d51f27e382be7b70a755f3ea2fbbeacdb50834</id>
<content type='text'>
commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq-&gt;cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq-&gt;cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Unthrottle rt runqueues in __disable_runtime()</title>
<updated>2014-02-13T19:51:19+00:00</updated>
<author>
<name>Peter Boonstoppel</name>
<email>pboonstoppel@nvidia.com</email>
</author>
<published>2012-08-09T22:34:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f61eb9ceb26cee3fdbb8c7a4920f171f7661fb4f'/>
<id>f61eb9ceb26cee3fdbb8c7a4920f171f7661fb4f</id>
<content type='text'>
commit a4c96ae319b8047f62dedbe1eac79e321c185749 upstream.

migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel &lt;pboonstoppel@nvidia.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a4c96ae319b8047f62dedbe1eac79e321c185749 upstream.

migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel &lt;pboonstoppel@nvidia.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
