linux.git/kernel/sched_fair.c, branch master

sched: Move all scheduler bits into kernel/sched/

2011-11-17T11:20:22+00:00

There's too many sched*.[ch] files in kernel/, give them their own
directory.

(No code changed, other than Makefile glue added.)

Signed-off-by: Peter Zijlstra 
Signed-off-by: Ingo Molnar

sched: Make separate sched*.c translation units

2011-11-17T11:20:19+00:00

Since once needs to do something at conferences and fixing compile
warnings doesn't actually require much if any attention I decided
to break up the sched.c #include "*.c" fest.

This further modularizes the scheduler code.

Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-x0fcd3mnp8f9c99grcpewmhi@git.kernel.org
Signed-off-by: Ingo Molnar

sched: Use jump labels to reduce overhead when bandwidth control is inactive

2011-11-16T07:48:18+00:00

Now that the linkage of jump-labels has been fixed they show a measurable
improvement in overhead for the enabled-but-unused case.

Workload is:

 'taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches
  bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"'

There's a speedup for all situations:

                 instructions            cycles                  branches
 -------------------------------------------------------------------------
 Intel Westmere
 base            806611770               745895590               146765378
 +jumplabel      803090165 (-0.44%)      713381840 (-4.36%)      144561130

 AMD Barcelona
 base            824657415               740055589               148855354
 +jumplabel      821056910 (-0.44%)      737558389 (-0.34%)      146635229

Signed-off-by: Paul Turner 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20111108042736.560831357@google.com
Signed-off-by: Ingo Molnar

sched: Fix buglet in return_cfs_rq_runtime()

2011-11-16T07:43:45+00:00

In return_cfs_rq_runtime() we want to return bandwidth when there are no
remaining tasks, not "return" when this is the case.

Signed-off-by: Paul Turner 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20111108042736.623812423@google.com
Signed-off-by: Ingo Molnar

sched: Avoid SMT siblings in select_idle_sibling() if possible

2011-11-16T07:43:43+00:00

Avoid select_idle_sibling() from picking a sibling thread if there's
an idle core that shares cache.

This fixes SMT balancing in the increasingly common case where there's
a shared cache core available to balance to.

Tested-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra 
Cc: Suresh Siddha 
Link: http://lkml.kernel.org/r/1321350377.1421.55.camel@twins
Signed-off-by: Ingo Molnar

sched_fair: Fix a typo in the comment describing update_sd_lb_stats

2011-11-14T11:50:34+00:00

Signed-off-by: Hui Kang 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1318388459-4427-1-git-send-email-hkang.sunysb@gmail.com
Signed-off-by: Ingo Molnar

sched: Add a comment to effective_load() since it's a pain

2011-11-14T11:50:32+00:00

Every time I have to stare at this function I need to completely
reverse engineer its workings, about time I write a comment
explaining the thing.

Collected bits and pieces from previous changelogs, mostly:

  4be9daaa1b33701f011f4117f22dc1e45a3e6e34
  83378269a5fad98f562ebc0f09c349575e6cbfe1

Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1318518057.27731.2.camel@twins
Signed-off-by: Ingo Molnar

sched: Wrap scheduler p->cpus_allowed access

2011-10-06T10:46:56+00:00

This task is preparatory for the migrate_disable() implementation, but
stands on its own and provides a cleanup.

It currently only converts those sites required for task-placement.
Kosaki-san once mentioned replacing cpus_allowed with a proper
cpumask_t instead of the NR_CPUS sized array it currently is, that
would also require something like this.

Signed-off-by: Peter Zijlstra 
Acked-by: Thomas Gleixner 
Cc: KOSAKI Motohiro 
Link: http://lkml.kernel.org/n/tip-e42skvaddos99psip0vce41o@git.kernel.org
Signed-off-by: Ingo Molnar

sched: Request for idle balance during nohz idle load balance

2011-10-06T10:46:27+00:00

rq's idle_at_tick is set to idle/busy during the timer tick
depending on the cpu was idle or not. This will be used later in the load
balance that will be done in the softirq context (which is a process
context in -RT kernels).

For nohz kernels, for the cpu doing nohz idle load balance on behalf of
all the idle cpu's, its rq->idle_at_tick might have a stale value (which is
recorded when it got the timer tick presumably when it is busy).

As the nohz idle load balancing is also being done at the same place
as the regular load balancing, nohz idle load balancing was bailing out
when it sees rq's idle_at_tick not set.

Thus leading to poor system utilization.

Rename rq's idle_at_tick to idle_balance and set it when someone requests
for nohz idle balance on an idle cpu.

Reported-by: Srivatsa Vaddagiri 
Signed-off-by: Suresh Siddha 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20111003220934.892350549@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar

sched: Use resched IPI to kick off the nohz idle balance

2011-10-06T10:46:23+00:00

Current use of smp call function to kick the nohz idle balance can deadlock
in this scenario.

1. cpu-A did a generic_exec_single() to cpu-B and after queuing its call single
data (csd) to the call single queue, cpu-A took a timer interrupt.  Actual IPI
to cpu-B to process the call single queue is not yet sent.

2. As part of the timer interrupt handler, cpu-A decided to kick cpu-B
for the idle load balancing (sets cpu-B's rq->nohz_balance_kick to 1)
and __smp_call_function_single() with nowait will queue the csd to the
cpu-B's queue. But the generic_exec_single() won't send an IPI to cpu-B
as the call single queue was not empty.

3. cpu-A is busy with lot of interrupts

4. Meanwhile cpu-B is entering and exiting idle and noticed that it has
it's rq->nohz_balance_kick set to '1'. So it will go ahead and do the
idle load balancer and clear its rq->nohz_balance_kick.

5. At this point, csd queued as part of the step-2 above is still locked
and waiting to be serviced on cpu-B.

6. cpu-A is still busy with interrupt load and now it got another timer
interrupt and as part of it decided to kick cpu-B for another idle load
balancing (as it finds cpu-B's rq->nohz_balance_kick cleared in step-4
above) and does __smp_call_function_single() with the same csd that is
still locked.

7. And we get a deadlock waiting for the csd_lock() in the
__smp_call_function_single().

Main issue here is that cpu-B can service the idle load balancer kick
request from cpu-A even with out receiving the IPI and this lead to
doing multiple __smp_call_function_single() on the same csd leading to
deadlock.

To kick a cpu, scheduler already has the reschedule vector reserved. Use
that mechanism (kick_process()) instead of using the generic smp call function
mechanism to kick off the nohz idle load balancing and avoid the deadlock.

   [ This issue is present from 2.6.35+ kernels, but marking it -stable
     only from v3.0+ as the proposed fix depends on the scheduler_ipi()
     that is introduced recently. ]

Reported-by: Prarit Bhargava 
Signed-off-by: Suresh Siddha 
Cc: stable@kernel.org # v3.0+
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20111003220934.834943260@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar