linux-stable.git/kernel/sched.c, branch linux-2.6.20.y

[PATCH] sched: fix next_interval determination in idle_balance()

2007-08-15T08:02:31+00:00

Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS.
(and later on reproduced without CFS as well).

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.
Otherwise we may defer rebalancing forever and nodes might stay idle for
very long times.

Siddha also spotted that the conversion of the balance interval to
jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri 

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).

Tested-by: Paul E. McKenney 

It did in fact trigger under all three of mainline, CFS, and -rt
including CFS -- see below for a couple of emails from last Friday
giving results for these three on the AMD box (where it happened) and on
a single-quad NUMA-Q system (where it did not, at least not with such
severity).

Signed-off-by: Christoph Lameter 
Signed-off-by: Ingo Molnar 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

sched: fix SMT scheduler bug

2007-03-09T18:50:27+00:00

The SMT scheduler incorrectly skips kernel threads even if they are
runnable (but they are preempted by a higher-prio user-space task which got
SMT-delayed by an even higher-priority task running on a sibling CPU).

Fix this for now by only doing the SMT-nice optimization if the
to-be-delayed task is the only runnable task.  (This should cover most of
the real-life cases anyway.)

This bug has been in the SMT scheduler since 2.6.17 or so, but has only
been noticed now by the active check in the dynticks code.

Signed-off-by: Ingo Molnar 
Cc: Michal Piotrowski 
Cc: Nick Piggin 
Cc: Thomas Gleixner 
Cc: Chuck Ebbert 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Fix posix-cpu-timer breakage caused by stale p->last_ran value

2007-03-09T18:50:24+00:00

Problem description at:
http://bugzilla.kernel.org/show_bug.cgi?id=8048

Commit b18ec80396834497933d77b81ec0918519f4e2a7 
    [PATCH] sched: improve migration accuracy
optimized the scheduler time calculations, but broke posix-cpu-timers.

The problem is that the p->last_ran value is not updated after a context
switch. So a subsequent call to current_sched_time() calculates with a
stale p->last_ran value, i.e. accounts the full time, which the task was
scheduled away.

Signed-off-by: Thomas Gleixner 
Acked-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

[PATCH] sched: tasks cannot run on cpus onlined after boot

2007-01-12T02:18:20+00:00

Commit 5c1e176781f43bc902a51e5832f789756bff911b ("sched: force /sbin/init
off isolated cpus") sets init's cpus_allowed to a subset of cpu_online_map
at boot time, which means that tasks won't be scheduled on cpus that are
added to the system later.

Make init's cpus_allowed a subset of cpu_possible_map instead.  This should
still preserve the behavior that Nick's change intended.

Thanks to Giuliano Pochini for reporting this and testing the fix:

http://ozlabs.org/pipermail/linuxppc-dev/2006-December/029397.html

Signed-off-by: Nathan Lynch 
Acked-by: Ingo Molnar 
Cc: Nick Piggin 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] sched: fix cond_resched_softirq() offset

2006-12-30T18:56:41+00:00

Remove the __resched_legal() check: it is conceptually broken.  The biggest
problem it had is that it can mask buggy cond_resched() calls.  A
cond_resched() call is only legal if we are not in an atomic context, with
two narrow exceptions:

 - if the system is booting
 - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set

But __resched_legal() hid this and just silently returned whenever
these primitives were called from invalid contexts. (Same goes for
cond_resched_locked() and cond_resched_softirq()).

Furthermore, the __legal_resched(0) call was buggy in that it caused
unnecessarily long softirq latencies via cond_resched_softirq().  (which is
only called from softirq-off sections, hence the code did nothing.)

The fix is to resurrect the efficiency of the might_sleep checks and to
only allow the narrow exceptions.

Signed-off-by: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] sched: remove __cpuinitdata anotation to cpu_isolated_map

2006-12-22T16:55:47+00:00

The structure cpu_isolated_map is used not only during initialization.
Multi-core scheduler configuration changes and exclusive cpusets
use this during run time.  During setting of sched_mc_power_savings
 policy, this structure is accessed to update sched_domains.

Signed-off-by: Tim Chen 
Acked-by: Suresh Siddha 
Acked-by: Ingo Molnar 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Conditionally check expected_preempt_count in __resched_legal()

2006-12-22T16:55:46+00:00

Commit 2d7d253548cffdce80f4e03664686e9ccb1b0ed7 ("fix cond_resched() fix")
introduced an 'expected_preempt_count' parameter to __resched_legal() to
fix a bug where it was returning a false negative when called from
cond_resched_lock() and preemption was enabled.

Unfortunately this broke things for when preemption is disabled.
preempt_count() will always return zero, thus failing the check against any
value of expected_preempt_count not equal to zero.  cond_resched_lock() for
example, passes an expected_preempt_count value of 1.

So fix the fix for the cond_resched() fix by skipping the check of
preempt_count() against expected_preempt_count when preemption is disabled.

Credit should go to Sunil Mushran for spotting the bug during testing.

Signed-off-by: Mark Fasheh 
Acked-by: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] sched: improve efficiency of sched_fork()

2006-12-21T08:11:51+00:00

Problem:
  sched_fork() has always called scheduler_tick() in some (unlikely)
  circumstances in order to update the current task in light of those
  circumstances.  It has always been the case that the work done by
  scheduler_tick() was more than was required to handle the problem in
  hand but no harm was done except for the waste of a few CPU cycles.

  However, the splitting of scheduler_tick() into two procedures in
  2.6.20-rc1 enables the wasted cycles to be saved as the new procedure
  task_running_tick() does all the work that is required to rectify the
  problem being handled.

Solution:
  Replace the call to scheduler_tick() in sched_fork() with a call to
  task_running_tick().

Signed-off-by: Peter Williams 
Acked-by: Ingo Molnar 
Signed-off-by: Linus Torvalds

[PATCH] lockdep: print irq-trace info on asserts

2006-12-13T17:05:50+00:00

When we print an assert due to scheduling-in-atomic bugs, and if lockdep
is enabled, then the IRQ tracing information of lockdep can be printed
to pinpoint the code location that disabled interrupts. This saved me
quite a bit of debugging time in cases where the backtrace did not
identify the irq-disabling site well enough.

Signed-off-by: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] kernel/sched.c: whitespace cleanups

2006-12-10T17:57:20+00:00

[akpm@osdl.org: additional cleanups]
Signed-off-by: Miguel Ojeda Sandonis 
Acked-by: Ingo Molnar 
Cc: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds