<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/cpuset.h, branch linux-2.6.34.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched: Make select_fallback_rq() cpuset friendly</title>
<updated>2011-01-06T23:07:50+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-03-15T09:10:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=20b1d1ed393ddf313006bb5645417e55ef264a40'/>
<id>20b1d1ed393ddf313006bb5645417e55ef264a40</id>
<content type='text'>
commit 9084bb8246ea935b98320554229e2f371f7f52fa upstream.

Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in -&gt;cpus_allowed but ttwu/etc must be able to find a
suitable cpu.

I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100315091027.GA9155@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9084bb8246ea935b98320554229e2f371f7f52fa upstream.

Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in -&gt;cpus_allowed but ttwu/etc must be able to find a
suitable cpu.

I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100315091027.GA9155@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code</title>
<updated>2011-01-06T23:07:49+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-03-15T09:10:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b16ba66ee81f4df8466ea8aaf0f193bd87e458fc'/>
<id>b16ba66ee81f4df8466ea8aaf0f193bd87e458fc</id>
<content type='text'>
commit 897f0b3c3ff40b443c84e271bef19bd6ae885195 upstream.

This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.

- cpuset_lock() doesn't really work. It is needed for
  cpuset_cpus_allowed_locked() but we can't take this lock in
  try_to_wake_up()-&gt;select_fallback_rq() path.

- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
  callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
  stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
  cpuset_lock() and hangs forever because CPU is already dead and thus
  T can't be scheduled.

- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
  which is not irq-safe, but try_to_wake_up() can be called from irq.

Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.

Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.

The subsequent patches try to to fix these problems.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100315091003.GA9123@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 897f0b3c3ff40b443c84e271bef19bd6ae885195 upstream.

This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.

- cpuset_lock() doesn't really work. It is needed for
  cpuset_cpus_allowed_locked() but we can't take this lock in
  try_to_wake_up()-&gt;select_fallback_rq() path.

- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
  callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
  stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
  cpuset_lock() and hangs forever because CPU is already dead and thus
  T can't be scheduled.

- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
  which is not irq-safe, but try_to_wake_up() can be called from irq.

Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.

Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.

The subsequent patches try to to fix these problems.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100315091003.GA9123@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset,mm: update tasks' mems_allowed in time</title>
<updated>2009-06-17T02:47:31+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2009-06-16T22:31:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=58568d2a8215cb6f55caf2332017d7bdff954e1c'/>
<id>58568d2a8215cb6f55caf2332017d7bdff954e1c</id>
<content type='text'>
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy.  Because the memory policy is applied in the process's
context originally.  After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression.  But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set.  In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
  The rework of mpol_new() to extract the adjusting of the node mask to
  apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
  with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
  allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
  node mask to mpol_new_mpolicy().

  Remove the now unneeded 'nodes = NULL' from mpol_new().

  Note that mpol_new_mempolicy() is always called with a non-NULL
  'nodes' parameter now that it has been removed from mpol_new().
  Therefore, we don't need to test nodes for NULL before testing it for
  'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
  verify this assumption.]
[lee.schermerhorn@hp.com:

  I don't think the function name 'mpol_new_mempolicy' is descriptive
  enough to differentiate it from mpol_new().

  This function applies cpuset set context, usually constraining nodes
  to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
  is set, it also translates the nodes.  So I settled on
  'mpol_set_nodemask()', because the comment block for mpol_new() mentions
  that we need to call this function to "set nodes".

  Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy.  Because the memory policy is applied in the process's
context originally.  After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression.  But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set.  In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
  The rework of mpol_new() to extract the adjusting of the node mask to
  apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
  with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
  allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
  node mask to mpol_new_mpolicy().

  Remove the now unneeded 'nodes = NULL' from mpol_new().

  Note that mpol_new_mempolicy() is always called with a non-NULL
  'nodes' parameter now that it has been removed from mpol_new().
  Therefore, we don't need to test nodes for NULL before testing it for
  'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
  verify this assumption.]
[lee.schermerhorn@hp.com:

  I don't think the function name 'mpol_new_mempolicy' is descriptive
  enough to differentiate it from mpol_new().

  This function applies cpuset set context, usually constraining nodes
  to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
  is set, it also translates the nodes.  So I settled on
  'mpol_set_nodemask()', because the comment block for mpol_new() mentions
  that we need to call this function to "set nodes".

  Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: replace zone allowed functions with node allowed</title>
<updated>2009-04-03T02:04:57+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2009-04-02T23:57:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a1bc5a4eee990a1f290735c8694d0aebdad095fa'/>
<id>a1bc5a4eee990a1f290735c8694d0aebdad095fa</id>
<content type='text'>
The cpuset_zone_allowed() variants are actually only a function of the
zone's node.

Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cpuset_zone_allowed() variants are actually only a function of the
zone's node.

Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpumask: use new cpumask_ functions in core code.</title>
<updated>2009-03-30T11:35:16+00:00</updated>
<author>
<name>Rusty Russell</name>
<email>rusty@rustcorp.com.au</email>
</author>
<published>2009-03-31T04:05:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aa85ea5b89c36c51200d795dd788139bd9b8cf50'/>
<id>aa85ea5b89c36c51200d795dd788139bd9b8cf50</id>
<content type='text'>
Impact: cleanup

Time to clean up remaining laggards using the old cpu_ functions.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Trond.Myklebust@netapp.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Impact: cleanup

Time to clean up remaining laggards using the old cpu_ functions.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Trond.Myklebust@netapp.com
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: remove remaining pointers to cpumask_t</title>
<updated>2009-01-08T16:31:11+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2009-01-08T02:08:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6af866af34a96fed24a55979a78b6f73bd4e8e87'/>
<id>6af866af34a96fed24a55979a78b6f73bd4e8e87</id>
<content type='text'>
Impact: cleanups, use new cpumask API

Final trivial cleanups: mainly s/cpumask_t/struct cpumask

Note there is a FIXME in generate_sched_domains(). A future patch will
change struct cpumask *doms to struct cpumask *doms[].
(I suppose Rusty will do this.)

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Mike Travis &lt;travis@sgi.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Impact: cleanups, use new cpumask API

Final trivial cleanups: mainly s/cpumask_t/struct cpumask

Note there is a FIXME in generate_sched_domains(). A future patch will
change struct cpumask *doms to struct cpumask *doms[].
(I suppose Rusty will do this.)

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Acked-by: Mike Travis &lt;travis@sgi.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: print triggering task's cpuset and mems allowed</title>
<updated>2009-01-06T23:58:59+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2009-01-06T22:39:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=75aa199410359dc5fbcf9025ff7af98a9d20f0d5'/>
<id>75aa199410359dc5fbcf9025ff7af98a9d20f0d5</id>
<content type='text'>
When cpusets are enabled, it's necessary to print the triggering task's
set of allowable nodes so the subsequently printed meminfo can be
interpreted correctly.

We also print the task's cpuset name for informational purposes.

[rientjes@google.com: task lock current before dereferencing cpuset]
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When cpusets are enabled, it's necessary to print the triggering task's
set of allowable nodes so the subsequently printed meminfo can be
interpreted correctly.

We also print the task's cpuset name for informational purposes.

[rientjes@google.com: task lock current before dereferencing cpuset]
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: update top cpuset's mems after adding a node</title>
<updated>2008-11-20T02:49:58+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2008-11-19T23:36:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f481891fdc49d3d1b8a9674a1825d183069a805f'/>
<id>f481891fdc49d3d1b8a9674a1825d183069a805f</id>
<content type='text'>
After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

  cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes.  It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY.  So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it.  And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Acked-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Paul Menage &lt;menage@google.com
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

  cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes.  It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY.  So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it.  And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Acked-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Paul Menage &lt;menage@google.com
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: arch_reinit_sched_domains() must destroy domains to force rebuild</title>
<updated>2008-09-06T17:22:15+00:00</updated>
<author>
<name>Max Krasnyansky</name>
<email>maxk@qualcomm.com</email>
</author>
<published>2008-08-29T20:11:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dfb512ec4834116124da61d6c1ee10fd0aa32bd6'/>
<id>dfb512ec4834116124da61d6c1ee10fd0aa32bd6</id>
<content type='text'>
What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.

What this means is that doing
     echo 1 &gt; /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).

This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.

Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.

Signed-off-by: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
Tested-by: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.

What this means is that doing
     echo 1 &gt; /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).

This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.

Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.

Signed-off-by: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
Tested-by: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)</title>
<updated>2008-07-18T11:22:25+00:00</updated>
<author>
<name>Max Krasnyansky</name>
<email>maxk@qualcomm.com</email>
</author>
<published>2008-07-15T11:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e761b7725234276a802322549cee5255305a0930'/>
<id>e761b7725234276a802322549cee5255305a0930</id>
<content type='text'>
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
	while true; do make clean; make -j 8; done
and
	while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 &gt; /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy &lt;maxk@qualcomm.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Gregory Haskins &lt;ghaskins@novell.com&gt;
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
	while true; do make clean; make -j 8; done
and
	while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 &gt; /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy &lt;maxk@qualcomm.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Gregory Haskins &lt;ghaskins@novell.com&gt;
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
</feed>
