<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/sparse.c, branch linux-2.6.20.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>[PATCH] x86_64: allocate sparsemem memmap above 4G</title>
<updated>2007-08-15T08:02:24+00:00</updated>
<author>
<name>Zou Nan hai</name>
<email>nanhai.zou@intel.com</email>
</author>
<published>2007-06-01T07:46:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6f86cc30ff0f6610b6ec1d41d7d1ab1061e0cc47'/>
<id>6f86cc30ff0f6610b6ec1d41d7d1ab1061e0cc47</id>
<content type='text'>
On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai &lt;nanhai.zou@intel.com&gt;
Acked-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[chrisw: trivial backport]
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai &lt;nanhai.zou@intel.com&gt;
Acked-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[chrisw: trivial backport]
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int</title>
<updated>2006-12-07T16:39:23+00:00</updated>
<author>
<name>Andy Whitcroft</name>
<email>apw@shadowen.org</email>
</author>
<published>2006-12-07T04:33:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25ba77c141dbcd2602dd0171824d0d72aa023a01'/>
<id>25ba77c141dbcd2602dd0171824d0d72aa023a01</id>
<content type='text'>
NUMA node ids are passed as either int or unsigned int almost exclusivly
page_to_nid and zone_to_nid both return unsigned long.  This is a throw
back to when page_to_nid was a #define and was thus exposing the real type
of the page flags field.

In addition to fixing up the definitions of page_to_nid and zone_to_nid I
audited the users of these functions identifying the following incorrect
uses:

1) mm/page_alloc.c show_node() -- printk dumping the node id,
2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
   against numa_node_id() which returns an int from cpu_to_node(), and
3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
   uses bit_set which in generic code takes an int.

Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@engr.sgi.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NUMA node ids are passed as either int or unsigned int almost exclusivly
page_to_nid and zone_to_nid both return unsigned long.  This is a throw
back to when page_to_nid was a #define and was thus exposing the real type
of the page flags field.

In addition to fixing up the definitions of page_to_nid and zone_to_nid I
audited the users of these functions identifying the following incorrect
uses:

1) mm/page_alloc.c show_node() -- printk dumping the node id,
2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
   against numa_node_id() which returns an int from cpu_to_node(), and
3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
   uses bit_set which in generic code takes an int.

Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@engr.sgi.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Get rid of zone_table[]</title>
<updated>2006-12-07T16:39:20+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-12-07T04:31:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=89689ae7f95995723fbcd5c116c47933a3bb8b13'/>
<id>89689ae7f95995723fbcd5c116c47933a3bb8b13</id>
<content type='text'>
The zone table is mostly not needed.  If we have a node in the page flags
then we can get to the zone via NODE_DATA() which is much more likely to be
already in the cpu cache.

In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
access an exact replica of zonetable in the node_zones field.  In all of
the above cases there will be no need at all for the zone table.

The only remaining case is if in a NUMA system the node numbers do not fit
into the page flags.  In that case we make sparse generate a table that
maps sections to nodes and use that table to to figure out the node number.
 This table is sized to fit in a single cache line for the known 32 bit
NUMA platform which makes it very likely that the information can be
obtained without a cache miss.

For sparsemem the zone table seems to be have been fairly large based on
the maximum possible number of sections and the number of zones per node.
There is some memory saving by removing zone_table.  The main benefit is to
reduce the cache foootprint of the VM from the frequent lookups of zones.
Plus it simplifies the page allocator.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The zone table is mostly not needed.  If we have a node in the page flags
then we can get to the zone via NODE_DATA() which is much more likely to be
already in the cpu cache.

In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
access an exact replica of zonetable in the node_zones field.  In all of
the above cases there will be no need at all for the zone table.

The only remaining case is if in a NUMA system the node numbers do not fit
into the page flags.  In that case we make sparse generate a table that
maps sections to nodes and use that table to to figure out the node number.
 This table is sized to fit in a single cache line for the known 32 bit
NUMA platform which makes it very likely that the information can be
obtained without a cache miss.

For sparsemem the zone table seems to be have been fairly large based on
the maximum possible number of sections and the number of zones per node.
There is some memory saving by removing zone_table.  The main benefit is to
reduce the cache foootprint of the VM from the frequent lookups of zones.
Plus it simplifies the page allocator.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] memory hotplug: __GFP_NOWARN is better for __kmalloc_section_memmap()</title>
<updated>2006-10-28T18:30:52+00:00</updated>
<author>
<name>Yasunori Goto</name>
<email>y-goto@jp.fujitsu.com</email>
</author>
<published>2006-10-28T17:38:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f2d0aa5bf8d4f7ae4cb1a7feebf5b1afddd0b9b0'/>
<id>f2d0aa5bf8d4f7ae4cb1a7feebf5b1afddd0b9b0</id>
<content type='text'>
Add __GFP_NOWARN flag to calling of __alloc_pages() in
__kmalloc_section_memmap().  It can reduce noisy failure message.

In ia64, section size is 1 GB, this means that order 8 pages are necessary
for each section's memmap.  It is often very hard requirement under heavy
memory pressure as you know.  So, __alloc_pages() gives up allocation and
shows many noisy stack traces which means no page for each sections.
(Current my environment shows 32 times of stack trace....)

But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
can succeed allocation of memmap.  So, its stack trace warning becomes just
noisy.  I suppose it shouldn't be shown.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add __GFP_NOWARN flag to calling of __alloc_pages() in
__kmalloc_section_memmap().  It can reduce noisy failure message.

In ia64, section size is 1 GB, this means that order 8 pages are necessary
for each section's memmap.  It is often very hard requirement under heavy
memory pressure as you know.  So, __alloc_pages() gives up allocation and
shows many noisy stack traces which means no page for each sections.
(Current my environment shows 32 times of stack trace....)

But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
can succeed allocation of memmap.  So, its stack trace warning becomes just
noisy.  I suppose it shouldn't be shown.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove obsolete #include &lt;linux/config.h&gt;</title>
<updated>2006-06-30T17:25:36+00:00</updated>
<author>
<name>Jörn Engel</name>
<email>joern@wohnheim.fh-wedel.de</email>
</author>
<published>2006-06-30T17:25:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6ab3d5624e172c553004ecc862bfeac16d9d68b7'/>
<id>6ab3d5624e172c553004ecc862bfeac16d9d68b7</id>
<content type='text'>
Signed-off-by: Jörn Engel &lt;joern@wohnheim.fh-wedel.de&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Jörn Engel &lt;joern@wohnheim.fh-wedel.de&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] spin/rwlock init cleanups</title>
<updated>2006-06-28T00:32:39+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2006-06-27T09:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=34af946a22724c4e2b204957f2b24b22a0fb121c'/>
<id>34af946a22724c4e2b204957f2b24b22a0fb121c</id>
<content type='text'>
locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

 - cleanliness
 - lockdep needs control of lock initialization, which the open-coded
   variants do not give
 - it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

 - cleanliness
 - lockdep needs control of lock initialization, which the open-coded
   variants do not give
 - it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] sparsemem: record nid during memory present</title>
<updated>2006-06-23T14:42:51+00:00</updated>
<author>
<name>Andy Whitcroft</name>
<email>apw@shadowen.org</email>
</author>
<published>2006-06-23T09:03:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=30c253e6da655d73eb8bfe2adca9b8f4d82fb81e'/>
<id>30c253e6da655d73eb8bfe2adca9b8f4d82fb81e</id>
<content type='text'>
Record the node id as we mark sections for instantiation.  Use this nid
during instantiation to direct allocations.

Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Bob Picco &lt;bob.picco@hp.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Martin Bligh &lt;mbligh@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Record the node id as we mark sections for instantiation.  Use this nid
during instantiation to direct allocations.

Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Bob Picco &lt;bob.picco@hp.com&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Martin Bligh &lt;mbligh@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] SPARSEMEM incorrectly calculates section number</title>
<updated>2006-05-21T19:59:17+00:00</updated>
<author>
<name>Mike Kravetz</name>
<email>kravetz@us.ibm.com</email>
</author>
<published>2006-05-20T22:00:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=12783b002db1f02c29353c8f698a85514420b9f4'/>
<id>12783b002db1f02c29353c8f698a85514420b9f4</id>
<content type='text'>
A bad calculation/loop in __section_nr() could result in incorrect section
information being put into sysfs memory entries.  This primarily impacts
memory add operations as the sysfs information is used while onlining new
memory.

Fix suggested by Dave Hansen.

Note that the bug may not be obvious from the patch.  It actually occurs in
the function's return statement:

	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);

In the existing code, root_nr has already been multiplied by
SECTIONS_PER_ROOT.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A bad calculation/loop in __section_nr() could result in incorrect section
information being put into sysfs memory entries.  This primarily impacts
memory add operations as the sysfs information is used while onlining new
memory.

Fix suggested by Dave Hansen.

Note that the bug may not be obvious from the patch.  It actually occurs in
the function's return statement:

	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);

In the existing code, root_nr has already been multiplied by
SECTIONS_PER_ROOT.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] add slab_is_available() routine for boot code</title>
<updated>2006-05-15T18:20:56+00:00</updated>
<author>
<name>Mike Kravetz</name>
<email>kravetz@us.ibm.com</email>
</author>
<published>2006-05-15T16:44:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=39d24e64263cd3211705d3b61ea4171c65030921'/>
<id>39d24e64263cd3211705d3b61ea4171c65030921</id>
<content type='text'>
slab_is_available() indicates slab based allocators are available for use.
SPARSEMEM code needs to know this as it can be called at various times
during the boot process.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
slab_is_available() indicates slab based allocators are available for use.
SPARSEMEM code needs to know this as it can be called at various times
during the boot process.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] sparsemem interaction with memory add bug fixes</title>
<updated>2006-05-02T01:17:46+00:00</updated>
<author>
<name>Mike Kravetz</name>
<email>mjkravetz@verizon.net</email>
</author>
<published>2006-05-01T19:16:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=46a66eecdf7bc12562ecb492297447ed0e1ecf59'/>
<id>46a66eecdf7bc12562ecb492297447ed0e1ecf59</id>
<content type='text'>
This patch fixes two bugs with the way sparsemem interacts with memory add.
They are:

- memory leak if memmap for section already exists

- calling alloc_bootmem_node() after boot

These bugs were discovered and a first cut at the fixes were provided by
Arnd Bergmann &lt;arnd@arndb.de&gt; and Joel Schopp &lt;jschopp@us.ibm.com&gt;.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Joel Schopp &lt;jschopp@austin.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fixes two bugs with the way sparsemem interacts with memory add.
They are:

- memory leak if memmap for section already exists

- calling alloc_bootmem_node() after boot

These bugs were discovered and a first cut at the fixes were provided by
Arnd Bergmann &lt;arnd@arndb.de&gt; and Joel Schopp &lt;jschopp@us.ibm.com&gt;.

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Joel Schopp &lt;jschopp@austin.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
