<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/sparse.c, branch linux-2.6.25.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: sparsemem memory_present() fix</title>
<updated>2008-04-16T02:30:19+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2008-04-15T23:40:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bead9a3abd15710b0bdfd418daef606722d86282'/>
<id>bead9a3abd15710b0bdfd418daef606722d86282</id>
<content type='text'>
Fix memory corruption and crash on 32-bit x86 systems.

If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of
RAM, then we call memory_present() with a start/end that goes outside
the scope of MAX_PHYSMEM_BITS.

That causes this loop to happily walk over the limit of the sparse
memory section map:

    for (pfn = start; pfn &lt; end; pfn += PAGES_PER_SECTION) {
                unsigned long section = pfn_to_section_nr(pfn);
                struct mem_section *ms;

                sparse_index_init(section, nid);
                set_section_nid(section, nid);

                ms = __nr_to_section(section);
                if (!ms-&gt;section_mem_map)
                        ms-&gt;section_mem_map = sparse_encode_early_nid(nid) |
			                                SECTION_MARKED_PRESENT;

'ms' will be out of bounds and we'll corrupt a small amount of memory by
encoding the node ID and writing SECTION_MARKED_PRESENT (==0x1) over it.

The corruption might happen when encoding a non-zero node ID, or due to
the SECTION_MARKED_PRESENT which is 0x1:

	mmzone.h:#define	SECTION_MARKED_PRESENT	(1UL&lt;&lt;0)

The fix is to sanity check anything the architecture passes to
sparsemem.

This bug seems to be rather old (as old as sparsemem support itself),
but the exact incarnation depended on random details like configs, which
made this bug more prominent in v2.6.25-to-be.

An additional enhancement might be to print a warning about ignored or
trimmed memory ranges.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Tested-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Cc: Yinghai Lu &lt;Yinghai.Lu@sun.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix memory corruption and crash on 32-bit x86 systems.

If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of
RAM, then we call memory_present() with a start/end that goes outside
the scope of MAX_PHYSMEM_BITS.

That causes this loop to happily walk over the limit of the sparse
memory section map:

    for (pfn = start; pfn &lt; end; pfn += PAGES_PER_SECTION) {
                unsigned long section = pfn_to_section_nr(pfn);
                struct mem_section *ms;

                sparse_index_init(section, nid);
                set_section_nid(section, nid);

                ms = __nr_to_section(section);
                if (!ms-&gt;section_mem_map)
                        ms-&gt;section_mem_map = sparse_encode_early_nid(nid) |
			                                SECTION_MARKED_PRESENT;

'ms' will be out of bounds and we'll corrupt a small amount of memory by
encoding the node ID and writing SECTION_MARKED_PRESENT (==0x1) over it.

The corruption might happen when encoding a non-zero node ID, or due to
the SECTION_MARKED_PRESENT which is 0x1:

	mmzone.h:#define	SECTION_MARKED_PRESENT	(1UL&lt;&lt;0)

The fix is to sanity check anything the architecture passes to
sparsemem.

This bug seems to be rather old (as old as sparsemem support itself),
but the exact incarnation depended on random details like configs, which
made this bug more prominent in v2.6.25-to-be.

An additional enhancement might be to print a warning about ignored or
trimmed memory ranges.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Tested-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Cc: Yinghai Lu &lt;Yinghai.Lu@sun.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix section mismatch warning in sparse.c</title>
<updated>2008-02-05T17:44:19+00:00</updated>
<author>
<name>Sam Ravnborg</name>
<email>sam@ravnborg.org</email>
</author>
<published>2008-02-05T06:29:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a322f8ab66f50b6c0dcdb59abae84fede7a5fded'/>
<id>a322f8ab66f50b6c0dcdb59abae84fede7a5fded</id>
<content type='text'>
Fix following warning:
WARNING: mm/built-in.o(.text+0x22069): Section mismatch in reference from the function sparse_early_usemap_alloc() to the function .init.text:__alloc_bootmem_node()

static sparse_early_usemap_alloc() were used only by sparse_init()
and with sparse_init() annotated _init it is safe to
annotate sparse_early_usemap_alloc with __init too.

Signed-off-by: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix following warning:
WARNING: mm/built-in.o(.text+0x22069): Section mismatch in reference from the function sparse_early_usemap_alloc() to the function .init.text:__alloc_bootmem_node()

static sparse_early_usemap_alloc() were used only by sparse_init()
and with sparse_init() annotated _init it is safe to
annotate sparse_early_usemap_alloc with __init too.

Signed-off-by: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>is_vmalloc_addr(): Check if an address is within the vmalloc boundaries</title>
<updated>2008-02-05T17:44:14+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2008-02-05T06:28:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9e2779fa281cfda13ac060753d674bbcaa23367e'/>
<id>9e2779fa281cfda13ac060753d674bbcaa23367e</id>
<content type='text'>
Checking if an address is a vmalloc address is done in a couple of places.
Define a common version in mm.h and replace the other checks.

Again the include structures suck.  The definition of VMALLOC_START and
VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
there.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Checking if an address is a vmalloc address is done in a couple of places.
Define a common version in mm.h and replace the other checks.

Again the include structures suck.  The definition of VMALLOC_START and
VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
there.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/sparse.c: improve the error handling for sparse_add_one_section()</title>
<updated>2007-12-18T03:28:16+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2007-12-18T00:19:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bbd0682596f7a434467ee551fee18d5f0b818539'/>
<id>bbd0682596f7a434467ee551fee18d5f0b818539</id>
<content type='text'>
Improve the error handling for mm/sparse.c::sparse_add_one_section().  And I
see no reason to check 'usemap' until holding the 'pgdat_resize_lock'.

[geoffrey.levand@am.sony.com: sparse_index_init() returns -EEXIST]
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Acked-by: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Geoff Levand &lt;geoffrey.levand@am.sony.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Improve the error handling for mm/sparse.c::sparse_add_one_section().  And I
see no reason to check 'usemap' until holding the 'pgdat_resize_lock'.

[geoffrey.levand@am.sony.com: sparse_index_init() returns -EEXIST]
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Acked-by: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Geoff Levand &lt;geoffrey.levand@am.sony.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/sparse.c: check the return value of sparse_index_alloc()</title>
<updated>2007-12-18T03:28:16+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2007-12-18T00:19:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=af0cd5a7c3cded50c25e98acd94912d17a0eb914'/>
<id>af0cd5a7c3cded50c25e98acd94912d17a0eb914</id>
<content type='text'>
Since sparse_index_alloc() can return NULL on memory allocation failure,
we must deal with the failure condition when calling it.

Signed-off-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since sparse_index_alloc() can return NULL on memory allocation failure,
we must deal with the failure condition when calling it.

Signed-off-by: WANG Cong &lt;xiyou.wangcong@gmail.com&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "x86_64: allocate sparsemem memmap above 4G"</title>
<updated>2007-10-29T21:05:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2007-10-29T18:36:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a22c57b8d2a62dea7280a6b2ac807a539ef0716'/>
<id>6a22c57b8d2a62dea7280a6b2ac807a539ef0716</id>
<content type='text'>
This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6.

First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes.  So the
commit will likely be reverted in the 2.6.23 stable kernels.

Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Tested-by: Martin Ebourne &lt;fedora@ebourne.me.uk&gt;
Cc: Zou Nan hai &lt;nanhai.zou@intel.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6.

First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes.  So the
commit will likely be reverted in the 2.6.23 stable kernels.

Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Tested-by: Martin Ebourne &lt;fedora@ebourne.me.uk&gt;
Cc: Zou Nan hai &lt;nanhai.zou@intel.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memory hotplug: Hot-add with sparsemem-vmemmap</title>
<updated>2007-10-16T16:43:02+00:00</updated>
<author>
<name>Yasunori Goto</name>
<email>y-goto@jp.fujitsu.com</email>
</author>
<published>2007-10-16T08:26:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=98f3cfc1dc7a53b629d43b7844a9b3f786213048'/>
<id>98f3cfc1dc7a53b629d43b7844a9b3f786213048</id>
<content type='text'>
This patch is to avoid panic when memory hot-add is executed with
sparsemem-vmemmap.  Current vmemmap-sparsemem code doesn't support memory
hot-add.  Vmemmap must be populated when hot-add.  This is for
2.6.23-rc2-mm2.

Todo: # Even if this patch is applied, the message "[xxxx-xxxx] potential
        offnode page_structs" is displayed. To allocate memmap on its node,
        memmap (and pgdat) must be initialized itself like chicken and
        egg relationship.

      # vmemmap_unpopulate will be necessary for followings.
         - For cancel hot-add due to error.
         - For unplug.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch is to avoid panic when memory hot-add is executed with
sparsemem-vmemmap.  Current vmemmap-sparsemem code doesn't support memory
hot-add.  Vmemmap must be populated when hot-add.  This is for
2.6.23-rc2-mm2.

Todo: # Even if this patch is applied, the message "[xxxx-xxxx] potential
        offnode page_structs" is displayed. To allocate memmap on its node,
        memmap (and pgdat) must be initialized itself like chicken and
        egg relationship.

      # vmemmap_unpopulate will be necessary for followings.
         - For cancel hot-add due to error.
         - For unplug.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2</title>
<updated>2007-10-16T16:43:00+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mel@csn.ul.ie</email>
</author>
<published>2007-10-16T08:25:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5c0e3066474b57c56ff0d88ca31d95bd14232fee'/>
<id>5c0e3066474b57c56ff0d88ca31d95bd14232fee</id>
<content type='text'>
There are problems in the use of SPARSEMEM and pageblock flags that causes
problems on ia64.

The first part of the problem is that units are incorrect in
SECTION_BLOCKFLAGS_BITS computation.  This results in a map_section's
section_mem_map being treated as part of a bitmap which isn't good.  This
was evident with an invalid virtual address when mem_init attempted to free
bootmem pages while relinquishing control from the bootmem allocator.

The second part of the problem occurs because the pageblock flags bitmap is
be located with the mem_section.  The SECTIONS_PER_ROOT computation using
sizeof (mem_section) may not be a power of 2 depending on the size of the
bitmap.  This renders masks and other such things not power of 2 base.
This issue was seen with SPARSEMEM_EXTREME on ia64.  This patch moves the
bitmap outside of mem_section and uses a pointer instead in the
mem_section.  The bitmaps are allocated when the section is being
initialised.

Note that sparse_early_usemap_alloc() does not use alloc_remap() like
sparse_early_mem_map_alloc().  The allocation required for the bitmap on
x86, the only architecture that uses alloc_remap is typically smaller than
a cache line.  alloc_remap() pads out allocations to the cache size which
would be a needless waste.

Credit to Bob Picco for identifying the original problem and effecting a
fix for the SECTION_BLOCKFLAGS_BITS calculation.  Credit to Andy Whitcroft
for devising the best way of allocating the bitmaps only when required for
the section.

[wli@holomorphy.com: warning fix]
Signed-off-by: Bob Picco &lt;bob.picco@hp.com&gt;
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Signed-off-by: William Irwin &lt;bill.irwin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are problems in the use of SPARSEMEM and pageblock flags that causes
problems on ia64.

The first part of the problem is that units are incorrect in
SECTION_BLOCKFLAGS_BITS computation.  This results in a map_section's
section_mem_map being treated as part of a bitmap which isn't good.  This
was evident with an invalid virtual address when mem_init attempted to free
bootmem pages while relinquishing control from the bootmem allocator.

The second part of the problem occurs because the pageblock flags bitmap is
be located with the mem_section.  The SECTIONS_PER_ROOT computation using
sizeof (mem_section) may not be a power of 2 depending on the size of the
bitmap.  This renders masks and other such things not power of 2 base.
This issue was seen with SPARSEMEM_EXTREME on ia64.  This patch moves the
bitmap outside of mem_section and uses a pointer instead in the
mem_section.  The bitmaps are allocated when the section is being
initialised.

Note that sparse_early_usemap_alloc() does not use alloc_remap() like
sparse_early_mem_map_alloc().  The allocation required for the bitmap on
x86, the only architecture that uses alloc_remap is typically smaller than
a cache line.  alloc_remap() pads out allocations to the cache size which
would be a needless waste.

Credit to Bob Picco for identifying the original problem and effecting a
fix for the SECTION_BLOCKFLAGS_BITS calculation.  Credit to Andy Whitcroft
for devising the best way of allocating the bitmaps only when required for
the section.

[wli@holomorphy.com: warning fix]
Signed-off-by: Bob Picco &lt;bob.picco@hp.com&gt;
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Signed-off-by: William Irwin &lt;bill.irwin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Generic Virtual Memmap support for SPARSEMEM</title>
<updated>2007-10-16T16:42:51+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2007-10-16T08:24:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8f6aac419bd590f535fb110875a51f7db2b62b5b'/>
<id>8f6aac419bd590f535fb110875a51f7db2b62b5b</id>
<content type='text'>
SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
the arches.  It would be great if it could be the default so that we can get
rid of various forms of DISCONTIG and other variations on memory maps.  So far
what has hindered this are the additional lookups that SPARSEMEM introduces
for virt_to_page and page_address.  This goes so far that the code to do this
has to be kept in a separate function and cannot be used inline.

This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
is mapped into a virtually contigious area, only the active sections are
physically backed.  This allows virt_to_page page_address and cohorts become
simple shift/add operations.  No page flag fields, no table lookups, nothing
involving memory is required.

The two key operations pfn_to_page and page_to_page become:

   #define __pfn_to_page(pfn)      (vmemmap + (pfn))
   #define __page_to_pfn(page)     ((page) - vmemmap)

By having a virtual mapping for the memmap we allow simple access without
wasting physical memory.  As kernel memory is typically already mapped 1:1
this introduces no additional overhead.  The virtual mapping must be big
enough to allow a struct page to be allocated and mapped for all valid
physical pages.  This vill make a virtual memmap difficult to use on 32 bit
platforms that support 36 address bits.

However, if there is enough virtual space available and the arch already maps
its 1-1 kernel space using TLBs (f.e.  true of IA64 and x86_64) then this
technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
FLATMEM needs to read the contents of the mem_map variable to get the start of
the memmap and then add the offset to the required entry.  vmemmap is a
constant to which we can simply add the offset.

This patch has the potential to allow us to make SPARSMEM the default (and
even the only) option for most systems.  It should be optimal on UP, SMP and
NUMA on most platforms.  Then we may even be able to remove the other memory
models: FLATMEM, DISCONTIG etc.

[apw@shadowen.org: config cleanups, resplit code etc]
[kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init]
[apw@shadowen.org: vmemmap: remove excess debugging]
[apw@shadowen.org: simplify initialisation code and reduce duplication]
[apw@shadowen.org: pull out the vmemmap code into its own file]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
the arches.  It would be great if it could be the default so that we can get
rid of various forms of DISCONTIG and other variations on memory maps.  So far
what has hindered this are the additional lookups that SPARSEMEM introduces
for virt_to_page and page_address.  This goes so far that the code to do this
has to be kept in a separate function and cannot be used inline.

This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
is mapped into a virtually contigious area, only the active sections are
physically backed.  This allows virt_to_page page_address and cohorts become
simple shift/add operations.  No page flag fields, no table lookups, nothing
involving memory is required.

The two key operations pfn_to_page and page_to_page become:

   #define __pfn_to_page(pfn)      (vmemmap + (pfn))
   #define __page_to_pfn(page)     ((page) - vmemmap)

By having a virtual mapping for the memmap we allow simple access without
wasting physical memory.  As kernel memory is typically already mapped 1:1
this introduces no additional overhead.  The virtual mapping must be big
enough to allow a struct page to be allocated and mapped for all valid
physical pages.  This vill make a virtual memmap difficult to use on 32 bit
platforms that support 36 address bits.

However, if there is enough virtual space available and the arch already maps
its 1-1 kernel space using TLBs (f.e.  true of IA64 and x86_64) then this
technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
FLATMEM needs to read the contents of the mem_map variable to get the start of
the memmap and then add the offset to the required entry.  vmemmap is a
constant to which we can simply add the offset.

This patch has the potential to allow us to make SPARSMEM the default (and
even the only) option for most systems.  It should be optimal on UP, SMP and
NUMA on most platforms.  Then we may even be able to remove the other memory
models: FLATMEM, DISCONTIG etc.

[apw@shadowen.org: config cleanups, resplit code etc]
[kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init]
[apw@shadowen.org: vmemmap: remove excess debugging]
[apw@shadowen.org: simplify initialisation code and reduce duplication]
[apw@shadowen.org: pull out the vmemmap code into its own file]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sparsemem: record when a section has a valid mem_map</title>
<updated>2007-10-16T16:42:51+00:00</updated>
<author>
<name>Andy Whitcroft</name>
<email>apw@shadowen.org</email>
</author>
<published>2007-10-16T08:24:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=540557b9439ec19668553830c90222f9fb0c2e95'/>
<id>540557b9439ec19668553830c90222f9fb0c2e95</id>
<content type='text'>
We have flags to indicate whether a section actually has a valid mem_map
associated with it.  This is never set and we rely solely on the present bit
to indicate a section is valid.  By definition a section is not valid if it
has no mem_map and there is a window during init where the present bit is set
but there is no mem_map, during which pfn_valid() will return true
incorrectly.

Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid
mem_map.  Switch valid_section{,_nr} and pfn_valid() to this bit.  Add a new
present_section{,_nr} and pfn_present() interfaces for those users who care to
know that a section is going to be valid.

[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have flags to indicate whether a section actually has a valid mem_map
associated with it.  This is never set and we rely solely on the present bit
to indicate a section is valid.  By definition a section is not valid if it
has no mem_map and there is a window during init where the present bit is set
but there is no mem_map, during which pfn_valid() will return true
incorrectly.

Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid
mem_map.  Switch valid_section{,_nr} and pfn_valid() to this bit.  Add a new
present_section{,_nr} and pfn_present() interfaces for those users who care to
know that a section is going to be valid.

[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Andy Whitcroft &lt;apw@shadowen.org&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
