linux-stable.git/mm, branch linux-2.6.20.y

[PATCH] do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY

2007-08-25T15:24:13+00:00

Fix a bug in mm/mlock.c on 32-bit architectures that prevents a user from
locking more than 4GB of shared memory, or allocating more than 4GB of
shared memory in hugepages, when rlim[RLIMIT_MEMLOCK] is set to
RLIM_INFINITY.

Signed-off-by: Herbert van den Bergh 
Acked-by: Chris Mason 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

[PATCH] hugetlb: fix race in alloc_fresh_huge_page()

2007-08-25T15:24:13+00:00

That static `nid' index needs locking.  Without it we can end up calling
alloc_pages_node() with an illegal node ID and the kernel crashes.

Acked-by: Gurudas Pai 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

[PATCH] readahead: MIN_RA_PAGES/MAX_RA_PAGES macros

2007-08-25T15:24:10+00:00

Define two convenient macros for read-ahead:
	- MAX_RA_PAGES: rounded down counterpart of VM_MAX_READAHEAD
	- MIN_RA_PAGES: rounded _up_ counterpart of VM_MIN_READAHEAD

Note that the rounded up MIN_RA_PAGES will work flawlessly with _large_
page sizes like 64k.

Signed-off-by: Fengguang Wu 
Cc: Steven Pratt 
Cc: Ram Pai 
Cc: Rusty Russell 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

[PATCH] mm: kill validate_anon_vma to avoid mapcount BUG

2007-08-15T08:02:33+00:00

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9
itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite
loop.  We could raise it to something enormous (depending on sizeof struct
vma and size of memory?); but I rather think validate_anon_vma has outlived
its usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins 
Cc: Petr Vandrovec 
Acked-by: Nick Piggin 
Cc: Andrea Arcangeli 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] x86_64: allocate sparsemem memmap above 4G

2007-08-15T08:02:24+00:00

On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai 
Acked-by: Suresh Siddha 
Cc: Andi Kleen 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[chrisw: trivial backport]
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] s390: page_mkclean data corruption.

2007-06-11T18:37:10+00:00

The git commit c2fda5fed81eea077363b285b66eafce20dfd45a which
added the page_test_and_clear_dirty call to page_mkclean and the
git commit 7658cc289288b8ae7dd2c2224549a048431222b3 which fixes
the "nasty and subtle race in shared mmap'ed page writeback"
problem in clear_page_dirty_for_io cause data corruption on s390.

The effect of the two changes is that for every call to
clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
the per page dirty bit is set set_page_dirty is called. Strangly
clear_page_dirty_for_io is called for not-uptodate pages, e.g.
over this call-chain:

[<000000000007c0f2>] clear_page_dirty_for_io+0x12a/0x130
[<000000000007c494>] generic_writepages+0x258/0x3e0
[<000000000007c692>] do_writepages+0x76/0x7c
[<00000000000c7a26>] __writeback_single_inode+0xba/0x3e4
[<00000000000c831a>] sync_sb_inodes+0x23e/0x398
[<00000000000c8802>] writeback_inodes+0x12e/0x140
[<000000000007b9ee>] wb_kupdate+0xd2/0x178
[<000000000007cca2>] pdflush+0x162/0x23c

The bad news now is that page_test_and_clear_dirty might claim
that a not-uptodate page is dirty since SetPageUptodate which
resets the per page dirty bit has not yet been called. The page
writeback that follows clobbers the data on disk.

The simplest solution to this problem is to move the call to
page_test_and_clear_dirty under the "if (page_mapped(page))".
If a file backed page is mapped it is uptodate.

Signed-off-by: Martin Schwidefsky 
Signed-off-by: Heiko Carstens 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Chris Wright

[PATCH] oom: kill all threads that share mm with killed task

2007-06-11T18:37:08+00:00

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584)

Acked-by: William Irwin 
Acked-by: Christoph Lameter 
Cc: Nick Piggin 
Cc: Andrew Morton 
Cc: Andi Kleen 
Signed-off-by: David Rientjes 
Signed-off-by: Linus Torvalds 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

page migration: fix NR_FILE_PAGES accounting

2007-05-02T00:06:01+00:00

NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to.  If we replace the page in the radix tree then we may have to
shift the count to another zone.

Suggested-by: Ethan Solomita 
Cc: Martin Bligh 
Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

fix OOM killing processes wrongly thought MPOL_BIND

2007-05-02T00:06:00+00:00

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)".  memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins 
Acked-by: Christoph Lameter 
Acked-by: William Irwin 
Acked-by: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

holepunch: fix mmap_sem i_mutex deadlock

2007-05-02T00:05:56+00:00

sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.

madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise.  (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)

Signed-off-by: Hugh Dickins 
Signed-off-by: Greg Kroah-Hartman