linux.git/mm/mlock.c, branch v3.9

Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"

2013-03-29T00:45:51+00:00

This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.

Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.

Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.

Signed-off-by: Michel Lespinasse 
Signed-off-by: Hugh Dickins 
Signed-off-by: Linus Torvalds

mm: accelerate munlock() treatment of THP pages

2013-02-28T03:10:09+00:00

munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE
at a time.  When munlocking THP pages (or the huge zero page), this
resulted in taking the mm->page_table_lock 512 times in a row.

We can do better by making use of the page_mask returned by
follow_page_mask (for the huge zero page case), or the size of the page
munlock_vma_page() operated on (for the true THP page case).

Signed-off-by: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: use long type for page counts in mm_populate() and get_user_pages()

2013-02-24T01:50:22+00:00

Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main(void) {
  void *p = mmap(NULL, 0x100000000000, PROT_READ,
                 MAP_PRIVATE | MAP_ANON, -1, 0);
  printf("p: %p\n", p);
  mlockall(MCL_CURRENT);
  printf("done\n");
  return 0;
}

Signed-off-by: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/mlock.c: document scary-looking stack expansion mlock chain

2013-02-24T01:50:20+00:00

The fact that mlock calls get_user_pages, and get_user_pages might call
mlock when expanding a stack looks like a potential recursion.

However, mlock makes sure the requested range is already contained
within a vma, so no stack expansion will actually happen from mlock.

Should this ever change: the stack expansion mlocks only the newly
expanded range and so will not result in recursive expansion.

Signed-off-by: Johannes Weiner 
Reported-by: Al Viro 
Cc: Hugh Dickins 
Acked-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: introduce VM_POPULATE flag to better deal with racy userspace programs

2013-02-24T01:50:11+00:00

The vm_populate() code populates user mappings without constantly
holding the mmap_sem.  This makes it susceptible to racy userspace
programs: the user mappings may change while vm_populate() is running,
and in this case vm_populate() may end up populating the new mapping
instead of the old one.

In order to reduce the possibility of userspace getting surprised by
this behavior, this change introduces the VM_POPULATE vma flag which
gets set on vmas we want vm_populate() to work on.  This way
vm_populate() may still end up populating the new mapping after such a
race, but only if the new mapping is also one that the user has
requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

Signed-off-by: Michel Lespinasse 
Acked-by: Rik van Riel 
Tested-by: Andy Lutomirski 
Cc: Greg Ungerer 
Cc: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: directly use __mlock_vma_pages_range() in find_extend_vma()

2013-02-24T01:50:11+00:00

In find_extend_vma(), we don't need mlock_vma_pages_range() to verify
the vma type - we know we're working with a stack.  So, we can call
directly into __mlock_vma_pages_range(), and remove the last
make_pages_present() call site.

Note that we don't use mm_populate() here, so we can't release the
mmap_sem while allocating new stack pages.  This is deemed acceptable,
because the stack vmas grow by a bounded number of pages at a time, and
these are anon pages so we don't have to read from disk to populate
them.

Signed-off-by: Michel Lespinasse 
Acked-by: Rik van Riel 
Tested-by: Andy Lutomirski 
Cc: Greg Ungerer 
Cc: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: introduce mm_populate() for populating new vmas

2013-02-24T01:50:10+00:00

When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas.  This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.

This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() /
mlockall().

Signed-off-by: Michel Lespinasse 
Reported-by: Andy Lutomirski 
Acked-by: Rik van Riel 
Tested-by: Andy Lutomirski 
Cc: Greg Ungerer 
Cc: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: don't overwrite mm->def_flags in do_mlockall()

2013-02-12T22:34:00+00:00

With commit 8e72033f2a48 ("thp: make MADV_HUGEPAGE check for
mm->def_flags") the VM_NOHUGEPAGE flag may be set on s390 in
mm->def_flags for certain processes, to prevent future thp mappings.
This would be overwritten by do_mlockall(), which sets it back to 0 with
an optional VM_LOCKED flag set.

To fix this, instead of overwriting mm->def_flags in do_mlockall(), only
the VM_LOCKED flag should be set or cleared.

Signed-off-by: Gerald Schaefer 
Reported-by: Vivek Goyal 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, thp: fix mlock statistics

2012-10-09T07:23:03+00:00

NR_MLOCK is only accounted in single page units: there's no logic to
handle transparent hugepages.  This patch checks the appropriate number of
pages to adjust the statistics by so that the correct amount of memory is
reflected.

Currently:

		$ grep Mlocked /proc/meminfo
		Mlocked:           19636 kB

	#define MAP_SIZE	(4 << 30)	/* 4GB */

	void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
			 MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	mlock(ptr, MAP_SIZE);

		$ grep Mlocked /proc/meminfo
		Mlocked:           29844 kB

	munlock(ptr, MAP_SIZE);

		$ grep Mlocked /proc/meminfo
		Mlocked:           19636 kB

And with this patch:

		$ grep Mlock /proc/meminfo
		Mlocked:           19636 kB

	mlock(ptr, MAP_SIZE);

		$ grep Mlock /proc/meminfo
		Mlocked:         4213664 kB

	munlock(ptr, MAP_SIZE);

		$ grep Mlock /proc/meminfo
		Mlocked:           19636 kB

Signed-off-by: David Rientjes 
Reported-by: Hugh Dickens 
Acked-by: Hugh Dickins 
Reviewed-by: Andrea Arcangeli 
Cc: Naoya Horiguchi 
Cc: KAMEZAWA Hiroyuki 
Cc: Johannes Weiner 
Reviewed-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: use clear_page_mlock() in page_remove_rmap()

2012-10-09T07:22:56+00:00

We had thought that pages could no longer get freed while still marked as
mlocked; but Johannes Weiner posted this program to demonstrate that
truncating an mlocked private file mapping containing COWed pages is still
mishandled:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(void)
{
	char *map;
	int fd;

	system("grep mlockfreed /proc/vmstat");
	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
	unlink("chigurh");
	ftruncate(fd, 4096);
	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
	map[0] = 11;
	mlock(map, sizeof(fd));
	ftruncate(fd, 0);
	close(fd);
	munlock(map, sizeof(fd));
	munmap(map, 4096);
	system("grep mlockfreed /proc/vmstat");
	return 0;
}

The anon COWed pages are not caught by truncation's clear_page_mlock() of
the pagecache pages; but unmap_mapping_range() unmaps them, so we ought to
look out for them there in page_remove_rmap().  Indeed, why should
truncation or invalidation be doing the clear_page_mlock() when removing
from pagecache?  mlock is a property of mapping in userspace, not a
property of pagecache: an mlocked unmapped page is nonsensical.

Reported-by: Johannes Weiner 
Signed-off-by: Hugh Dickins 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Michel Lespinasse 
Cc: Ying Han 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds