<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/memory.c, branch linux-2.6.32.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: avoid setting up anonymous pages into file mapping</title>
<updated>2015-09-18T11:52:16+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-07-06T20:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f85457f8be43e6fb2796d4728cc01806415acc51'/>
<id>f85457f8be43e6fb2796d4728cc01806415acc51</id>
<content type='text'>
commit 6b7339f4c31ad69c8e9c0b2859276e22cf72176d upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops-&gt;fault() and the VMA wasn't fully populated
on -&gt;mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (-&gt;vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops-&gt;fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
(cherry picked from commit e2506476534cff7bb3697fbe0654fdefd101bc80)

Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6b7339f4c31ad69c8e9c0b2859276e22cf72176d upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops-&gt;fault() and the VMA wasn't fully populated
on -&gt;mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (-&gt;vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops-&gt;fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
(cherry picked from commit e2506476534cff7bb3697fbe0654fdefd101bc80)

Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vm: add vm_iomap_memory() helper function</title>
<updated>2014-05-19T05:54:22+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-04-16T20:45:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dc734909be2b9689c3ef84a36c24832b32041d80'/>
<id>dc734909be2b9689c3ef84a36c24832b32041d80</id>
<content type='text'>
Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
(cherry picked from commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e)
[WT: only needed in 2.6.32 for next commit]
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
(cherry picked from commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e)
[WT: only needed in 2.6.32 for next commit]
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: prevent concurrent unmap_mapping_range() on the same inode</title>
<updated>2011-07-13T03:29:27+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2011-02-23T12:49:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ffdd12eabed263a487ddc05fdf65be6e4fc717b4'/>
<id>ffdd12eabed263a487ddc05fdf65be6e4fc717b4</id>
<content type='text'>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>guard page for stacks that grow upwards</title>
<updated>2010-09-27T00:21:34+00:00</updated>
<author>
<name>Luck, Tony</name>
<email>tony.luck@intel.com</email>
</author>
<published>2010-08-24T18:44:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c837b58c0ea48760cc83a2e3ffddd92ac88dd156'/>
<id>c837b58c0ea48760cc83a2e3ffddd92ac88dd156</id>
<content type='text'>
commit 8ca3eb08097f6839b2206e2242db4179aee3cfb3 upstream.

pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: dann frazier &lt;dannf@debian.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8ca3eb08097f6839b2206e2242db4179aee3cfb3 upstream.

pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: dann frazier &lt;dannf@debian.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: make stack guard page logic use vm_prev pointer</title>
<updated>2010-08-26T23:41:45+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-20T23:49:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=42fd8fdce2d33d0d84ce850b4d63a2385b8d889a'/>
<id>42fd8fdce2d33d0d84ce850b4d63a2385b8d889a</id>
<content type='text'>
commit 0e8e50e20c837eeec8323bba7dcd25fe5479194c upstream.

Like the mlock() change previously, this makes the stack guard check
code use vma-&gt;vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().

Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.

Tested-by: Ian Campbell &lt;ijc@hellion.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0e8e50e20c837eeec8323bba7dcd25fe5479194c upstream.

Like the mlock() change previously, this makes the stack guard check
code use vma-&gt;vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().

Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.

Tested-by: Ian Campbell &lt;ijc@hellion.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix page table unmap for stack guard page properly</title>
<updated>2010-08-20T18:34:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-14T18:44:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=058daedc8311ab42702dfe29d3ff16dff7e7eaf8'/>
<id>058daedc8311ab42702dfe29d3ff16dff7e7eaf8</id>
<content type='text'>
commit 11ac552477e32835cb6970bf0a70c210807f5673 upstream.

We do in fact need to unmap the page table _before_ doing the whole
stack guard page logic, because if it is needed (mainly 32-bit x86 with
PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
will do a kmap_atomic/kunmap_atomic.

And those kmaps will create an atomic region that we cannot do
allocations in.  However, the whole stack expand code will need to do
anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
atomic region.

Now, a better model might actually be to do the anon_vma_prepare() when
_creating_ a VM_GROWSDOWN segment, and not have to worry about any of
this at page fault time.  But in the meantime, this is the
straightforward fix for the issue.

See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.

Reported-by: Wylda &lt;wylda@volny.cz&gt;
Reported-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Reported-by: Mike Pagano &lt;mpagano@gentoo.org&gt;
Reported-by: François Valenduc &lt;francois.valenduc@tvcablenet.be&gt;
Tested-by: Ed Tomlinson &lt;edt@aei.ca&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 11ac552477e32835cb6970bf0a70c210807f5673 upstream.

We do in fact need to unmap the page table _before_ doing the whole
stack guard page logic, because if it is needed (mainly 32-bit x86 with
PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
will do a kmap_atomic/kunmap_atomic.

And those kmaps will create an atomic region that we cannot do
allocations in.  However, the whole stack expand code will need to do
anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
atomic region.

Now, a better model might actually be to do the anon_vma_prepare() when
_creating_ a VM_GROWSDOWN segment, and not have to worry about any of
this at page fault time.  But in the meantime, this is the
straightforward fix for the issue.

See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.

Reported-by: Wylda &lt;wylda@volny.cz&gt;
Reported-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Reported-by: Mike Pagano &lt;mpagano@gentoo.org&gt;
Reported-by: François Valenduc &lt;francois.valenduc@tvcablenet.be&gt;
Tested-by: Ed Tomlinson &lt;edt@aei.ca&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix missing page table unmap for stack guard page failure case</title>
<updated>2010-08-13T20:20:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-13T16:24:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ab832422673d1774c4ce3941f2ac87743d73bded'/>
<id>ab832422673d1774c4ce3941f2ac87743d73bded</id>
<content type='text'>
commit 5528f9132cf65d4d892bcbc5684c61e7822b21e9 upstream.

.. which didn't show up in my tests because it's a no-op on x86-64 and
most other architectures.  But we enter the function with the last-level
page table mapped, and should unmap it at exit.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5528f9132cf65d4d892bcbc5684c61e7822b21e9 upstream.

.. which didn't show up in my tests because it's a no-op on x86-64 and
most other architectures.  But we enter the function with the last-level
page table mapped, and should unmap it at exit.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: keep a guard page below a grow-down stack segment</title>
<updated>2010-08-13T20:20:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-13T00:54:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7e281afe24330aeea86113ac241eabdac8ba2311'/>
<id>7e281afe24330aeea86113ac241eabdac8ba2311</id>
<content type='text'>
commit 320b2b8de12698082609ebbc1a17165727f4c893 upstream.

This is a rather minimally invasive patch to solve the problem of the
user stack growing into a memory mapped area below it.  Whenever we fill
the first page of the stack segment, expand the segment down by one
page.

Now, admittedly some odd application might _want_ the stack to grow down
into the preceding memory mapping, and so we may at some point need to
make this a process tunable (some people might also want to have more
than a single page of guarding), but let's try the minimal approach
first.

Tested with trivial application that maps a single page just below the
stack, and then starts recursing.  Without this, we will get a SIGSEGV
_after_ the stack has smashed the mapping.  With this patch, we'll get a
nice SIGBUS just as the stack touches the page just above the mapping.

Requested-by: Keith Packard &lt;keithp@keithp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 320b2b8de12698082609ebbc1a17165727f4c893 upstream.

This is a rather minimally invasive patch to solve the problem of the
user stack growing into a memory mapped area below it.  Whenever we fill
the first page of the stack segment, expand the segment down by one
page.

Now, admittedly some odd application might _want_ the stack to grow down
into the preceding memory mapping, and so we may at some point need to
make this a process tunable (some people might also want to have more
than a single page of guarding), but let's try the minimal approach
first.

Tested with trivial application that maps a single page just below the
stack, and then starts recursing.  Without this, we will get a SIGSEGV
_after_ the stack has smashed the mapping.  With this patch, we'll get a
nice SIGBUS just as the stack touches the page just above the mapping.

Requested-by: Keith Packard &lt;keithp@keithp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix ia64 crash when gcore reads gate area</title>
<updated>2010-08-10T17:20:35+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2010-07-30T17:58:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a03567653b63fc938c5179bed77a868e82f179ef'/>
<id>a03567653b63fc938c5179bed77a868e82f179ef</id>
<content type='text'>
commit de51257aa301652876ab6e8f13ea4eadbe4a3846 upstream.

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.

Reported-by: Andreas Barth &lt;aba@not.so.argh.org&gt;
Bisected-by: dann frazier &lt;dannf@debian.org&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Tested-by: dann frazier &lt;dannf@dannf.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit de51257aa301652876ab6e8f13ea4eadbe4a3846 upstream.

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.

Reported-by: Andreas Barth &lt;aba@not.so.argh.org&gt;
Bisected-by: dann frazier &lt;dannf@debian.org&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Tested-by: dann frazier &lt;dannf@dannf.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: sigbus instead of abusing oom</title>
<updated>2009-12-18T22:05:51+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh.dickins@tiscali.co.uk</email>
</author>
<published>2009-12-15T01:59:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1c723eb018de64d2159e9ce66ec6df0d1798e1e3'/>
<id>1c723eb018de64d2159e9ce66ec6df0d1798e1e3</id>
<content type='text'>
commit d99be1a8ecf377c2c9b3372d36411ad6547bbd4c upstream.

When do_nonlinear_fault() realizes that the page table must have been
corrupted for it to have been called, it does print_bad_pte() and returns
...  VM_FAULT_OOM, which is hard to understand.

It made some sense when I did it for 2.6.15, when do_page_fault() just
killed the current process; but nowadays it lets the OOM killer decide who
to kill - so page table corruption in one process would be liable to kill
another.

Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee that
the process will be killed, but is good enough for such a rare
abnormality, accompanied as it is by the "BUG: Bad page map" message.

And recent HWPOISON work has copied that code into do_swap_page(), when it
finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.

Signed-off-by: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Cc: Izik Eidus &lt;ieidus@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d99be1a8ecf377c2c9b3372d36411ad6547bbd4c upstream.

When do_nonlinear_fault() realizes that the page table must have been
corrupted for it to have been called, it does print_bad_pte() and returns
...  VM_FAULT_OOM, which is hard to understand.

It made some sense when I did it for 2.6.15, when do_page_fault() just
killed the current process; but nowadays it lets the OOM killer decide who
to kill - so page table corruption in one process would be liable to kill
another.

Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee that
the process will be killed, but is good enough for such a rare
abnormality, accompanied as it is by the "BUG: Bad page map" message.

And recent HWPOISON work has copied that code into do_swap_page(), when it
finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.

Signed-off-by: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Cc: Izik Eidus &lt;ieidus@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
