<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/x86/include/asm, branch linux-3.12.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>x86/vdso: Plug race between mapping and ELF header setup</title>
<updated>2017-04-28T17:30:42+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2017-04-10T15:14:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3ebb1b606820857986b4310e5012f78352478774'/>
<id>3ebb1b606820857986b4310e5012f78352478774</id>
<content type='text'>
commit 6fdc6dd90272ce7e75d744f71535cfbd8d77da81 upstream.

The vsyscall32 sysctl can racy against a concurrent fork when it switches
from disabled to enabled:

    arch_setup_additional_pages()
	if (vdso32_enabled)
           --&gt; No mapping
                                        sysctl.vsysscall32()
                                          --&gt; vdso32_enabled = true
    create_elf_tables()
      ARCH_DLINFO_IA32
        if (vdso32_enabled) {
           --&gt; Add VDSO entry with NULL pointer

Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for
the newly forked process or not.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathias Krause &lt;minipli@googlemail.com&gt;
Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6fdc6dd90272ce7e75d744f71535cfbd8d77da81 upstream.

The vsyscall32 sysctl can racy against a concurrent fork when it switches
from disabled to enabled:

    arch_setup_additional_pages()
	if (vdso32_enabled)
           --&gt; No mapping
                                        sysctl.vsysscall32()
                                          --&gt; vdso32_enabled = true
    create_elf_tables()
      ARCH_DLINFO_IA32
        if (vdso32_enabled) {
           --&gt; Add VDSO entry with NULL pointer

Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for
the newly forked process or not.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathias Krause &lt;minipli@googlemail.com&gt;
Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/apic: Order irq_enter/exit() calls correctly vs. ack_APIC_irq()</title>
<updated>2017-01-27T16:15:00+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2016-09-18T11:34:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a2b17a74c58b0a6ac4c7bd1c1c64f00eaadadfce'/>
<id>a2b17a74c58b0a6ac4c7bd1c1c64f00eaadadfce</id>
<content type='text'>
commit b0f48706a176b71a6e54f399d7404bbeeaa7cfab upstream.

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc6+ #5 Not tainted
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.

stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
 0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
 0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
 000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff94446949&gt;] dump_stack+0x99/0xd0
 [&lt;ffffffff940e9247&gt;] lockdep_rcu_suspicious+0xe7/0x120
 [&lt;ffffffff9448e0d5&gt;] do_trace_write_msr+0x135/0x140
 [&lt;ffffffff9406e750&gt;] native_write_msr+0x20/0x30
 [&lt;ffffffff9406503d&gt;] native_apic_msr_eoi_write+0x1d/0x30
 [&lt;ffffffff9405b17e&gt;] smp_trace_call_function_interrupt+0x1e/0x270
 [&lt;ffffffff948cb1d6&gt;] trace_call_function_interrupt+0x96/0xa0
 &lt;EOI&gt;  [&lt;ffffffff947200f4&gt;] ? cpuidle_enter_state+0xe4/0x360
 [&lt;ffffffff947200df&gt;] ? cpuidle_enter_state+0xcf/0x360
 [&lt;ffffffff947203a7&gt;] cpuidle_enter+0x17/0x20
 [&lt;ffffffff940df008&gt;] cpu_startup_entry+0x338/0x4d0
 [&lt;ffffffff9405bfc4&gt;] start_secondary+0x154/0x180

This can be reproduced readily by running ftrace test case of kselftest.

Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly. The same applies to
exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().

[ tglx: Massaged changelog ]

Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b0f48706a176b71a6e54f399d7404bbeeaa7cfab upstream.

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc6+ #5 Not tainted
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.

stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
 0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
 0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
 000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff94446949&gt;] dump_stack+0x99/0xd0
 [&lt;ffffffff940e9247&gt;] lockdep_rcu_suspicious+0xe7/0x120
 [&lt;ffffffff9448e0d5&gt;] do_trace_write_msr+0x135/0x140
 [&lt;ffffffff9406e750&gt;] native_write_msr+0x20/0x30
 [&lt;ffffffff9406503d&gt;] native_apic_msr_eoi_write+0x1d/0x30
 [&lt;ffffffff9405b17e&gt;] smp_trace_call_function_interrupt+0x1e/0x270
 [&lt;ffffffff948cb1d6&gt;] trace_call_function_interrupt+0x96/0xa0
 &lt;EOI&gt;  [&lt;ffffffff947200f4&gt;] ? cpuidle_enter_state+0xe4/0x360
 [&lt;ffffffff947200df&gt;] ? cpuidle_enter_state+0xcf/0x360
 [&lt;ffffffff947203a7&gt;] cpuidle_enter+0x17/0x20
 [&lt;ffffffff940df008&gt;] cpu_startup_entry+0x338/0x4d0
 [&lt;ffffffff9405bfc4&gt;] start_secondary+0x154/0x180

This can be reproduced readily by running ftrace test case of kselftest.

Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly. The same applies to
exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().

[ tglx: Massaged changelog ]

Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/mm/xen: Suppress hugetlbfs in PV guests</title>
<updated>2016-11-24T15:23:39+00:00</updated>
<author>
<name>Jan Beulich</name>
<email>JBeulich@suse.com</email>
</author>
<published>2016-04-21T06:27:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b0363263e57199808d17e597df74cbdb28abbe55'/>
<id>b0363263e57199808d17e597df74cbdb28abbe55</id>
<content type='text'>
commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[&lt;ffffffff811c333b&gt;]  [&lt;ffffffff811c333b&gt;] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [&lt;ffffffff811c3415&gt;] hugetlbfs_evict_inode+0x15/0x40
   [&lt;ffffffff81167b3d&gt;] evict+0xbd/0x1b0
   [&lt;ffffffff8116514a&gt;] __dentry_kill+0x19a/0x1f0
   [&lt;ffffffff81165b0e&gt;] dput+0x1fe/0x220
   [&lt;ffffffff81150535&gt;] __fput+0x155/0x200
   [&lt;ffffffff81079fc0&gt;] task_work_run+0x60/0xa0
   [&lt;ffffffff81063510&gt;] do_exit+0x160/0x400
   [&lt;ffffffff810637eb&gt;] do_group_exit+0x3b/0xa0
   [&lt;ffffffff8106e8bd&gt;] get_signal+0x1ed/0x470
   [&lt;ffffffff8100f854&gt;] do_signal+0x14/0x110
   [&lt;ffffffff810030e9&gt;] prepare_exit_to_usermode+0xe9/0xf0
   [&lt;ffffffff814178a5&gt;] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Juergen Gross &lt;JGross@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luis R. Rodriguez &lt;mcgrof@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Cc: xen-devel &lt;xen-devel@lists.xenproject.org&gt;
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[&lt;ffffffff811c333b&gt;]  [&lt;ffffffff811c333b&gt;] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [&lt;ffffffff811c3415&gt;] hugetlbfs_evict_inode+0x15/0x40
   [&lt;ffffffff81167b3d&gt;] evict+0xbd/0x1b0
   [&lt;ffffffff8116514a&gt;] __dentry_kill+0x19a/0x1f0
   [&lt;ffffffff81165b0e&gt;] dput+0x1fe/0x220
   [&lt;ffffffff81150535&gt;] __fput+0x155/0x200
   [&lt;ffffffff81079fc0&gt;] task_work_run+0x60/0xa0
   [&lt;ffffffff81063510&gt;] do_exit+0x160/0x400
   [&lt;ffffffff810637eb&gt;] do_group_exit+0x3b/0xa0
   [&lt;ffffffff8106e8bd&gt;] get_signal+0x1ed/0x470
   [&lt;ffffffff8100f854&gt;] do_signal+0x14/0x110
   [&lt;ffffffff810030e9&gt;] prepare_exit_to_usermode+0xe9/0xf0
   [&lt;ffffffff814178a5&gt;] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Juergen Gross &lt;JGross@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luis R. Rodriguez &lt;mcgrof@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Cc: xen-devel &lt;xen-devel@lists.xenproject.org&gt;
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix potential infoleak in older kernels</title>
<updated>2016-11-24T15:20:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-11-08T10:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c0547ebf8db11de283c482276511d6cdfb747a8c'/>
<id>c0547ebf8db11de283c482276511d6cdfb747a8c</id>
<content type='text'>
Not upstream as it is not needed there.

So a patch something like this might be a safe way to fix the
potential infoleak in older kernels.

THIS IS UNTESTED. It's a very obvious patch, though, so if it compiles
it probably works. It just initializes the output variable with 0 in
the inline asm description, instead of doing it in the exception
handler.

It will generate slightly worse code (a few unnecessary ALU
operations), but it doesn't have any interactions with the exception
handler implementation.

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Not upstream as it is not needed there.

So a patch something like this might be a safe way to fix the
potential infoleak in older kernels.

THIS IS UNTESTED. It's a very obvious patch, though, so if it compiles
it probably works. It just initializes the output variable with 0 in
the inline asm description, instead of doing it in the exception
handler.

It will generate slightly worse code (a few unnecessary ALU
operations), but it doesn't have any interactions with the exception
handler implementation.

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "fix minor infoleak in get_user_ex()"</title>
<updated>2016-11-08T15:38:29+00:00</updated>
<author>
<name>Jiri Slaby</name>
<email>jslaby@suse.cz</email>
</author>
<published>2016-10-31T19:30:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2a6257bbdcfa57c6edf658e0d57f0c7c232bce18'/>
<id>2a6257bbdcfa57c6edf658e0d57f0c7c232bce18</id>
<content type='text'>
This reverts commit d42924ab1ec523c0671f5560d51750996be31d3a which is
1c109fabbd51863475cd12ac206bdd249aee35af upstream.

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit d42924ab1ec523c0671f5560d51750996be31d3a which is
1c109fabbd51863475cd12ac206bdd249aee35af upstream.

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix minor infoleak in get_user_ex()</title>
<updated>2016-09-29T09:14:28+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ZenIV.linux.org.uk</email>
</author>
<published>2016-09-15T01:35:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d42924ab1ec523c0671f5560d51750996be31d3a'/>
<id>d42924ab1ec523c0671f5560d51750996be31d3a</id>
<content type='text'>
commit 1c109fabbd51863475cd12ac206bdd249aee35af upstream.

get_user_ex(x, ptr) should zero x on failure.  It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1c109fabbd51863475cd12ac206bdd249aee35af upstream.

get_user_ex(x, ptr) should zero x on failure.  It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/mm: Disable preemption during CR3 read+write</title>
<updated>2016-09-21T11:40:07+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2016-08-05T13:37:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=08180d21d4dc457bd6ed494c1363e942f518534c'/>
<id>08180d21d4dc457bd6ed494c1363e942f518534c</id>
<content type='text'>
commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream.

There's a subtle preemption race on UP kernels:

Usually current-&gt;mm (and therefore mm-&gt;pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current-&gt;mm = NULL followed by:

 -&gt; mmput()
 -&gt; exit_mmap()
 -&gt; tlb_finish_mmu()
 -&gt; tlb_flush_mmu()
 -&gt; tlb_flush_mmu_tlbonly()
 -&gt; tlb_flush()
 -&gt; flush_tlb_mm_range()
 -&gt; __flush_tlb_up()
 -&gt; __flush_tlb()
 -&gt;  __native_flush_tlb()

At this point current-&gt;mm is NULL but current-&gt;active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no -&gt;mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (-&gt;active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream.

There's a subtle preemption race on UP kernels:

Usually current-&gt;mm (and therefore mm-&gt;pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current-&gt;mm = NULL followed by:

 -&gt; mmput()
 -&gt; exit_mmap()
 -&gt; tlb_finish_mmu()
 -&gt; tlb_flush_mmu()
 -&gt; tlb_flush_mmu_tlbonly()
 -&gt; tlb_flush()
 -&gt; flush_tlb_mm_range()
 -&gt; __flush_tlb_up()
 -&gt; __flush_tlb()
 -&gt;  __native_flush_tlb()

At this point current-&gt;mm is NULL but current-&gt;active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no -&gt;mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (-&gt;active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/mm: Improve switch_mm() barrier comments</title>
<updated>2016-08-19T07:51:05+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@kernel.org</email>
</author>
<published>2016-01-12T20:47:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=69744a2a0eceeb9212620fa43d084a12bff21857'/>
<id>69744a2a0eceeb9212620fa43d084a12bff21857</id>
<content type='text'>
commit 4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b upstream.

My previous comments were still a bit confusing and there was a
typo. Fix it up.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b upstream.

My previous comments were still a bit confusing and there was a
typo. Fix it up.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/mm: Add barriers and document switch_mm()-vs-flush synchronization</title>
<updated>2016-07-21T06:36:14+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@kernel.org</email>
</author>
<published>2016-01-06T20:21:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aa8f21d06e61b029341c51b17edd68ba15fe0e47'/>
<id>aa8f21d06e61b029341c51b17edd68ba15fe0e47</id>
<content type='text'>
commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

Signed-off-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Charles (Chas) Williams &lt;ciwillia@brocade.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

Signed-off-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Charles (Chas) Williams &lt;ciwillia@brocade.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()</title>
<updated>2016-04-11T14:44:01+00:00</updated>
<author>
<name>Dave Jones</name>
<email>davej@codemonkey.org.uk</email>
</author>
<published>2016-03-15T01:20:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f479018034abbc565fcaf93ce3e8ebf22a68fbad'/>
<id>f479018034abbc565fcaf93ce3e8ebf22a68fbad</id>
<content type='text'>
commit 7834c10313fb823e538f2772be78edcdeed2e6e3 upstream.

Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
 ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff92560c2a&gt;] dump_stack+0x67/0x9d
 [&lt;ffffffff921376a6&gt;] lockdep_rcu_suspicious+0xe6/0x100
 [&lt;ffffffff925ae7a7&gt;] do_trace_write_msr+0x127/0x1a0
 [&lt;ffffffff92061c83&gt;] native_apic_msr_eoi_write+0x23/0x30
 [&lt;ffffffff92054408&gt;] smp_trace_call_function_interrupt+0x38/0x360
 [&lt;ffffffff92d1ca60&gt;] trace_call_function_interrupt+0x90/0xa0
 &lt;EOI&gt;  [&lt;ffffffff92ac5124&gt;] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Fixes: 4787c368a9bc "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7834c10313fb823e538f2772be78edcdeed2e6e3 upstream.

Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
 ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff92560c2a&gt;] dump_stack+0x67/0x9d
 [&lt;ffffffff921376a6&gt;] lockdep_rcu_suspicious+0xe6/0x100
 [&lt;ffffffff925ae7a7&gt;] do_trace_write_msr+0x127/0x1a0
 [&lt;ffffffff92061c83&gt;] native_apic_msr_eoi_write+0x23/0x30
 [&lt;ffffffff92054408&gt;] smp_trace_call_function_interrupt+0x38/0x360
 [&lt;ffffffff92d1ca60&gt;] trace_call_function_interrupt+0x90/0xa0
 &lt;EOI&gt;  [&lt;ffffffff92ac5124&gt;] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Fixes: 4787c368a9bc "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
