<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/x86/include/asm, branch linux-3.19.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance</title>
<updated>2015-05-06T20:01:39+00:00</updated>
<author>
<name>Len Brown</name>
<email>len.brown@intel.com</email>
</author>
<published>2014-01-15T05:37:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0e126cfab3928e8b01e17e1e707eec2bfdb4f1c2'/>
<id>0e126cfab3928e8b01e17e1e707eec2bfdb4f1c2</id>
<content type='text'>
commit b253149b843f89cd300cbdbea27ce1f847506f99 upstream.

In Linux-3.9 we removed the mwait_idle() loop:

  69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:

 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
    MWAIT-C1 is preferred for optimal power and performance.
    But if they support just C1, cpuidle never loads and
    so they use the boot-time default idle loop forever.

 2. Some laptops will boot-hang if HALT is used,
    but will boot successfully if MWAIT is used.
    This appears to be a hidden assumption in BIOS SMI,
    that is presumably valid on the proprietary OS
    where the BIOS was validated.

       https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:

  For 3.9, simply revert 69fb3676df
  for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
  context For 3.11, 3.12, 3.13, this patch applies cleanly

Tested-by: Mike Galbraith &lt;bitbucket@online.de&gt;
Signed-off-by: Len Brown &lt;len.brown@intel.com&gt;
Acked-by: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Ian Malone &lt;ibmalone@gmail.com&gt;
Cc: Josh Boyer &lt;jwboyer@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b253149b843f89cd300cbdbea27ce1f847506f99 upstream.

In Linux-3.9 we removed the mwait_idle() loop:

  69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:

 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
    MWAIT-C1 is preferred for optimal power and performance.
    But if they support just C1, cpuidle never loads and
    so they use the boot-time default idle loop forever.

 2. Some laptops will boot-hang if HALT is used,
    but will boot successfully if MWAIT is used.
    This appears to be a hidden assumption in BIOS SMI,
    that is presumably valid on the proprietary OS
    where the BIOS was validated.

       https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:

  For 3.9, simply revert 69fb3676df
  for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
  context For 3.11, 3.12, 3.13, this patch applies cleanly

Tested-by: Mike Galbraith &lt;bitbucket@online.de&gt;
Signed-off-by: Len Brown &lt;len.brown@intel.com&gt;
Acked-by: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Ian Malone &lt;ibmalone@gmail.com&gt;
Cc: Josh Boyer &lt;jwboyer@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/asm/decoder: Fix and enforce max instruction size in the insn decoder</title>
<updated>2015-05-06T20:01:39+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2015-01-28T00:06:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=30d7277d02c5073cd9e834b812ff996be202be00'/>
<id>30d7277d02c5073cd9e834b812ff996be202be00</id>
<content type='text'>
commit 91e5ed49fca09c2b83b262b9757d1376ee2b46c3 upstream.

x86 instructions cannot exceed 15 bytes, and the instruction
decoder should enforce that.  Prior to 6ba48ff46f76, the
instruction length limit was implicitly set to 16, which was an
approximation of 15, but there is currently no limit at all.

Fix MAX_INSN_SIZE (it should be 15, not 16), and fix the decoder
to reject instructions that exceed MAX_INSN_SIZE.

Other than potentially confusing some of the decoder sanity
checks, I'm not aware of any actual problems that omitting this
check would cause, nor am I aware of any practical problems
caused by the MAX_INSN_SIZE error.

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Acked-by: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Fixes: 6ba48ff46f76 ("x86: Remove arbitrary instruction size limit ...
Link: http://lkml.kernel.org/r/f8f0bc9b8c58cfd6830f7d88400bf1396cbdcd0f.1422403511.git.luto@amacapital.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 91e5ed49fca09c2b83b262b9757d1376ee2b46c3 upstream.

x86 instructions cannot exceed 15 bytes, and the instruction
decoder should enforce that.  Prior to 6ba48ff46f76, the
instruction length limit was implicitly set to 16, which was an
approximation of 15, but there is currently no limit at all.

Fix MAX_INSN_SIZE (it should be 15, not 16), and fix the decoder
to reject instructions that exceed MAX_INSN_SIZE.

Other than potentially confusing some of the decoder sanity
checks, I'm not aware of any actual problems that omitting this
check would cause, nor am I aware of any practical problems
caused by the MAX_INSN_SIZE error.

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Acked-by: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Fixes: 6ba48ff46f76 ("x86: Remove arbitrary instruction size limit ...
Link: http://lkml.kernel.org/r/f8f0bc9b8c58cfd6830f7d88400bf1396cbdcd0f.1422403511.git.luto@amacapital.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/fpu: Drop_fpu() should not assume that tsk equals current</title>
<updated>2015-03-26T12:59:47+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-03-13T08:53:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4acde8bf03d0aeb901cf46c4babefb45aec92f76'/>
<id>4acde8bf03d0aeb901cf46c4babefb45aec92f76</id>
<content type='text'>
commit f4c3686386393c120710dd34df2a74183ab805fd upstream.

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()-&gt;restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Pekka Riikonen &lt;priikone@iki.fi&gt;
Cc: Quentin Casasnovas &lt;quentin.casasnovas@oracle.com&gt;
Cc: Suresh Siddha &lt;sbsiddha@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f4c3686386393c120710dd34df2a74183ab805fd upstream.

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()-&gt;restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Pekka Riikonen &lt;priikone@iki.fi&gt;
Cc: Quentin Casasnovas &lt;quentin.casasnovas@oracle.com&gt;
Cc: Suresh Siddha &lt;sbsiddha@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/fpu/xsaves: Fix improper uses of __ex_table</title>
<updated>2015-03-18T13:10:57+00:00</updated>
<author>
<name>Quentin Casasnovas</name>
<email>quentin.casasnovas@oracle.com</email>
</author>
<published>2015-03-05T12:19:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b6055fc682c2cbbdca5ff3a71af663af986f0e6'/>
<id>8b6055fc682c2cbbdca5ff3a71af663af986f0e6</id>
<content type='text'>
commit 06c8173eb92bbfc03a0fe8bb64315857d0badd06 upstream.

Commit:

  f31a9f7c7169 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")

introduced alternative instructions for XSAVES/XRSTORS and commit:

  adb9d526e982 ("x86/xsaves: Add xsaves and xrstors support for booting time")

added support for the XSAVES/XRSTORS instructions at boot time.

Unfortunately both failed to properly protect them against faulting:

The 'xstate_fault' macro will use the closest label named '1'
backward and that ends up in the .altinstr_replacement section
rather than in .text. This means that the kernel will never find
in the __ex_table the .text address where this instruction might
fault, leading to serious problems if userspace manages to
trigger the fault.

Signed-off-by: Quentin Casasnovas &lt;quentin.casasnovas@oracle.com&gt;
Signed-off-by: Jamie Iles &lt;jamie.iles@oracle.com&gt;
[ Improved the changelog, fixed some whitespace noise. ]
Acked-by: Borislav Petkov &lt;bp@alien8.de&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Allan Xavier &lt;mr.a.xavier@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: adb9d526e982 ("x86/xsaves: Add xsaves and xrstors support for booting time")
Fixes: f31a9f7c7169 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 06c8173eb92bbfc03a0fe8bb64315857d0badd06 upstream.

Commit:

  f31a9f7c7169 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")

introduced alternative instructions for XSAVES/XRSTORS and commit:

  adb9d526e982 ("x86/xsaves: Add xsaves and xrstors support for booting time")

added support for the XSAVES/XRSTORS instructions at boot time.

Unfortunately both failed to properly protect them against faulting:

The 'xstate_fault' macro will use the closest label named '1'
backward and that ends up in the .altinstr_replacement section
rather than in .text. This means that the kernel will never find
in the __ex_table the .text address where this instruction might
fault, leading to serious problems if userspace manages to
trigger the fault.

Signed-off-by: Quentin Casasnovas &lt;quentin.casasnovas@oracle.com&gt;
Signed-off-by: Jamie Iles &lt;jamie.iles@oracle.com&gt;
[ Improved the changelog, fixed some whitespace noise. ]
Acked-by: Borislav Petkov &lt;bp@alien8.de&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Allan Xavier &lt;mr.a.xavier@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: adb9d526e982 ("x86/xsaves: Add xsaves and xrstors support for booting time")
Fixes: f31a9f7c7169 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/spinlocks/paravirt: Fix memory corruption on unlock</title>
<updated>2015-03-06T22:57:42+00:00</updated>
<author>
<name>Raghavendra K T</name>
<email>raghavendra.kt@linux.vnet.ibm.com</email>
</author>
<published>2015-02-06T11:14:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dcc2e51a1fc91c2a631ed3ea5afed22c6aa13728'/>
<id>dcc2e51a1fc91c2a631ed3ea5afed22c6aa13728</id>
<content type='text'>
commit d6abfdb2022368d8c6c4be3f11a06656601a6cc2 upstream.

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:

                prev = *lock;
                add_smp(&amp;lock-&gt;tickets.head, TICKET_LOCK_INC);

                /* add_smp() is a full mb() */

                if (unlikely(lock-&gt;tickets.tail &amp; TICKET_SLOWPATH_FLAG))
                        __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:

 1. Moving slowpath flag to head (Oleg):
    Unlocked locks don't care about the slowpath flag; therefore we can keep
    it set after the last unlock, and clear it again on the first (try)lock.
    -- this removes the write after unlock. note that keeping slowpath flag would
    result in unnecessary kicks.
    By moving the slowpath flag from the tail to the head ticket we also avoid
    the need to access both the head and tail tickets on unlock.

 2. use xadd to avoid read/write after unlock that checks the need for
    unlock_kick (Linus):
    We further avoid the need for a read-after-release by using xadd;
    the prev head value will include the slowpath flag and indicate if we
    need to do PV kicking of suspended spinners -- on modern chips xadd
    isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x           -0.13
 kernbench  2x            0.02
 dbench     1x           -1.77
 dbench     2x           -0.63

[Jeremy: Hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moved slowpath flag to head, ticket_equals idea]
[PeterZ: Added detailed changelog]

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Jones &lt;drjones@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Fernando Luis Vázquez Cao &lt;fernando_b1@lab.ntt.co.jp&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: a.ryabinin@samsung.com
Cc: dave@stgolabs.net
Cc: hpa@zytor.com
Cc: jasowang@redhat.com
Cc: jeremy@goop.org
Cc: paul.gortmaker@windriver.com
Cc: riel@redhat.com
Cc: tglx@linutronix.de
Cc: waiman.long@hp.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d6abfdb2022368d8c6c4be3f11a06656601a6cc2 upstream.

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:

                prev = *lock;
                add_smp(&amp;lock-&gt;tickets.head, TICKET_LOCK_INC);

                /* add_smp() is a full mb() */

                if (unlikely(lock-&gt;tickets.tail &amp; TICKET_SLOWPATH_FLAG))
                        __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:

 1. Moving slowpath flag to head (Oleg):
    Unlocked locks don't care about the slowpath flag; therefore we can keep
    it set after the last unlock, and clear it again on the first (try)lock.
    -- this removes the write after unlock. note that keeping slowpath flag would
    result in unnecessary kicks.
    By moving the slowpath flag from the tail to the head ticket we also avoid
    the need to access both the head and tail tickets on unlock.

 2. use xadd to avoid read/write after unlock that checks the need for
    unlock_kick (Linus):
    We further avoid the need for a read-after-release by using xadd;
    the prev head value will include the slowpath flag and indicate if we
    need to do PV kicking of suspended spinners -- on modern chips xadd
    isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x           -0.13
 kernbench  2x            0.02
 dbench     1x           -1.77
 dbench     2x           -0.63

[Jeremy: Hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moved slowpath flag to head, ticket_equals idea]
[PeterZ: Added detailed changelog]

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Jones &lt;drjones@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Fernando Luis Vázquez Cao &lt;fernando_b1@lab.ntt.co.jp&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: a.ryabinin@samsung.com
Cc: dave@stgolabs.net
Cc: hpa@zytor.com
Cc: jasowang@redhat.com
Cc: jeremy@goop.org
Cc: paul.gortmaker@windriver.com
Cc: riel@redhat.com
Cc: tglx@linutronix.de
Cc: waiman.long@hp.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86, tls: Interpret an all-zero struct user_desc as "no segment"</title>
<updated>2015-01-22T20:45:07+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2015-01-22T19:27:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3669ef9fa7d35f573ec9c0e0341b29251c2734a7'/>
<id>3669ef9fa7d35f573ec9c0e0341b29251c2734a7</id>
<content type='text'>
The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&amp;u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &amp;u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: stable@vger.kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&amp;u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &amp;u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: stable@vger.kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, tls, ldt: Stop checking lm in LDT_empty</title>
<updated>2015-01-22T20:11:06+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2015-01-22T19:27:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e30ab185c490e9a9381385529e0fd32f0a399495'/>
<id>e30ab185c490e9a9381385529e0fd32f0a399495</id>
<content type='text'>
32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, mpx: Fix potential performance issue on unmaps</title>
<updated>2015-01-22T20:11:06+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2015-01-08T22:30:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c922228efeeefa32e57f875764bfa6ca8053a68a'/>
<id>c922228efeeefa32e57f875764bfa6ca8053a68a</id>
<content type='text'>
The 3.19 merge window saw some TLB modifications merged which caused a
performance regression. They were fixed in commit 045bbb9fa.

Once that fix was applied, I also noticed that there was a small
but intermittent regression still present.  It was not present
consistently enough to bisect reliably, but I'm fairly confident
that it came from (my own) MPX patches.  The source was reading
a relatively unused field in the mm_struct via arch_unmap.

I also noted that this code was in the main instruction flow of
do_munmap() and probably had more icache impact than we want.

This patch does two things:
1. Adds a static (via Kconfig) and dynamic (via cpuid) check
   for MPX with cpu_feature_enabled().  This keeps us from
   reading that cacheline in the mm and trades it for a check
   of the global CPUID variables at least on CPUs without MPX.
2. Adds an unlikely() to ensure that the MPX call ends up out
   of the main instruction flow in do_munmap().  I've added
   a detailed comment about why this was done and why we want
   it even on systems where MPX is present.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: luto@amacapital.net
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 3.19 merge window saw some TLB modifications merged which caused a
performance regression. They were fixed in commit 045bbb9fa.

Once that fix was applied, I also noticed that there was a small
but intermittent regression still present.  It was not present
consistently enough to bisect reliably, but I'm fairly confident
that it came from (my own) MPX patches.  The source was reading
a relatively unused field in the mm_struct via arch_unmap.

I also noted that this code was in the main instruction flow of
do_munmap() and probably had more icache impact than we want.

This patch does two things:
1. Adds a static (via Kconfig) and dynamic (via cpuid) check
   for MPX with cpu_feature_enabled().  This keeps us from
   reading that cacheline in the mm and trades it for a check
   of the global CPUID variables at least on CPUs without MPX.
2. Adds an unlikely() to ensure that the MPX call ends up out
   of the main instruction flow in do_munmap().  I've added
   a detailed comment about why this was done and why we want
   it even on systems where MPX is present.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: luto@amacapital.net
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi</title>
<updated>2015-01-20T10:44:41+00:00</updated>
<author>
<name>Jiang Liu</name>
<email>jiang.liu@linux.intel.com</email>
</author>
<published>2015-01-20T02:21:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8abb850a03a3a8b11a0e92949e5b99d9cc178e35'/>
<id>8abb850a03a3a8b11a0e92949e5b99d9cc178e35</id>
<content type='text'>
Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is.
That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or
acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(),
which may cause undesired effects. So override __acpi_unregister_gsi to
NULL for safety.

Signed-off-by: Jiang Liu &lt;jiang.liu@linux.intel.com&gt;
Tested-by: Sander Eikelenboom &lt;linux@eikelenboom.it&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Graeme Gregory &lt;graeme.gregory@linaro.org&gt;
Cc: Lv Zheng &lt;lv.zheng@intel.com&gt;
Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is.
That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or
acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(),
which may cause undesired effects. So override __acpi_unregister_gsi to
NULL for safety.

Signed-off-by: Jiang Liu &lt;jiang.liu@linux.intel.com&gt;
Tested-by: Sander Eikelenboom &lt;linux@eikelenboom.it&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Graeme Gregory &lt;graeme.gregory@linaro.org&gt;
Cc: Lv Zheng &lt;lv.zheng@intel.com&gt;
Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'pr-20141223-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent</title>
<updated>2015-01-01T21:21:22+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2015-01-01T21:21:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2aba73a6146bb85d4a42386ca41dec0f4aa4b3ad'/>
<id>2aba73a6146bb85d4a42386ca41dec0f4aa4b3ad</id>
<content type='text'>
Pull VDSO fix from Andy Lutomirski:

 "This is hopefully the last vdso fix for 3.19.  It should be very
  safe (it just adds a volatile).

  I don't think it fixes an actual bug (the __getcpu calls in the
  pvclock code may not have been needed in the first place), but
  discussion on that point is ongoing.

  It also fixes a big performance issue in 3.18 and earlier in which
  the lsl instructions in vclock_gettime got hoisted so far up the
  function that they happened even when the function they were in was
  never called.  n 3.19, the performance issue seems to be gone due to
  the whims of my compiler and some interaction with a branch that's
  now gone.

  I'll hopefully have a much bigger overhaul of the pvclock code
  for 3.20, but it needs careful review."

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull VDSO fix from Andy Lutomirski:

 "This is hopefully the last vdso fix for 3.19.  It should be very
  safe (it just adds a volatile).

  I don't think it fixes an actual bug (the __getcpu calls in the
  pvclock code may not have been needed in the first place), but
  discussion on that point is ongoing.

  It also fixes a big performance issue in 3.18 and earlier in which
  the lsl instructions in vclock_gettime got hoisted so far up the
  function that they happened even when the function they were in was
  never called.  n 3.19, the performance issue seems to be gone due to
  the whims of my compiler and some interaction with a branch that's
  now gone.

  I'll hopefully have a much bigger overhaul of the pvclock code
  for 3.20, but it needs careful review."

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
