<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/kernel/static_call.c, branch v5.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>x86/static_call: Serialize __static_call_fixup() properly</title>
<updated>2022-07-12T12:23:32+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2022-07-12T12:01:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c27c753ea6fd1237f4f96abf8b623d7bab505513'/>
<id>c27c753ea6fd1237f4f96abf8b623d7bab505513</id>
<content type='text'>
__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.

Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.

Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/retbleed: Add fine grained Kconfig knobs</title>
<updated>2022-06-29T15:43:41+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-06-27T22:21:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f43b9876e857c739d407bc56df288b0ebe1a9164'/>
<id>f43b9876e857c739d407bc56df288b0ebe1a9164</id>
<content type='text'>
Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86,static_call: Use alternative RET encoding</title>
<updated>2022-06-27T08:33:58+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-06-14T21:15:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ee88d363d15617ff50ac24fab0ffec11113b2aeb'/>
<id>ee88d363d15617ff50ac24fab0ffec11113b2aeb</id>
<content type='text'>
In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.

In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.

This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.

This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.

OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:

  push %rbx
  rex.XB push %r12

Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.

As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:

  ud1 %esp, %ecx

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.

In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.

This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.

This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.

OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:

  push %rbx
  rex.XB push %r12

Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.

As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:

  ud1 %esp, %ecx

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86,static_call: Fix __static_call_return0 for i386</title>
<updated>2022-04-05T07:59:37+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-03-18T20:24:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1cd5f059d956e6f614ba6666ecdbcf95db05d5f5'/>
<id>1cd5f059d956e6f614ba6666ecdbcf95db05d5f5</id>
<content type='text'>
Paolo reported that the instruction sequence that is used to replace:

    call __static_call_return0

namely:

    66 66 48 31 c0	data16 data16 xor %rax,%rax

decodes to something else on i386, namely:

    66 66 48		data16 dec %ax
    31 c0		xor    %eax,%eax

Which is a nonsensical sequence that happens to have the same outcome.
*However* an important distinction is that it consists of 2
instructions which is a problem when the thing needs to be overwriten
with a regular call instruction again.

As such, replace the instruction with something that decodes the same
on both i386 and x86_64.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-ass.net
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Paolo reported that the instruction sequence that is used to replace:

    call __static_call_return0

namely:

    66 66 48 31 c0	data16 data16 xor %rax,%rax

decodes to something else on i386, namely:

    66 66 48		data16 dec %ax
    31 c0		xor    %eax,%eax

Which is a nonsensical sequence that happens to have the same outcome.
*However* an important distinction is that it consists of 2
instructions which is a problem when the thing needs to be overwriten
with a regular call instruction again.

As such, replace the instruction with something that decodes the same
on both i386 and x86_64.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-ass.net
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Add straight-line-speculation mitigation</title>
<updated>2021-12-09T12:32:25+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-12-04T13:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e463a09af2f0677b9485a7e8e4e70b396b2ffb6f'/>
<id>e463a09af2f0677b9485a7e8e4e70b396b2ffb6f</id>
<content type='text'>
Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

  https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
  https://bugs.llvm.org/show_bug.cgi?id=52323

It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.

Maintenance overhead of this should be fairly low due to objtool
validation.

Size overhead of all these additional int3 instructions comes to:

     text	   data	    bss	    dec	    hex	filename
  22267751	6933356	2011368	31212475	1dc43bb	defconfig-build/vmlinux
  22804126	6933356	1470696	31208178	1dc32f2	defconfig-build/vmlinux.sls

Or roughly 2.4% additional text.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

  https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
  https://bugs.llvm.org/show_bug.cgi?id=52323

It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.

Maintenance overhead of this should be fairly low due to objtool
validation.

Size overhead of all these additional int3 instructions comes to:

     text	   data	    bss	    dec	    hex	filename
  22267751	6933356	2011368	31212475	1dc43bb	defconfig-build/vmlinux
  22804126	6933356	1470696	31208178	1dc32f2	defconfig-build/vmlinux.sls

Or roughly 2.4% additional text.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>static_call,x86: Robustify trampoline patching</title>
<updated>2021-11-11T12:09:31+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-10-30T07:47:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2105a92748e83e2e3ee6be539da959706bbb3898'/>
<id>2105a92748e83e2e3ee6be539da959706bbb3898</id>
<content type='text'>
Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

   d:   53                      push   %rbx
   e:   43 54                   rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20211030074758.GT174703@worktop.programming.kicks-ass.net
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

   d:   53                      push   %rbx
   e:   43 54                   rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20211030074758.GT174703@worktop.programming.kicks-ass.net
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Remove dynamic NOP selection</title>
<updated>2021-03-15T15:24:59+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-03-12T11:32:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a89dfde3dc3c2dbf56910af75e2d8b11ec5308f6'/>
<id>a89dfde3dc3c2dbf56910af75e2d8b11ec5308f6</id>
<content type='text'>
This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.

 [ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
   which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt; # bpf
Acked-by: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.

 [ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
   which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt; # bpf
Acked-by: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>static_call/x86: Add __static_call_return0()</title>
<updated>2021-02-17T13:08:43+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-01-18T14:12:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3f2a8fc4b15de18644e8a80a09edda168676e22c'/>
<id>3f2a8fc4b15de18644e8a80a09edda168676e22c</id>
<content type='text'>
Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>static_call: Allow early init</title>
<updated>2020-09-01T07:58:06+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-08-18T13:57:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a945c8345ec0decb2f1a7f19a8c5e60bcb1dd1eb'/>
<id>a945c8345ec0decb2f1a7f19a8c5e60bcb1dd1eb</id>
<content type='text'>
In order to use static_call() to wire up x86_pmu, we need to
initialize earlier, specifically before memory allocation works; copy
some of the tricks from jump_label to enable this.

Primarily we overload key-&gt;next to store a sites pointer when there
are no modules, this avoids having to use kmalloc() to initialize the
sites and allows us to run much earlier.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20200818135805.220737930@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to use static_call() to wire up x86_pmu, we need to
initialize earlier, specifically before memory allocation works; copy
some of the tricks from jump_label to enable this.

Primarily we overload key-&gt;next to store a sites pointer when there
are no modules, this avoids having to use kmalloc() to initialize the
sites and allows us to run much earlier.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20200818135805.220737930@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>static_call: Add some validation</title>
<updated>2020-09-01T07:58:06+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-08-18T13:57:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c3fce794e9d2a5ce3a948962d0808a459c40a84'/>
<id>6c3fce794e9d2a5ce3a948962d0808a459c40a84</id>
<content type='text'>
Verify the text we're about to change is as we expect it to be.

Requested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20200818135805.161974981@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Verify the text we're about to change is as we expect it to be.

Requested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20200818135805.161974981@infradead.org
</pre>
</div>
</content>
</entry>
</feed>
