linux.git/arch/x86/kernel/vmlinux.lds.S, branch v5.1

x86/build/lto: Fix truncated .bss with -fdata-sections

2019-04-16T06:20:55+00:00

With CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y, we compile the kernel with
-fdata-sections, which also splits the .bss section.

The new section, with a new .bss.* name, which pattern gets missed by the
main x86 linker script which only expects the '.bss' name. This results
in the discarding of the second part and a too small, truncated .bss
section and an unhappy, non-working kernel.

Use the common BSS_MAIN macro in the linker script to properly capture
and merge all the generated BSS sections.

Signed-off-by: Sami Tolvanen 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Kees Cook 
Cc: Borislav Petkov 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Nicholas Piggin 
Cc: Nick Desaulniers 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20190415164956.124067-1-samitolvanen@google.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar

x86/build: Use the single-argument OUTPUT_FORMAT() linker script command

2019-01-16T11:21:53+00:00

The various x86 linker scripts use the three-argument linker script
command variant OUTPUT_FORMAT(DEFAULT, BIG, LITTLE) which specifies
three object file formats when the -EL and -EB linker command line
options are used. When -EB is specified, OUTPUT_FORMAT issues the BIG
object file format, when -EL, LITTLE, respectively, and when neither is
specified, DEFAULT.

However, those -E[LB] options are not used by arch/x86/ so switch to the
simple OUTPUT_FORMAT(BFDNAME) macro variant.

No functional changes.

Signed-off-by: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190109181531.27513-1-bp@alien8.de

x86/build: Mark per-CPU symbols as absolute explicitly for LLD

2019-01-09T10:22:35+00:00

Accessing per-CPU variables is done by finding the offset of the
variable in the per-CPU block and adding it to the address of the
respective CPU's block.

Section 3.10.8 of ld.bfd's documentation states:

  For expressions involving numbers, relative addresses and absolute
  addresses, ld follows these rules to evaluate terms:

  Other binary operations, that is, between two relative addresses
  not in the same section, or between a relative address and an
  absolute address, first convert any non-absolute term to an
  absolute address before applying the operator."

Note that LLVM's linker does not adhere to the GNU ld's implementation
and as such requires implicitly-absolute terms to be explicitly marked
as absolute in the linker script. If not, it fails currently with:

  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
  Makefile:1040: recipe for target 'vmlinux' failed

This is not a functional change for ld.bfd which converts the term to an
absolute symbol anyways as specified above.

Based on a previous submission by Tri Vo .

Reported-by: Dmitry Golovin 
Signed-off-by: Rafael Ávila de Espíndola 
[ Update commit message per Boris' and Michael's suggestions. ]
Signed-off-by: Nick Desaulniers 
[ Massage commit message more, fix typos. ]
Signed-off-by: Borislav Petkov 
Tested-by: Dmitry Golovin 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: Brijesh Singh 
Cc: Cao Jin 
Cc: Ingo Molnar 
Cc: Joerg Roedel 
Cc: Masahiro Yamada 
Cc: Masami Hiramatsu 
Cc: Thomas Gleixner 
Cc: Tri Vo 
Cc: dima@golovin.in
Cc: morbo@google.com
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2018-10-23T17:43:04+00:00

Pull x86 pti updates from Ingo Molnar:
 "The main changes:

   - Make the IBPB barrier more strict and add STIBP support (Jiri
     Kosina)

   - Micro-optimize and clean up the entry code (Andy Lutomirski)

   - ... plus misc other fixes"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Propagate information about RSB filling mitigation to sysfs
  x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
  x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
  x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
  x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
  x86/pti/64: Remove the SYSCALL64 entry trampoline
  x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
  x86/entry/64: Document idtentry

x86/mm: Add .bss..decrypted section to hold shared variables

2018-09-15T18:48:45+00:00

kvmclock defines few static variables which are shared with the
hypervisor during the kvmclock initialization.

When SEV is active, memory is encrypted with a guest-specific key, and
if the guest OS wants to share the memory region with the hypervisor
then it must clear the C-bit before sharing it.

Currently, we use kernel_physical_mapping_init() to split large pages
before clearing the C-bit on shared pages. But it fails when called from
the kvmclock initialization (mainly because the memblock allocator is
not ready that early during boot).

Add a __bss_decrypted section attribute which can be used when defining
such shared variable. The so-defined variables will be placed in the
.bss..decrypted section. This section will be mapped with C=0 early
during boot.

The .bss..decrypted section has a big chunk of memory that may be unused
when memory encryption is not active, free it when memory encryption is
not active.

Suggested-by: Thomas Gleixner 
Signed-off-by: Brijesh Singh 
Signed-off-by: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: Sean Christopherson 
Cc: Radim Krčmář
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com

x86/pti/64: Remove the SYSCALL64 entry trampoline

2018-09-12T19:33:53+00:00

The SYSCALL64 trampoline has a couple of nice properties:

 - The usual sequence of SWAPGS followed by two GS-relative accesses to
   set up RSP is somewhat slow because the GS-relative accesses need
   to wait for SWAPGS to finish.  The trampoline approach allows
   RIP-relative accesses to set up RSP, which avoids the stall.

 - The trampoline avoids any percpu access before CR3 is set up,
   which means that no percpu memory needs to be mapped in the user
   page tables.  This prevents using Meltdown to read any percpu memory
   outside the cpu_entry_area and prevents using timing leaks
   to directly locate the percpu areas.

The downsides of using a trampoline may outweigh the upsides, however.
It adds an extra non-contiguous I$ cache line to system calls, and it
forces an indirect jump to transfer control back to the normal kernel
text after CR3 is set up.  The latter is because x86 lacks a 64-bit
direct jump instruction that could jump from the trampoline to the entry
text.  With retpolines enabled, the indirect jump is extremely slow.

Change the code to map the percpu TSS into the user page tables to allow
the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
new direct information leak, since the TSS is readable by Meltdown from the
cpu_entry_area alias regardless.  It does allow a timing attack to locate
the percpu area, but KASLR is more or less a lost cause against local
attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
on current hardware, KASLR is only useful to mitigate remote attacks that
try to attack the kernel without first gaining RCE against a vulnerable
user process.

On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
overhead from ~237ns to ~228ns.

There is a possible alternative approach: Move the trampoline within 2G of
the entry text and make a separate copy for each CPU.  This would allow a
direct jump to rejoin the normal entry path. There are pro's and con's for
this approach:

 + It avoids a pipeline stall

 - It executes from an extra page and read from another extra page during
   the syscall. The latter is because it needs to use a relative
   addressing mode to find sp1 -- it's the same *cacheline*, but accessed
   using an alias, so it's an extra TLB entry.

 - Slightly more memory. This would be one page per CPU for a simple
   implementation and 64-ish bytes per CPU or one page per node for a more
   complex implementation.

 - More code complexity.

The current approach is chosen for simplicity and because the alternative
does not provide a significant benefit, which makes it worth.

[ tglx: Added the alternative discussion to the changelog ]

Signed-off-by: Andy Lutomirski 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: Josh Poimboeuf 
Cc: Joerg Roedel 
Cc: Jiri Olsa 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org

x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit

2018-07-19T23:11:44+00:00

The pti_clone_kernel_text() function references __end_rodata_hpage_align,
which is only present on x86-64.  This makes sense as the end of the rodata
section is not huge-page aligned on 32 bit.

Nevertheless a symbol is required for the function that points at the right
address for both 32 and 64 bit. Introduce __end_rodata_aligned for that
purpose and use it in pti_clone_kernel_text().

Signed-off-by: Joerg Roedel 
Signed-off-by: Thomas Gleixner 
Tested-by: Pavel Machek 
Cc: "H . Peter Anvin" 
Cc: linux-mm@kvack.org
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Jiri Kosina 
Cc: Boris Ostrovsky 
Cc: Brian Gerst 
Cc: David Laight 
Cc: Denys Vlasenko 
Cc: Eduardo Valentin 
Cc: Greg KH 
Cc: Will Deacon 
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli 
Cc: Waiman Long 
Cc: "David H . Gutteridge" 
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-28-git-send-email-joro@8bytes.org

x86/build: Remove no-op macro VMLINUX_SYMBOL()

2018-05-13T13:11:34+00:00

VMLINUX_SYMBOL() is no-op unless CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX
is defined.  It has ever been selected only by BLACKFIN and METAG.
VMLINUX_SYMBOL() is unneeded for x86-specific code.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Thomas Gleixner 
Cc: linux-arch 
Cc: "H. Peter Anvin" 
Link: https://lkml.kernel.org/r/1525852174-29022-1-git-send-email-yamada.masahiro@socionext.com

Merge branch 'linus' into x86/build to pick up dependencies

2018-03-20T09:56:18+00:00

x86/kprobes: Fix kernel crash when probing .entry_trampoline code

2018-03-09T08:58:36+00:00

Disable the kprobe probing of the entry trampoline:

.entry_trampoline is a code area that is used to ensure page table
isolation between userspace and kernelspace.

At the beginning of the execution of the trampoline, we load the
kernel's CR3 register. This has the effect of enabling the translation
of the kernel virtual addresses to physical addresses. Before this
happens most kernel addresses can not be translated because the running
process' CR3 is still used.

If a kprobe is placed on the trampoline code before that change of the
CR3 register happens the kernel crashes because int3 handling pages are
not accessible.

To fix this, add the .entry_trampoline section to the kprobe blacklist
to prohibit the probing of code before all the kernel pages are
accessible.

Signed-off-by: Francis Deslauriers 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: mathieu.desnoyers@efficios.com
Cc: mhiramat@kernel.org
Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com
Signed-off-by: Ingo Molnar