<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/kernel/relocate_kernel_64.S, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-01-24T13:54:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-24T13:54:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5b7f7234ff18a5f48e0d36c417a0d07c7fdfc61a'/>
<id>5b7f7234ff18a5f48e0d36c417a0d07c7fdfc61a</id>
<content type='text'>
Pull x86 boot updates from Ingo Molnar:

 - A large and involved preparatory series to pave the way to add
   exception handling for relocate_kernel - which will be a debugging
   facility that has aided in the field to debug an exceptionally hard
   to debug early boot bug. Plus assorted cleanups and fixes that were
   discovered along the way, by David Woodhouse:

      - Clean up and document register use in relocate_kernel_64.S
      - Use named labels in swap_pages in relocate_kernel_64.S
      - Only swap pages for ::preserve_context mode
      - Allocate PGD for x86_64 transition page tables separately
      - Copy control page into place in machine_kexec_prepare()
      - Invoke copy of relocate_kernel() instead of the original
      - Move relocate_kernel to kernel .data section
      - Add data section to relocate_kernel
      - Drop page_list argument from relocate_kernel()
      - Eliminate writes through kernel mapping of relocate_kernel page
      - Clean up register usage in relocate_kernel()
      - Mark relocate_kernel page as ROX instead of RWX
      - Disable global pages before writing to control page
      - Ensure preserve_context flag is set on return to kernel
      - Use correct swap page in swap_pages function
      - Fix stack and handling of re-entry point for ::preserve_context
      - Mark machine_kexec() with __nocfi
      - Cope with relocate_kernel() not being at the start of the page
      - Use typedef for relocate_kernel_fn function prototype
      - Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)

 - A series to remove the last remaining absolute symbol references from
   .head.text, and enforce this at build time, by Ard Biesheuvel:

      - Avoid WARN()s and panic()s in early boot code
      - Don't hang but terminate on failure to remap SVSM CA
      - Determine VA/PA offset before entering C code
      - Avoid intentional absolute symbol references in .head.text
      - Disable UBSAN in early boot code
      - Move ENTRY_TEXT to the start of the image
      - Move .head.text into its own output section
      - Reject absolute references in .head.text

 - The above build-time enforcement uncovered a handful of bugs of
   essentially non-working code, and a wrokaround for a toolchain bug,
   fixed by Ard Biesheuvel as well:

      - Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
      - Disable UBSAN on SEV code that may execute very early
      - Disable ftrace branch profiling in SEV startup code

 - And miscellaneous cleanups:

      - kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
      - x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)"

* tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  x86/sev: Disable ftrace branch profiling in SEV startup code
  x86/kexec: Use typedef for relocate_kernel_fn function prototype
  x86/kexec: Cope with relocate_kernel() not being at the start of the page
  kexec_core: Add and update comments regarding the KEXEC_JUMP flow
  x86/kexec: Mark machine_kexec() with __nocfi
  x86/kexec: Fix location of relocate_kernel with -ffunction-sections
  x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
  x86/kexec: Use correct swap page in swap_pages function
  x86/kexec: Ensure preserve_context flag is set on return to kernel
  x86/kexec: Disable global pages before writing to control page
  x86/sev: Don't hang but terminate on failure to remap SVSM CA
  x86/sev: Disable UBSAN on SEV code that may execute very early
  x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
  x86/sysfs: Constify 'struct bin_attribute'
  x86/kexec: Mark relocate_kernel page as ROX instead of RWX
  x86/kexec: Clean up register usage in relocate_kernel()
  x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
  x86/kexec: Drop page_list argument from relocate_kernel()
  x86/kexec: Add data section to relocate_kernel
  x86/kexec: Move relocate_kernel to kernel .data section
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 boot updates from Ingo Molnar:

 - A large and involved preparatory series to pave the way to add
   exception handling for relocate_kernel - which will be a debugging
   facility that has aided in the field to debug an exceptionally hard
   to debug early boot bug. Plus assorted cleanups and fixes that were
   discovered along the way, by David Woodhouse:

      - Clean up and document register use in relocate_kernel_64.S
      - Use named labels in swap_pages in relocate_kernel_64.S
      - Only swap pages for ::preserve_context mode
      - Allocate PGD for x86_64 transition page tables separately
      - Copy control page into place in machine_kexec_prepare()
      - Invoke copy of relocate_kernel() instead of the original
      - Move relocate_kernel to kernel .data section
      - Add data section to relocate_kernel
      - Drop page_list argument from relocate_kernel()
      - Eliminate writes through kernel mapping of relocate_kernel page
      - Clean up register usage in relocate_kernel()
      - Mark relocate_kernel page as ROX instead of RWX
      - Disable global pages before writing to control page
      - Ensure preserve_context flag is set on return to kernel
      - Use correct swap page in swap_pages function
      - Fix stack and handling of re-entry point for ::preserve_context
      - Mark machine_kexec() with __nocfi
      - Cope with relocate_kernel() not being at the start of the page
      - Use typedef for relocate_kernel_fn function prototype
      - Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)

 - A series to remove the last remaining absolute symbol references from
   .head.text, and enforce this at build time, by Ard Biesheuvel:

      - Avoid WARN()s and panic()s in early boot code
      - Don't hang but terminate on failure to remap SVSM CA
      - Determine VA/PA offset before entering C code
      - Avoid intentional absolute symbol references in .head.text
      - Disable UBSAN in early boot code
      - Move ENTRY_TEXT to the start of the image
      - Move .head.text into its own output section
      - Reject absolute references in .head.text

 - The above build-time enforcement uncovered a handful of bugs of
   essentially non-working code, and a wrokaround for a toolchain bug,
   fixed by Ard Biesheuvel as well:

      - Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
      - Disable UBSAN on SEV code that may execute very early
      - Disable ftrace branch profiling in SEV startup code

 - And miscellaneous cleanups:

      - kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
      - x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)"

* tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  x86/sev: Disable ftrace branch profiling in SEV startup code
  x86/kexec: Use typedef for relocate_kernel_fn function prototype
  x86/kexec: Cope with relocate_kernel() not being at the start of the page
  kexec_core: Add and update comments regarding the KEXEC_JUMP flow
  x86/kexec: Mark machine_kexec() with __nocfi
  x86/kexec: Fix location of relocate_kernel with -ffunction-sections
  x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
  x86/kexec: Use correct swap page in swap_pages function
  x86/kexec: Ensure preserve_context flag is set on return to kernel
  x86/kexec: Disable global pages before writing to control page
  x86/sev: Don't hang but terminate on failure to remap SVSM CA
  x86/sev: Disable UBSAN on SEV code that may execute very early
  x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
  x86/sysfs: Constify 'struct bin_attribute'
  x86/kexec: Mark relocate_kernel page as ROX instead of RWX
  x86/kexec: Clean up register usage in relocate_kernel()
  x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
  x86/kexec: Drop page_list argument from relocate_kernel()
  x86/kexec: Add data section to relocate_kernel
  x86/kexec: Move relocate_kernel to kernel .data section
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Cope with relocate_kernel() not being at the start of the page</title>
<updated>2025-01-14T12:05:14+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-01-09T14:04:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e536057543791fbfa0d979f4e782933ea312c38c'/>
<id>e536057543791fbfa0d979f4e782933ea312c38c</id>
<content type='text'>
A few places in the kexec control code page make the assumption that the first
instruction of relocate_kernel is at the very start of the page.

To allow for Clang CFI information to be added to relocate_kernel(), as well
as the general principle of removing unwarranted assumptions, fix them to use
the external __relocate_kernel_start symbol that the linker adds. This means
using a separate addq and subq for calculating offsets, as the assembler can
no longer calculate the delta directly for itself and relocations aren't that
versatile. But those values can at least be used relative to a local label to
avoid absolute relocations.

Turn the jump from relocate_kernel() to identity_mapped() into a real indirect
'jmp *%rsi' too, while touching it. There was no real reason for it to be
a push+ret in the first place, and adding Clang CFI info will also give
objtool enough visibility to start complaining 'return with modified stack
frame' about it.

  [ bp: Massage commit message. ]

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-9-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A few places in the kexec control code page make the assumption that the first
instruction of relocate_kernel is at the very start of the page.

To allow for Clang CFI information to be added to relocate_kernel(), as well
as the general principle of removing unwarranted assumptions, fix them to use
the external __relocate_kernel_start symbol that the linker adds. This means
using a separate addq and subq for calculating offsets, as the assembler can
no longer calculate the delta directly for itself and relocations aren't that
versatile. But those values can at least be used relative to a local label to
avoid absolute relocations.

Turn the jump from relocate_kernel() to identity_mapped() into a real indirect
'jmp *%rsi' too, while touching it. There was no real reason for it to be
a push+ret in the first place, and adding Clang CFI info will also give
objtool enough visibility to start complaining 'return with modified stack
frame' about it.

  [ bp: Massage commit message. ]

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-9-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Fix location of relocate_kernel with -ffunction-sections</title>
<updated>2025-01-14T12:00:18+00:00</updated>
<author>
<name>Nathan Chancellor</name>
<email>nathan@kernel.org</email>
</author>
<published>2025-01-09T14:04:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eeed9150411a63dd0611490cf31fdae681427918'/>
<id>eeed9150411a63dd0611490cf31fdae681427918</id>
<content type='text'>
After commit

  cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),

kernels configured with an option that uses -ffunction-sections, such as
CONFIG_LTO_CLANG, crash when kexecing because the value of relocate_kernel
does not match the value of __relocate_kernel_start so incorrect code gets
copied via machine_kexec_prepare().

  $ llvm-nm good-vmlinux &amp;| rg relocate_kernel
  ffffffff83280d41 T __relocate_kernel_end
  ffffffff83280b00 T __relocate_kernel_start
  ffffffff83280b00 T relocate_kernel

  $ llvm-nm bad-vmlinux &amp;| rg relocate_kernel
  ffffffff83266100 D __relocate_kernel_end
  ffffffff83266100 D __relocate_kernel_start
  ffffffff8120b0d8 T relocate_kernel

When -ffunction-sections is enabled, TEXT_MAIN matches on
'.text.[0-9a-zA-Z_]*' to coalesce the function specific functions back
into .text during link time after they have been optimized. Due to the
placement of TEXT_TEXT before KEXEC_RELOCATE_KERNEL in the x86 linker
script, the .text.relocate_kernel section ends up in .text instead of
.data.

Use a second dot in the relocate_kernel section name to avoid matching
on TEXT_MAIN, which matches a similar situation that happened in
commit

  79cd2a11224e ("x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG"),

which allows kexec to function properly.

While .data.relocate_kernel still ends up in the .data section via
DATA_MAIN -&gt; DATA_DATA, ensure it is located with the
.text.relocate_kernel section as intended by performing the same
transformation.

Fixes: cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section")
Fixes: 8dbec5c77bc3 ("x86/kexec: Add data section to relocate_kernel")
Signed-off-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-6-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit

  cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),

kernels configured with an option that uses -ffunction-sections, such as
CONFIG_LTO_CLANG, crash when kexecing because the value of relocate_kernel
does not match the value of __relocate_kernel_start so incorrect code gets
copied via machine_kexec_prepare().

  $ llvm-nm good-vmlinux &amp;| rg relocate_kernel
  ffffffff83280d41 T __relocate_kernel_end
  ffffffff83280b00 T __relocate_kernel_start
  ffffffff83280b00 T relocate_kernel

  $ llvm-nm bad-vmlinux &amp;| rg relocate_kernel
  ffffffff83266100 D __relocate_kernel_end
  ffffffff83266100 D __relocate_kernel_start
  ffffffff8120b0d8 T relocate_kernel

When -ffunction-sections is enabled, TEXT_MAIN matches on
'.text.[0-9a-zA-Z_]*' to coalesce the function specific functions back
into .text during link time after they have been optimized. Due to the
placement of TEXT_TEXT before KEXEC_RELOCATE_KERNEL in the x86 linker
script, the .text.relocate_kernel section ends up in .text instead of
.data.

Use a second dot in the relocate_kernel section name to avoid matching
on TEXT_MAIN, which matches a similar situation that happened in
commit

  79cd2a11224e ("x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG"),

which allows kexec to function properly.

While .data.relocate_kernel still ends up in the .data section via
DATA_MAIN -&gt; DATA_DATA, ensure it is located with the
.text.relocate_kernel section as intended by performing the same
transformation.

Fixes: cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section")
Fixes: 8dbec5c77bc3 ("x86/kexec: Add data section to relocate_kernel")
Signed-off-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-6-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Fix stack and handling of re-entry point for ::preserve_context</title>
<updated>2025-01-14T11:57:43+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-01-09T14:04:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2cacf7f23a024ab1fdc603ca6a4f4c8b2de9f64e'/>
<id>2cacf7f23a024ab1fdc603ca6a4f4c8b2de9f64e</id>
<content type='text'>
A ::preserve_context kimage can be invoked more than once, and the entry point
can be different every time. When the callee returns to the kernel, it leaves
the address of its entry point for next time on the stack.

That being the case, one might reasonably assume that the caller would
allocate space for it on the stack frame before actually performing the 'call'
into the callee.

Apparently not, though. Ever since the kjump code was first added in 2009, it
has set up a *new* stack at the top of the swap_page scratch page, then just
performed the 'call' without allocating any space for the re-entry address to
be returned. It then reads the re-entry point for next time from 0(%rsp) which
is actually the first qword of the page *after* the swap page, which might not
exist at all! And if the callee has written to that, then it will have
corrupted memory it doesn't own.

Correct this by pushing the entry point of the callee onto the stack before
calling it. The callee may then adjust it, or not, as it sees fit, and
subsequent invocations should work correctly either way.

Remove a stray push of zero to the *relocate_kernel* stack, which may have
been intended for this purpose, but which was actually just noise.

Also, loading the stack for the callee relied on the address of the swap page
being in %r10 without ever documenting that fact. Recent code changes made
that no longer true, so load it directly from the local kexec_pa_swap_page
variable instead.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-5-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A ::preserve_context kimage can be invoked more than once, and the entry point
can be different every time. When the callee returns to the kernel, it leaves
the address of its entry point for next time on the stack.

That being the case, one might reasonably assume that the caller would
allocate space for it on the stack frame before actually performing the 'call'
into the callee.

Apparently not, though. Ever since the kjump code was first added in 2009, it
has set up a *new* stack at the top of the swap_page scratch page, then just
performed the 'call' without allocating any space for the re-entry address to
be returned. It then reads the re-entry point for next time from 0(%rsp) which
is actually the first qword of the page *after* the swap page, which might not
exist at all! And if the callee has written to that, then it will have
corrupted memory it doesn't own.

Correct this by pushing the entry point of the callee onto the stack before
calling it. The callee may then adjust it, or not, as it sees fit, and
subsequent invocations should work correctly either way.

Remove a stray push of zero to the *relocate_kernel* stack, which may have
been intended for this purpose, but which was actually just noise.

Also, loading the stack for the callee relied on the address of the swap page
being in %r10 without ever documenting that fact. Recent code changes made
that no longer true, so load it directly from the local kexec_pa_swap_page
variable instead.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-5-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Use correct swap page in swap_pages function</title>
<updated>2025-01-14T11:54:36+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-01-09T14:04:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=85d724df8c82c060dcdeb8d0de0bd986e6c37b72'/>
<id>85d724df8c82c060dcdeb8d0de0bd986e6c37b72</id>
<content type='text'>
The swap_pages function expects the swap page to be in %r10, but there
was no documentation to that effect. Once upon a time the setup code
used to load its value from a kernel virtual address and save it to an
address which is accessible in the identity-mapped page tables, and
*happened* to use %r10 to do so, with no comment that it was left there
on *purpose* instead of just being a scratch register. Once that was no
longer necessary, %r10 just holds whatever the kernel happened to leave
in it.

Now that the original value passed by the kernel is accessible via
%rip-relative addressing, load directly from there instead of using %r10
for it. But document the other parameters that the swap_pages function
*does* expect in registers.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-4-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The swap_pages function expects the swap page to be in %r10, but there
was no documentation to that effect. Once upon a time the setup code
used to load its value from a kernel virtual address and save it to an
address which is accessible in the identity-mapped page tables, and
*happened* to use %r10 to do so, with no comment that it was left there
on *purpose* instead of just being a scratch register. Once that was no
longer necessary, %r10 just holds whatever the kernel happened to leave
in it.

Now that the original value passed by the kernel is accessible via
%rip-relative addressing, load directly from there instead of using %r10
for it. But document the other parameters that the swap_pages function
*does* expect in registers.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-4-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Ensure preserve_context flag is set on return to kernel</title>
<updated>2025-01-14T11:52:47+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-01-09T14:04:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4d5f1da98f5e94ecc39d6e7a990db2efe0ae3810'/>
<id>4d5f1da98f5e94ecc39d6e7a990db2efe0ae3810</id>
<content type='text'>
The swap_pages() function will only actually *swap*, as its name implies, if
the preserve_context flag in the %r11 register is non-zero. On the way back
from a ::preserve_context kexec, ensure that the %r11 register is non-zero so
that the pages get swapped back.

Fixes: 9e5683e2d0b5 ("x86/kexec: Only swap pages for ::preserve_context mode")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-3-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The swap_pages() function will only actually *swap*, as its name implies, if
the preserve_context flag in the %r11 register is non-zero. On the way back
from a ::preserve_context kexec, ensure that the %r11 register is non-zero so
that the pages get swapped back.

Fixes: 9e5683e2d0b5 ("x86/kexec: Only swap pages for ::preserve_context mode")
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-3-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Disable global pages before writing to control page</title>
<updated>2025-01-14T11:46:17+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-01-09T14:04:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d144d8a65286fb4a9f06ca80dea6e02e7d846558'/>
<id>d144d8a65286fb4a9f06ca80dea6e02e7d846558</id>
<content type='text'>
The kernel switches to a new set of page tables during kexec. The global
mappings (_PAGE_GLOBAL==1) can remain in the TLB after this switch. This
is generally not a problem because the new page tables use a different
portion of the virtual address space than the normal kernel mappings.

The critical exception to that generalisation (and the only mapping
which isn't an identity mapping) is the kexec control page itself —
which was ROX in the original kernel mapping, but should be RWX in the
new page tables. If there is a global TLB entry for that in its prior
read-only state, it definitely needs to be flushed before attempting to
write through that virtual mapping.

It would be possible to just avoid writing to the virtual address of the
page and defer all writes until they can be done through the identity
mapping. But there's no good reason to keep the old TLB entries around,
as they can cause nothing but trouble.

Clear the PGE bit in %cr4 early, before storing data in the control page.

Fixes: 5a82223e0743 ("x86/kexec: Mark relocate_kernel page as ROX instead of RWX")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219592
Reported-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Reported-by: "Ning, Hongyu" &lt;hongyu.ning@linux.intel.com&gt;
Co-developed-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Tested-by: "Ning, Hongyu" &lt;hongyu.ning@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-2-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kernel switches to a new set of page tables during kexec. The global
mappings (_PAGE_GLOBAL==1) can remain in the TLB after this switch. This
is generally not a problem because the new page tables use a different
portion of the virtual address space than the normal kernel mappings.

The critical exception to that generalisation (and the only mapping
which isn't an identity mapping) is the kexec control page itself —
which was ROX in the original kernel mapping, but should be RWX in the
new page tables. If there is a global TLB entry for that in its prior
read-only state, it definitely needs to be flushed before attempting to
write through that virtual mapping.

It would be possible to just avoid writing to the virtual address of the
page and defer all writes until they can be done through the identity
mapping. But there's no good reason to keep the old TLB entries around,
as they can cause nothing but trouble.

Clear the PGE bit in %cr4 early, before storing data in the control page.

Fixes: 5a82223e0743 ("x86/kexec: Mark relocate_kernel page as ROX instead of RWX")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219592
Reported-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Reported-by: "Ning, Hongyu" &lt;hongyu.ning@linux.intel.com&gt;
Co-developed-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Tested-by: "Ning, Hongyu" &lt;hongyu.ning@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20250109140757.2841269-2-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Fix build regression with CONFIG_KEXEC_JUMP enabled</title>
<updated>2024-12-09T18:13:28+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-12-08T23:53:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aeb68937614f4aeceaaa762bd7f0212ce842b797'/>
<id>aeb68937614f4aeceaaa762bd7f0212ce842b797</id>
<content type='text'>
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

  ld: vmlinux.o: in function `virtual_mapped':
  linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'

when CONFIG_KEXEC_JUMP is enabled.

This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.

Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).

Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Acked-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

  ld: vmlinux.o: in function `virtual_mapped':
  linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'

when CONFIG_KEXEC_JUMP is enabled.

This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.

Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).

Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Acked-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Clean up register usage in relocate_kernel()</title>
<updated>2024-12-06T09:42:00+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2024-12-05T15:05:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=93e489ad7a4694bb2fe8110f5012f85bd3eee65a'/>
<id>93e489ad7a4694bb2fe8110f5012f85bd3eee65a</id>
<content type='text'>
The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%r8 for other things, just leave it in %r8 and use other registers
instead.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20241205153343.3275139-13-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%r8 for other things, just leave it in %r8 and use other registers
instead.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20241205153343.3275139-13-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page</title>
<updated>2024-12-06T09:42:00+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2024-12-05T15:05:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b7155dfd4999211247cce40be2665c71235ab094'/>
<id>b7155dfd4999211247cce40be2665c71235ab094</id>
<content type='text'>
All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, which means the DATA()
macro with its pointer arithmetic can also now be removed.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20241205153343.3275139-12-dwmw2@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, which means the DATA()
macro with its pointer arithmetic can also now be removed.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20241205153343.3275139-12-dwmw2@infradead.org
</pre>
</div>
</content>
</entry>
</feed>
