linux.git/arch/x86/kernel/cpu/microcode, branch v6.8

x86/microcode/intel: Set new revision only after a successful update

2023-12-03T10:49:53+00:00

This was meant to be done only when early microcode got updated
successfully. Move it into the if-branch.

Also, make sure the current revision is read unconditionally and only
once.

Fixes: 080990aa3344 ("x86/microcode: Rework early revisions reporting")
Reported-by: Ashok Raj 
Signed-off-by: Borislav Petkov (AMD) 
Tested-by: Ashok Raj 
Link: https://lore.kernel.org/r/ZWjVt5dNRjbcvlzR@a4bf019067fa.jf.intel.com

x86/microcode/intel: Remove redundant microcode late updated message

2023-12-01T17:52:01+00:00

After successful update, the late loading routine prints an update
summary similar to:

  microcode: load: updated on 128 primary CPUs with 128 siblings
  microcode: revision: 0x21000170 -> 0x21000190

Remove the redundant message in the Intel side of the driver.

  [ bp: Massage commit message. ]

Signed-off-by: Ashok Raj 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/ZWjYhedNfhAUmt0k@a4bf019067fa.jf.intel.com

x86/microcode: Rework early revisions reporting

2023-11-21T15:35:48+00:00

The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.

So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.

What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.

So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.

Then, dump them only once on driver init.

On Intel, do not dump the patch date - it is not needed.

Reported-by: Linus Torvalds 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com

x86/microcode: Remove the driver announcement and version

2023-11-21T15:20:49+00:00

First of all, the print is useless. The driver will either load and say
which microcode revision the machine has or issue an error.

Then, the version number is meaningless and actively confusing, as Yazen
mentioned recently: when a subset of patches are backported to a distro
kernel, one can't assume the driver version is the same as the upstream
one. And besides, the version number of the loader hasn't been used and
incremented for a long time. So drop it.

Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de

x86/microcode/intel: Add a minimum required revision for late loading

2023-10-24T13:05:55+00:00

In general users, don't have the necessary information to determine
whether late loading of a new microcode version is safe and does not
modify anything which the currently running kernel uses already, e.g.
removal of CPUID bits or behavioural changes of MSRs.

To address this issue, Intel has added a "minimum required version"
field to a previously reserved field in the microcode header.  Microcode
updates should only be applied if the current microcode version is equal
to, or greater than this minimum required version.

Thomas made some suggestions on how meta-data in the microcode file could
provide Linux with information to decide if the new microcode is suitable
candidate for late loading. But even the "simpler" option requires a lot of
metadata and corresponding kernel code to parse it, so the final suggestion
was to add the 'minimum required version' field in the header.

When microcode changes visible features, microcode will set the minimum
required version to its own revision which prevents late loading.

Old microcode blobs have the minimum revision field always set to 0, which
indicates that there is no information and the kernel considers it
unsafe.

This is a pure OS software mechanism. The hardware/firmware ignores this
header field.

For early loading there is no restriction because OS visible features
are enumerated after the early load and therefore a change has no
effect.

The check is always enabled, but by default not enforced. It can be
enforced via Kconfig or kernel command line.

If enforced, the kernel refuses to late load microcode with a minimum
required version field which is zero or when the currently loaded
microcode revision is smaller than the minimum required revision.

If not enforced the load happens independent of the revision check to
stay compatible with the existing behaviour, but it influences the
decision whether the kernel is tainted or not. If the check signals that
the late load is safe, then the kernel is not tainted.

Early loading is not affected by this.

[ tglx: Massaged changelog and fixed up the implementation ]

Suggested-by: Thomas Gleixner 
Signed-off-by: Ashok Raj 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231002115903.776467264@linutronix.de

x86/microcode: Prepare for minimal revision check

2023-10-24T13:05:55+00:00

Applying microcode late can be fatal for the running kernel when the
update changes functionality which is in use already in a non-compatible
way, e.g. by removing a CPUID bit.

There is no way for admins which do not have access to the vendors deep
technical support to decide whether late loading of such a microcode is
safe or not.

Intel has added a new field to the microcode header which tells the
minimal microcode revision which is required to be active in the CPU in
order to be safe.

Provide infrastructure for handling this in the core code and a command
line switch which allows to enforce it.

If the update is considered safe the kernel is not tainted and the annoying
warning message not emitted. If it's enforced and the currently loaded
microcode revision is not safe for late loading then the load is aborted.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de

x86/microcode: Handle "offline" CPUs correctly

2023-10-24T13:05:55+00:00

Offline CPUs need to be parked in a safe loop when microcode update is
in progress on the primary CPU. Currently, offline CPUs are parked in
mwait_play_dead(), and for Intel CPUs, its not a safe instruction,
because the MWAIT instruction can be patched in the new microcode update
that can cause instability.

  - Add a new microcode state 'UCODE_OFFLINE' to report status on per-CPU
  basis.
  - Force NMI on the offline CPUs.

Wake up offline CPUs while the update is in progress and then return
them back to mwait_play_dead() after microcode update is complete.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231002115903.660850472@linutronix.de

x86/microcode: Protect against instrumentation

2023-10-24T13:05:55+00:00

The wait for control loop in which the siblings are waiting for the
microcode update on the primary thread must be protected against
instrumentation as instrumentation can end up in #INT3, #DB or #PF,
which then returns with IRET. That IRET reenables NMI which is the
opposite of what the NMI rendezvous is trying to achieve.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231002115903.545969323@linutronix.de

x86/microcode: Rendezvous and load in NMI

2023-10-24T13:05:55+00:00

stop_machine() does not prevent the spin-waiting sibling from handling
an NMI, which is obviously violating the whole concept of rendezvous.

Implement a static branch right in the beginning of the NMI handler
which is nopped out except when enabled by the late loading mechanism.

The late loader enables the static branch before stop_machine() is
invoked. Each CPU has an nmi_enable in its control structure which
indicates whether the CPU should go into the update routine.

This is required to bridge the gap between enabling the branch and
actually being at the point where it is required to enter the loader
wait loop.

Each CPU which arrives in the stopper thread function sets that flag and
issues a self NMI right after that. If the NMI function sees the flag
clear, it returns. If it's set it clears the flag and enters the
rendezvous.

This is safe against a real NMI which hits in between setting the flag
and sending the NMI to itself. The real NMI will be swallowed by the
microcode update and the self NMI will then let stuff continue.
Otherwise this would end up with a spurious NMI.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231002115903.489900814@linutronix.de

x86/microcode: Replace the all-in-one rendevous handler

2023-10-24T13:05:55+00:00

with a new handler which just separates the control flow of primary and
secondary CPUs.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20231002115903.433704135@linutronix.de