linux-stable.git/arch/x86/kernel/cpu/mcheck, branch linux-3.16.y

x86/MCE: Remove min interval polling limitation

2018-11-20T18:05:43+00:00

commit fbdb328c6bae0a7c78d75734a738b66b86dffc96 upstream.

commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min
interval limitation when setting the check interval for polled MCEs.
However, the logic is that 0 disables polling for corrected MCEs, see
Documentation/x86/x86_64/machinecheck. The limitation prevents disabling.

Remove this limitation and allow the value 0 to disable polling again.

Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes")
Signed-off-by: Dewet Thibaut 
Signed-off-by: Alexander Sverdlin 
[ Massage commit message. ]
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com
Signed-off-by: Ben Hutchings

x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()

2018-11-20T18:05:24+00:00

commit 1f74c8a64798e2c488f86efc97e308b85fb7d7aa upstream.

mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.

However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.

Which leads to MCE records with an empty status value:

  mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
  mce: [Hardware Error]: RIP 10: {trigger_mce+0x7/0x10}

In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.

Tony: read the rest of the MCA bank to populate the struct mce fully.

Suggested-by: Tony Luck 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

x86/MCE: Save microcode revision in machine check records

2018-06-16T21:22:35+00:00

commit fa94d0c6e0f3431523f5701084d799c77c7d4a4f upstream.

Updating microcode used to be relatively rare. Now that it has become
more common we should save the microcode version in a machine check
record to make sure that those people looking at the error have this
important information bundled with the rest of the logged information.

[ Borislav: Simplify a bit. ]

Signed-off-by: Tony Luck 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Cc: Yazen Ghannam 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com
[bwh: Backported to 3.2:
 - Add other new fields to struct mce, to match upstream UAPI
 - Adjust context]
Signed-off-by: Ben Hutchings

x86/MCE: Serialize sysfs changes

2018-06-16T21:21:33+00:00

commit b3b7c4795ccab5be71f080774c45bbbcc75c2aaf upstream.

The check_interval file in

  /sys/devices/system/machinecheck/machinecheck

directory is a global timer value for MCE polling. If it is changed by one
CPU, mce_restart() broadcasts the event to other CPUs to delete and restart
the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the
mce_timer variable.

If more than one CPU writes a specific value to the check_interval file
concurrently, mce_timer is not protected from such concurrent accesses and
all kinds of explosions happen. Since only root can write to those sysfs
variables, the issue is not a big deal security-wise.

However, concurrent writes to these configuration variables is void of
reason so the proper thing to do is to serialize the access with a mutex.

Boris:

 - Make store_int_with_restart() use device_store_ulong() to filter out
   negative intervals
 - Limit min interval to 1 second
 - Correct locking
 - Massage commit message

Signed-off-by: Seunghun Han 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
Signed-off-by: Ben Hutchings

x86/mce: Make machine check speculation protected

2018-03-03T15:52:26+00:00

commit 6f41c34d69eb005e7848716bbcafc979b35037d5 upstream.

The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.

Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.

Signed-off-by: Thomas Gleixner 
Reviewed-by:Borislav Petkov 
Reviewed-by: David Woodhouse 
Niced-by: Peter Zijlstra 
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
[bwh: Backported to 3.16
 - #include  in mce.c
 - Adjust filename, context]
Signed-off-by: Ben Hutchings

x86: Clean up cr4 manipulation

2018-01-09T00:35:07+00:00

commit 375074cc736ab1d89a708c0a8d7baa4a70d5d476 upstream.

CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski 
Reviewed-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrea Arcangeli 
Cc: Vince Weaver 
Cc: "hillf.zj" 
Cc: Valdis Kletnieks 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Cc: Kees Cook 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

x86/mce: Ensure offline CPUs don't participate in rendezvous process

2016-01-25T10:44:03+00:00

commit d90167a941f62860f35eb960e1012aa2d30e7e94 upstream.

Intel's MCA implementation broadcasts MCEs to all CPUs on the
node. This poses a problem for offlined CPUs which cannot
participate in the rendezvous process:

  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
  Kernel Offset: disabled
  Rebooting in 100 seconds..

More specifically, Linux does a soft offline of a CPU when
writing a 0 to /sys/devices/system/cpu/cpuX/online, which
doesn't prevent the #MC exception from being broadcasted to that
CPU.

Ensure that offline CPUs don't participate in the MCE rendezvous
and clear the RIP valid status bit so that a second MCE won't
cause a shutdown.

Without the patch, mce_start() will increment mce_callin and
wait for all CPUs. Offlined CPUs should avoid participating in
the rendezvous process altogether.

Signed-off-by: Ashok Raj 
[ Massage commit message. ]
Signed-off-by: Borislav Petkov 
Reviewed-by: Tony Luck 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Thomas Gleixner 
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques

x86/mce: Reenable CMCI banks when swiching back to interrupt mode

2015-09-28T09:21:39+00:00

commit 1b48465500611a2dc5e75800c61ac352e22d41c3 upstream.

Zhang Liguang reported the following issue:

1) System detects a CMCI storm on the current CPU.

2) Kernel disables the CMCI interrupt on banks owned by the
   current CPU and switches to poll mode

3) After the CMCI storm subsides, kernel switches back to
   interrupt mode

4) We expect the system to reenable the CMCI interrupt on banks
   owned by the current CPU

   mce_intel_adjust_timer
   |-> cmci_reenable
       |-> cmci_discover     # owned banks are ignored here

  static void cmci_discover(int banks)
	...
	for (i = 0; i < banks; i++) {
		...
		if (test_bit(i, owned))	# ownd banks is ignore here
			continue;

So convert cmci_storm_disable_banks() to
cmci_toggle_interrupt_mode() which controls whether to enable or
disable CMCI interrupts with its argument.

NB: We cannot clear the owned bit because the banks won't be
polled, otherwise. See:

  27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")

for more info.

Reported-by: Zhang Liguang 
Signed-off-by: Xie XiuQi 
Signed-off-by: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: huawei.libin@huawei.com
Cc: linux-edac 
Cc: rui.xiang@huawei.com
Link: http://lkml.kernel.org/r/1439396985-12812-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar 
[ luis: backported to 3.16:
  - use __get_cpu_var() instead of this_cpu_ptr()
  - adjusted context ]
Signed-off-by: Luis Henriques

x86/mce: Fix MCE severity messages

2015-07-09T13:35:56+00:00

commit 17fea54bf0ab34fa09a06bbde2f58ed7bbdf9299 upstream.

Derek noticed that a critical MCE gets reported with the wrong
error type description:

  [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
  [Hardware Error]: RIP !INEXACT! 10: {intel_idle+0xb1/0x170}
  [Hardware Error]: TSC 49587b8e321cb
  [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
  [Hardware Error]: Some CPUs didn't answer in synchronization
  [Hardware Error]: Machine check: Invalid
				   ^^^^^^^

The last line with 'Invalid' should have printed the high level
MCE error type description we get from mce_severity, i.e.
something like:

  [Hardware Error]: Machine check: Action required: data load error in a user process

this happens due to the fact that mce_no_way_out() iterates over
all MCA banks and possibly overwrites the @msg argument which is
used in the panic printing later.

Change behavior to take the message of only and the (last)
critical MCE it detects.

Reported-by: Derek 
Signed-off-by: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar 
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa 
Signed-off-by: Luis Henriques

x86: MCE: Add raw_lock conversion again

2014-09-05T23:36:35+00:00

commit ed5c41d30ef2ce578fd6b6e2f7ec23f2a58b1eba upstream.

Commit ea431643d6c3 ("x86/mce: Fix CMCI preemption bugs") breaks RT by
the completely unrelated conversion of the cmci_discover_lock to a
regular (non raw) spinlock.  This lock was annotated in commit
59d958d2c7de ("locking, x86: mce: Annotate cmci_discover_lock as raw")
with a proper explanation why.

The argument for converting the lock back to a regular spinlock was:

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

Which is complete nonsense.  The raw_spinlock is disabling preemption in
the same way as a regular spinlock.  In mainline spinlock maps to
raw_spinlock, in RT spinlock becomes a "sleeping" lock.

raw_spinlock has on RT exactly the same semantics as in mainline.  And
because this lock is taken in non preemptible context it must be raw on
RT.

Undo the locking brainfart.

Reported-by: Clark Williams 
Reported-by: Steven Rostedt 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman