<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/kernel/mce.c, branch v5.6</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>powerpc/64s/pseries: machine check convert to use common event code</title>
<updated>2019-08-30T00:32:35+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2019-08-02T10:56:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9ca766f9891d23743b4e1a7b1cafdc63723cd6a7'/>
<id>9ca766f9891d23743b4e1a7b1cafdc63723cd6a7</id>
<content type='text'>
The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.

This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.

This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.

This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.

This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s/powernv: machine check dump SLB contents</title>
<updated>2019-08-30T00:32:35+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2019-08-02T10:56:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7290f3b3d3e66b54720f23079ffc60e0b7bbb0cc'/>
<id>7290f3b3d3e66b54720f23079ffc60e0b7bbb0cc</id>
<content type='text'>
Re-use the code introduced in pseries to save and dump the contents
of the SLB in the case of an SLB involved machine check exception.

This patch also avoids allocating the SLB save array on pseries radix.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Re-use the code introduced in pseries to save and dump the contents
of the SLB in the case of an SLB involved machine check exception.

This patch also avoids allocating the SLB save array on pseries radix.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/mce: Handle UE event for memcpy_mcsafe</title>
<updated>2019-08-21T12:23:48+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>bsingharora@gmail.com</email>
</author>
<published>2019-08-20T08:13:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=895e3dceeb97855dc9990136cbb80a842fe581aa'/>
<id>895e3dceeb97855dc9990136cbb80a842fe581aa</id>
<content type='text'>
If we take a UE on one of the instructions with a fixup entry, set nip
to continue execution at the fixup entry. Stop processing the event
further or print it.

Co-developed-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-6-santosh@fossix.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we take a UE on one of the instructions with a fixup entry, set nip
to continue execution at the fixup entry. Stop processing the event
further or print it.

Co-developed-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-6-santosh@fossix.org
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/mce: Make machine_check_ue_event() static</title>
<updated>2019-08-21T12:23:47+00:00</updated>
<author>
<name>Reza Arbab</name>
<email>arbab@linux.ibm.com</email>
</author>
<published>2019-08-20T08:13:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a1715f516fd7fcfedffee5978ec4c07c5164470'/>
<id>1a1715f516fd7fcfedffee5978ec4c07c5164470</id>
<content type='text'>
The function doesn't get used outside this file, so make it static.

Signed-off-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-4-santosh@fossix.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function doesn't get used outside this file, so make it static.

Signed-off-by: Reza Arbab &lt;arbab@linux.ibm.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-4-santosh@fossix.org
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/mce: Schedule work from irq_work</title>
<updated>2019-08-21T12:23:03+00:00</updated>
<author>
<name>Santosh Sivaraj</name>
<email>santosh@fossix.org</email>
</author>
<published>2019-08-20T08:13:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5bda6263cad9a927e1a4edb7493d542da0c1410'/>
<id>b5bda6263cad9a927e1a4edb7493d542da0c1410</id>
<content type='text'>
schedule_work() cannot be called from MCE exception context as MCE can
interrupt even in interrupt disabled context.

Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors")
Cc: stable@vger.kernel.org # v4.15+
Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
schedule_work() cannot be called from MCE exception context as MCE can
interrupt even in interrupt disabled context.

Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors")
Cc: stable@vger.kernel.org # v4.15+
Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156</title>
<updated>2019-05-30T18:26:35+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-27T06:55:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a59d1b8e05ea6ab45f7e18897de1ef0e6bc3da6'/>
<id>1a59d1b8e05ea6ab45f7e18897de1ef0e6bc3da6</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details you
  should have received a copy of the gnu general public license along
  with this program if not write to the free software foundation inc
  59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Richard Fontana &lt;rfontana@redhat.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details you
  should have received a copy of the gnu general public license along
  with this program if not write to the free software foundation inc
  59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Richard Fontana &lt;rfontana@redhat.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/powernv/mce: Print additional information about MCE error.</title>
<updated>2019-05-01T12:23:20+00:00</updated>
<author>
<name>Mahesh Salgaonkar</name>
<email>mahesh@linux.vnet.ibm.com</email>
</author>
<published>2019-04-29T18:16:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=50dbabe06a6e1c35980154ea1fac2ed6ad28652b'/>
<id>50dbabe06a6e1c35980154ea1fac2ed6ad28652b</id>
<content type='text'>
Print more information about MCE error whether it is an hardware or
software error.

Some of the MCE errors can be easily categorized as hardware or
software errors e.g. UEs are due to hardware error, where as error
triggered due to invalid usage of tlbie is a pure software bug. But
not all the MCE errors can be easily categorize into either software
or hardware. There are errors like multihit errors which are usually
result of a software bug, but in some rare cases a hardware failure
can cause a multihit error. In past, we have seen case where after
replacing faulty chip, multihit errors stopped occurring. Same with
parity errors, which are usually due to faulty hardware but there are
chances where multihit can also cause an parity error. Such errors are
difficult to determine what really caused it. Hence this patch
classifies MCE errors into following four categorize:

  1. Hardware error:
  	UE and Link timeout failure errors.
  2. Probable hardware error (some chance of software cause)
  	SLB/ERAT/TLB Parity errors.
  3. Software error
  	Invalid tlbie form.
  4. Probable software error (some chance of hardware cause)
  	SLB/ERAT/TLB Multihit errors.

Sample output:

  MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
  MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
  MCE: CPU80: Probable Software error (some chance of hardware cause)

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Print more information about MCE error whether it is an hardware or
software error.

Some of the MCE errors can be easily categorized as hardware or
software errors e.g. UEs are due to hardware error, where as error
triggered due to invalid usage of tlbie is a pure software bug. But
not all the MCE errors can be easily categorize into either software
or hardware. There are errors like multihit errors which are usually
result of a software bug, but in some rare cases a hardware failure
can cause a multihit error. In past, we have seen case where after
replacing faulty chip, multihit errors stopped occurring. Same with
parity errors, which are usually due to faulty hardware but there are
chances where multihit can also cause an parity error. Such errors are
difficult to determine what really caused it. Hence this patch
classifies MCE errors into following four categorize:

  1. Hardware error:
  	UE and Link timeout failure errors.
  2. Probable hardware error (some chance of software cause)
  	SLB/ERAT/TLB Parity errors.
  3. Software error
  	Invalid tlbie form.
  4. Probable software error (some chance of hardware cause)
  	SLB/ERAT/TLB Multihit errors.

Sample output:

  MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
  MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
  MCE: CPU80: Probable Software error (some chance of hardware cause)

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/powernv/mce: Print correct severity for MCE error.</title>
<updated>2019-05-01T12:22:51+00:00</updated>
<author>
<name>Mahesh Salgaonkar</name>
<email>mahesh@linux.vnet.ibm.com</email>
</author>
<published>2019-04-29T18:15:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cda6618d060b5e8afc93e691d4bcd987f3dd4393'/>
<id>cda6618d060b5e8afc93e691d4bcd987f3dd4393</id>
<content type='text'>
Currently all machine check errors are printed as severe errors which
isn't correct. Print soft errors as warning instead of severe errors.

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently all machine check errors are printed as severe errors which
isn't correct. Print soft errors as warning instead of severe errors.

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/powernv/mce: Reduce MCE console logs to lesser lines.</title>
<updated>2019-05-01T12:22:24+00:00</updated>
<author>
<name>Mahesh Salgaonkar</name>
<email>mahesh@linux.vnet.ibm.com</email>
</author>
<published>2019-04-29T18:15:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d6e8a150850601277039a548ffcdddd1bfe3e365'/>
<id>d6e8a150850601277039a548ffcdddd1bfe3e365</id>
<content type='text'>
Also add cpu number while displaying MCE log. This will help cleaner
logs when MCE hits on multiple cpus simultaneously.

Before the changes the MCE output was:

  Severe Machine check interrupt [Recovered]
    NIP [d00000000ba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: d00000000ba80280

After this patch series changes the MCE output will be:

  MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered]
  MCE: CPU80: NIP: [d00000000b550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
  MCE: CPU80: Probable software error (some chance of hardware cause)

UE in host application:

  MCE: CPU48: machine check (Severe) Host UE Load/Store DAR: 00007fffc6079a80 paddr: 0000000f8e260000 [Not recovered]
  MCE: CPU48: PID: 4584 Comm: find NIP: [0000000010023368]
  MCE: CPU48: Hardware error

and for MCE in Guest:

  MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
  MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
  MCE: CPU80: Probable software error (some chance of hardware cause)

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also add cpu number while displaying MCE log. This will help cleaner
logs when MCE hits on multiple cpus simultaneously.

Before the changes the MCE output was:

  Severe Machine check interrupt [Recovered]
    NIP [d00000000ba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: d00000000ba80280

After this patch series changes the MCE output will be:

  MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered]
  MCE: CPU80: NIP: [d00000000b550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
  MCE: CPU80: Probable software error (some chance of hardware cause)

UE in host application:

  MCE: CPU48: machine check (Severe) Host UE Load/Store DAR: 00007fffc6079a80 paddr: 0000000f8e260000 [Not recovered]
  MCE: CPU48: PID: 4584 Comm: find NIP: [0000000010023368]
  MCE: CPU48: Hardware error

and for MCE in Guest:

  MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
  MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
  MCE: CPU80: Probable software error (some chance of hardware cause)

Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: Fix HV NMI vs HV interrupt recoverability test</title>
<updated>2019-02-26T12:28:24+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2019-02-26T08:51:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ccd477028a202993b9ddca5d2404fdaca3b7a55c'/>
<id>ccd477028a202993b9ddca5d2404fdaca3b7a55c</id>
<content type='text'>
HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.

This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.

Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.

This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.

Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
