<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/edac, branch v6.6.78</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>EDAC/amd64: Simplify ECC check on unified memory controllers</title>
<updated>2024-12-27T12:58:50+00:00</updated>
<author>
<name>Borislav Petkov (AMD)</name>
<email>bp@alien8.de</email>
</author>
<published>2024-12-11T11:07:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3e84704feefefdad6b7721c4f08091e8b4aa6ad5'/>
<id>3e84704feefefdad6b7721c4f08091e8b4aa6ad5</id>
<content type='text'>
commit 747367340ca6b5070728b86ae36ad6747f66b2fb upstream.

The intent of the check is to see whether at least one UMC has ECC
enabled. So do that instead of tracking which ones are enabled in masks
which are too small in size anyway and lead to not loading the driver on
Zen4 machines with UMCs enabled over UMC8.

Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
Reported-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Reviewed-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Cc: &lt;stable@kernel.org&gt;
Link: https://lore.kernel.org/r/20241210212054.3895697-1-avadhut.naik@amd.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 747367340ca6b5070728b86ae36ad6747f66b2fb upstream.

The intent of the check is to see whether at least one UMC has ECC
enabled. So do that instead of tracking which ones are enabled in masks
which are too small in size anyway and lead to not loading the driver on
Zen4 machines with UMCs enabled over UMC8.

Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
Reported-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Reviewed-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Cc: &lt;stable@kernel.org&gt;
Link: https://lore.kernel.org/r/20241210212054.3895697-1-avadhut.naik@amd.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/igen6: Avoid segmentation fault on module unload</title>
<updated>2024-12-09T09:31:48+00:00</updated>
<author>
<name>Orange Kao</name>
<email>orange@aiven.io</email>
</author>
<published>2024-11-04T12:40:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=830cabb61113d92a425dd3038ccedbdfb3c8d079'/>
<id>830cabb61113d92a425dd3038ccedbdfb3c8d079</id>
<content type='text'>
[ Upstream commit fefaae90398d38a1100ccd73b46ab55ff4610fba ]

The segmentation fault happens because:

During modprobe:
1. In igen6_probe(), igen6_pvt will be allocated with kzalloc()
2. In igen6_register_mci(), mci-&gt;pvt_info will point to
   &amp;igen6_pvt-&gt;imc[mc]

During rmmod:
1. In mci_release() in edac_mc.c, it will kfree(mci-&gt;pvt_info)
2. In igen6_remove(), it will kfree(igen6_pvt);

Fix this issue by setting mci-&gt;pvt_info to NULL to avoid the double
kfree.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360
Signed-off-by: Orange Kao &lt;orange@aiven.io&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit fefaae90398d38a1100ccd73b46ab55ff4610fba ]

The segmentation fault happens because:

During modprobe:
1. In igen6_probe(), igen6_pvt will be allocated with kzalloc()
2. In igen6_register_mci(), mci-&gt;pvt_info will point to
   &amp;igen6_pvt-&gt;imc[mc]

During rmmod:
1. In mci_release() in edac_mc.c, it will kfree(mci-&gt;pvt_info)
2. In igen6_remove(), it will kfree(igen6_pvt);

Fix this issue by setting mci-&gt;pvt_info to NULL to avoid the double
kfree.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360
Signed-off-by: Orange Kao &lt;orange@aiven.io&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator</title>
<updated>2024-12-09T09:31:47+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2024-10-15T07:22:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d9338b781fe9342e9e3c87c998d8260519da8753'/>
<id>d9338b781fe9342e9e3c87c998d8260519da8753</id>
<content type='text'>
[ Upstream commit a36667037a0c0e36c59407f8ae636295390239a5 ]

The Granite Rapids CPUs with Flat2LM memory configurations may
mistakenly report near-memory errors as far-memory errors, resulting
in the invalid decoded ADXL results:

  EDAC skx: Bad imc -1

Fix this incorrect far-memory error source indicator by prefetching the
decoded far-memory controller ID, and adjust the error source indicator
to near-memory if the far-memory controller ID is invalid.

Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Diego Garcia Rodriguez &lt;diego.garcia.rodriguez@intel.com&gt;
Link: https://lore.kernel.org/r/20241015072236.24543-3-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a36667037a0c0e36c59407f8ae636295390239a5 ]

The Granite Rapids CPUs with Flat2LM memory configurations may
mistakenly report near-memory errors as far-memory errors, resulting
in the invalid decoded ADXL results:

  EDAC skx: Bad imc -1

Fix this incorrect far-memory error source indicator by prefetching the
decoded far-memory controller ID, and adjust the error source indicator
to near-memory if the far-memory controller ID is invalid.

Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Diego Garcia Rodriguez &lt;diego.garcia.rodriguez@intel.com&gt;
Link: https://lore.kernel.org/r/20241015072236.24543-3-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/skx_common: Differentiate memory error sources</title>
<updated>2024-12-09T09:31:47+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2024-10-15T07:22:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=edf58d4bebc3a2cf2d3ed5e26967f65aa40a813d'/>
<id>edf58d4bebc3a2cf2d3ed5e26967f65aa40a813d</id>
<content type='text'>
[ Upstream commit 2397f795735219caa9c2fe61e7bcdd0652e670d3 ]

The current skx_common determines whether the memory error source is the
near memory of the 2LM system and then retrieves the decoded error results
from the ADXL components (near-memory vs. far-memory) accordingly.

However, some memory controllers may have limitations in correctly
reporting the memory error source, leading to the retrieval of incorrect
decoded parts from the ADXL.

To address these limitations, instead of simply determining whether the
memory error is from the near memory of the 2LM system, it is necessary to
distinguish the memory error source details as follows:

  Memory error from the near memory of the 2LM system.
  Memory error from the far memory of the 2LM system.
  Memory error from the 1LM system.
  Not a memory error.

This will enable the i10nm_edac driver to take appropriate actions for
those memory controllers that have limitations in reporting the memory
error source.

Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Diego Garcia Rodriguez &lt;diego.garcia.rodriguez@intel.com&gt;
Link: https://lore.kernel.org/r/20241015072236.24543-2-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2397f795735219caa9c2fe61e7bcdd0652e670d3 ]

The current skx_common determines whether the memory error source is the
near memory of the 2LM system and then retrieves the decoded error results
from the ADXL components (near-memory vs. far-memory) accordingly.

However, some memory controllers may have limitations in correctly
reporting the memory error source, leading to the retrieval of incorrect
decoded parts from the ADXL.

To address these limitations, instead of simply determining whether the
memory error is from the near memory of the 2LM system, it is necessary to
distinguish the memory error source details as follows:

  Memory error from the near memory of the 2LM system.
  Memory error from the far memory of the 2LM system.
  Memory error from the 1LM system.
  Not a memory error.

This will enable the i10nm_edac driver to take appropriate actions for
those memory controllers that have limitations in reporting the memory
error source.

Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Diego Garcia Rodriguez &lt;diego.garcia.rodriguez@intel.com&gt;
Link: https://lore.kernel.org/r/20241015072236.24543-2-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/fsl_ddr: Fix bad bit shift operations</title>
<updated>2024-12-09T09:31:47+00:00</updated>
<author>
<name>Priyanka Singh</name>
<email>priyanka.singh@nxp.com</email>
</author>
<published>2024-10-16T20:31:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=63a2d940c719fbded9d8fef94b5fb6a4933447f9'/>
<id>63a2d940c719fbded9d8fef94b5fb6a4933447f9</id>
<content type='text'>
[ Upstream commit 9ec22ac4fe766c6abba845290d5139a3fbe0153b ]

Fix undefined behavior caused by left-shifting a negative value in the
expression:

    cap_high ^ (1 &lt;&lt; (bad_data_bit - 32))

The variable bad_data_bit ranges from 0 to 63. When it is less than 32,
bad_data_bit - 32 becomes negative, and left-shifting by a negative
value in C is undefined behavior.

Fix this by combining cap_high and cap_low into a 64-bit variable.

  [ bp: Massage commit message, simplify error bits handling. ]

Fixes: ea2eb9a8b620 ("EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx")
Signed-off-by: Priyanka Singh &lt;priyanka.singh@nxp.com&gt;
Signed-off-by: Li Yang &lt;leoyang.li@nxp.com&gt;
Signed-off-by: Frank Li &lt;Frank.Li@nxp.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20241016-imx95_edac-v3-3-86ae6fc2756a@nxp.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 9ec22ac4fe766c6abba845290d5139a3fbe0153b ]

Fix undefined behavior caused by left-shifting a negative value in the
expression:

    cap_high ^ (1 &lt;&lt; (bad_data_bit - 32))

The variable bad_data_bit ranges from 0 to 63. When it is less than 32,
bad_data_bit - 32 becomes negative, and left-shifting by a negative
value in C is undefined behavior.

Fix this by combining cap_high and cap_low into a 64-bit variable.

  [ bp: Massage commit message, simplify error bits handling. ]

Fixes: ea2eb9a8b620 ("EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx")
Signed-off-by: Priyanka Singh &lt;priyanka.singh@nxp.com&gt;
Signed-off-by: Li Yang &lt;leoyang.li@nxp.com&gt;
Signed-off-by: Frank Li &lt;Frank.Li@nxp.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20241016-imx95_edac-v3-3-86ae6fc2756a@nxp.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/bluefield: Fix potential integer overflow</title>
<updated>2024-12-09T09:31:47+00:00</updated>
<author>
<name>David Thompson</name>
<email>davthompson@nvidia.com</email>
</author>
<published>2024-09-30T15:10:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ac6ebb9edcdb7077e841862c402697c4c48a7c0a'/>
<id>ac6ebb9edcdb7077e841862c402697c4c48a7c0a</id>
<content type='text'>
[ Upstream commit 1fe774a93b46bb029b8f6fa9d1f25affa53f06c6 ]

The 64-bit argument for the "get DIMM info" SMC call consists of mem_ctrl_idx
left-shifted 16 bits and OR-ed with DIMM index.  With mem_ctrl_idx defined as
32-bits wide the left-shift operation truncates the upper 16 bits of
information during the calculation of the SMC argument.

The mem_ctrl_idx stack variable must be defined as 64-bits wide to prevent any
potential integer overflow, i.e. loss of data from upper 16 bits.

Fixes: 82413e562ea6 ("EDAC, mellanox: Add ECC support for BlueField DDR4")
Signed-off-by: David Thompson &lt;davthompson@nvidia.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shravan Kumar Ramani &lt;shravankr@nvidia.com&gt;
Link: https://lore.kernel.org/r/20240930151056.10158-1-davthompson@nvidia.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 1fe774a93b46bb029b8f6fa9d1f25affa53f06c6 ]

The 64-bit argument for the "get DIMM info" SMC call consists of mem_ctrl_idx
left-shifted 16 bits and OR-ed with DIMM index.  With mem_ctrl_idx defined as
32-bits wide the left-shift operation truncates the upper 16 bits of
information during the calculation of the SMC argument.

The mem_ctrl_idx stack variable must be defined as 64-bits wide to prevent any
potential integer overflow, i.e. loss of data from upper 16 bits.

Fixes: 82413e562ea6 ("EDAC, mellanox: Add ECC support for BlueField DDR4")
Signed-off-by: David Thompson &lt;davthompson@nvidia.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shravan Kumar Ramani &lt;shravankr@nvidia.com&gt;
Link: https://lore.kernel.org/r/20240930151056.10158-1-davthompson@nvidia.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/igen6: Fix conversion of system address to physical memory address</title>
<updated>2024-10-04T14:29:56+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2024-08-14T06:10:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e0be8f2d64d65c8b7a93f10b25c5631abc892661'/>
<id>e0be8f2d64d65c8b7a93f10b25c5631abc892661</id>
<content type='text'>
commit 0ad875f442e95d69a1145a38aabac2fd29984fe3 upstream.

The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0ad875f442e95d69a1145a38aabac2fd29984fe3 upstream.

The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/synopsys: Fix error injection on Zynq UltraScale+</title>
<updated>2024-10-04T14:28:49+00:00</updated>
<author>
<name>Shubhrajyoti Datta</name>
<email>shubhrajyoti.datta@amd.com</email>
</author>
<published>2024-07-11T10:06:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3e360e00d21badf3c19da63110020cd03f60415'/>
<id>b3e360e00d21badf3c19da63110020cd03f60415</id>
<content type='text'>
[ Upstream commit 35e6dbfe1846caeafabb49b7575adb36b0aa2269 ]

The Zynq UltraScale+ MPSoC DDR has a disjoint memory from 2GB to 32GB.
The DDR host interface has a contiguous memory so while injecting
errors, the driver should remove the hole else the injection fails as
the address translation is incorrect.

Introduce a get_mem_info() function pointer and set it for Zynq
UltraScale+ platform to return host address.

Fixes: 1a81361f75d8 ("EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller")
Signed-off-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240711100656.31376-1-shubhrajyoti.datta@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 35e6dbfe1846caeafabb49b7575adb36b0aa2269 ]

The Zynq UltraScale+ MPSoC DDR has a disjoint memory from 2GB to 32GB.
The DDR host interface has a contiguous memory so while injecting
errors, the driver should remove the hole else the injection fails as
the address translation is incorrect.

Introduce a get_mem_info() function pointer and set it for Zynq
UltraScale+ platform to return host address.

Fixes: 1a81361f75d8 ("EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller")
Signed-off-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240711100656.31376-1-shubhrajyoti.datta@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/synopsys: Fix ECC status and IRQ control race condition</title>
<updated>2024-10-04T14:28:48+00:00</updated>
<author>
<name>Serge Semin</name>
<email>fancer.lancer@gmail.com</email>
</author>
<published>2024-02-22T18:12:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=23752ababd7243622b4643926e85fd78d14a8392'/>
<id>23752ababd7243622b4643926e85fd78d14a8392</id>
<content type='text'>
[ Upstream commit 591c946675d88dcc0ae9ff54be9d5caaee8ce1e3 ]

The race condition around the ECCCLR register access happens in the IRQ
disable method called in the device remove() procedure and in the ECC IRQ
handler:

  1. Enable IRQ:
     a. ECCCLR = EN_CE | EN_UE
  2. Disable IRQ:
     a. ECCCLR = 0
  3. IRQ handler:
     a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
     b. ECCCLR = 0
     c. ECCCLR = EN_CE | EN_UE

So if the IRQ disabling procedure is called concurrently with the IRQ
handler method the IRQ might be actually left enabled due to the
statement 3c.

The root cause of the problem is that ECCCLR register (which since
v3.10a has been called as ECCCTL) has intermixed ECC status data clear
flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN
flags) and handling (write 1 to clear ECC status data) procedures must
be serialised around the ECCCTL register modification to prevent the
race.

So fix the problem described above by adding the spin-lock around the
ECCCLR modifications and preventing the IRQ-handler from modifying the
IRQs enable flags (there is no point in disabling the IRQ and then
re-enabling it again within a single IRQ handler call, see the
statements 3a/3b and 3c above).

Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Serge Semin &lt;fancer.lancer@gmail.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 591c946675d88dcc0ae9ff54be9d5caaee8ce1e3 ]

The race condition around the ECCCLR register access happens in the IRQ
disable method called in the device remove() procedure and in the ECC IRQ
handler:

  1. Enable IRQ:
     a. ECCCLR = EN_CE | EN_UE
  2. Disable IRQ:
     a. ECCCLR = 0
  3. IRQ handler:
     a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
     b. ECCCLR = 0
     c. ECCCLR = EN_CE | EN_UE

So if the IRQ disabling procedure is called concurrently with the IRQ
handler method the IRQ might be actually left enabled due to the
statement 3c.

The root cause of the problem is that ECCCLR register (which since
v3.10a has been called as ECCCTL) has intermixed ECC status data clear
flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN
flags) and handling (write 1 to clear ECC status data) procedures must
be serialised around the ECCCTL register modification to prevent the
race.

So fix the problem described above by adding the spin-lock around the
ECCCLR modifications and preventing the IRQ-handler from modifying the
IRQs enable flags (there is no point in disabling the IRQ and then
re-enabling it again within a single IRQ handler call, see the
statements 3a/3b and 3c above).

Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Serge Semin &lt;fancer.lancer@gmail.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>EDAC/skx_common: Allow decoding of SGX addresses</title>
<updated>2024-08-29T15:33:41+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2024-04-08T12:04:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6019283e1e35dbc28967536970746f519d44a6eb'/>
<id>6019283e1e35dbc28967536970746f519d44a6eb</id>
<content type='text'>
[ Upstream commit e0d335077831196bffe6a634ffe385fc684192ca ]

There are no "struct page" associations with SGX pages, causing the check
pfn_to_online_page() to fail. This results in the inability to decode the
SGX addresses and warning messages like:

  Invalid address 0x34cc9a98840 in IA32_MC17_ADDR

Add an additional check to allow the decoding of the error address and to
skip the warning message, if the error address is an SGX address.

Fixes: 1e92af09fab1 ("EDAC/skx_common: Filter out the invalid address")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit e0d335077831196bffe6a634ffe385fc684192ca ]

There are no "struct page" associations with SGX pages, causing the check
pfn_to_online_page() to fail. This results in the inability to decode the
SGX addresses and warning messages like:

  Invalid address 0x34cc9a98840 in IA32_MC17_ADDR

Add an additional check to allow the decoding of the error address and to
skip the warning message, if the error address is an SGX address.

Fixes: 1e92af09fab1 ("EDAC/skx_common: Filter out the invalid address")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
