<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/cxl, branch linux-6.16.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>cxl/edac: Fix wrong dpa checking for PPR operation</title>
<updated>2025-08-15T14:39:06+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming.li@zohomail.com</email>
</author>
<published>2025-07-11T03:23:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9be45061dd610d3e5491ece3c674bbcf95a0a792'/>
<id>9be45061dd610d3e5491ece3c674bbcf95a0a792</id>
<content type='text'>
[ Upstream commit 03ff65c02559e8da32be231d7f10fe899233ceae ]

Per Table 8-143. "Get Partition Info Output Payload" in CXL r3.2 section
8.2.10.9.2.1 "Get Partition Info(Opcode 4100h)", DPA 0 is a valid
address of a CXL device. However, cxl_do_ppr() considers it as an
invalid address, so that user will get an -EINVAL when user calls the
sysfs interface of the edac driver to trigger a Post Package Repair(PPR)
operation for DPA 0 on a CXL device. The correct implementation should
be checking if the input DPA is in the DPA range of the CXL device.

Fixes: be9b359e056a ("cxl/edac: Add CXL memory device soft PPR control feature")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Link: https://patch.msgid.link/20250711032357.127355-3-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 03ff65c02559e8da32be231d7f10fe899233ceae ]

Per Table 8-143. "Get Partition Info Output Payload" in CXL r3.2 section
8.2.10.9.2.1 "Get Partition Info(Opcode 4100h)", DPA 0 is a valid
address of a CXL device. However, cxl_do_ppr() considers it as an
invalid address, so that user will get an -EINVAL when user calls the
sysfs interface of the edac driver to trigger a Post Package Repair(PPR)
operation for DPA 0 on a CXL device. The correct implementation should
be checking if the input DPA is in the DPA range of the CXL device.

Fixes: be9b359e056a ("cxl/edac: Add CXL memory device soft PPR control feature")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Link: https://patch.msgid.link/20250711032357.127355-3-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/core: Introduce a new helper cxl_resource_contains_addr()</title>
<updated>2025-08-15T14:39:06+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming.li@zohomail.com</email>
</author>
<published>2025-07-11T03:23:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6c0d5ba3500b0a06b506f123128820ac71abfc33'/>
<id>6c0d5ba3500b0a06b506f123128820ac71abfc33</id>
<content type='text'>
[ Upstream commit 5b6031c832c2747d58d3f0130098d965ef050b9a ]

In CXL subsystem, many functions need to check an address availability
by checking if the resource range contains the address. Providing a new
helper function cxl_resource_contains_addr() to check if the resource
range contains the input address.

Suggested-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://patch.msgid.link/20250711032357.127355-2-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Stable-dep-of: 03ff65c02559 ("cxl/edac: Fix wrong dpa checking for PPR operation")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 5b6031c832c2747d58d3f0130098d965ef050b9a ]

In CXL subsystem, many functions need to check an address availability
by checking if the resource range contains the address. Providing a new
helper function cxl_resource_contains_addr() to check if the resource
range contains the input address.

Suggested-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://patch.msgid.link/20250711032357.127355-2-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Stable-dep-of: 03ff65c02559 ("cxl/edac: Fix wrong dpa checking for PPR operation")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/edac: Fix using wrong repair type to check dram event record</title>
<updated>2025-06-25T19:05:45+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming.li@zohomail.com</email>
</author>
<published>2025-06-20T05:29:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0a46f60a9fe16f5596b6b4b3ee1a483ea7854136'/>
<id>0a46f60a9fe16f5596b6b4b3ee1a483ea7854136</id>
<content type='text'>
cxl_find_rec_dram() is used to find a DRAM event record based on the
inputted attributes. Different repair_type of the inputted attributes
will check the DRAM event record in different ways.
When EDAC driver is performing a memory rank sparing, it should use
CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM
event record checking.

Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cxl_find_rec_dram() is used to find a DRAM event record based on the
inputted attributes. Different repair_type of the inputted attributes
will check the DRAM event record in different ways.
When EDAC driver is performing a memory rank sparing, it should use
CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM
event record checking.

Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/ras: Fix CPER handler device confusion</title>
<updated>2025-06-13T16:02:04+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2025-06-12T19:20:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3c70ec71abdaf4e4fa48cd8fdfbbd864d78235a8'/>
<id>3c70ec71abdaf4e4fa48cd8fdfbbd864d78235a8</id>
<content type='text'>
By inspection, cxl_cper_handle_prot_err() is making a series of fragile
assumptions that can lead to crashes:

1/ It assumes that endpoints identified in the record are a CXL-type-3
   device, nothing guarantees that.

2/ It assumes that the device is bound to the cxl_pci driver, nothing
   guarantees that.

3/ Minor, it holds the device lock over the switch-port tracing for no
   reason as the trace is 100% generated from data in the record.

Correct those by checking that the PCIe endpoint parents a cxl_memdev
before assuming the format of the driver data, and move the lock to where
it is required. Consequently this also makes the implementation ready for
CXL accelerators that are not bound to cxl_pci.

Fixes: 36f257e3b0ba ("acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors")
Cc: Terry Bowman &lt;terry.bowman@amd.com&gt;
Cc: Li Ming &lt;ming.li@zohomail.com&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Li Ming &lt;ming.li@zohomail.com&gt;
Link: https://patch.msgid.link/20250612192043.2254617-1-dan.j.williams@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By inspection, cxl_cper_handle_prot_err() is making a series of fragile
assumptions that can lead to crashes:

1/ It assumes that endpoints identified in the record are a CXL-type-3
   device, nothing guarantees that.

2/ It assumes that the device is bound to the cxl_pci driver, nothing
   guarantees that.

3/ Minor, it holds the device lock over the switch-port tracing for no
   reason as the trace is 100% generated from data in the record.

Correct those by checking that the PCIe endpoint parents a cxl_memdev
before assuming the format of the driver data, and move the lock to where
it is required. Consequently this also makes the implementation ready for
CXL accelerators that are not bound to cxl_pci.

Fixes: 36f257e3b0ba ("acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors")
Cc: Terry Bowman &lt;terry.bowman@amd.com&gt;
Cc: Li Ming &lt;ming.li@zohomail.com&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Li Ming &lt;ming.li@zohomail.com&gt;
Link: https://patch.msgid.link/20250612192043.2254617-1-dan.j.williams@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/edac: Fix potential memory leak issues</title>
<updated>2025-06-13T15:45:30+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming.li@zohomail.com</email>
</author>
<published>2025-06-13T01:16:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a403fe6c0b17f472e01246eb350f5eef105243ac'/>
<id>a403fe6c0b17f472e01246eb350f5eef105243ac</id>
<content type='text'>
In cxl_store_rec_gen_media() and cxl_store_rec_dram(), use kmemdup() to
duplicate a cxl gen_media/dram event to store the event in a xarray by
xa_store(). The cxl gen_media/dram event allocated by kmemdup() should
be freed in the case that the xa_store() fails.

Fixes: 0b5ccb0de1e2 ("cxl/edac: Support for finding memory operation attributes from the current boot")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Link: https://patch.msgid.link/20250613011648.102840-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In cxl_store_rec_gen_media() and cxl_store_rec_dram(), use kmemdup() to
duplicate a cxl gen_media/dram event to store the event in a xarray by
xa_store(). The cxl gen_media/dram event allocated by kmemdup() should
be freed in the case that the xa_store() fails.

Fixes: 0b5ccb0de1e2 ("cxl/edac: Support for finding memory operation attributes from the current boot")
Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Tested-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Link: https://patch.msgid.link/20250613011648.102840-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/edac: Fix the min_scrub_cycle of a region miscalculation</title>
<updated>2025-06-10T20:54:15+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming.li@zohomail.com</email>
</author>
<published>2025-06-03T10:43:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fdc9be90929081ed5a9658583aa5f3121d5e8365'/>
<id>fdc9be90929081ed5a9658583aa5f3121d5e8365</id>
<content type='text'>
When trying to update the scrub_cycle value of a cxl region, which means
updating the scrub_cycle value of each memdev under a cxl region. cxl
driver needs to guarantee the new scrub_cycle value is greater than the
min_scrub_cycle value of a memdev, otherwise the updating operation will
fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).

Current implementation logic of getting the min_scrub_cycle value of a
cxl region is that getting the min_scrub_cycle value of each memdevs
under the cxl region, then using the minimum min_scrub_cycle value as
the region's min_scrub_cycle. Checking if the new scrub_cycle value is
greater than this value. If yes, updating the new scrub_cycle value to
each memdevs. The issue is that the new scrub_cycle value is possibly
greater than the minimum min_scrub_cycle value of all memdevs but less
than the maximum min_scrub_cycle value of all memdevs if memdevs have
a different min_scrub_cycle value. The updating operation will always
fail on these memdevs which have a greater min_scrub_cycle than the new
scrub_cycle.

The correct implementation logic is to get the maximum value of these
memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater
than the value. If yes, the new scrub_cycle value is fit for the region.

The change also impacts the result of
cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
minimum min_scrub_cycle value among all memdevs under the region before
the change. The interface will return the maximum min_scrub_cycle value
among all memdevs under the region with the change.

Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: https://patch.msgid.link/20250603104314.25569-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When trying to update the scrub_cycle value of a cxl region, which means
updating the scrub_cycle value of each memdev under a cxl region. cxl
driver needs to guarantee the new scrub_cycle value is greater than the
min_scrub_cycle value of a memdev, otherwise the updating operation will
fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).

Current implementation logic of getting the min_scrub_cycle value of a
cxl region is that getting the min_scrub_cycle value of each memdevs
under the cxl region, then using the minimum min_scrub_cycle value as
the region's min_scrub_cycle. Checking if the new scrub_cycle value is
greater than this value. If yes, updating the new scrub_cycle value to
each memdevs. The issue is that the new scrub_cycle value is possibly
greater than the minimum min_scrub_cycle value of all memdevs but less
than the maximum min_scrub_cycle value of all memdevs if memdevs have
a different min_scrub_cycle value. The updating operation will always
fail on these memdevs which have a greater min_scrub_cycle than the new
scrub_cycle.

The correct implementation logic is to get the maximum value of these
memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater
than the value. If yes, the new scrub_cycle value is fit for the region.

The change also impacts the result of
cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
minimum min_scrub_cycle value among all memdevs under the region before
the change. The interface will return the maximum min_scrub_cycle value
among all memdevs under the region with the change.

Signed-off-by: Li Ming &lt;ming.li@zohomail.com&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: https://patch.msgid.link/20250603104314.25569-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl: fix return value in cxlctl_validate_set_features()</title>
<updated>2025-06-09T16:18:15+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2025-05-28T08:11:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=87b42c114cdda76c8ad3002f2096699ad5146cb3'/>
<id>87b42c114cdda76c8ad3002f2096699ad5146cb3</id>
<content type='text'>
The cxlctl_validate_set_features() function is type bool.  It's supposed
to return true for valid requests and false for invalid.  However, this
error path returns ERR_PTR(-EINVAL) which is true when it was intended to
return false.

The incorrect return will result in kernel failing to prevent a
incorrect op_size passed in from userspace to be detected.

[ dj: Add user impact to commit log ]

Fixes: f76e0bbc8bc3 ("cxl: Update prototype of function get_support_feature_info()")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/aDbFPSCujpJLY1if@stanley.mountain
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cxlctl_validate_set_features() function is type bool.  It's supposed
to return true for valid requests and false for invalid.  However, this
error path returns ERR_PTR(-EINVAL) which is true when it was intended to
return false.

The incorrect return will result in kernel failing to prevent a
incorrect op_size passed in from userspace to be detected.

[ dj: Add user impact to commit log ]

Fixes: f76e0bbc8bc3 ("cxl: Update prototype of function get_support_feature_info()")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/aDbFPSCujpJLY1if@stanley.mountain
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-6.16/cxl-features-ras' into cxl-for-next</title>
<updated>2025-05-23T20:26:24+00:00</updated>
<author>
<name>Dave Jiang</name>
<email>dave.jiang@intel.com</email>
</author>
<published>2025-05-23T20:26:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9f153b7fb5ae45c7d426851f896487927f40e501'/>
<id>9f153b7fb5ae45c7d426851f896487927f40e501</id>
<content type='text'>
Add CXL RAS Features support. Features include "patrol scrub control",
"error check scrub", "perform maintenance", and "memory sparing". This
support connects the RAS Featurs to EDAC.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add CXL RAS Features support. Features include "patrol scrub control",
"error check scrub", "perform maintenance", and "memory sparing". This
support connects the RAS Featurs to EDAC.
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/edac: Add CXL memory device soft PPR control feature</title>
<updated>2025-05-23T20:25:06+00:00</updated>
<author>
<name>Shiju Jose</name>
<email>shiju.jose@huawei.com</email>
</author>
<published>2025-05-21T12:47:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=be9b359e056a78bb6cc2e17cf457338f6aef57f9'/>
<id>be9b359e056a78bb6cc2e17cf457338f6aef57f9</id>
<content type='text'>
Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.

Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/

Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.

Note: Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3

Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://patch.msgid.link/20250521124749.817-9-shiju.jose@huawei.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.

Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/

Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.

Note: Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3

Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://patch.msgid.link/20250521124749.817-9-shiju.jose@huawei.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/edac: Add CXL memory device memory sparing control feature</title>
<updated>2025-05-23T20:24:53+00:00</updated>
<author>
<name>Shiju Jose</name>
<email>shiju.jose@huawei.com</email>
</author>
<published>2025-05-21T12:47:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=588ca944c27729c7f950d1f44c6d6700a919969a'/>
<id>588ca944c27729c7f950d1f44c6d6700a919969a</id>
<content type='text'>
Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.

Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.

The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.

During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.

CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.

Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/

Use case
========
1. CXL device identifies a failure in a memory component, report to
   userspace in a CXL DRAM trace event with DPA and other attributes of
   memory to repair such as channel, rank, nibble mask, bank Group,
   bank, row, column, sub-channel.

2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
 - 'maintenance needed' flag set in the event record.
 - 'threshold event' flag set for CVME threshold feature.
 - When the number of corrected error reported on a CXL.mem media to the
   userspace exceeds the threshold value for corrected error count defined
   by the userspace policy.

3. Rasdaemon process the memory sparing trace event and issue repair
   request for memory sparing.

Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.

Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.

Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.

Note: Tested memory sparing feature control with QEMU patch
      "hw/cxl: Add emulation for memory sparing control feature"
      https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6

Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.

Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.

The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.

During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.

CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.

Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/

Use case
========
1. CXL device identifies a failure in a memory component, report to
   userspace in a CXL DRAM trace event with DPA and other attributes of
   memory to repair such as channel, rank, nibble mask, bank Group,
   bank, row, column, sub-channel.

2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
 - 'maintenance needed' flag set in the event record.
 - 'threshold event' flag set for CVME threshold feature.
 - When the number of corrected error reported on a CXL.mem media to the
   userspace exceeds the threshold value for corrected error count defined
   by the userspace policy.

3. Rasdaemon process the memory sparing trace event and issue repair
   request for memory sparing.

Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.

Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.

Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.

Note: Tested memory sparing feature control with QEMU patch
      "hw/cxl: Add emulation for memory sparing control feature"
      https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6

Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
