linux-stable.git/drivers/cxl, branch linux-6.9.y

cxl/region: check interleave capability

2024-07-05T07:38:20+00:00

[ Upstream commit 84328c5acebc10c8cdcf17283ab6c6d548885bfc ]

Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.

In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.

Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.

Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.

Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
  eIW means encoded Interleave Ways.
  eIG means encoded Interleave Granularity.

  in HPA:
  if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
  the interleave bits are none, the following check is ignored.

  if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.

  if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW - 1.

  if the interleave mask is insufficient to cover the required interleave
  bits, the target cannot be attached to the region.

Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao 
Reviewed-by: Dan Williams 
Reviewed-by: Jonathan Cameron 
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang 
Signed-off-by: Sasha Levin

cxl/region: Avoid null pointer dereference in region lookup

2024-07-05T07:38:20+00:00

[ Upstream commit 285f2a08841432fc3e498b1cd00cce5216cdf189 ]

cxl_dpa_to_region() looks up a region based on a memdev and DPA.
It wrongly assumes an endpoint found mapping the DPA is also of
a fully assembled region. When not true it leads to a null pointer
dereference looking up the region name.

This appears during testing of region lookup after a failure to
assemble a BIOS defined region or if the lookup raced with the
assembly of the BIOS defined region.

Failure to clean up BIOS defined regions that fail assembly is an
issue in itself and a fix to that problem will alleviate some of
the impact. It will not alleviate the race condition so let's harden
this path.

The behavior change is that the kernel oops due to a null pointer
dereference is replaced with a dev_dbg() message noting that an
endpoint was mapped.

Additional comments are added so that future users of this function
can more clearly understand what it provides.

Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
Signed-off-by: Alison Schofield 
Reviewed-by: Jonathan Cameron 
Link: https://patch.msgid.link/20240604003609.202682-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang 
Signed-off-by: Sasha Levin

cxl/region: Move cxl_dpa_to_region() work to the region driver

2024-07-05T07:38:19+00:00

[ Upstream commit b98d042698a32518c93e47730e9ad86b387a9c21 ]

This helper belongs in the region driver as it is only useful
with CONFIG_CXL_REGION. Add a stub in core.h for when the region
driver is not built.

Signed-off-by: Alison Schofield 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Ira Weiny 
Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang 
Stable-dep-of: 285f2a088414 ("cxl/region: Avoid null pointer dereference in region lookup")
Signed-off-by: Sasha Levin

cxl/mem: Fix no cxl_nvd during pmem region auto-assembling

2024-07-05T07:38:19+00:00

[ Upstream commit 84ec985944ef34a34a1605b93ce401aa8737af96 ]

When CXL subsystem is auto-assembling a pmem region during cxl
endpoint port probing, always hit below calltrace.

BUG: kernel NULL pointer dereference, address: 0000000000000078
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem]
Call Trace:

? __die+0x24/0x70
? page_fault_oops+0x82/0x160
? do_user_addr_fault+0x65/0x6b0
? exc_page_fault+0x7d/0x170
? asm_exc_page_fault+0x26/0x30
? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem]
? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem]
cxl_bus_probe+0x1b/0x60 [cxl_core]
really_probe+0x173/0x410
? __pfx___device_attach_driver+0x10/0x10
__driver_probe_device+0x80/0x170
driver_probe_device+0x1e/0x90
__device_attach_driver+0x90/0x120
bus_for_each_drv+0x84/0xe0
__device_attach+0xbc/0x1f0
bus_probe_device+0x90/0xa0
device_add+0x51c/0x710
devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core]
cxl_bus_probe+0x1b/0x60 [cxl_core]

The cxl_nvd of the memdev needs to be available during the pmem region
probe. Currently the cxl_nvd is registered after the endpoint port probe.
The endpoint probe, in the case of autoassembly of regions, can cause a
pmem region probe requiring the not yet available cxl_nvd. Adjust the
sequence so this dependency is met.

This requires adding a port parameter to cxl_find_nvdimm_bridge() that
can be used to query the ancestor root port. The endpoint port is not
yet available, but will share a common ancestor with its parent, so
start the query from there instead.

Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Co-developed-by: Dan Williams
Signed-off-by: Dan Williams
Signed-off-by: Li Ming
Tested-by: Alison Schofield
Reviewed-by: Jonathan Cameron
Reviewed-by: Alison Schofield
Link: https://patch.msgid.link/20240612064423.2567625-1-ming4.li@intel.com
Signed-off-by: Dave Jiang
Signed-off-by: Sasha Levin

cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management

2024-07-05T07:38:19+00:00

[ Upstream commit d357dd8ad2f154376e5cb930284e7bf4fe21ffaa ]

A recent bugfix to cxl_pmem_region_alloc() to fix an
error-unwind-memleak [1], highlighted a use case for scope-based resource
management.

Delete the goto for releasing @cxl_region_rwsem, and return error codes
directly from error condition paths.

The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem
directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from
@cxlr was already in place for @cxlr->cxl_nvb, and converting
cxl_pmem_region_alloc() to return an int makes it less awkward to handle
no_free_ptr().

Cc: Li Zhijian 
Reported-by: Jonathan Cameron 
Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com
Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams 
Reviewed-by: Jonathan Cameron 
Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang 
Stable-dep-of: 84ec985944ef ("cxl/mem: Fix no cxl_nvd during pmem region auto-assembling")
Signed-off-by: Sasha Levin

cxl: Add post-reset warning if reset results in loss of previously committed HDM decoders

2024-06-27T11:52:18+00:00

[ Upstream commit 934edcd436dca0447e0d3691a908394ba16d06c3 ]

Secondary Bus Reset (SBR) is equivalent to a device being hot removed and
inserted again. Doing a SBR on a CXL type 3 device is problematic if the
exported device memory is part of system memory that cannot be offlined.
The event is equivalent to violently ripping out that range of memory from
the kernel. While the hardware requires the "Unmask SBR" bit set in the
Port Control Extensions register and the kernel currently does not unmask
it, user can unmask this bit via setpci or similar tool.

The driver does not have a way to detect whether a reset coming from the
PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to
detect is to note if a decoder is marked as enabled in software but the
decoder control register indicates it's not committed.

Add a helper function to find discrepancy between the decoder software
state versus the hardware register state.

Suggested-by: Dan Williams 
Link: https://lore.kernel.org/r/20240502165851.1948523-6-dave.jiang@intel.com
Signed-off-by: Dave Jiang 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Dan Williams 
Signed-off-by: Sasha Levin

cxl/region: Fix memregion leaks in devm_cxl_add_region()

2024-06-21T12:40:15+00:00

[ Upstream commit 49ba7b515c4c0719b866d16f068e62d16a8a3dd1 ]

Move the mode verification to __create_region() before allocating the
memregion to avoid the memregion leaks.

Fixes: 6e099264185d ("cxl/region: Add volatile region creation support")
Signed-off-by: Li Zhijian 
Reviewed-by: Dan Williams 
Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang 
Signed-off-by: Sasha Levin

cxl/region: Fix cxlr_pmem leaks

2024-06-12T09:39:33+00:00

[ Upstream commit 1c987cf22d6b65ade46145c03eef13f0e3e81d83 ]

Before this error path, cxlr_pmem pointed to a kzalloc() memory, free
it to avoid this memory leaking.

Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Signed-off-by: Li Zhijian 
Reviewed-by: Dan Williams 
Reviewed-by: Jonathan Cameron 
Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang 
Signed-off-by: Sasha Levin

cxl/trace: Correct DPA field masks for general_media & dram events

2024-06-12T09:39:33+00:00

[ Upstream commit 2042d11cb57b7e0cbda7910e5ff80e9e8bf0ae17 ]

The length of Physical Address in General Media and DRAM event
records is 64-bit, so the field mask for extracting the DPA should
be 64-bit also, otherwise the trace event reports DPA's with the
upper 32 bits of a DPA address masked off. If users do DPA-to-HPA
translations this could lead to incorrect page retirement decisions.

Use GENMASK_ULL() for CXL_DPA_MASK to get all the DPA address bits.

Tidy up CXL_DPA_FLAGS_MASK by using GENMASK() to only mask the exact
flag bits.

These bits are defined as part of the event record physical address
descriptions of General Media and DRAM events in CXL Spec 3.1
Section 8.2.9.2 Events.

Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
Co-developed-by: Shiyang Ruan 
Signed-off-by: Shiyang Ruan 
Signed-off-by: Alison Schofield 
Reviewed-by: Ira Weiny 
Reviewed-by: Jonathan Cameron 
Link: https://lore.kernel.org/r/2867fc43c57720a4a15a3179431829b8dbd2dc16.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang 
Signed-off-by: Sasha Levin

cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH

2024-04-29T16:03:26+00:00

Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
 [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
 [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]

... plus some related subsequent NULL pointer dereference:

 [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8

The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.

Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.

Reported-by: Robert Richter 
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams 
Tested-by: Robert Richter 
Reviewed-by: Robert Richter 
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang