<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/tools/testing/cxl, branch linux-6.11.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>cxl/port: Fix use-after-free, permit out-of-order decoder shutdown</title>
<updated>2024-11-08T15:30:56+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2024-10-23T01:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=78c8454fdce0eeee962be004eb6d99860c80dad1'/>
<id>78c8454fdce0eeee962be004eb6d99860c80dad1</id>
<content type='text'>
commit 101c268bd2f37e965a5468353e62d154db38838e upstream.

In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:

    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
    cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1)  cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
    [..]
    cxld_unregister: cxl decoder14.0:
    cxl_region_decode_reset: cxl_region region3:
    mock_decoder_reset: cxl_port port3: decoder3.0 reset
2)  mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
    cxl_endpoint_decoder_release: cxl decoder14.0:
    [..]
    cxld_unregister: cxl decoder7.0:
3)  cxl_region_decode_reset: cxl_region region3:
    Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
    [..]
    RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
    [..]
    Call Trace:
     &lt;TASK&gt;
     cxl_region_decode_reset+0x69/0x190 [cxl_core]
     cxl_region_detach+0xe8/0x210 [cxl_core]
     cxl_decoder_kill_region+0x27/0x40 [cxl_core]
     cxld_unregister+0x5d/0x60 [cxl_core]

At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.

The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.

In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.

A new device_for_each_child_reverse_from() is added to cleanup
port-&gt;commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0-&gt;1-&gt;2 and disabled 1-&gt;2-&gt;0 then
port-&gt;commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.

Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Zijun Hu &lt;quic_zijuhu@quicinc.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 101c268bd2f37e965a5468353e62d154db38838e upstream.

In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:

    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
    cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1)  cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
    [..]
    cxld_unregister: cxl decoder14.0:
    cxl_region_decode_reset: cxl_region region3:
    mock_decoder_reset: cxl_port port3: decoder3.0 reset
2)  mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
    cxl_endpoint_decoder_release: cxl decoder14.0:
    [..]
    cxld_unregister: cxl decoder7.0:
3)  cxl_region_decode_reset: cxl_region region3:
    Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
    [..]
    RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
    [..]
    Call Trace:
     &lt;TASK&gt;
     cxl_region_decode_reset+0x69/0x190 [cxl_core]
     cxl_region_detach+0xe8/0x210 [cxl_core]
     cxl_decoder_kill_region+0x27/0x40 [cxl_core]
     cxld_unregister+0x5d/0x60 [cxl_core]

At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.

The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.

In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.

A new device_for_each_child_reverse_from() is added to cleanup
port-&gt;commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0-&gt;1-&gt;2 and disabled 1-&gt;2-&gt;0 then
port-&gt;commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.

Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Zijun Hu &lt;quic_zijuhu@quicinc.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/test: Skip cxl_setup_parent_dport() for emulated dports</title>
<updated>2024-08-09T22:14:12+00:00</updated>
<author>
<name>Li Ming</name>
<email>ming4.li@intel.com</email>
</author>
<published>2024-08-09T08:27:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2c402bd2e85b44dc00ef85b5c0e217de684b5372'/>
<id>2c402bd2e85b44dc00ef85b5c0e217de684b5372</id>
<content type='text'>
The cxl_test unit test environment on qemu always hits below call trace
with KASAN enabled:

 BUG: KASAN: slab-out-of-bounds in cxl_setup_parent_dport+0x480/0x530 [cxl_core]
 Read of size 1 at addr ff110000676014f8 by task (udev-worker)/676[   24.424403] CPU: 2 PID: 676 Comm: (udev-worker) Tainted: G           O     N 6.10.0-qemucxl #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20240214-2.el9 02/14/2024
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0xea/0x150
  print_report+0xce/0x610
  ? kasan_complete_mode_report_info+0x40/0x200
  kasan_report+0xcc/0x110
  __asan_report_load1_noabort+0x18/0x20
  cxl_setup_parent_dport+0x480/0x530 [cxl_core]
  cxl_mem_probe+0x49b/0xaa0 [cxl_mem]

cxl_test module models a CXL topology for testing, it creates some
emulated dports with platform devices in the CXL topology, so the
dport_dev of an emulated dport points to a platform device rather than a
pci device or a pci host bridge in the case. Currently,
cxl_setup_parent_dport() is used to set up RAS and AER capability on the
dport connected to the CXL memory device, but cxl_test does not support
RAS or AER functionality yet, so the fix is implementing a
__wrap_cxl_setup_parent_dport() to filter out all emulated dports,
guarantees only real dports can be handled by cxl_setup_parent_dport().

Fixes: f05fd10d138d ("cxl/pci: Add RCH downstream port AER register discovery")
Reported-by: Pengfei Xu &lt;pengfei.xu@intel.com&gt;
Closes: https://lore.kernel.org/linux-cxl/ZrHTBp2O+HtUe6kt@xpf.sh.intel.com/T/#t
Signed-off-by: Li Ming &lt;ming4.li@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Tested-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Tested-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://patch.msgid.link/20240809082750.3015641-3-ming4.li@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cxl_test unit test environment on qemu always hits below call trace
with KASAN enabled:

 BUG: KASAN: slab-out-of-bounds in cxl_setup_parent_dport+0x480/0x530 [cxl_core]
 Read of size 1 at addr ff110000676014f8 by task (udev-worker)/676[   24.424403] CPU: 2 PID: 676 Comm: (udev-worker) Tainted: G           O     N 6.10.0-qemucxl #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20240214-2.el9 02/14/2024
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0xea/0x150
  print_report+0xce/0x610
  ? kasan_complete_mode_report_info+0x40/0x200
  kasan_report+0xcc/0x110
  __asan_report_load1_noabort+0x18/0x20
  cxl_setup_parent_dport+0x480/0x530 [cxl_core]
  cxl_mem_probe+0x49b/0xaa0 [cxl_mem]

cxl_test module models a CXL topology for testing, it creates some
emulated dports with platform devices in the CXL topology, so the
dport_dev of an emulated dport points to a platform device rather than a
pci device or a pci host bridge in the case. Currently,
cxl_setup_parent_dport() is used to set up RAS and AER capability on the
dport connected to the CXL memory device, but cxl_test does not support
RAS or AER functionality yet, so the fix is implementing a
__wrap_cxl_setup_parent_dport() to filter out all emulated dports,
guarantees only real dports can be handled by cxl_setup_parent_dport().

Fixes: f05fd10d138d ("cxl/pci: Add RCH downstream port AER register discovery")
Reported-by: Pengfei Xu &lt;pengfei.xu@intel.com&gt;
Closes: https://lore.kernel.org/linux-cxl/ZrHTBp2O+HtUe6kt@xpf.sh.intel.com/T/#t
Signed-off-by: Li Ming &lt;ming4.li@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Tested-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Tested-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://patch.msgid.link/20240809082750.3015641-3-ming4.li@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl</title>
<updated>2024-07-28T16:33:28+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-28T16:33:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e62f81bbd24db746c9b1aa29e7b6423211262ac4'/>
<id>e62f81bbd24db746c9b1aa29e7b6423211262ac4</id>
<content type='text'>
Pull CXL updates from Dave Jiang:
 "Core:

   - A CXL maturity map has been added to the documentation to detail
     the current state of CXL enabling.

     It provides the status of the current state of various CXL features
     to inform current and future contributors of where things are and
     which areas need contribution.

   - A notifier handler has been added in order for a newly created CXL
     memory region to trigger the abstract distance metrics calculation.

     This should bring parity for CXL memory to the same level vs
     hotplugged DRAM for NUMA abstract distance calculation. The
     abstract distance reflects relative performance used for memory
     tiering handling.

   - An addition for XOR math has been added to address the CXL DPA to
     SPA translation.

     CXL address translation did not support address interleave math
     with XOR prior to this change.

  Fixes:

   - Fix to address race condition in the CXL memory hotplug notifier

   - Add missing MODULE_DESCRIPTION() for CXL modules

   - Fix incorrect vendor debug UUID define

  Misc:

   - A warning has been added to inform users of an unsupported
     configuration when mixing CXL VH and RCH/RCD hierarchies

   - The ENXIO error code has been replaced with EBUSY for inject poison
     limit reached via debugfs and cxl-test support

   - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
     unnecessary PCI config reads

   - A refactor to a common struct for DRAM and general media CXL
     events"

* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/core/pci: Move reading of control register to immediately before usage
  cxl: Remove defunct code calculating host bridge target positions
  cxl/region: Verify target positions using the ordered target list
  cxl: Restore XOR'd position bits during address translation
  cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
  cxl/core: Fix incorrect vendor debug UUID define
  Documentation: CXL Maturity Map
  cxl/region: Simplify cxl_region_nid()
  cxl/region: Support to calculate memory tier abstract distance
  cxl/region: Fix a race condition in memory hotplug notifier
  cxl: add missing MODULE_DESCRIPTION() macros
  cxl/events: Use a common struct for DRAM and General Media events
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull CXL updates from Dave Jiang:
 "Core:

   - A CXL maturity map has been added to the documentation to detail
     the current state of CXL enabling.

     It provides the status of the current state of various CXL features
     to inform current and future contributors of where things are and
     which areas need contribution.

   - A notifier handler has been added in order for a newly created CXL
     memory region to trigger the abstract distance metrics calculation.

     This should bring parity for CXL memory to the same level vs
     hotplugged DRAM for NUMA abstract distance calculation. The
     abstract distance reflects relative performance used for memory
     tiering handling.

   - An addition for XOR math has been added to address the CXL DPA to
     SPA translation.

     CXL address translation did not support address interleave math
     with XOR prior to this change.

  Fixes:

   - Fix to address race condition in the CXL memory hotplug notifier

   - Add missing MODULE_DESCRIPTION() for CXL modules

   - Fix incorrect vendor debug UUID define

  Misc:

   - A warning has been added to inform users of an unsupported
     configuration when mixing CXL VH and RCH/RCD hierarchies

   - The ENXIO error code has been replaced with EBUSY for inject poison
     limit reached via debugfs and cxl-test support

   - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
     unnecessary PCI config reads

   - A refactor to a common struct for DRAM and general media CXL
     events"

* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/core/pci: Move reading of control register to immediately before usage
  cxl: Remove defunct code calculating host bridge target positions
  cxl/region: Verify target positions using the ordered target list
  cxl: Restore XOR'd position bits during address translation
  cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
  cxl/core: Fix incorrect vendor debug UUID define
  Documentation: CXL Maturity Map
  cxl/region: Simplify cxl_region_nid()
  cxl/region: Support to calculate memory tier abstract distance
  cxl/region: Fix a race condition in memory hotplug notifier
  cxl: add missing MODULE_DESCRIPTION() macros
  cxl/events: Use a common struct for DRAM and General Media events
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/test: Replace ENXIO with EBUSY for inject poison limit reached</title>
<updated>2024-07-11T00:12:42+00:00</updated>
<author>
<name>Alison Schofield</name>
<email>alison.schofield@intel.com</email>
</author>
<published>2024-07-07T01:53:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3a8617c7df6eb351227aad9b0df647f34a7ef423'/>
<id>3a8617c7df6eb351227aad9b0df647f34a7ef423</id>
<content type='text'>
The CXL driver was recently updated to return EBUSY rather than
ENXIO when the device reports that an injection request exceeds
the device's limit. That change to EBUSY allows debug users to
differentiate between limit reached and inject failures for any
other reason.

Change cxl-test to also return EBUSY and tidy up the dev_dbg()
messaging to emit the correct limit.

Reminder: the cxl-test per device injection limit is a configurable
attribute: /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max

Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Tested-by: Xingtao Yao &lt;yaoxt.fnst@fujitsu.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: https://patch.msgid.link/ba1b80e1658b644d85d0d5e2287112d00a48b9cf.1720316188.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The CXL driver was recently updated to return EBUSY rather than
ENXIO when the device reports that an injection request exceeds
the device's limit. That change to EBUSY allows debug users to
differentiate between limit reached and inject failures for any
other reason.

Change cxl-test to also return EBUSY and tidy up the dev_dbg()
messaging to emit the correct limit.

Reminder: the cxl-test per device injection limit is a configurable
attribute: /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max

Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Tested-by: Xingtao Yao &lt;yaoxt.fnst@fujitsu.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: https://patch.msgid.link/ba1b80e1658b644d85d0d5e2287112d00a48b9cf.1720316188.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/events: Use a common struct for DRAM and General Media events</title>
<updated>2024-07-02T19:52:25+00:00</updated>
<author>
<name>Fabio M. De Francesco</name>
<email>fabio.m.de.francesco@linux.intel.com</email>
</author>
<published>2024-06-07T14:43:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=675e979db473d08be346a3190c5f0db095a57153'/>
<id>675e979db473d08be346a3190c5f0db095a57153</id>
<content type='text'>
cxl_event_common was an unfortunate naming choice and caused confusion with
the existing Common Event Record. Furthermore, its fields didn't map all
the common information between DRAM and General Media Events.

Remove cxl_event_common and introduce cxl_event_media_hdr to record common
information between DRAM and General Media events.

cxl_event_media_hdr, which is embedded in both cxl_event_gen_media and
cxl_event_dram, leverages the commonalities between the two events to
simplify their respective handling.

Suggested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Fabio M. De Francesco &lt;fabio.m.de.francesco@linux.intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://lore.kernel.org/r/20240607144423.48681-1-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cxl_event_common was an unfortunate naming choice and caused confusion with
the existing Common Event Record. Furthermore, its fields didn't map all
the common information between DRAM and General Media Events.

Remove cxl_event_common and introduce cxl_event_media_hdr to record common
information between DRAM and General Media events.

cxl_event_media_hdr, which is embedded in both cxl_event_gen_media and
cxl_event_dram, leverages the commonalities between the two events to
simplify their respective handling.

Suggested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Fabio M. De Francesco &lt;fabio.m.de.francesco@linux.intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://lore.kernel.org/r/20240607144423.48681-1-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/region: check interleave capability</title>
<updated>2024-06-25T21:45:27+00:00</updated>
<author>
<name>Yao Xingtao</name>
<email>yaoxt.fnst@fujitsu.com</email>
</author>
<published>2024-06-14T08:47:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=84328c5acebc10c8cdcf17283ab6c6d548885bfc'/>
<id>84328c5acebc10c8cdcf17283ab6c6d548885bfc</id>
<content type='text'>
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.

In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.

Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.

Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.

Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
  eIW means encoded Interleave Ways.
  eIG means encoded Interleave Granularity.

  in HPA:
  if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
  the interleave bits are none, the following check is ignored.

  if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.

  if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW - 1.

  if the interleave mask is insufficient to cover the required interleave
  bits, the target cannot be attached to the region.

Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao &lt;yaoxt.fnst@fujitsu.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.

In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.

Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.

Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.

Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
  eIW means encoded Interleave Ways.
  eIG means encoded Interleave Granularity.

  in HPA:
  if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
  the interleave bits are none, the following check is ignored.

  if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.

  if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
  start at bit position eIG + 8 and end at eIG + eIW - 1.

  if the interleave mask is insufficient to cover the required interleave
  bits, the target cannot be attached to the region.

Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao &lt;yaoxt.fnst@fujitsu.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c</title>
<updated>2024-05-28T23:08:04+00:00</updated>
<author>
<name>Dave Jiang</name>
<email>dave.jiang@intel.com</email>
</author>
<published>2024-05-28T22:55:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d55510527153d17a3af8cc2df69c04f95ae1350d'/>
<id>d55510527153d17a3af8cc2df69c04f95ae1350d</id>
<content type='text'>
tools/testing/cxl/test/mem.c uses vmalloc() and vfree() but does not
include linux/vmalloc.h. Kernel v6.10 made changes that causes the
currently included headers not depend on vmalloc.h and therefore
mem.c can no longer compile. Add linux/vmalloc.h to fix compile
issue.

  CC [M]  tools/testing/cxl/test/mem.o
tools/testing/cxl/test/mem.c: In function ‘label_area_release’:
tools/testing/cxl/test/mem.c:1428:9: error: implicit declaration of function ‘vfree’; did you mean ‘kvfree’? [-Werror=implicit-function-declaration]
 1428 |         vfree(lsa);
      |         ^~~~~
      |         kvfree
tools/testing/cxl/test/mem.c: In function ‘cxl_mock_mem_probe’:
tools/testing/cxl/test/mem.c:1466:22: error: implicit declaration of function ‘vmalloc’; did you mean ‘kmalloc’? [-Werror=implicit-function-declaration]
 1466 |         mdata-&gt;lsa = vmalloc(LSA_SIZE);
      |                      ^~~~~~~
      |                      kmalloc

Fixes: 7d3eb23c4ccf ("tools/testing/cxl: Introduce a mock memory device + driver")
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://lore.kernel.org/r/20240528225551.1025977-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tools/testing/cxl/test/mem.c uses vmalloc() and vfree() but does not
include linux/vmalloc.h. Kernel v6.10 made changes that causes the
currently included headers not depend on vmalloc.h and therefore
mem.c can no longer compile. Add linux/vmalloc.h to fix compile
issue.

  CC [M]  tools/testing/cxl/test/mem.o
tools/testing/cxl/test/mem.c: In function ‘label_area_release’:
tools/testing/cxl/test/mem.c:1428:9: error: implicit declaration of function ‘vfree’; did you mean ‘kvfree’? [-Werror=implicit-function-declaration]
 1428 |         vfree(lsa);
      |         ^~~~~
      |         kvfree
tools/testing/cxl/test/mem.c: In function ‘cxl_mock_mem_probe’:
tools/testing/cxl/test/mem.c:1466:22: error: implicit declaration of function ‘vmalloc’; did you mean ‘kmalloc’? [-Werror=implicit-function-declaration]
 1466 |         mdata-&gt;lsa = vmalloc(LSA_SIZE);
      |                      ^~~~~~~
      |                      kmalloc

Fixes: 7d3eb23c4ccf ("tools/testing/cxl: Introduce a mock memory device + driver")
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Link: https://lore.kernel.org/r/20240528225551.1025977-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl</title>
<updated>2024-05-15T21:32:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-05-15T21:32:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2e9250022e9f2c9cde3b98fd26dcad1c2a9aedf3'/>
<id>2e9250022e9f2c9cde3b98fd26dcad1c2a9aedf3</id>
<content type='text'>
Pull CXL updates from Dave Jiang:

 - Three CXL mailbox passthrough commands are added to support the
   populating and clearing of vendor debug logs:
     - Get Log Capabilities
     - Get Supported Log Sub-List Commands
     - Clear Log

 - Add support of Device Phyiscal Address (DPA) to Host Physical Address
   (HPA) translation for CXL events of cxl_dram and cxl_general media.

   This allows user space to figure out which CXL region the event
   occured via trace event.

 - Connect CXL to CPER reporting.

   If a device is configured for firmware first, CXL event records are
   not sent directly to the host. Those records are reported through EFI
   Common Platform Error Records (CPER). Add support to route the CPER
   records through the CXL sub-system in order to provide DPA to HPA
   translation and also event decoding and tracing. This is useful for
   users to determine which system issues may correspond to specific
   hardware events.

 - A number of misc cleanups and fixes:
     - Fix for compile warning of cxl_security_ops
     - Add debug message for invalid interleave granularity
     - Enhancement to cxl-test event testing
     - Add dev_warn() on unsupported mixed mode decoder
     - Fix use of phys_to_target_node() for x86
     - Use helper function for decoder enum instead of open coding
     - Include missing headers for cxl-event
     - Fix MAINTAINERS file entry
     - Fix cxlr_pmem memory leak
     - Cleanup __cxl_parse_cfmws via scope-based resource menagement
     - Convert cxl_pmem_region_alloc() to scope-based resource management

* tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
  cxl/cper: Remove duplicated GUID defines
  cxl/cper: Fix non-ACPI-APEI-GHES build
  cxl/pci: Process CPER events
  acpi/ghes: Process CXL Component Events
  cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management
  cxl/acpi: Cleanup __cxl_parse_cfmws()
  cxl/region: Fix cxlr_pmem leaks
  cxl/core: Add region info to cxl_general_media and cxl_dram events
  cxl/region: Move cxl_trace_hpa() work to the region driver
  cxl/region: Move cxl_dpa_to_region() work to the region driver
  cxl/trace: Correct DPA field masks for general_media &amp; dram events
  MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK
  cxl/cxl-event: include missing &lt;linux/types.h&gt; and &lt;linux/uuid.h&gt;
  cxl/hdm: Debug, use decoder name function
  cxl: Fix use of phys_to_target_node() for x86
  cxl/hdm: dev_warn() on unsupported mixed mode decoder
  cxl/test: Enhance event testing
  cxl/hdm: Add debug message for invalid interleave granularity
  cxl: Fix compile warning for cxl_security_ops extern
  cxl/mbox: Add Clear Log mailbox command
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull CXL updates from Dave Jiang:

 - Three CXL mailbox passthrough commands are added to support the
   populating and clearing of vendor debug logs:
     - Get Log Capabilities
     - Get Supported Log Sub-List Commands
     - Clear Log

 - Add support of Device Phyiscal Address (DPA) to Host Physical Address
   (HPA) translation for CXL events of cxl_dram and cxl_general media.

   This allows user space to figure out which CXL region the event
   occured via trace event.

 - Connect CXL to CPER reporting.

   If a device is configured for firmware first, CXL event records are
   not sent directly to the host. Those records are reported through EFI
   Common Platform Error Records (CPER). Add support to route the CPER
   records through the CXL sub-system in order to provide DPA to HPA
   translation and also event decoding and tracing. This is useful for
   users to determine which system issues may correspond to specific
   hardware events.

 - A number of misc cleanups and fixes:
     - Fix for compile warning of cxl_security_ops
     - Add debug message for invalid interleave granularity
     - Enhancement to cxl-test event testing
     - Add dev_warn() on unsupported mixed mode decoder
     - Fix use of phys_to_target_node() for x86
     - Use helper function for decoder enum instead of open coding
     - Include missing headers for cxl-event
     - Fix MAINTAINERS file entry
     - Fix cxlr_pmem memory leak
     - Cleanup __cxl_parse_cfmws via scope-based resource menagement
     - Convert cxl_pmem_region_alloc() to scope-based resource management

* tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
  cxl/cper: Remove duplicated GUID defines
  cxl/cper: Fix non-ACPI-APEI-GHES build
  cxl/pci: Process CPER events
  acpi/ghes: Process CXL Component Events
  cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management
  cxl/acpi: Cleanup __cxl_parse_cfmws()
  cxl/region: Fix cxlr_pmem leaks
  cxl/core: Add region info to cxl_general_media and cxl_dram events
  cxl/region: Move cxl_trace_hpa() work to the region driver
  cxl/region: Move cxl_dpa_to_region() work to the region driver
  cxl/trace: Correct DPA field masks for general_media &amp; dram events
  MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK
  cxl/cxl-event: include missing &lt;linux/types.h&gt; and &lt;linux/uuid.h&gt;
  cxl/hdm: Debug, use decoder name function
  cxl: Fix use of phys_to_target_node() for x86
  cxl/hdm: dev_warn() on unsupported mixed mode decoder
  cxl/test: Enhance event testing
  cxl/hdm: Add debug message for invalid interleave granularity
  cxl: Fix compile warning for cxl_security_ops extern
  cxl/mbox: Add Clear Log mailbox command
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/test: Enhance event testing</title>
<updated>2024-04-30T17:43:48+00:00</updated>
<author>
<name>Ira Weiny</name>
<email>ira.weiny@intel.com</email>
</author>
<published>2024-04-02T05:31:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=364ee9f3265e486772f445f437273a9c94b66d4b'/>
<id>364ee9f3265e486772f445f437273a9c94b66d4b</id>
<content type='text'>
An issue was found in the processing of event logs when the output
buffer length was not reset.[1]

This bug was not caught with cxl-test for 2 reasons.  First, the test
harness mbox_send command [mock_get_event()] does not set the output
size based on the amount of data returned like the hardware command
does.  Second, the simplistic event log testing always returned the same
number of elements per-get command.

Enhance the simulation of the event log mailbox to better match the bug
found with real hardware to cover potential regressions.

NOTE: These changes will cause cxl-events.sh in ndctl to fail without
the fix from Kwangjin.  However, no changes to the user space test was
required.  Therefore ndctl itself will be compatible with old or new
kernels once both patches land in the new kernel.

[1] Link: https://lore.kernel.org/all/20240401091057.1044-1-kwangjin.ko@sk.com/

Cc: Kwangjin Ko &lt;kwangjin.ko@sk.com&gt;
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://lore.kernel.org/r/20240401-enhance-event-test-v1-1-6669a524ed38@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An issue was found in the processing of event logs when the output
buffer length was not reset.[1]

This bug was not caught with cxl-test for 2 reasons.  First, the test
harness mbox_send command [mock_get_event()] does not set the output
size based on the amount of data returned like the hardware command
does.  Second, the simplistic event log testing always returned the same
number of elements per-get command.

Enhance the simulation of the event log mailbox to better match the bug
found with real hardware to cover potential regressions.

NOTE: These changes will cause cxl-events.sh in ndctl to fail without
the fix from Kwangjin.  However, no changes to the user space test was
required.  Therefore ndctl itself will be compatible with old or new
kernels once both patches land in the new kernel.

[1] Link: https://lore.kernel.org/all/20240401091057.1044-1-kwangjin.ko@sk.com/

Cc: Kwangjin Ko &lt;kwangjin.ko@sk.com&gt;
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://lore.kernel.org/r/20240401-enhance-event-test-v1-1-6669a524ed38@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH</title>
<updated>2024-04-29T16:03:26+00:00</updated>
<author>
<name>Dave Jiang</name>
<email>dave.jiang@intel.com</email>
</author>
<published>2024-04-26T22:47:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5d211c7090590033581175d6405ae40917ca3a06'/>
<id>5d211c7090590033581175d6405ae40917ca3a06</id>
<content type='text'>
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
 [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
 [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]

... plus some related subsequent NULL pointer dereference:

 [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8

The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.

Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.

Reported-by: Robert Richter &lt;rrichter@amd.com&gt;
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Robert Richter &lt;rrichter@amd.com&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
 [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
 [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]

... plus some related subsequent NULL pointer dereference:

 [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8

The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.

Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.

Reported-by: Robert Richter &lt;rrichter@amd.com&gt;
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Robert Richter &lt;rrichter@amd.com&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
