<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/cxl, branch v6.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>cxl/pmu: Ensure put_device on pmu devices</title>
<updated>2023-12-15T05:54:45+00:00</updated>
<author>
<name>Ira Weiny</name>
<email>ira.weiny@intel.com</email>
</author>
<published>2023-10-16T23:25:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ef3d5cf9c59cccb012aa6b93d99f4c6eb5d6648e'/>
<id>ef3d5cf9c59cccb012aa6b93d99f4c6eb5d6648e</id>
<content type='text'>
The following kmemleaks were detected when removing the cxl module
stack:

unreferenced object 0xffff88822616b800 (size 1024):
...
  backtrace:
    [&lt;00000000bedc6f83&gt;] kmalloc_trace+0x26/0x90
    [&lt;00000000448d1afc&gt;] devm_cxl_pmu_add+0x3a/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882260abcc0 (size 16):
...
  hex dump (first 16 bytes):
    70 6d 75 5f 6d 65 6d 30 2e 30 00 26 82 88 ff ff  pmu_mem0.0.&amp;....
  backtrace:
...
    [&lt;00000000152b5e98&gt;] dev_set_name+0x43/0x50
    [&lt;00000000c228798b&gt;] devm_cxl_pmu_add+0x102/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882272af200 (size 256):
...
  backtrace:
    [&lt;00000000bedc6f83&gt;] kmalloc_trace+0x26/0x90
    [&lt;00000000a14d1813&gt;] device_add+0x4ea/0x890
    [&lt;00000000a3f07b47&gt;] devm_cxl_pmu_add+0xbe/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...

devm_cxl_pmu_add() correctly registers a device remove function but it
only calls device_del() which is only part of device unregistration.

Properly call device_unregister() to free up the memory associated with
the device.

Fixes: 1ad3f701c399 ("cxl/pci: Find and register CXL PMU devices")
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following kmemleaks were detected when removing the cxl module
stack:

unreferenced object 0xffff88822616b800 (size 1024):
...
  backtrace:
    [&lt;00000000bedc6f83&gt;] kmalloc_trace+0x26/0x90
    [&lt;00000000448d1afc&gt;] devm_cxl_pmu_add+0x3a/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882260abcc0 (size 16):
...
  hex dump (first 16 bytes):
    70 6d 75 5f 6d 65 6d 30 2e 30 00 26 82 88 ff ff  pmu_mem0.0.&amp;....
  backtrace:
...
    [&lt;00000000152b5e98&gt;] dev_set_name+0x43/0x50
    [&lt;00000000c228798b&gt;] devm_cxl_pmu_add+0x102/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882272af200 (size 256):
...
  backtrace:
    [&lt;00000000bedc6f83&gt;] kmalloc_trace+0x26/0x90
    [&lt;00000000a14d1813&gt;] device_add+0x4ea/0x890
    [&lt;00000000a3f07b47&gt;] devm_cxl_pmu_add+0xbe/0x110 [cxl_core]
    [&lt;00000000ca3bfe16&gt;] 0xffffffffa105213b
    [&lt;00000000ba7f78dc&gt;] local_pci_probe+0x41/0x90
    [&lt;000000005bb027ac&gt;] pci_device_probe+0xb0/0x1c0
...

devm_cxl_pmu_add() correctly registers a device remove function but it
only calls device_del() which is only part of device unregistration.

Properly call device_unregister() to free up the memory associated with
the device.

Fixes: 1ad3f701c399 ("cxl/pci: Find and register CXL PMU devices")
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/cdat: Free correct buffer on checksum error</title>
<updated>2023-12-09T00:14:28+00:00</updated>
<author>
<name>Ira Weiny</name>
<email>ira.weiny@intel.com</email>
</author>
<published>2023-11-17T00:03:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c65efe3685f5d150eeca5599afeabdc85da899d1'/>
<id>c65efe3685f5d150eeca5599afeabdc85da899d1</id>
<content type='text'>
The new 6.7-rc1 kernel now checks the checksum on CDAT data.  While
using a branch of Fan's DCD qemu work (and specifying DCD devices), the
following splat was observed.

	WARNING: CPU: 1 PID: 1384 at drivers/base/devres.c:1064 devm_kfree+0x4f/0x60
	...
	RIP: 0010:devm_kfree+0x4f/0x60
	...
 	? devm_kfree+0x4f/0x60
 	read_cdat_data+0x1a0/0x2a0 [cxl_core]
 	cxl_port_probe+0xdf/0x200 [cxl_port]
	...

The issue in qemu is still unknown but the spat is a straight forward
bug in the CDAT checksum processing code.  Use a CDAT buffer variable to
ensure the devm_free() works correctly on error.

Fixes: 670e4e88f3b1 ("cxl: Add checksum verification to CDAT from CXL")
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: http://lore.kernel.org/r/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The new 6.7-rc1 kernel now checks the checksum on CDAT data.  While
using a branch of Fan's DCD qemu work (and specifying DCD devices), the
following splat was observed.

	WARNING: CPU: 1 PID: 1384 at drivers/base/devres.c:1064 devm_kfree+0x4f/0x60
	...
	RIP: 0010:devm_kfree+0x4f/0x60
	...
 	? devm_kfree+0x4f/0x60
 	read_cdat_data+0x1a0/0x2a0 [cxl_core]
 	cxl_port_probe+0xdf/0x200 [cxl_port]
	...

The issue in qemu is still unknown but the spat is a straight forward
bug in the CDAT checksum processing code.  Use a CDAT buffer variable to
ensure the devm_free() works correctly on error.

Fixes: 670e4e88f3b1 ("cxl: Add checksum verification to CDAT from CXL")
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: http://lore.kernel.org/r/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/hdm: Fix dpa translation locking</title>
<updated>2023-12-08T03:14:04+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-12-07T03:11:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6f5c4eca48ffe18307b4e1d375817691c9005c87'/>
<id>6f5c4eca48ffe18307b4e1d375817691c9005c87</id>
<content type='text'>
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
in the helper. Otherwise, it triggers multiple lockdep reports:

1/ Tracing callbacks are in an atomic context that can not acquire sleeping
locks:

    BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
    preempt_count: 2, expected: 0
    RCU nest depth: 0, expected: 0
    [..]
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Call Trace:
     &lt;TASK&gt;
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]

2/ The rwsem is already held in the inject poison path:

    WARNING: possible recursive locking detected
    6.7.0-rc2+ #12 Tainted: G        W  OE    N
    --------------------------------------------
    bash/1288 is trying to acquire lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]

    but task is already holding lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
    [..]
    Call Trace:
     &lt;TASK&gt;
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
     __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
     cxl_inject_poison+0x1bc/0x1e0 [cxl_core]

This appears to have been an issue since the initial implementation and
uncovered by the new cxl-poison.sh test [1]. That test is now passing with
these changes.

Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
in the helper. Otherwise, it triggers multiple lockdep reports:

1/ Tracing callbacks are in an atomic context that can not acquire sleeping
locks:

    BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
    preempt_count: 2, expected: 0
    RCU nest depth: 0, expected: 0
    [..]
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Call Trace:
     &lt;TASK&gt;
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]

2/ The rwsem is already held in the inject poison path:

    WARNING: possible recursive locking detected
    6.7.0-rc2+ #12 Tainted: G        W  OE    N
    --------------------------------------------
    bash/1288 is trying to acquire lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]

    but task is already holding lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
    [..]
    Call Trace:
     &lt;TASK&gt;
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
     __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
     cxl_inject_poison+0x1bc/0x1e0 [cxl_core]

This appears to have been an issue since the initial implementation and
uncovered by the new cxl-poison.sh test [1]. That test is now passing with
these changes.

Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/memdev: Hold region_rwsem during inject and clear poison ops</title>
<updated>2023-11-30T01:23:03+00:00</updated>
<author>
<name>Alison Schofield</name>
<email>alison.schofield@intel.com</email>
</author>
<published>2023-11-27T00:09:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0e33ac9c3ffe5e4f55c68345f44cea7fec2fe750'/>
<id>0e33ac9c3ffe5e4f55c68345f44cea7fec2fe750</id>
<content type='text'>
Poison inject and clear are supported via debugfs where a privileged
user can inject and clear poison to a device physical address.

Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
added a lockdep assert that highlighted a gap in poison inject and
clear functions where holding the dpa_rwsem does not assure that a
a DPA is not added to a region.

The impact for inject and clear is that if the DPA address being
injected or cleared has been attached to a region, but not yet
committed, the dev_dbg() message intended to alert the debug user
that they are acting on a mapped address is not emitted. Also, the
cxl_poison trace event that serves as a log of the inject and clear
activity will not include region info.

Close this gap by snapshotting an unchangeable region state during
poison inject and clear operations. That means holding both the
region_rwsem and the dpa_rwsem during the inject and clear ops.

Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Poison inject and clear are supported via debugfs where a privileged
user can inject and clear poison to a device physical address.

Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
added a lockdep assert that highlighted a gap in poison inject and
clear functions where holding the dpa_rwsem does not assure that a
a DPA is not added to a region.

The impact for inject and clear is that if the DPA address being
injected or cleared has been attached to a region, but not yet
committed, the dev_dbg() message intended to alert the debug user
that they are acting on a mapped address is not emitted. Also, the
cxl_poison trace event that serves as a log of the inject and clear
activity will not include region info.

Close this gap by snapshotting an unchangeable region state during
poison inject and clear operations. That means holding both the
region_rwsem and the dpa_rwsem during the inject and clear ops.

Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/core: Always hold region_rwsem while reading poison lists</title>
<updated>2023-11-30T01:03:53+00:00</updated>
<author>
<name>Alison Schofield</name>
<email>alison.schofield@intel.com</email>
</author>
<published>2023-11-27T00:09:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5558b92e8d39e18aa19619be2ee37274e9592528'/>
<id>5558b92e8d39e18aa19619be2ee37274e9592528</id>
<content type='text'>
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.

In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).

In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.

Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")

Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.

In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).

In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.

Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")

Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
Signed-off-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/hdm: Fix a benign lockdep splat</title>
<updated>2023-11-23T00:34:30+00:00</updated>
<author>
<name>Dave Jiang</name>
<email>dave.jiang@intel.com</email>
</author>
<published>2023-11-17T20:18:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=36a1c2ee50f573972aea3c3019555f47ee0094c0'/>
<id>36a1c2ee50f573972aea3c3019555f47ee0094c0</id>
<content type='text'>
The new helper "cxl_num_decoders_committed()" added a lockdep assertion
to validate that port-&gt;commit_end is protected against modification.
That assertion fires in init_hdm_decoder() where it is initializing
port-&gt;commit_end. Given that it is both accessing and writing that
property it obstensibly needs the lock.

In practice, CXL decoder commit rules (must commit in order) and the
in-order discovery of device decoders makes the manipulation of
-&gt;commit_end in init_hdm_decoder() safe. However, rather than rely on
the subtle rules of CXL hardware, just make the implementation obviously
correct from a software perspective.

The Fixes: tag is only for cleaning up a lockdep splat, there is no
functional issue addressed by this fix.

Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The new helper "cxl_num_decoders_committed()" added a lockdep assertion
to validate that port-&gt;commit_end is protected against modification.
That assertion fires in init_hdm_decoder() where it is initializing
port-&gt;commit_end. Given that it is both accessing and writing that
property it obstensibly needs the lock.

In practice, CXL decoder commit rules (must commit in order) and the
in-order discovery of device decoders makes the manipulation of
-&gt;commit_end in init_hdm_decoder() safe. However, rather than rely on
the subtle rules of CXL hardware, just make the implementation obviously
correct from a software perspective.

The Fixes: tag is only for cleaning up a lockdep splat, there is no
functional issue addressed by this fix.

Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/pci: Change CXL AER support check to use native AER</title>
<updated>2023-11-02T21:09:01+00:00</updated>
<author>
<name>Terry Bowman</name>
<email>terry.bowman@amd.com</email>
</author>
<published>2023-11-02T15:52:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3741ac86c8e648709506102f7ab51905d50df43'/>
<id>b3741ac86c8e648709506102f7ab51905d50df43</id>
<content type='text'>
Native CXL protocol errors are delivered to the OS through AER
reporting. The owner of AER owns CXL Protocol error management with
respect to _OSC negotiation.[1] CXL device errors are handled by a
separate interrupt with native control gated by _OSC control field
'CXL Memory Error Reporting Control'.

The CXL driver incorrectly checks for 'CXL Memory Error Reporting
Control' before accessing AER registers and caching RCH downport
AER registers. Replace the current check in these 2 cases with
native AER checks.

[1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL
_OSC Support Fields, p.641

Fixes: f05fd10d138d ("cxl/pci: Add RCH downstream port AER register discovery")
Signed-off-by: Terry Bowman &lt;terry.bowman@amd.com&gt;
Reviewed-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Link: https://lore.kernel.org/r/20231102155232.1421261-1-terry.bowman@amd.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Native CXL protocol errors are delivered to the OS through AER
reporting. The owner of AER owns CXL Protocol error management with
respect to _OSC negotiation.[1] CXL device errors are handled by a
separate interrupt with native control gated by _OSC control field
'CXL Memory Error Reporting Control'.

The CXL driver incorrectly checks for 'CXL Memory Error Reporting
Control' before accessing AER registers and caching RCH downport
AER registers. Replace the current check in these 2 cases with
native AER checks.

[1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL
_OSC Support Fields, p.641

Fixes: f05fd10d138d ("cxl/pci: Add RCH downstream port AER register discovery")
Signed-off-by: Terry Bowman &lt;terry.bowman@amd.com&gt;
Reviewed-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Link: https://lore.kernel.org/r/20231102155232.1421261-1-terry.bowman@amd.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/hdm: Remove broken error path</title>
<updated>2023-10-31T21:10:04+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-10-31T21:09:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5d09c63f11f083707b60c8ea0bb420651c47740f'/>
<id>5d09c63f11f083707b60c8ea0bb420651c47740f</id>
<content type='text'>
Dan reports that cxl_decoder_commit() potentially leaks a hold of
cxl_dpa_rwsem. The potential error case is a "should not" happen
scenario, turn it into a "can not" happen scenario by adding the error
check to cxl_port_setup_targets() where other setting validation occurs.

Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dan reports that cxl_decoder_commit() potentially leaks a hold of
cxl_dpa_rwsem. The potential error case is a "should not" happen
scenario, turn it into a "can not" happen scenario by adding the error
check to cxl_port_setup_targets() where other setting validation occurs.

Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cxl/hdm: Fix &amp;&amp; vs || bug</title>
<updated>2023-10-31T21:09:50+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2023-10-31T09:53:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=69d56b15a7941680aba8c3175b165221ecdf54b6'/>
<id>69d56b15a7941680aba8c3175b165221ecdf54b6</id>
<content type='text'>
If "info" is NULL then this code will crash.  || was intended instead of
&amp;&amp;.

Fixes: 8ce520fdea24 ("cxl/hdm: Use stored Component Register mappings to map HDM decoder capability")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: https://lore.kernel.org/r/60028378-d3d5-4d6d-90fd-f915f061e731@moroto.mountain
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If "info" is NULL then this code will crash.  || was intended instead of
&amp;&amp;.

Fixes: 8ce520fdea24 ("cxl/hdm: Use stored Component Register mappings to map HDM decoder capability")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Reviewed-by: Robert Richter &lt;rrichter@amd.com&gt;
Link: https://lore.kernel.org/r/60028378-d3d5-4d6d-90fd-f915f061e731@moroto.mountain
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-6.7/cxl-commited' into cxl/next</title>
<updated>2023-10-31T18:00:08+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-10-31T18:00:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3cfdbf6a062bcfb431153f92d6bc1ad20bfc687'/>
<id>b3cfdbf6a062bcfb431153f92d6bc1ad20bfc687</id>
<content type='text'>
Add the committed decoder sysfs attribute for v6.7.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the committed decoder sysfs attribute for v6.7.
</pre>
</div>
</content>
</entry>
</feed>
