<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/xe/regs, branch for-next</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel</title>
<updated>2024-11-21T22:56:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-21T22:56:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28eb75e178d389d325f1666e422bc13bbbb9804c'/>
<id>28eb75e178d389d325f1666e422bc13bbbb9804c</id>
<content type='text'>
Pull drm updates from Dave Airlie:
 "There's a lot of rework, the panic helper support is being added to
  more drivers, v3d gets support for HW superpages, scheduler
  documentation, drm client and video aperture reworks, some new
  MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
  has some Pantherlake enablement and xe is getting some SRIOV bits, but
  just lots of stuff everywhere.

  core:
   - split DSC helpers from DP helpers
   - clang build fixes for drm/mm test
   - drop simple pipeline support for gem vram
   - document submission error signaling
   - move drm_rect to drm core module from kms helper
   - add default client setup to most drivers
   - move to video aperture helpers instead of drm ones

  tests:
   - new framebuffer tests

  ttm:
   - remove swapped and pinned BOs from TTM lru

  panic:
   - fix uninit spinlock
   - add ABGR2101010 support

  bridge:
   - add TI TDP158 support
   - use standard PM OPS

  dma-fence:
   - use read_trylock instead of read_lock to help lockdep

  scheduler:
   - add errno to sched start to report different errors
   - add locking to drm_sched_entity_modify_sched
   - improve documentation

  xe:
   - add drm_line_printer
   - lots of refactoring
   - Enable Xe2 + PES disaggregation
   - add new ARL PCI ID
   - SRIOV development work
   - fix exec unnecessary implicit fence
   - define and parse OA sync props
   - forcewake refactoring

  i915:
   - Enable BMG/LNL ultra joiner
   - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
   - use DSB for plane/color mgmt
   - Arrow lake PCI IDs
   - lots of i915/xe display refactoring
   - enable PXP GuC autoteardown
   - Pantherlake (PTL) Xe3 LPD display enablement
   - Allow fastset HDR infoframe changes
   - write DP source OUI for non-eDP sinks
   - share PCI IDs between i915 and xe

  amdgpu:
   - SDMA queue reset support
   - SMU 13.0.6, JPEG 4.0.3 updates
   - Initial runtime repartitioning support
   - rework IP structs for multiple IP instances
   - Fetch EDID from _DDC if available
   - SMU13 zero rpm user control
   - lots of fixes/cleanups

  amdkfd:
   - Increase event FIFO size
   - add topology cap flag for per queue reset

  msm:
   - DPU:
      - SA8775P support
      - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
      - Enable large framebuffer support
      - Drop MSM8998 and SDM845
   - DP:
      - SA8775P support
   - GPU:
      - a7xx preemption support
      - Adreno A663 support

  ast:
   - warn about unsupported TX chips

  ivpu:
   - add coredump
   - add pantherlake support

  rockchip:
   - 4K@60Hz display enablement
   - generate pll programming tables

  panthor:
   - add timestamp query API
   - add realtime group priority
   - add fdinfo support

  etnaviv:
   - improve handling of DMA address limits
   - improve GPU hangcheck

  exynos:
   - Decon Exynos7870 support

  mediatek:
   - add OF graph support

  omap:
   - locking fixes

  bochs:
   - convert to gem/shmem from simpledrm

  v3d:
   - support big/super pages
   - add gemfs

  vc4:
   - BCM2712 support refactoring
   - add YUV444 format support

  udmabuf:
   - folio related fixes

  nouveau:
   - add panic support on nv50+"

* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
  drm/xe/guc: Fix dereference before NULL check
  drm/amd: Fix initialization mistake for NBIO 7.7.0
  Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
  drm/amd/display: Fix failure to read vram info due to static BP_RESULT
  drm/amdgpu: enable GTT fallback handling for dGPUs only
  drm/amd/amdgpu: limit single process inside MES
  drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
  drm/amdgpu/mes12: correct kiq unmap latency
  drm/amdgpu: Support vcn and jpeg error info parsing
  drm/amd : Update MES API header file for v11 &amp; v12
  drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
  drm/amdkfd: change kfd process kref count at creation
  drm/amdgpu: Cleanup shift coding style
  drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
  drm/amdgpu: Implement virt req_ras_err_count
  drm/amdgpu: VF Query RAS Caps from Host if supported
  drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
  drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
  drm/amd/display: 3.2.309
  drm/amd/display: Adjust VSDB parser for replay feature
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull drm updates from Dave Airlie:
 "There's a lot of rework, the panic helper support is being added to
  more drivers, v3d gets support for HW superpages, scheduler
  documentation, drm client and video aperture reworks, some new
  MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
  has some Pantherlake enablement and xe is getting some SRIOV bits, but
  just lots of stuff everywhere.

  core:
   - split DSC helpers from DP helpers
   - clang build fixes for drm/mm test
   - drop simple pipeline support for gem vram
   - document submission error signaling
   - move drm_rect to drm core module from kms helper
   - add default client setup to most drivers
   - move to video aperture helpers instead of drm ones

  tests:
   - new framebuffer tests

  ttm:
   - remove swapped and pinned BOs from TTM lru

  panic:
   - fix uninit spinlock
   - add ABGR2101010 support

  bridge:
   - add TI TDP158 support
   - use standard PM OPS

  dma-fence:
   - use read_trylock instead of read_lock to help lockdep

  scheduler:
   - add errno to sched start to report different errors
   - add locking to drm_sched_entity_modify_sched
   - improve documentation

  xe:
   - add drm_line_printer
   - lots of refactoring
   - Enable Xe2 + PES disaggregation
   - add new ARL PCI ID
   - SRIOV development work
   - fix exec unnecessary implicit fence
   - define and parse OA sync props
   - forcewake refactoring

  i915:
   - Enable BMG/LNL ultra joiner
   - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
   - use DSB for plane/color mgmt
   - Arrow lake PCI IDs
   - lots of i915/xe display refactoring
   - enable PXP GuC autoteardown
   - Pantherlake (PTL) Xe3 LPD display enablement
   - Allow fastset HDR infoframe changes
   - write DP source OUI for non-eDP sinks
   - share PCI IDs between i915 and xe

  amdgpu:
   - SDMA queue reset support
   - SMU 13.0.6, JPEG 4.0.3 updates
   - Initial runtime repartitioning support
   - rework IP structs for multiple IP instances
   - Fetch EDID from _DDC if available
   - SMU13 zero rpm user control
   - lots of fixes/cleanups

  amdkfd:
   - Increase event FIFO size
   - add topology cap flag for per queue reset

  msm:
   - DPU:
      - SA8775P support
      - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
      - Enable large framebuffer support
      - Drop MSM8998 and SDM845
   - DP:
      - SA8775P support
   - GPU:
      - a7xx preemption support
      - Adreno A663 support

  ast:
   - warn about unsupported TX chips

  ivpu:
   - add coredump
   - add pantherlake support

  rockchip:
   - 4K@60Hz display enablement
   - generate pll programming tables

  panthor:
   - add timestamp query API
   - add realtime group priority
   - add fdinfo support

  etnaviv:
   - improve handling of DMA address limits
   - improve GPU hangcheck

  exynos:
   - Decon Exynos7870 support

  mediatek:
   - add OF graph support

  omap:
   - locking fixes

  bochs:
   - convert to gem/shmem from simpledrm

  v3d:
   - support big/super pages
   - add gemfs

  vc4:
   - BCM2712 support refactoring
   - add YUV444 format support

  udmabuf:
   - folio related fixes

  nouveau:
   - add panic support on nv50+"

* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
  drm/xe/guc: Fix dereference before NULL check
  drm/amd: Fix initialization mistake for NBIO 7.7.0
  Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
  drm/amd/display: Fix failure to read vram info due to static BP_RESULT
  drm/amdgpu: enable GTT fallback handling for dGPUs only
  drm/amd/amdgpu: limit single process inside MES
  drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
  drm/amdgpu/mes12: correct kiq unmap latency
  drm/amdgpu: Support vcn and jpeg error info parsing
  drm/amd : Update MES API header file for v11 &amp; v12
  drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
  drm/amdkfd: change kfd process kref count at creation
  drm/amdgpu: Cleanup shift coding style
  drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
  drm/amdgpu: Implement virt req_ras_err_count
  drm/amdgpu: VF Query RAS Caps from Host if supported
  drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
  drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
  drm/amd/display: 3.2.309
  drm/amd/display: Adjust VSDB parser for replay feature
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe: Set mask bits for CCS_MODE register</title>
<updated>2024-11-04T16:03:40+00:00</updated>
<author>
<name>Balasubramani Vivekanandan</name>
<email>balasubramani.vivekanandan@intel.com</email>
</author>
<published>2024-10-08T07:36:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fd3fa006fa56c0ec299c61ecf5c572c723adad5'/>
<id>7fd3fa006fa56c0ec299c61ecf5c572c723adad5</id>
<content type='text'>
CCS_MODE register requires setting mask bits from Xe2+ platforms. Set
the mask bits unconditionally, as those bits are unused for older
platforms.

Signed-off-by: Balasubramani Vivekanandan &lt;balasubramani.vivekanandan@intel.com&gt;
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit 23ea2c7572d4735ef66beb1e4feb8ae510b78247)
[ Fix conflict with mmio refactors ]
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CCS_MODE register requires setting mask bits from Xe2+ platforms. Set
the mask bits unconditionally, as those bits are unused for older
platforms.

Signed-off-by: Balasubramani Vivekanandan &lt;balasubramani.vivekanandan@intel.com&gt;
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
(cherry picked from commit 23ea2c7572d4735ef66beb1e4feb8ae510b78247)
[ Fix conflict with mmio refactors ]
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/guc: Capture all available bits of GuC timestamp</title>
<updated>2024-10-29T20:11:33+00:00</updated>
<author>
<name>John Harrison</name>
<email>John.C.Harrison@Intel.com</email>
</author>
<published>2024-10-24T00:25:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=833b2ec3bd5d18b85d8a3f416ca590a44bc4f58c'/>
<id>833b2ec3bd5d18b85d8a3f416ca590a44bc4f58c</id>
<content type='text'>
The extra bits are not hugely useful because the GuC log only uses
32bit time stamps. But they exist so might as well provide them.

Signed-off-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-2-John.C.Harrison@Intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The extra bits are not hugely useful because the GuC log only uses
32bit time stamps. But they exist so might as well provide them.

Signed-off-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-2-John.C.Harrison@Intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/bmg: improve cache flushing behaviour</title>
<updated>2024-10-16T14:00:22+00:00</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-10-07T07:45:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6df106e93f79fb7dc90546a2d93bb3776b42863e'/>
<id>6df106e93f79fb7dc90546a2d93bb3776b42863e</id>
<content type='text'>
The BSpec says that EN_L3_RW_CCS_CACHE_FLUSH must be toggled
on for manual global invalidation to take effect and actually flush
device cache, however this also turns on flushing for things like
pipecontrol, which occurs between submissions for compute/render. This
sounds like massive overkill for our needs, where we already have the
manual flushing on the display side with the global invalidation. Some
observations on BMG:

1. Disabling l2 caching for host writes and stubbing out the driver
   global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH enabled, has
   no impact on wb-transient-vs-display IGT, which makes sense since the
   pipecontrol is now flushing the device cache after the render copy.
   Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
   expected since device cache is now dirty and display engine can't see
   the writes.

2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
   invalidation also has no impact on wb-transient-vs-display. This
   suggests that the global invalidation still works as expected and is
   flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH turned on.

With that drop EN_L3_RW_CCS_CACHE_FLUSH. This helps some workloads since
we no longer flush the device cache between submissions as part of
pipecontrol.

Edit: We now also have clarification from HW side that BSpec was indeed
wrong here.

v2:
  - Rebase and update commit message.

BSpec: 71718
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Vitasta Wattal &lt;vitasta.wattal@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Cc: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241007074541.33937-2-matthew.auld@intel.com
(cherry picked from commit 67ec9f87bd6c57db1251bb2244d242f7ca5a0b6a)
[ Fix conflict due to changed xe_mmio_write32() signature ]
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The BSpec says that EN_L3_RW_CCS_CACHE_FLUSH must be toggled
on for manual global invalidation to take effect and actually flush
device cache, however this also turns on flushing for things like
pipecontrol, which occurs between submissions for compute/render. This
sounds like massive overkill for our needs, where we already have the
manual flushing on the display side with the global invalidation. Some
observations on BMG:

1. Disabling l2 caching for host writes and stubbing out the driver
   global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH enabled, has
   no impact on wb-transient-vs-display IGT, which makes sense since the
   pipecontrol is now flushing the device cache after the render copy.
   Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
   expected since device cache is now dirty and display engine can't see
   the writes.

2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
   invalidation also has no impact on wb-transient-vs-display. This
   suggests that the global invalidation still works as expected and is
   flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH turned on.

With that drop EN_L3_RW_CCS_CACHE_FLUSH. This helps some workloads since
we no longer flush the device cache between submissions as part of
pipecontrol.

Edit: We now also have clarification from HW side that BSpec was indeed
wrong here.

v2:
  - Rebase and update commit message.

BSpec: 71718
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Vitasta Wattal &lt;vitasta.wattal@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Cc: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241007074541.33937-2-matthew.auld@intel.com
(cherry picked from commit 67ec9f87bd6c57db1251bb2244d242f7ca5a0b6a)
[ Fix conflict due to changed xe_mmio_write32() signature ]
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/xe3: Add initial set of workarounds</title>
<updated>2024-10-09T13:41:46+00:00</updated>
<author>
<name>Gustavo Sousa</name>
<email>gustavo.sousa@intel.com</email>
</author>
<published>2024-10-08T20:46:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=081cb8948cfe322076cd23f22f85ba68f73e2c4b'/>
<id>081cb8948cfe322076cd23f22f85ba68f73e2c4b</id>
<content type='text'>
Implement the initial set of workarounds for Xe3 IPs.

Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Signed-off-by: Matt Atwood &lt;matthew.s.atwood@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241008204626.55802-2-matthew.s.atwood@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement the initial set of workarounds for Xe3 IPs.

Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Signed-off-by: Matt Atwood &lt;matthew.s.atwood@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241008204626.55802-2-matthew.s.atwood@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/bmg: improve cache flushing behaviour</title>
<updated>2024-10-09T08:01:42+00:00</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2024-10-07T07:45:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=67ec9f87bd6c57db1251bb2244d242f7ca5a0b6a'/>
<id>67ec9f87bd6c57db1251bb2244d242f7ca5a0b6a</id>
<content type='text'>
The BSpec says that EN_L3_RW_CCS_CACHE_FLUSH must be toggled
on for manual global invalidation to take effect and actually flush
device cache, however this also turns on flushing for things like
pipecontrol, which occurs between submissions for compute/render. This
sounds like massive overkill for our needs, where we already have the
manual flushing on the display side with the global invalidation. Some
observations on BMG:

1. Disabling l2 caching for host writes and stubbing out the driver
   global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH enabled, has
   no impact on wb-transient-vs-display IGT, which makes sense since the
   pipecontrol is now flushing the device cache after the render copy.
   Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
   expected since device cache is now dirty and display engine can't see
   the writes.

2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
   invalidation also has no impact on wb-transient-vs-display. This
   suggests that the global invalidation still works as expected and is
   flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH turned on.

With that drop EN_L3_RW_CCS_CACHE_FLUSH. This helps some workloads since
we no longer flush the device cache between submissions as part of
pipecontrol.

Edit: We now also have clarification from HW side that BSpec was indeed
wrong here.

v2:
  - Rebase and update commit message.

BSpec: 71718
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Vitasta Wattal &lt;vitasta.wattal@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Cc: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241007074541.33937-2-matthew.auld@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The BSpec says that EN_L3_RW_CCS_CACHE_FLUSH must be toggled
on for manual global invalidation to take effect and actually flush
device cache, however this also turns on flushing for things like
pipecontrol, which occurs between submissions for compute/render. This
sounds like massive overkill for our needs, where we already have the
manual flushing on the display side with the global invalidation. Some
observations on BMG:

1. Disabling l2 caching for host writes and stubbing out the driver
   global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH enabled, has
   no impact on wb-transient-vs-display IGT, which makes sense since the
   pipecontrol is now flushing the device cache after the render copy.
   Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
   expected since device cache is now dirty and display engine can't see
   the writes.

2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
   invalidation also has no impact on wb-transient-vs-display. This
   suggests that the global invalidation still works as expected and is
   flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH turned on.

With that drop EN_L3_RW_CCS_CACHE_FLUSH. This helps some workloads since
we no longer flush the device cache between submissions as part of
pipecontrol.

Edit: We now also have clarification from HW side that BSpec was indeed
wrong here.

v2:
  - Rebase and update commit message.

BSpec: 71718
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Vitasta Wattal &lt;vitasta.wattal@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Cc: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Reviewed-by: Nirmoy Das &lt;nirmoy.das@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241007074541.33937-2-matthew.auld@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/guc: Prepare GuC register list and update ADS size for error capture</title>
<updated>2024-10-08T16:34:04+00:00</updated>
<author>
<name>Zhanjun Dong</name>
<email>zhanjun.dong@intel.com</email>
</author>
<published>2024-10-04T19:34:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9c8c7a7e6f1f55ec28cf0dbfe39a7a797f67be78'/>
<id>9c8c7a7e6f1f55ec28cf0dbfe39a7a797f67be78</id>
<content type='text'>
Add referenced registers defines and list of registers.
Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.

Then, populate GuC ADS with the lists of registers we want
GuC to report back to host on engine reset events. This list
should include global, engine-class and engine-instance
registers for every engine-class type on the current hardware.

Ensure we allocate a persistent storage for the register lists
that are populated into ADS so that we don't need to allocate
memory during GT resets when GuC is reloaded and ADS population
happens again.

Signed-off-by: Zhanjun Dong &lt;zhanjun.dong@intel.com&gt;
Reviewed-by: Alan Previn &lt;alan.previn.teres.alexis@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add referenced registers defines and list of registers.
Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.

Then, populate GuC ADS with the lists of registers we want
GuC to report back to host on engine reset events. This list
should include global, engine-class and engine-instance
registers for every engine-class type on the current hardware.

Ensure we allocate a persistent storage for the register lists
that are populated into ADS so that we don't need to allocate
memory during GT resets when GuC is reloaded and ADS population
happens again.

Signed-off-by: Zhanjun Dong &lt;zhanjun.dong@intel.com&gt;
Reviewed-by: Alan Previn &lt;alan.previn.teres.alexis@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/guc: Use a two stage dump for GuC logs and add more info</title>
<updated>2024-10-08T01:12:16+00:00</updated>
<author>
<name>John Harrison</name>
<email>John.C.Harrison@Intel.com</email>
</author>
<published>2024-10-03T00:46:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d8ce1a97722617317b04eb9f19ab8d6d95379f7a'/>
<id>d8ce1a97722617317b04eb9f19ab8d6d95379f7a</id>
<content type='text'>
Split the GuC log dump into a two stage snapshot and print mechanism.
This allows the log to be captured at the point of an error (which may
be in a restricted context) and then dump it out later (from a regular
context such as a worker function or a sysfs file handler).

Also add a bunch of other useful pieces of information that can help
(or are fundamentally required!) to decode and parse the log.

v2: Add kerneldoc and fix a couple of comment typos - review feedback
from Michal W.
v3: Move chunking code to this patch as it makes the deltas simpler.
Fix a bunch of kerneldoc issues.
v4: Move the CS frequency out of the coredump snapshot function into
the debugfs only code (as that info is already part of the main
devcoredump). Add a header to the debugfs log to match the one in the
devcoredump to aid processing by a unified tool. Add forcewake to the
GuC timestamp read so it actually works.
v6: Add colon to GuC version string (review feedback by Julia F).

Signed-off-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Reviewed-by: Julia Filipchuk &lt;julia.filipchuk@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-7-John.C.Harrison@Intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Split the GuC log dump into a two stage snapshot and print mechanism.
This allows the log to be captured at the point of an error (which may
be in a restricted context) and then dump it out later (from a regular
context such as a worker function or a sysfs file handler).

Also add a bunch of other useful pieces of information that can help
(or are fundamentally required!) to decode and parse the log.

v2: Add kerneldoc and fix a couple of comment typos - review feedback
from Michal W.
v3: Move chunking code to this patch as it makes the deltas simpler.
Fix a bunch of kerneldoc issues.
v4: Move the CS frequency out of the coredump snapshot function into
the debugfs only code (as that info is already part of the main
devcoredump). Add a header to the debugfs log to match the one in the
devcoredump to aid processing by a unified tool. Add forcewake to the
GuC timestamp read so it actually works.
v6: Add colon to GuC version string (review feedback by Julia F).

Signed-off-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Reviewed-by: Julia Filipchuk &lt;julia.filipchuk@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-7-John.C.Harrison@Intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/xe2: Add performance tuning for L3 cache flushing</title>
<updated>2024-10-03T06:13:55+00:00</updated>
<author>
<name>Gustavo Sousa</name>
<email>gustavo.sousa@intel.com</email>
</author>
<published>2024-09-20T21:13:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6ef5a04221aaeb858d1a825b2ecb7e200cac80f8'/>
<id>6ef5a04221aaeb858d1a825b2ecb7e200cac80f8</id>
<content type='text'>
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.

Unlike the other existing tuning settings, we limit this one for LNL
only, since there is no info about whether this would be applicable to
other platforms yet. In the future we can come back and use IP version
ranges if applicable.

v2:
  - Fix reference to Bspec. (Sai Teja, Tejas)
  - Use correct register name for "Tuning: L3 RW flush all Cache". (Sai
    Teja)
  - Use SCRATCH3_LBCF (with the underscore) for better readability.
v3:
  - Limit setting to LNL only. (Matt)

Bspec: 72161
Cc: Sai Teja Pottumuttu &lt;sai.teja.pottumuttu@intel.com&gt;
Cc: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-5-gustavo.sousa@intel.com
(cherry picked from commit 876253165f3eaaacacb8c8bed16a9df4b6081479)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.

Unlike the other existing tuning settings, we limit this one for LNL
only, since there is no info about whether this would be applicable to
other platforms yet. In the future we can come back and use IP version
ranges if applicable.

v2:
  - Fix reference to Bspec. (Sai Teja, Tejas)
  - Use correct register name for "Tuning: L3 RW flush all Cache". (Sai
    Teja)
  - Use SCRATCH3_LBCF (with the underscore) for better readability.
v3:
  - Limit setting to LNL only. (Matt)

Bspec: 72161
Cc: Sai Teja Pottumuttu &lt;sai.teja.pottumuttu@intel.com&gt;
Cc: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-5-gustavo.sousa@intel.com
(cherry picked from commit 876253165f3eaaacacb8c8bed16a9df4b6081479)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/xe2: Extend performance tuning to media GT</title>
<updated>2024-10-03T06:13:55+00:00</updated>
<author>
<name>Gustavo Sousa</name>
<email>gustavo.sousa@intel.com</email>
</author>
<published>2024-09-20T21:13:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3bf90935aafc750c838c8831e96c3ac36cfd48d5'/>
<id>3bf90935aafc750c838c8831e96c3ac36cfd48d5</id>
<content type='text'>
With exception of "Tuning: L3 cache - media", we are currently applying
recommended performance tuning settings only for the primary GT. Let's
also implement them for the media GT when applicable.

According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
only in Xe2_LPM and their offsets do not match their primary GT
counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
listed as a multicast range on the media GT. As such, we need to have
Xe2_LPM-specific definitions for those registers and apply the setting
only for that specific IP.

Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
offset on the media GT matches the one on the primary one. So we can
simply have a copy of "Tuning: Stateless compression control" for the
media GT.

v2:
  - Fix implementation with respect to multicast vs non-multicast
    registers. (Matt)
  - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
    Compression Overfetch - media".
v3:
  - STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register,
    do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas)

Bspec: 72161
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-3-gustavo.sousa@intel.com
(cherry picked from commit e1f813947ccf2326cfda4558b7d31430d7860c4b)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With exception of "Tuning: L3 cache - media", we are currently applying
recommended performance tuning settings only for the primary GT. Let's
also implement them for the media GT when applicable.

According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
only in Xe2_LPM and their offsets do not match their primary GT
counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
listed as a multicast range on the media GT. As such, we need to have
Xe2_LPM-specific definitions for those registers and apply the setting
only for that specific IP.

Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
offset on the media GT matches the one on the primary one. So we can
simply have a copy of "Tuning: Stateless compression control" for the
media GT.

v2:
  - Fix implementation with respect to multicast vs non-multicast
    registers. (Matt)
  - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
    Compression Overfetch - media".
v3:
  - STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register,
    do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas)

Bspec: 72161
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-3-gustavo.sousa@intel.com
(cherry picked from commit e1f813947ccf2326cfda4558b7d31430d7860c4b)
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
