<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/thermal/intel, branch v6.2.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()</title>
<updated>2023-01-25T14:37:21+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-01-25T12:17:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=acd7e9ee57c880b99671dd99680cb707b7b5b0ee'/>
<id>acd7e9ee57c880b99671dd99680cb707b7b5b0ee</id>
<content type='text'>
In order to prevent int340x_thermal_get_trip_type() from possibly
racing with int340x_thermal_read_trips() invoked by int3403_notify()
add locking to it in analogy with int340x_thermal_get_trip_temp().

Fixes: 6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to prevent int340x_thermal_get_trip_type() from possibly
racing with int340x_thermal_read_trips() invoked by int3403_notify()
add locking to it in analogy with int340x_thermal_get_trip_temp().

Fixes: 6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: int340x: Protect trip temperature from concurrent updates</title>
<updated>2023-01-24T20:28:19+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2023-01-23T17:21:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6757a7abe47bcb12cb2d45661067e182424b0ee3'/>
<id>6757a7abe47bcb12cb2d45661067e182424b0ee3</id>
<content type='text'>
Trip temperatures are read using ACPI methods and stored in the memory
during zone initializtion and when the firmware sends a notification for
change. This trip temperature is returned when the thermal core calls via
callback get_trip_temp().

But it is possible that while updating the memory copy of the trips when
the firmware sends a notification for change, thermal core is reading the
trip temperature via the callback get_trip_temp(). This may return invalid
trip temperature.

To address this add a mutex to protect the invalid temperature reads in
the callback get_trip_temp() and int340x_thermal_read_trips().

Fixes: 5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 5.0+ &lt;stable@vger.kernel.org&gt; # 5.0+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Trip temperatures are read using ACPI methods and stored in the memory
during zone initializtion and when the firmware sends a notification for
change. This trip temperature is returned when the thermal core calls via
callback get_trip_temp().

But it is possible that while updating the memory copy of the trips when
the firmware sends a notification for change, thermal core is reading the
trip temperature via the callback get_trip_temp(). This may return invalid
trip temperature.

To address this add a mutex to protect the invalid temperature reads in
the callback get_trip_temp() and int340x_thermal_read_trips().

Fixes: 5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 5.0+ &lt;stable@vger.kernel.org&gt; # 5.0+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: int340x: Add missing attribute for data rate base</title>
<updated>2022-12-30T18:48:37+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-12-28T00:10:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b878d3ba9bb41cddb73ba4b56e5552f0a638daca'/>
<id>b878d3ba9bb41cddb73ba4b56e5552f0a638daca</id>
<content type='text'>
Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM
driver")' added rfi_restriction_data_rate_base string, mmio details and
documentation, but missed adding attribute to sysfs.

Add missing sysfs attribute.

Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")
Cc: 5.11+ &lt;stable@vger.kernel.org&gt; # v5.11+
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM
driver")' added rfi_restriction_data_rate_base string, mmio details and
documentation, but missed adding attribute to sysfs.

Add missing sysfs attribute.

Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")
Cc: 5.11+ &lt;stable@vger.kernel.org&gt; # v5.11+
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2022-12-15T18:16:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-15T18:16:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=601c1aa855a686643259c4a34e96ee692cdaf01f'/>
<id>601c1aa855a686643259c4a34e96ee692cdaf01f</id>
<content type='text'>
Pull more thermal control updates from Rafael Wysocki:
 "These are updates of assorted thermal drivers, mostly for ARM
  platforms, generally isolated and fairly straightforward, and the
  recent Intel HFI driver fix for systems without HFI support.

  Specifics:

   - Avoid clearing the HFI status bit on systems without HFI support
     which triggers unchecked MSR access errors (Srinivas Pandruvada)

   - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca
     Weiss, Neil Armstrong)

   - Use devm_platform_get_and_ioremap_resource on the ST platform to
     group two calls into a single one (Minghao Chi)

   - Use GENMASK instead of bitmaps and validate the temperature after
     reading it in the imx8mm_thermal driver (Marcus Folkesson)

   - Convert generic-adc-thermal to DT schema (Rob Herring)

   - Fix debug print message with inverted logic in the k3_j72xx_bandgap
     driver (Keerthy)

   - Fix memory leak on thermal_of_zone_register() failure (Ido
     Schimmel)

   - Add support for IPQ8074 in the tsens thermal driver along with the
     DT bindings (Robert Marko)

   - Fix and rework the debugfs code in the tsens driver (Christian
     Marangi)

   - Add calibration and DT documentation for the imx8mm driver (Marek
     Vasut)

   - Add DT bindings and compatible for the Mediatek SoCs mt7981 and
     mt7983 (Daniel Golle)

   - Don't show an error message if it happens at probe time while it
     will be deferred on the QCom SPMI ADC driver (Johan Hovold)

   - Add HWMon support for the imx8mm board (Alexander Stein)

   - Remove pointless include from the power allocator governor
     (Christophe JAILLET)

   - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450
     (Krzysztof Kozlowski)

   - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss)

   - Demote error log of thermal zone register to debug in the tsens
     QCom driver (Manivannan Sadhasivam)

   - Consolidate the the efuse values and the errata handling in the TI
     Bandgap driver (Bryan Brattlof)

   - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT
     bindings (Lad Prabhakar)

   - Fix the irq handler return value in the LMh driver (Bjorn
     Andersson)

   - Delete empty platform remove callback from imx_sc_thermal (Uwe
     Kleine-König)"

* tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  thermal/drivers/imx_sc_thermal: Drop empty platform remove function
  thermal/drivers/qcom/lmh: Fix irq handler return value
  dt-bindings: thermal: qcom-tsens: Add compatible for sm8550
  thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
  dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC
  dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range
  dt-bindings: thermal: k3-j72xx: elaborate on binding description
  thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround
  thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure
  thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag
  thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function
  thermal/drivers/qcom: Demote error log of thermal zone register to debug
  thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2
  dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450
  thermal/core/power allocator: Remove a useless include
  thermal/drivers/imx8mm: Add hwmon support
  thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message
  dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC
  thermal: ti-soc-thermal: Drop comma after SoC match table sentinel
  thermal/drivers/imx: Add support for loading calibration data from OCOTP
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull more thermal control updates from Rafael Wysocki:
 "These are updates of assorted thermal drivers, mostly for ARM
  platforms, generally isolated and fairly straightforward, and the
  recent Intel HFI driver fix for systems without HFI support.

  Specifics:

   - Avoid clearing the HFI status bit on systems without HFI support
     which triggers unchecked MSR access errors (Srinivas Pandruvada)

   - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca
     Weiss, Neil Armstrong)

   - Use devm_platform_get_and_ioremap_resource on the ST platform to
     group two calls into a single one (Minghao Chi)

   - Use GENMASK instead of bitmaps and validate the temperature after
     reading it in the imx8mm_thermal driver (Marcus Folkesson)

   - Convert generic-adc-thermal to DT schema (Rob Herring)

   - Fix debug print message with inverted logic in the k3_j72xx_bandgap
     driver (Keerthy)

   - Fix memory leak on thermal_of_zone_register() failure (Ido
     Schimmel)

   - Add support for IPQ8074 in the tsens thermal driver along with the
     DT bindings (Robert Marko)

   - Fix and rework the debugfs code in the tsens driver (Christian
     Marangi)

   - Add calibration and DT documentation for the imx8mm driver (Marek
     Vasut)

   - Add DT bindings and compatible for the Mediatek SoCs mt7981 and
     mt7983 (Daniel Golle)

   - Don't show an error message if it happens at probe time while it
     will be deferred on the QCom SPMI ADC driver (Johan Hovold)

   - Add HWMon support for the imx8mm board (Alexander Stein)

   - Remove pointless include from the power allocator governor
     (Christophe JAILLET)

   - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450
     (Krzysztof Kozlowski)

   - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss)

   - Demote error log of thermal zone register to debug in the tsens
     QCom driver (Manivannan Sadhasivam)

   - Consolidate the the efuse values and the errata handling in the TI
     Bandgap driver (Bryan Brattlof)

   - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT
     bindings (Lad Prabhakar)

   - Fix the irq handler return value in the LMh driver (Bjorn
     Andersson)

   - Delete empty platform remove callback from imx_sc_thermal (Uwe
     Kleine-König)"

* tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  thermal/drivers/imx_sc_thermal: Drop empty platform remove function
  thermal/drivers/qcom/lmh: Fix irq handler return value
  dt-bindings: thermal: qcom-tsens: Add compatible for sm8550
  thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
  dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC
  dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range
  dt-bindings: thermal: k3-j72xx: elaborate on binding description
  thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround
  thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure
  thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag
  thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function
  thermal/drivers/qcom: Demote error log of thermal zone register to debug
  thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2
  dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450
  thermal/core/power allocator: Remove a useless include
  thermal/drivers/imx8mm: Add hwmon support
  thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message
  dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC
  thermal: ti-soc-thermal: Drop comma after SoC match table sentinel
  thermal/drivers/imx: Add support for loading calibration data from OCOTP
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: Don't set HFI status bit to 1</title>
<updated>2022-12-14T13:50:15+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-12-14T02:06:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=904f309ae7edaadc9fd0ee04be8281d7781d97e4'/>
<id>904f309ae7edaadc9fd0ee04be8281d7781d97e4</id>
<content type='text'>
When CPU doesn't support HFI (Hardware Feedback Interface), don't include
BIT 26 in the mask to prevent clearing. otherwise this results in:
    unchecked MSR access error: WRMSR to 0x1b1
      (tried to write 0x0000000004000aa8)
      at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)

Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Tested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When CPU doesn't support HFI (Hardware Feedback Interface), don't include
BIT 26 in the mask to prevent clearing. otherwise this results in:
    unchecked MSR access error: WRMSR to 0x1b1
      (tried to write 0x0000000004000aa8)
      at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)

Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Tested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2022-12-12T21:45:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-12T21:45:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=691806e977a3a64895bd891878ed726cdbd282c0'/>
<id>691806e977a3a64895bd891878ed726cdbd282c0</id>
<content type='text'>
Pull thermal control updates from Rafael Wysocki:
 "These include thermal core fixes to protect thermal device operations
  against thermal device removal, other thermal core fixes and updates
  of Intel thermal control drivers.

  Specifics:

   - Fix race conditions related to thermal device operations that are
     not protected against thermal device removal (Guenter Roeck)

   - Fix error code in __thermal_cooling_device_register() (Dan
     Carpenter)

   - Validate new cooling device state (coming from user space) in
     cur_state_store() and reuse the max_state value from cooling device
     structure in the sysfs interface (Viresh Kumar)

   - Fix some possible name leaks in error paths in the thermal control
     core code (Yang Yingliang)

   - Detect TCC lock bit set in the intel_tcc_cooling driver and make it
     refuse to update the TCC offset in that case (Zhang Rui)

   - Add TCC cooling support for RaptorLake-S (Zhang Rui)

   - Prevent accidental clearing of HFI status by one of the other
     drivers using the same status register (Srinivas Pandruvada)

   - Protect clearing of thermal status bits in Intel thermal control
     drivers (Srinivas Pandruvada)

   - Allow the HFI thermal control driver to ACK an HFI event for the
     previously observed timestamp (Srinivas Pandruvada)

   - Remove a pointless die_id check from the HFI thermal driver and
     adjust the definition a data structure used by it (Ricardo Neri)"

* tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: hfi: Remove a pointless die_id check
  thermal: core: fix some possible name leaks in error paths
  thermal: intel: hfi: ACK HFI for the same timestamp
  thermal: intel: Protect clearing of thermal status bits
  thermal: intel: Prevent accidental clearing of HFI status
  thermal/core: Protect thermal device operations against thermal device removal
  thermal/core: Remove thermal_zone_set_trips()
  thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
  thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
  thermal/core: Introduce locked version of thermal_zone_device_update
  thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
  thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
  thermal/core: Delete device under thermal device zone lock
  thermal/core: Destroy thermal zone device mutex in release function
  thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
  thermal: intel: intel_tcc_cooling: Detect TCC lock bit
  thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
  thermal/core: fix error code in __thermal_cooling_device_register()
  thermal: sysfs: Reuse cdev-&gt;max_state
  thermal: Validate new state in cur_state_store()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull thermal control updates from Rafael Wysocki:
 "These include thermal core fixes to protect thermal device operations
  against thermal device removal, other thermal core fixes and updates
  of Intel thermal control drivers.

  Specifics:

   - Fix race conditions related to thermal device operations that are
     not protected against thermal device removal (Guenter Roeck)

   - Fix error code in __thermal_cooling_device_register() (Dan
     Carpenter)

   - Validate new cooling device state (coming from user space) in
     cur_state_store() and reuse the max_state value from cooling device
     structure in the sysfs interface (Viresh Kumar)

   - Fix some possible name leaks in error paths in the thermal control
     core code (Yang Yingliang)

   - Detect TCC lock bit set in the intel_tcc_cooling driver and make it
     refuse to update the TCC offset in that case (Zhang Rui)

   - Add TCC cooling support for RaptorLake-S (Zhang Rui)

   - Prevent accidental clearing of HFI status by one of the other
     drivers using the same status register (Srinivas Pandruvada)

   - Protect clearing of thermal status bits in Intel thermal control
     drivers (Srinivas Pandruvada)

   - Allow the HFI thermal control driver to ACK an HFI event for the
     previously observed timestamp (Srinivas Pandruvada)

   - Remove a pointless die_id check from the HFI thermal driver and
     adjust the definition a data structure used by it (Ricardo Neri)"

* tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: hfi: Remove a pointless die_id check
  thermal: core: fix some possible name leaks in error paths
  thermal: intel: hfi: ACK HFI for the same timestamp
  thermal: intel: Protect clearing of thermal status bits
  thermal: intel: Prevent accidental clearing of HFI status
  thermal/core: Protect thermal device operations against thermal device removal
  thermal/core: Remove thermal_zone_set_trips()
  thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
  thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
  thermal/core: Introduce locked version of thermal_zone_device_update
  thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
  thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
  thermal/core: Delete device under thermal device zone lock
  thermal/core: Destroy thermal zone device mutex in release function
  thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
  thermal: intel: intel_tcc_cooling: Detect TCC lock bit
  thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
  thermal/core: fix error code in __thermal_cooling_device_register()
  thermal: sysfs: Reuse cdev-&gt;max_state
  thermal: Validate new state in cur_state_store()
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Remove a pointless die_id check</title>
<updated>2022-12-02T19:47:52+00:00</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2022-11-28T16:20:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3a3073b69c76a8909374c5f9d610ea2f02ba3402'/>
<id>3a3073b69c76a8909374c5f9d610ea2f02ba3402</id>
<content type='text'>
die_id is an u16 quantity. On single-die systems the default value of
die_id is 0. No need to check for negative values.

Plus, removing this check makes Coverity happy.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
die_id is an u16 quantity. On single-die systems the default value of
die_id is 0. No need to check for negative values.

Plus, removing this check makes Coverity happy.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: hfi: ACK HFI for the same timestamp</title>
<updated>2022-11-23T19:13:22+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-11-16T23:14:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c0e3acdcdeb14099765de38224dfe0ad019c8482'/>
<id>c0e3acdcdeb14099765de38224dfe0ad019c8482</id>
<content type='text'>
Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Tested-by: Arshad, Adeel&lt;adeel.arshad@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Tested-by: Arshad, Adeel&lt;adeel.arshad@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: Protect clearing of thermal status bits</title>
<updated>2022-11-23T19:09:06+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-11-16T02:54:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=930d06bf071aa746db11d68d2d75660b449deff3'/>
<id>930d06bf071aa746db11d68d2d75660b449deff3</id>
<content type='text'>
The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: Prevent accidental clearing of HFI status</title>
<updated>2022-11-23T19:09:06+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-11-16T02:54:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6fe1e64b60269aa58fa00568807738025ae3bd05'/>
<id>6fe1e64b60269aa58fa00568807738025ae3bd05</id>
<content type='text'>
When there is a package thermal interrupt with PROCHOT log, it will be
processed and cleared. It is possible that there is an active HFI event
status, which is about to get processed or getting processed. While
clearing PROCHOT log bit, it will also clear HFI status bit. This means
that hardware is free to update HFI memory.

When clearing a package thermal interrupt, some processors will generate
a "general protection fault" when any of the read only bit is set to 1.

The driver maintains a mask of all read-write bits which can be set.

This mask doesn't include HFI status bit. This bit will also be cleared,
as it will be assumed read-only bit. So, add HFI status bit 26 to the
mask.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When there is a package thermal interrupt with PROCHOT log, it will be
processed and cleared. It is possible that there is an active HFI event
status, which is about to get processed or getting processed. While
clearing PROCHOT log bit, it will also clear HFI status bit. This means
that hardware is free to update HFI memory.

When clearing a package thermal interrupt, some processors will generate
a "general protection fault" when any of the read only bit is set to 1.

The driver maintains a mask of all read-write bits which can be set.

This mask doesn't include HFI status bit. This bit will also be cleared,
as it will be assumed read-only bit. So, add HFI status bit 26 to the
mask.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
