<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/thermal, branch v6.3</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits</title>
<updated>2023-04-11T16:12:19+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2023-04-10T17:35:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=117e4e5bd9d47b89777dbf6b37a709dcfe59520f'/>
<id>117e4e5bd9d47b89777dbf6b37a709dcfe59520f</id>
<content type='text'>
Some older processors don't allow BIT(13) and BIT(15) in the current
mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:

unchecked MSR access error: WRMSR to 0x19c (tried to
write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
(throttle_active_work+0xa6/0x1d0)

To avoid unchecked MSR issues, check CPUID for each relevant feature and
use that information to set the supported feature bits only in the
"clear" mask for cores. Do the same for the analogous package mask set
by "THERM_STATUS_CLEAR_PKG_MASK".

Introduce functions thermal_intr_init_core_clear_mask() and
thermal_intr_init_pkg_clear_mask() to set core and package mask bits,
respectively. These functions are called during initialization.

Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Link: https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
Tested-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 6.2+ &lt;stable@kernel.org&gt; # 6.2+
[ rjw: Renamed 2 funtions and 2 static variables, edited subject and
  changelog ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some older processors don't allow BIT(13) and BIT(15) in the current
mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:

unchecked MSR access error: WRMSR to 0x19c (tried to
write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
(throttle_active_work+0xa6/0x1d0)

To avoid unchecked MSR issues, check CPUID for each relevant feature and
use that information to set the supported feature bits only in the
"clear" mask for cores. Do the same for the analogous package mask set
by "THERM_STATUS_CLEAR_PKG_MASK".

Introduce functions thermal_intr_init_core_clear_mask() and
thermal_intr_init_pkg_clear_mask() to set core and package mask bits,
respectively. These functions are called during initialization.

Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Link: https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
Tested-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 6.2+ &lt;stable@kernel.org&gt; # 6.2+
[ rjw: Renamed 2 funtions and 2 static variables, edited subject and
  changelog ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'thermal-intel-fixes'</title>
<updated>2023-03-31T10:02:46+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-03-31T10:02:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=896c5150edfd5c01ed7abfcf02612f4aac6296b3'/>
<id>896c5150edfd5c01ed7abfcf02612f4aac6296b3</id>
<content type='text'>
Merge Intel thermal driver fixes for 6.3-rc5:

 - Fix handling of two recently added module parameters in the Intel
   powerclamp thermal driver (David Arcari).

 - Fix one more deadlock in the int340x thermal driver (Srinivas
   Pandruvada).

* thermal-intel-fixes:
  thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
  thermal: intel: int340x: processor_thermal: Fix additional deadlock
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge Intel thermal driver fixes for 6.3-rc5:

 - Fix handling of two recently added module parameters in the Intel
   powerclamp thermal driver (David Arcari).

 - Fix one more deadlock in the int340x thermal driver (Srinivas
   Pandruvada).

* thermal-intel-fixes:
  thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
  thermal: intel: int340x: processor_thermal: Fix additional deadlock
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: powerclamp: Fix cpumask and max_idle module parameters</title>
<updated>2023-03-30T18:04:29+00:00</updated>
<author>
<name>David Arcari</name>
<email>darcari@redhat.com</email>
</author>
<published>2023-03-30T13:42:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ae817e618d4b5d221daae34d32a39476e4bdcb36'/>
<id>ae817e618d4b5d221daae34d32a39476e4bdcb36</id>
<content type='text'>
When cpumask is specified as a module parameter the value is
overwritten by the module init routine.  This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.

When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.

Fixes: ebf519710218 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari &lt;darcari@redhat.com&gt;
Reviewed-by: Srinivas Pandruvada&lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When cpumask is specified as a module parameter the value is
overwritten by the module init routine.  This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.

When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.

Fixes: ebf519710218 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari &lt;darcari@redhat.com&gt;
Reviewed-by: Srinivas Pandruvada&lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: int340x: processor_thermal: Fix additional deadlock</title>
<updated>2023-03-29T18:36:35+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2023-03-29T15:22:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a57cc2dbb3738930d9cb361b9b473f90c8ede0b8'/>
<id>a57cc2dbb3738930d9cb361b9b473f90c8ede0b8</id>
<content type='text'>
Commit 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix
deadlock") addressed deadlock issue during user space trip update. But it
missed a case when thermal zone device is disabled when user writes 0.

Call to thermal_zone_device_disable() also causes deadlock as it also
tries to lock tz-&gt;lock, which is already claimed by trip_point_temp_store()
in the thermal core code.

Remove call to thermal_zone_device_disable() in the function
sys_set_trip_temp(), which is called from trip_point_temp_store().

Fixes: 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix deadlock")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 6.2+ &lt;stable@vger.kernel.org&gt; # 6.2+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix
deadlock") addressed deadlock issue during user space trip update. But it
missed a case when thermal zone device is disabled when user writes 0.

Call to thermal_zone_device_disable() also causes deadlock as it also
tries to lock tz-&gt;lock, which is already claimed by trip_point_temp_store()
in the thermal core code.

Remove call to thermal_zone_device_disable() in the function
sys_set_trip_temp(), which is called from trip_point_temp_store().

Fixes: 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix deadlock")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 6.2+ &lt;stable@vger.kernel.org&gt; # 6.2+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Drop excessive lockdep_assert_held() calls</title>
<updated>2023-03-28T18:49:47+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-03-28T18:43:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b57841fb0b564c61508222e885ac8f30a2811089'/>
<id>b57841fb0b564c61508222e885ac8f30a2811089</id>
<content type='text'>
The lockdep_assert_held() calls added to cooling_device_stats_setup()
and cooling_device_stats_destroy() by commit 790930f44289 ("thermal:
core: Introduce thermal_cooling_device_update()") trigger false-positive
lockdep reports in code paths that are not subject to race conditions
(before cooling device registration and after cooling device removal).

For this reason, remove the lockdep_assert_held() calls from both
cooling_device_stats_setup() and cooling_device_stats_destroy() and
add one to thermal_cooling_device_stats_reinit() that has to be called
under the cdev lock.

Fixes: 790930f44289 ("thermal: core: Introduce thermal_cooling_device_update()")
Link: https://lore.kernel.org/linux-acpi/ZCIDTLFt27Ei7+V6@ideak-desk.fi.intel.com
Reported-by: Imre Deak &lt;imre.deak@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The lockdep_assert_held() calls added to cooling_device_stats_setup()
and cooling_device_stats_destroy() by commit 790930f44289 ("thermal:
core: Introduce thermal_cooling_device_update()") trigger false-positive
lockdep reports in code paths that are not subject to race conditions
(before cooling device registration and after cooling device removal).

For this reason, remove the lockdep_assert_held() calls from both
cooling_device_stats_setup() and cooling_device_stats_destroy() and
add one to thermal_cooling_device_stats_reinit() that has to be called
under the cdev lock.

Fixes: 790930f44289 ("thermal: core: Introduce thermal_cooling_device_update()")
Link: https://lore.kernel.org/linux-acpi/ZCIDTLFt27Ei7+V6@ideak-desk.fi.intel.com
Reported-by: Imre Deak &lt;imre.deak@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'thermal-acpi'</title>
<updated>2023-03-24T16:11:27+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-03-24T16:11:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6babf38d894bec696761c10fbfccafceae76f4eb'/>
<id>6babf38d894bec696761c10fbfccafceae76f4eb</id>
<content type='text'>
Merge a fix for a recent thermal-related regression in the ACPI
processor driver.

* thermal-acpi:
  ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  thermal: core: Introduce thermal_cooling_device_update()
  thermal: core: Introduce thermal_cooling_device_present()
  ACPI: processor: Reorder acpi_processor_driver_init()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge a fix for a recent thermal-related regression in the ACPI
processor driver.

* thermal-acpi:
  ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  thermal: core: Introduce thermal_cooling_device_update()
  thermal: core: Introduce thermal_cooling_device_present()
  ACPI: processor: Reorder acpi_processor_driver_init()
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Restore behavior regarding invalid trip points</title>
<updated>2023-03-22T18:59:08+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2023-03-14T15:50:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f1b80a3878b2d76ced46a275fdfd7fb80b4f083b'/>
<id>f1b80a3878b2d76ced46a275fdfd7fb80b4f083b</id>
<content type='text'>
Commit 7c3d5c20dc16 ("thermal/core: Add a generic thermal_zone_get_trip()
function") stopped marking trip points with a zero temperature as
disabled, behavior that was originally introduced in commit 81ad4276b505
("Thermal: Ignore invalid trip points").

When using the mlxsw driver we see that when such trip points are not
disabled, the thermal subsystem repeatedly tries to set the state of the
associated cooling devices to the maximum state.

Address this by restoring the original behavior and mark trip points
with a zero temperature as disabled.

Fixes: 7c3d5c20dc16 ("thermal/core: Add a generic thermal_zone_get_trip() function")
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 7c3d5c20dc16 ("thermal/core: Add a generic thermal_zone_get_trip()
function") stopped marking trip points with a zero temperature as
disabled, behavior that was originally introduced in commit 81ad4276b505
("Thermal: Ignore invalid trip points").

When using the mlxsw driver we see that when such trip points are not
disabled, the thermal subsystem repeatedly tries to set the state of the
associated cooling devices to the maximum state.

Address this by restoring the original behavior and mark trip points
with a zero temperature as disabled.

Fixes: 7c3d5c20dc16 ("thermal/core: Add a generic thermal_zone_get_trip() function")
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Introduce thermal_cooling_device_update()</title>
<updated>2023-03-22T14:20:38+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-03-17T17:01:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=790930f44289c8209c57461b2db499fcc702e0b3'/>
<id>790930f44289c8209c57461b2db499fcc702e0b3</id>
<content type='text'>
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its -&gt;get_max_state()
callback return value.

That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state.  Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).

The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.

Note that the actual user of the new function will be added separately.

Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Reviewed-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its -&gt;get_max_state()
callback return value.

That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state.  Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).

The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.

Note that the actual user of the new function will be added separately.

Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Reviewed-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Introduce thermal_cooling_device_present()</title>
<updated>2023-03-22T14:20:38+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-03-17T16:54:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c43198af05cffa5de8d4f356c40ce4bdca066272'/>
<id>c43198af05cffa5de8d4f356c40ce4bdca066272</id>
<content type='text'>
Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Reviewed-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Reviewed-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: int340x: processor_thermal: Fix deadlock</title>
<updated>2023-03-03T19:34:49+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@intel.com</email>
</author>
<published>2023-03-03T16:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=52f04f10b9005ac4ce640da14a52ed7a146432fa'/>
<id>52f04f10b9005ac4ce640da14a52ed7a146432fa</id>
<content type='text'>
When user space updates the trip point there is a deadlock, which results
in caller gets blocked forever.

Commit 05eeee2b51b4 ("thermal/core: Protect sysfs accesses to thermal
operations with thermal zone mutex"), added a mutex for tz-&gt;lock in the
function trip_point_temp_store(). Hence, trip set callback() can't
call any thermal zone API as they are protected with the same mutex lock.

The callback here calling thermal_zone_device_enable(), which will result
in deadlock.

Move the thermal_zone_device_enable() to proc_thermal_pci_probe() to
avoid this deadlock.

Fixes: 05eeee2b51b4 ("thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@intel.com&gt;
Cc: 6.2+ &lt;stable@vger.kernel.org&gt; # 6.2+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When user space updates the trip point there is a deadlock, which results
in caller gets blocked forever.

Commit 05eeee2b51b4 ("thermal/core: Protect sysfs accesses to thermal
operations with thermal zone mutex"), added a mutex for tz-&gt;lock in the
function trip_point_temp_store(). Hence, trip set callback() can't
call any thermal zone API as they are protected with the same mutex lock.

The callback here calling thermal_zone_device_enable(), which will result
in deadlock.

Move the thermal_zone_device_enable() to proc_thermal_pci_probe() to
avoid this deadlock.

Fixes: 05eeee2b51b4 ("thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@intel.com&gt;
Cc: 6.2+ &lt;stable@vger.kernel.org&gt; # 6.2+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
