<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/thermal, branch v6.9.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>thermal/debugfs: Prevent use-after-free from occurring after cdev removal</title>
<updated>2024-04-26T12:57:50+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2024-04-26T09:10:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d351eb0ab04c3e8109895fc33250cebbce9c11da'/>
<id>d351eb0ab04c3e8109895fc33250cebbce9c11da</id>
<content type='text'>
Since thermal_debug_cdev_remove() does not run under cdev-&gt;lock, it can
run in parallel with thermal_debug_cdev_state_update() and it may free
the struct thermal_debugfs object used by the latter after it has been
checked against NULL.

If that happens, thermal_debug_cdev_state_update() will access memory
that has been freed already causing the kernel to crash.

Address this by using cdev-&gt;lock in thermal_debug_cdev_remove() around
the cdev-&gt;debugfs value check (in case the same cdev is removed at the
same time in two different threads) and its reset to NULL.

Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since thermal_debug_cdev_remove() does not run under cdev-&gt;lock, it can
run in parallel with thermal_debug_cdev_state_update() and it may free
the struct thermal_debugfs object used by the latter after it has been
checked against NULL.

If that happens, thermal_debug_cdev_state_update() will access memory
that has been freed already causing the kernel to crash.

Address this by using cdev-&gt;lock in thermal_debug_cdev_remove() around
the cdev-&gt;debugfs value check (in case the same cdev is removed at the
same time in two different threads) and its reset to NULL.

Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/debugfs: Fix two locking issues with thermal zone debug</title>
<updated>2024-04-26T09:11:30+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2024-04-25T18:00:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c7f7c37271787a7f77d7eedc132b0b419a76b4c8'/>
<id>c7f7c37271787a7f77d7eedc132b0b419a76b4c8</id>
<content type='text'>
With the current thermal zone locking arrangement in the debugfs code,
user space can open the "mitigations" file for a thermal zone before
the zone's debugfs pointer is set which will result in a NULL pointer
dereference in tze_seq_start().

Moreover, thermal_debug_tz_remove() is not called under the thermal
zone lock, so it can run in parallel with the other functions accessing
the thermal zone's struct thermal_debugfs object.  Then, it may clear
tz-&gt;debugfs after one of those functions has checked it and the
struct thermal_debugfs object may be freed prematurely.

To address the first problem, pass a pointer to the thermal zone's
struct thermal_debugfs object to debugfs_create_file() in
thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
tze_seq_stop(), and tze_seq_show() retrieve it from s-&gt;private
instead of a pointer to the thermal zone object.  This will ensure
that tz_debugfs will be valid across the "mitigations" file accesses
until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
removes that file.

To address the second problem, use tz-&gt;lock in thermal_debug_tz_remove()
around the tz-&gt;debugfs value check (in case the same thermal zone is
removed at the same time in two different threads) and its reset to NULL.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the current thermal zone locking arrangement in the debugfs code,
user space can open the "mitigations" file for a thermal zone before
the zone's debugfs pointer is set which will result in a NULL pointer
dereference in tze_seq_start().

Moreover, thermal_debug_tz_remove() is not called under the thermal
zone lock, so it can run in parallel with the other functions accessing
the thermal zone's struct thermal_debugfs object.  Then, it may clear
tz-&gt;debugfs after one of those functions has checked it and the
struct thermal_debugfs object may be freed prematurely.

To address the first problem, pass a pointer to the thermal zone's
struct thermal_debugfs object to debugfs_create_file() in
thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
tze_seq_stop(), and tze_seq_show() retrieve it from s-&gt;private
instead of a pointer to the thermal zone object.  This will ensure
that tz_debugfs will be valid across the "mitigations" file accesses
until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
removes that file.

To address the second problem, use tz-&gt;lock in thermal_debug_tz_remove()
around the tz-&gt;debugfs value check (in case the same thermal zone is
removed at the same time in two different threads) and its reset to NULL.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/debugfs: Free all thermal zone debug memory on zone removal</title>
<updated>2024-04-26T09:11:00+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2024-04-25T17:52:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=72c1afffa4c645fe0e0f1c03e5f34395ed65b5f4'/>
<id>72c1afffa4c645fe0e0f1c03e5f34395ed65b5f4</id>
<content type='text'>
Because thermal_debug_tz_remove() does not free all memory allocated for
thermal zone diagnostics, some of that memory becomes unreachable after
freeing the thermal zone's struct thermal_debugfs object.

Address this by making thermal_debug_tz_remove() free all of the memory
in question.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Because thermal_debug_tz_remove() does not free all memory allocated for
thermal zone diagnostics, some of that memory becomes unreachable after
freeing the thermal zone's struct thermal_debugfs object.

Address this by making thermal_debug_tz_remove() free all of the memory
in question.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up()</title>
<updated>2024-04-19T13:08:19+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2024-04-15T19:02:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b552f63cd43735048bbe9bfbb7a9dcfce166fbdd'/>
<id>b552f63cd43735048bbe9bfbb7a9dcfce166fbdd</id>
<content type='text'>
The count field in struct trip_stats, representing the number of times
the zone temperature was above the trip point, needs to be incremented
in thermal_debug_tz_trip_up(), for two reasons.

First, if a trip point is crossed on the way up for the first time,
thermal_debug_update_temp() called from update_temperature() does
not see it because it has not been added to trips_crossed[] array
in the thermal zone's struct tz_debugfs object yet.  Therefore, when
thermal_debug_tz_trip_up() is called after that, the trip point's
count value is 0, and the attempt to divide by it during the average
temperature computation leads to a divide error which causes the kernel
to crash.  Setting the count to 1 before the division by incrementing it
fixes this problem.

Second, if a trip point is crossed on the way up, but it has been
crossed on the way up already before, its count value needs to be
incremented to make a record of the fact that the zone temperature is
above the trip now.  Without doing that, if the mitigations applied
after crossing the trip cause the zone temperature to drop below its
threshold, the count will not be updated for this episode at all and
the average temperature in the trip statistics record will be somewhat
higher than it should be.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The count field in struct trip_stats, representing the number of times
the zone temperature was above the trip point, needs to be incremented
in thermal_debug_tz_trip_up(), for two reasons.

First, if a trip point is crossed on the way up for the first time,
thermal_debug_update_temp() called from update_temperature() does
not see it because it has not been added to trips_crossed[] array
in the thermal zone's struct tz_debugfs object yet.  Therefore, when
thermal_debug_tz_trip_up() is called after that, the trip point's
count value is 0, and the attempt to divide by it during the average
temperature computation leads to a divide error which causes the kernel
to crash.  Setting the count to 1 before the division by incrementing it
fixes this problem.

Second, if a trip point is crossed on the way up, but it has been
crossed on the way up already before, its count value needs to be
incremented to make a record of the fact that the zone temperature is
above the trip now.  Without doing that, if the mitigations applied
after crossing the trip cause the zone temperature to drop below its
threshold, the count will not be updated for this episode at all and
the average temperature in the trip statistics record will be somewhat
higher than it should be.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ &lt;stable@vger.kernel.org&gt; # 6.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: gov_power_allocator: Allow binding without trip points</title>
<updated>2024-04-03T14:32:15+00:00</updated>
<author>
<name>Nikita Travkin</name>
<email>nikita@trvn.ru</email>
</author>
<published>2024-04-03T11:31:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=da781936e7c301e6197eb6513775748e79fb2575'/>
<id>da781936e7c301e6197eb6513775748e79fb2575</id>
<content type='text'>
IPA probe function was recently refactored to perform extra error checks
and make sure the thermal zone has trip points necessary for the IPA
operation. With this change, if a thermal zone is probed such that it
has no trip points that IPA can use, IPA will fail and the TZ won't be
created. This is the case if a platform defines a TZ without cooling
devices and only with "hot"/"critical" trip points, often found on some
Qualcomm devices [1].

Documentation across IPA code (notably get_governor_trips() kerneldoc)
suggests that IPA is supposed to handle such TZ even if it won't
actually do anything.

This commit partially reverts the previous change to allow IPA to bind
to such "empty" thermal zones.

Fixes: e83747c2f8e3 ("thermal: gov_power_allocator: Set up trip points earlier")
Link: arch/arm64/boot/dts/qcom/sc7180.dtsi#n4776 # [1]
Signed-off-by: Nikita Travkin &lt;nikita@trvn.ru&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
IPA probe function was recently refactored to perform extra error checks
and make sure the thermal zone has trip points necessary for the IPA
operation. With this change, if a thermal zone is probed such that it
has no trip points that IPA can use, IPA will fail and the TZ won't be
created. This is the case if a platform defines a TZ without cooling
devices and only with "hot"/"critical" trip points, often found on some
Qualcomm devices [1].

Documentation across IPA code (notably get_governor_trips() kerneldoc)
suggests that IPA is supposed to handle such TZ even if it won't
actually do anything.

This commit partially reverts the previous change to allow IPA to bind
to such "empty" thermal zones.

Fixes: e83747c2f8e3 ("thermal: gov_power_allocator: Set up trip points earlier")
Link: arch/arm64/boot/dts/qcom/sc7180.dtsi#n4776 # [1]
Signed-off-by: Nikita Travkin &lt;nikita@trvn.ru&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: gov_power_allocator: Allow binding without cooling devices</title>
<updated>2024-04-03T14:32:14+00:00</updated>
<author>
<name>Nikita Travkin</name>
<email>nikita@trvn.ru</email>
</author>
<published>2024-04-03T11:31:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1057c4c36ef8b236a2e28edef301da0801338c5f'/>
<id>1057c4c36ef8b236a2e28edef301da0801338c5f</id>
<content type='text'>
IPA was recently refactored to split out memory allocation into a
separate funciton. That funciton was made to return -EINVAL if there is
zero power_actors and thus no memory to allocate. This causes IPA to
fail probing when the thermal zone has no attached cooling devices.

Since cooling devices can attach after the thermal zone is created and
the governer is attached to it, failing probe due to the lack of cooling
devices is incorrect.

Change the allocate_actors_buffer() to return success when there is no
cooling devices present.

Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
Signed-off-by: Nikita Travkin &lt;nikita@trvn.ru&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
IPA was recently refactored to split out memory allocation into a
separate funciton. That funciton was made to return -EINVAL if there is
zero power_actors and thus no memory to allocate. This causes IPA to
fail probing when the thermal zone has no attached cooling devices.

Since cooling devices can attach after the thermal zone is created and
the governer is attached to it, failing probe due to the lack of cooling
devices is incorrect.

Change the allocate_actors_buffer() to return success when there is no
cooling devices present.

Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
Signed-off-by: Nikita Travkin &lt;nikita@trvn.ru&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: devfreq_cooling: Fix perf state when calculate dfc res_util</title>
<updated>2024-03-27T15:27:39+00:00</updated>
<author>
<name>Ye Zhang</name>
<email>ye.zhang@rock-chips.com</email>
</author>
<published>2024-03-21T10:21:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a26de34b3c77ae3a969654d94be49e433c947e3b'/>
<id>a26de34b3c77ae3a969654d94be49e433c947e3b</id>
<content type='text'>
The issue occurs when the devfreq cooling device uses the EM power model
and the get_real_power() callback is provided by the driver.

The EM power table is sorted ascending，can't index the table by cooling
device state，so convert cooling state to performance state by
dfc-&gt;max_state - dfc-&gt;capped_state.

Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM")
Cc: 5.11+ &lt;stable@vger.kernel.org&gt; # 5.11+
Signed-off-by: Ye Zhang &lt;ye.zhang@rock-chips.com&gt;
Reviewed-by: Dhruva Gole &lt;d-gole@ti.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The issue occurs when the devfreq cooling device uses the EM power model
and the get_real_power() callback is provided by the driver.

The EM power table is sorted ascending，can't index the table by cooling
device state，so convert cooling state to performance state by
dfc-&gt;max_state - dfc-&gt;capped_state.

Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM")
Cc: 5.11+ &lt;stable@vger.kernel.org&gt; # 5.11+
Signed-off-by: Ye Zhang &lt;ye.zhang@rock-chips.com&gt;
Reviewed-by: Dhruva Gole &lt;d-gole@ti.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "thermal: core: Don't update trip points inside the hysteresis range"</title>
<updated>2024-03-26T12:18:13+00:00</updated>
<author>
<name>Daniel Lezcano</name>
<email>daniel.lezcano@linaro.org</email>
</author>
<published>2024-03-25T22:24:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f67cf45deedb118af302534643627ce59074e8eb'/>
<id>f67cf45deedb118af302534643627ce59074e8eb</id>
<content type='text'>
It has been reported the commit cf3986f8c01d3 introduced a regression
when the temperature is wavering in the hysteresis region. The
mitigation stops leading to an uncontrolled temperature increase until
reaching the critical trip point.

Here what happens:

 * 'throttle' is when the current temperature is greater than the trip
   point temperature
 * 'target' is the mitigation level
 * 'passive' is positive when there is a mitigation, zero otherwise
 * these values are computed in the step_wise governor

Configuration:

 trip point 1: temp=95°C, hyst=5°C (passive)
 trip point 2: temp=115°C, hyst=0°C (critical)
 governor: step_wise

 1. The temperature crosses the way up the trip point 1 at 95°C

   - trend=raising
   - throttle=1, target=1
   - passive=1
   - set_trips: low=90°C, high=115°C

 2. The temperature decreases but stays in the hysteresis region at
    93°C

   - trend=dropping
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

 3. The temperature increases a bit but stays in the hysteresis region
    at 94°C (so below the trip point 1 temp 95°C)

   - trend=raising
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

 4. The temperature decreases but stays in the hysteresis region at
    93°C

   - trend=dropping
   - throttle=0, target=THERMAL_NO_TARGET
   - passive=0

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

At this point, the 'passive' value is zero, there is no mitigation,
the temperature is in the hysteresis region, the next trip point is
115°C. As 'passive' is zero, the timer to monitor the thermal zone is
disabled. Consequently if the temperature continues to increase, no
mitigation will happen and it will reach the 115°C trip point and
reboot.

Before the optimization, the high boundary would have been 95°C, thus
triggering the mitigation again and rearming the polling timer.

The optimization make sense but given the current implementation of
the step_wise governor collaborating via this 'passive' flag with the
core framework it can not work.

From a higher perspective it seems like there is a problem between the
governor which sets a variable to be used by the core framework. That
sounds akward and it would make much more sense if the core framework
controls the governor and not the opposite. But as the devil hides in
the details, there are some subtilities to be addressed before.

Elaborating those would be out of the scope this changelog. So let's
stay simple and revert the change first to fixup all broken mobile
platforms.

This reverts commit cf3986f8c01d3 ("thermal: core: Don't update trip
points inside the hysteresis range") and takes a conflict with commit
0c0c4740c9d26 ("0c0c4740c9d2 thermal: trip: Use for_each_trip() in
__thermal_zone_set_trips()") in drivers/thermal/thermal_trip.c into
account.

Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range")
Reported-by: Manaf Meethalavalappu Pallikunhi &lt;quic_manafm@quicinc.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Acked-by: Nícolas F. R. A. Prado &lt;nfraprado@collabora.com&gt;
Cc: 6.7+ &lt;stable@vger.kernel.org&gt; # 6.7+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It has been reported the commit cf3986f8c01d3 introduced a regression
when the temperature is wavering in the hysteresis region. The
mitigation stops leading to an uncontrolled temperature increase until
reaching the critical trip point.

Here what happens:

 * 'throttle' is when the current temperature is greater than the trip
   point temperature
 * 'target' is the mitigation level
 * 'passive' is positive when there is a mitigation, zero otherwise
 * these values are computed in the step_wise governor

Configuration:

 trip point 1: temp=95°C, hyst=5°C (passive)
 trip point 2: temp=115°C, hyst=0°C (critical)
 governor: step_wise

 1. The temperature crosses the way up the trip point 1 at 95°C

   - trend=raising
   - throttle=1, target=1
   - passive=1
   - set_trips: low=90°C, high=115°C

 2. The temperature decreases but stays in the hysteresis region at
    93°C

   - trend=dropping
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

 3. The temperature increases a bit but stays in the hysteresis region
    at 94°C (so below the trip point 1 temp 95°C)

   - trend=raising
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

 4. The temperature decreases but stays in the hysteresis region at
    93°C

   - trend=dropping
   - throttle=0, target=THERMAL_NO_TARGET
   - passive=0

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

At this point, the 'passive' value is zero, there is no mitigation,
the temperature is in the hysteresis region, the next trip point is
115°C. As 'passive' is zero, the timer to monitor the thermal zone is
disabled. Consequently if the temperature continues to increase, no
mitigation will happen and it will reach the 115°C trip point and
reboot.

Before the optimization, the high boundary would have been 95°C, thus
triggering the mitigation again and rearming the polling timer.

The optimization make sense but given the current implementation of
the step_wise governor collaborating via this 'passive' flag with the
core framework it can not work.

From a higher perspective it seems like there is a problem between the
governor which sets a variable to be used by the core framework. That
sounds akward and it would make much more sense if the core framework
controls the governor and not the opposite. But as the devil hides in
the details, there are some subtilities to be addressed before.

Elaborating those would be out of the scope this changelog. So let's
stay simple and revert the change first to fixup all broken mobile
platforms.

This reverts commit cf3986f8c01d3 ("thermal: core: Don't update trip
points inside the hysteresis range") and takes a conflict with commit
0c0c4740c9d26 ("0c0c4740c9d2 thermal: trip: Use for_each_trip() in
__thermal_zone_set_trips()") in drivers/thermal/thermal_trip.c into
account.

Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range")
Reported-by: Manaf Meethalavalappu Pallikunhi &lt;quic_manafm@quicinc.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Acked-by: Nícolas F. R. A. Prado &lt;nfraprado@collabora.com&gt;
Cc: 6.7+ &lt;stable@vger.kernel.org&gt; # 6.7+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'thermal-v6.9-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux</title>
<updated>2024-03-13T19:35:48+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2024-03-13T19:35:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4e7193acdecd53e79e341b0f6ab7b19596266f35'/>
<id>4e7193acdecd53e79e341b0f6ab7b19596266f35</id>
<content type='text'>
Merge additional thermal control changes for 6.9-rc1 from Daniel Lezcano:

"- Fix memory leak in the error path at probe time in the Mediatek LVTS
   driver (Christophe Jaillet)

 - Fix control buffer enablement regression on Meditek MT7896 (Frank
   Wunderlich)

 - Drop spaces before TABs in different places: thermal-of, ST drivers
   and Makefile (Geert Uytterhoeven)

 - Adjust DT binding for NXP as fsl,tmu-range min/maxItems can vary
   among several SoC versions (Fabio Estevam)

 - Add support for H616 THS controller for the Sun8i platforms. Note
   that this change relies on another change in the SoC specific code
   which is included in this branch (Martin Botka)

 - Don't fail probe due to zone registration failure because there is
   no trip points defined in the DT (Mark Brown)

 - Support variable TMU array size for new platforms (Peng Fan)

 - Adjust the DT binding for thermal-of and make the polling time not
   required and assume it is zero when not found in the DT (Konrad
   Dybcio)

 - Add r8a779h0 support in both the DT and the driver (Geert Uytterhoeven)"

* tag 'thermal-v6.9-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/rcar_gen3: Add support for R-Car V4M
  dt-bindings: thermal: rcar-gen3-thermal: Add r8a779h0 support
  thermal/of: Assume polling-delay(-passive) 0 when absent
  dt-bindings: thermal-zones: Don't require polling-delay(-passive)
  thermal/drivers/qoriq: Fix getting tmu range
  thermal/drivers/sun8i: Don't fail probe due to zone registration failure
  thermal/drivers/sun8i: Add support for H616 THS controller
  thermal/drivers/sun8i: Add SRAM register access code
  thermal/drivers/sun8i: Extend H6 calibration to support 4 sensors
  thermal/drivers/sun8i: Explain unknown H6 register value
  dt-bindings: thermal: sun8i: Add H616 THS controller
  soc: sunxi: sram: export register 0 for THS on H616
  dt-bindings: thermal: qoriq-thermal: Adjust fsl,tmu-range min/maxItems
  thermal: Drop spaces before TABs
  thermal/drivers/mediatek: Fix control buffer enablement on MT7896
  thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error handling path
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge additional thermal control changes for 6.9-rc1 from Daniel Lezcano:

"- Fix memory leak in the error path at probe time in the Mediatek LVTS
   driver (Christophe Jaillet)

 - Fix control buffer enablement regression on Meditek MT7896 (Frank
   Wunderlich)

 - Drop spaces before TABs in different places: thermal-of, ST drivers
   and Makefile (Geert Uytterhoeven)

 - Adjust DT binding for NXP as fsl,tmu-range min/maxItems can vary
   among several SoC versions (Fabio Estevam)

 - Add support for H616 THS controller for the Sun8i platforms. Note
   that this change relies on another change in the SoC specific code
   which is included in this branch (Martin Botka)

 - Don't fail probe due to zone registration failure because there is
   no trip points defined in the DT (Mark Brown)

 - Support variable TMU array size for new platforms (Peng Fan)

 - Adjust the DT binding for thermal-of and make the polling time not
   required and assume it is zero when not found in the DT (Konrad
   Dybcio)

 - Add r8a779h0 support in both the DT and the driver (Geert Uytterhoeven)"

* tag 'thermal-v6.9-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/rcar_gen3: Add support for R-Car V4M
  dt-bindings: thermal: rcar-gen3-thermal: Add r8a779h0 support
  thermal/of: Assume polling-delay(-passive) 0 when absent
  dt-bindings: thermal-zones: Don't require polling-delay(-passive)
  thermal/drivers/qoriq: Fix getting tmu range
  thermal/drivers/sun8i: Don't fail probe due to zone registration failure
  thermal/drivers/sun8i: Add support for H616 THS controller
  thermal/drivers/sun8i: Add SRAM register access code
  thermal/drivers/sun8i: Extend H6 calibration to support 4 sensors
  thermal/drivers/sun8i: Explain unknown H6 register value
  dt-bindings: thermal: sun8i: Add H616 THS controller
  soc: sunxi: sram: export register 0 for THS on H616
  dt-bindings: thermal: qoriq-thermal: Adjust fsl,tmu-range min/maxItems
  thermal: Drop spaces before TABs
  thermal/drivers/mediatek: Fix control buffer enablement on MT7896
  thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error handling path
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2024-03-13T19:03:57+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-03-13T19:03:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=259f7d5e2baf87fcbb4fabc46526c9c47fed1914'/>
<id>259f7d5e2baf87fcbb4fabc46526c9c47fed1914</id>
<content type='text'>
Pull thermal control updates from Rafael Wysocki:
 "These mostly change the thermal core in a few ways allowing thermal
  drivers to be simplified, in particular in their removal and failing
  probe handling parts that are notoriously prone to errors, and
  propagate the changes to several drivers.

  Apart from that, support for a new platform is added (Intel Lunar
  Lake-M), some bugs are fixed and some code is cleaned up, as usual.

  Specifics:

   - Store zone trips table and zone operations directly in struct
     thermal_zone_device (Rafael Wysocki)

   - Fix up flex array initialization during thermal zone device
     registration (Nathan Chancellor)

   - Rework writable trip points handling in the thermal core and
     several drivers (Rafael Wysocki)

   - Thermal core code cleanups (Dan Carpenter, Flavio Suligoi)

   - Use thermal zone accessor functions in the int340x Intel thermal
     driver (Rafael Wysocki)

   - Add Lunar Lake-M PCI ID to the int340x Intel thermal driver
     (Srinivas Pandruvada)

   - Minor fixes for thermal governors (Rafael Wysocki, Di Shen)

   - Trip point handling fixes for the iwlwifi wireless driver (Rafael
     Wysocki)

   - Code cleanups (Rafael J. Wysocki, AngeloGioacchino Del Regno)"

* tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits)
  thermal: core: remove unnecessary check in trip_point_hyst_store()
  thermal: intel: int340x_thermal: Use thermal zone accessor functions
  thermal: core: Remove excess empty line from a comment
  thermal: int340x: processor_thermal: Add Lunar Lake-M PCI ID
  thermal: core: Eliminate writable trip points masks
  thermal: of: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: imx: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  wifi: iwlwifi: mvm: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: intel: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: core: Drop the .set_trip_hyst() thermal zone operation
  thermal: core: Add flags to struct thermal_trip
  thermal: core: Move initial num_trips assignment before memcpy()
  thermal: Get rid of CONFIG_THERMAL_WRITABLE_TRIPS
  thermal: intel: Adjust ops handling during thermal zone registration
  thermal: ACPI: Constify acpi_thermal_zone_ops
  thermal: core: Store zone ops in struct thermal_zone_device
  thermal: intel: Discard trip tables after zone registration
  thermal: ACPI: Discard trips table after zone registration
  thermal: core: Store zone trips table in struct thermal_zone_device
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull thermal control updates from Rafael Wysocki:
 "These mostly change the thermal core in a few ways allowing thermal
  drivers to be simplified, in particular in their removal and failing
  probe handling parts that are notoriously prone to errors, and
  propagate the changes to several drivers.

  Apart from that, support for a new platform is added (Intel Lunar
  Lake-M), some bugs are fixed and some code is cleaned up, as usual.

  Specifics:

   - Store zone trips table and zone operations directly in struct
     thermal_zone_device (Rafael Wysocki)

   - Fix up flex array initialization during thermal zone device
     registration (Nathan Chancellor)

   - Rework writable trip points handling in the thermal core and
     several drivers (Rafael Wysocki)

   - Thermal core code cleanups (Dan Carpenter, Flavio Suligoi)

   - Use thermal zone accessor functions in the int340x Intel thermal
     driver (Rafael Wysocki)

   - Add Lunar Lake-M PCI ID to the int340x Intel thermal driver
     (Srinivas Pandruvada)

   - Minor fixes for thermal governors (Rafael Wysocki, Di Shen)

   - Trip point handling fixes for the iwlwifi wireless driver (Rafael
     Wysocki)

   - Code cleanups (Rafael J. Wysocki, AngeloGioacchino Del Regno)"

* tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits)
  thermal: core: remove unnecessary check in trip_point_hyst_store()
  thermal: intel: int340x_thermal: Use thermal zone accessor functions
  thermal: core: Remove excess empty line from a comment
  thermal: int340x: processor_thermal: Add Lunar Lake-M PCI ID
  thermal: core: Eliminate writable trip points masks
  thermal: of: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: imx: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  wifi: iwlwifi: mvm: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: intel: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: core: Drop the .set_trip_hyst() thermal zone operation
  thermal: core: Add flags to struct thermal_trip
  thermal: core: Move initial num_trips assignment before memcpy()
  thermal: Get rid of CONFIG_THERMAL_WRITABLE_TRIPS
  thermal: intel: Adjust ops handling during thermal zone registration
  thermal: ACPI: Constify acpi_thermal_zone_ops
  thermal: core: Store zone ops in struct thermal_zone_device
  thermal: intel: Discard trip tables after zone registration
  thermal: ACPI: Discard trips table after zone registration
  thermal: core: Store zone trips table in struct thermal_zone_device
  ...
</pre>
</div>
</content>
</entry>
</feed>
