linux-stable.git/drivers/thermal, branch linux-3.16.y

thermal: Fix deadlock in thermal thermal_zone_device_check

2019-12-10T18:01:25+00:00

commit 163b00cde7cf2206e248789d2780121ad5e6a70b upstream.

1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone
device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid
a use-after-free issue. However, cancel_delayed_work_sync could be called
insides the WQ causing deadlock.

[54109.642398] c0   1162 kworker/u17:1   D    0 11030      2 0x00000000
[54109.642437] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.642447] c0   1162 Call trace:
[54109.642456] c0   1162  __switch_to+0x138/0x158
[54109.642467] c0   1162  __schedule+0xba4/0x1434
[54109.642480] c0   1162  schedule_timeout+0xa0/0xb28
[54109.642492] c0   1162  wait_for_common+0x138/0x2e8
[54109.642511] c0   1162  flush_work+0x348/0x40c
[54109.642522] c0   1162  __cancel_work_timer+0x180/0x218
[54109.642544] c0   1162  handle_thermal_trip+0x2c4/0x5a4
[54109.642553] c0   1162  thermal_zone_device_update+0x1b4/0x25c
[54109.642563] c0   1162  thermal_zone_device_check+0x18/0x24
[54109.642574] c0   1162  process_one_work+0x3cc/0x69c
[54109.642583] c0   1162  worker_thread+0x49c/0x7c0
[54109.642593] c0   1162  kthread+0x17c/0x1b0
[54109.642602] c0   1162  ret_from_fork+0x10/0x18
[54109.643051] c0   1162 kworker/u17:2   D    0 16245      2 0x00000000
[54109.643067] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.643077] c0   1162 Call trace:
[54109.643085] c0   1162  __switch_to+0x138/0x158
[54109.643095] c0   1162  __schedule+0xba4/0x1434
[54109.643104] c0   1162  schedule_timeout+0xa0/0xb28
[54109.643114] c0   1162  wait_for_common+0x138/0x2e8
[54109.643122] c0   1162  flush_work+0x348/0x40c
[54109.643131] c0   1162  __cancel_work_timer+0x180/0x218
[54109.643141] c0   1162  handle_thermal_trip+0x2c4/0x5a4
[54109.643150] c0   1162  thermal_zone_device_update+0x1b4/0x25c
[54109.643159] c0   1162  thermal_zone_device_check+0x18/0x24
[54109.643167] c0   1162  process_one_work+0x3cc/0x69c
[54109.643177] c0   1162  worker_thread+0x49c/0x7c0
[54109.643186] c0   1162  kthread+0x17c/0x1b0
[54109.643195] c0   1162  ret_from_fork+0x10/0x18
[54109.644500] c0   1162 cat             D    0  7766      1 0x00000001
[54109.644515] c0   1162 Call trace:
[54109.644524] c0   1162  __switch_to+0x138/0x158
[54109.644536] c0   1162  __schedule+0xba4/0x1434
[54109.644546] c0   1162  schedule_preempt_disabled+0x80/0xb0
[54109.644555] c0   1162  __mutex_lock+0x3a8/0x7f0
[54109.644563] c0   1162  __mutex_lock_slowpath+0x14/0x20
[54109.644575] c0   1162  thermal_zone_get_temp+0x84/0x360
[54109.644586] c0   1162  temp_show+0x30/0x78
[54109.644609] c0   1162  dev_attr_show+0x5c/0xf0
[54109.644628] c0   1162  sysfs_kf_seq_show+0xcc/0x1a4
[54109.644636] c0   1162  kernfs_seq_show+0x48/0x88
[54109.644656] c0   1162  seq_read+0x1f4/0x73c
[54109.644664] c0   1162  kernfs_fop_read+0x84/0x318
[54109.644683] c0   1162  __vfs_read+0x50/0x1bc
[54109.644692] c0   1162  vfs_read+0xa4/0x140
[54109.644701] c0   1162  SyS_read+0xbc/0x144
[54109.644708] c0   1162  el0_svc_naked+0x34/0x38
[54109.845800] c0   1162 D 720.000s 1->7766->7766 cat [panic]

Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device")
Signed-off-by: Wei Wang 
Signed-off-by: Zhang Rui 
Cc: Ido Schimmel 
Signed-off-by: Ben Hutchings

thermal: Fix use-after-free when unregistering thermal zone device

2019-12-10T18:01:25+00:00

commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 upstream.

thermal_zone_device_unregister() cancels the delayed work that polls the
thermal zone, but it does not wait for it to finish. This is racy with
respect to the freeing of the thermal zone device, which can result in a
use-after-free [1].

Fix this by waiting for the delayed work to finish before freeing the
thermal zone device. Note that thermal_zone_device_set_polling() is
never invoked from an atomic context, so it is safe to call
cancel_delayed_work_sync() that can block.

[1]
[  +0.002221] ==================================================================
[  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
[  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17

[  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
[  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
[  +0.000012] Call Trace:
[  +0.000021]  dump_stack+0xa9/0x10e
[  +0.000020]  print_address_description.cold.2+0x9/0x25e
[  +0.000018]  __kasan_report.cold.3+0x78/0x9d
[  +0.000016]  kasan_report+0xe/0x20
[  +0.000016]  __mutex_lock+0x1076/0x11c0
[  +0.000014]  step_wise_throttle+0x72/0x150
[  +0.000018]  handle_thermal_trip+0x167/0x760
[  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
[  +0.000019]  process_one_work+0x969/0x16f0
[  +0.000017]  worker_thread+0x91/0xc40
[  +0.000014]  kthread+0x33d/0x400
[  +0.000015]  ret_from_fork+0x3a/0x50

[  +0.000020] Allocated by task 1:
[  +0.000015]  save_stack+0x19/0x80
[  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
[  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
[  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
[  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
[  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
[  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
[  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
[  +0.000014]  local_pci_probe+0xc3/0x150
[  +0.000013]  pci_device_probe+0x280/0x410
[  +0.000013]  really_probe+0x26a/0xbb0
[  +0.000013]  driver_probe_device+0x208/0x2e0
[  +0.000013]  device_driver_attach+0xfe/0x140
[  +0.000013]  __driver_attach+0x110/0x310
[  +0.000013]  bus_for_each_dev+0x14b/0x1d0
[  +0.000013]  driver_register+0x1c0/0x400
[  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
[  +0.000014]  do_one_initcall+0x239/0x4dd
[  +0.000013]  kernel_init_freeable+0x42b/0x4e8
[  +0.000012]  kernel_init+0x11/0x18b
[  +0.000013]  ret_from_fork+0x3a/0x50

[  +0.000015] Freed by task 581:
[  +0.000013]  save_stack+0x19/0x80
[  +0.000014]  __kasan_slab_free+0x125/0x170
[  +0.000013]  kfree+0xf3/0x310
[  +0.000013]  thermal_release+0xc7/0xf0
[  +0.000014]  device_release+0x77/0x200
[  +0.000014]  kobject_put+0x1a8/0x4c0
[  +0.000014]  device_unregister+0x38/0xc0
[  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
[  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
[  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
[  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
[  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
[  +0.000014]  genl_family_rcv_msg+0x700/0xee0
[  +0.000013]  genl_rcv_msg+0xca/0x170
[  +0.000013]  netlink_rcv_skb+0x137/0x3a0
[  +0.000012]  genl_rcv+0x29/0x40
[  +0.000013]  netlink_unicast+0x49b/0x660
[  +0.000013]  netlink_sendmsg+0x755/0xc90
[  +0.000013]  __sys_sendto+0x3de/0x430
[  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
[  +0.000013]  do_syscall_64+0xa4/0x4d0
[  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
               which belongs to the cache kmalloc-2k of size 2048
[  +0.000012] The buggy address is located 1096 bytes inside of
               2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
[  +0.000007] The buggy address belongs to the page:
[  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
[  +0.000020] flags: 0x200000000010200(slab|head)
[  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
[  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[  +0.000007] page dumped because: kasan: bad access detected

[  +0.000012] Memory state around the buggy address:
[  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000008]                                                  ^
[  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000007] ==================================================================

Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer")
Reported-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
Acked-by: Jiri Pirko 
Signed-off-by: Zhang Rui 
Signed-off-by: Ben Hutchings

thermal: rcar_thermal: Prevent doing work after unbind

2019-02-11T17:53:41+00:00

commit 697ee786f15d7b65c7f3045d45fe3a05d28e0911 upstream.

When testing bind/unbind on r8a7791/koelsch:

    WARNING: CPU: 1 PID: 697 at lib/debugobjects.c:329 debug_print_object+0x8c/0xb4
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10

This happens if the workqueue runs after the device has been unbound.
Fix this by cancelling any queued work during remove.

Fixes: e0a5172e9eec7f0d ("thermal: rcar: add interrupt support")
Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Niklas Söderlund 
Signed-off-by: Eduardo Valentin 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

thermal: rcar: Make error and remove paths symmetrical with init

2019-02-11T17:53:41+00:00

commit ac71c7025ebc1ed25114b1be77dc60b7f8cb8544 upstream.

Swap interrupt disable and thermal zone unregistration in the error and
remove paths, to make them more symmetrical with the initialization
path.

Signed-off-by: Geert Uytterhoeven 
Acked-by: Kuninori Morimoto 
Signed-off-by: Eduardo Valentin 
Signed-off-by: Ben Hutchings

thermal: rcar_thermal: Prevent hardware access during system suspend

2019-02-11T17:53:41+00:00

commit 3a31386217628ffe2491695be2db933c25dde785 upstream.

On r8a7791/koelsch, sometimes the following message is printed during
system suspend:

    rcar_thermal e61f0000.thermal: thermal sensor was broken

This happens if the workqueue runs while the device is already
suspended.  Fix this by using the freezable system workqueue instead,
cfr. commit 51e20d0e3a60cf46 ("thermal: Prevent polling from happening
during system suspend").

Fixes: e0a5172e9eec7f0d ("thermal: rcar: add interrupt support")
Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Niklas Söderlund 
Signed-off-by: Eduardo Valentin 
Signed-off-by: Ben Hutchings

thermal: imx: Fix race condition in imx_thermal_probe()

2018-10-21T07:44:55+00:00

commit cf1ba1d73a33944d8c1a75370a35434bf146b8a7 upstream.

When device boots with T > T_trip_1 and requests interrupt,
the race condition takes place. The interrupt comes before
THERMAL_DEVICE_ENABLED is set. This leads to an attempt to
reading sensor value from irq and disabling the sensor, based on
the data->mode field, which expected to be THERMAL_DEVICE_ENABLED,
but still stays as THERMAL_DEVICE_DISABLED. Afher this issue
sensor is never re-enabled, as the driver state is wrong.

Fix this problem by setting the 'data' members prior to
requesting the interrupts.

Fixes: 37713a1e8e4c ("thermal: imx: implement thermal alarm interrupt handling")
Signed-off-by: Mikhail Lappo 
Signed-off-by: Fabio Estevam 
Reviewed-by: Philipp Zabel 
Acked-by: Dong Aisheng 
Signed-off-by: Zhang Rui 
Signed-off-by: Ben Hutchings

thermal: imx: register irq handler later in probe

2018-10-21T07:44:54+00:00

commit 84866ee5818e95f6e97194656777c10ac24cb9d3 upstream.

The irq handler should be registered after the tempmon
module has been initialized in a known state and the
thermal_zone and cpu_cooling device have been registered
successfully. Otherwise, if the irq is triggled earlier
before thermal probe has been finished, it may lead to
'NULL' pointer kernel panic.

Signed-off-by: Bai Ping 
Signed-off-by: Eduardo Valentin 
Signed-off-by: Ben Hutchings

thermal: hwmon: Properly report critical temperature in sysfs

2017-03-16T02:26:26+00:00

commit f37fabb8643eaf8e3b613333a72f683770c85eca upstream.

In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space.  It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.

For example:
	/sys/class/hwmon/hwmon0/temp1_crit:50000
	/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
	/sys/class/thermal/thermal_zone0/trip_point_0_type:active
	/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
	/sys/class/thermal/thermal_zone0/trip_point_3_type:critical

Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided.  However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().

Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F")
Signed-off-by: Krzysztof Kozlowski 
Signed-off-by: Zhang Rui 
Signed-off-by: Ben Hutchings

thermal: exynos: Fix unbalanced regulator disable on probe failure

2015-12-13T17:49:07+00:00

commit 824ead03b78403a21449cb7eb153a4344cd3b4c8 upstream.

During probe if the regulator could not be enabled, the error exit path
would still disable it. This could lead to unbalanced counter of
regulator enable/disable.

The patch moves code for getting and enabling the regulator from
exynos_map_dt_data() to probe function because it is really not a part
of getting Device Tree properties.

Acked-by: Lukasz Majewski 
Tested-by: Lukasz Majewski 
Reviewed-by: Alim Akhtar 
Signed-off-by: Krzysztof Kozlowski 
Fixes: 5f09a5cbd14a ("thermal: exynos: Disable the regulator on probe failure")
Signed-off-by: Eduardo Valentin 
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques

thermal: exynos: Disable the regulator on probe failure

2015-08-27T11:08:09+00:00

commit 5f09a5cbd14ae16e93866040fa44d930ff885650 upstream.

During probe the regulator (if present) was enabled but not disabled in
case of failure. So an unsuccessful probe lead to enabling the
regulator which was actually not needed because the device was not
enabled.

Additionally each deferred probe lead to increase of regulator enable
count so it would not be effectively disabled during removal of the
device.

Test HW: Exynos4412 - Trats2 board

Signed-off-by: Krzysztof Kozlowski 
Fixes: 498d22f616f6 ("thermal: exynos: Support for TMU regulator defined at device tree")
Reviewed-by: Javier Martinez Canillas 
Signed-off-by: Lukasz Majewski 
Tested-by: Lukasz Majewski 
Signed-off-by: Eduardo Valentin 
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques