<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/thermal/intel, branch v5.14.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>thermal/drivers/int340x/processor_thermal: Fix tcc setting</title>
<updated>2021-07-04T16:28:04+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-06-28T21:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fe6a6de6692e7f7159c1ff42b07ecd737df712b4'/>
<id>fe6a6de6692e7f7159c1ff42b07ecd737df712b4</id>
<content type='text'>
The following fixes are done for tcc sysfs interface:
- TCC is 6 bits only from bit 29-24
- TCC of 0 is valid
- When BIT(31) is set, this register is read only
- Check for invalid tcc value
- Error for negative values

Fixes: fdf4f2fb8e899 ("drivers: thermal: processor_thermal_device: Export sysfs interface for TCC offset")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210628215803.75038-1-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following fixes are done for tcc sysfs interface:
- TCC is 6 bits only from bit 29-24
- TCC of 0 is valid
- When BIT(31) is set, this register is read only
- Check for invalid tcc value
- Error for negative values

Fixes: fdf4f2fb8e899 ("drivers: thermal: processor_thermal_device: Export sysfs interface for TCC offset")
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210628215803.75038-1-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/int340x/processor_thermal: Fix warning for return value</title>
<updated>2021-07-04T16:28:04+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-06-28T18:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ad079d981db6a4047b60c576df6430bed36bcd7d'/>
<id>ad079d981db6a4047b60c576df6430bed36bcd7d</id>
<content type='text'>
Fix smatch warnings:
drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c:258 proc_thermal_pci_probe() warn: missing error code 'ret'

Use PTR_ERR to return failure of thermal_zone_device_register().

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210628183232.62877-1-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix smatch warnings:
drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c:258 proc_thermal_pci_probe() warn: missing error code 'ret'

Use PTR_ERR to return failure of thermal_zone_device_register().

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210628183232.62877-1-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver</title>
<updated>2021-07-04T16:28:04+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-05-25T20:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=acd65d5d1cf4a3324c8970ba74632abe069fe23e'/>
<id>acd65d5d1cf4a3324c8970ba74632abe069fe23e</id>
<content type='text'>
Add a new PCI driver which register a thermal zone and allows to get
notification for threshold violation by a RW trip point. These
notifications are delivered from the device using MSI based
interrupt.

The main difference between this new PCI driver and the existing
one is that the temperature and trip points directly use PCI
MMIO instead of using ACPI methods.

This driver registers a thermal zone "TCPU_PCI" in addition to the
legacy processor thermal device, which uses ACPI companion device
to set name, temperature and trips.

This driver is enabled for AlderLake.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210525204811.3793651-3-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new PCI driver which register a thermal zone and allows to get
notification for threshold violation by a RW trip point. These
notifications are delivered from the device using MSI based
interrupt.

The main difference between this new PCI driver and the existing
one is that the temperature and trip points directly use PCI
MMIO instead of using ACPI methods.

This driver registers a thermal zone "TCPU_PCI" in addition to the
legacy processor thermal device, which uses ACPI companion device
to set name, temperature and trips.

This driver is enabled for AlderLake.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210525204811.3793651-3-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/int340x/processor_thermal: Split enumeration and processing part</title>
<updated>2021-07-04T16:28:04+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-05-25T20:48:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8fe145f7ceda7b8909cd471e1e75dd20dfbf41a5'/>
<id>8fe145f7ceda7b8909cd471e1e75dd20dfbf41a5</id>
<content type='text'>
Remove enumeration part from the processor_thermal_device to two
different modules. One for ACPI and one for PCI:
	ACPI enumeration: int3401_thermal
	PCI part: processor_thermal_device_pci_legacy

The current processor_thermal_device now just implements interface
functions to be used by the ACPI and PCI enumeration module. This is
done by:
1. Make functions proc_thermal_add() and proc_thermal_remove() non static
and export them for usage in other processor_thermal_device_pci_legacy.c
and in int3401_thermal.c.

2. Move the sysfs file creation for TCC offset and power limit attribute
group to the proc_thermal_add() from the individual enumeration callbacks
for PCI and ACPI.

3. Create new interface functions proc_thermal_mmio_add() and
proc_thermal_mmio_remove() which will be called from the
processor_thermal_device_pci_legacy module.

4. Export proc_thermal_resume(), so that it can be used by power
management callbacks.

5. Remove special check for double enumeration as it never happens.

While here, fix some cleanup on error conditions in proc_thermal_add().

No functional changes are expected with this change.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210525204811.3793651-2-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove enumeration part from the processor_thermal_device to two
different modules. One for ACPI and one for PCI:
	ACPI enumeration: int3401_thermal
	PCI part: processor_thermal_device_pci_legacy

The current processor_thermal_device now just implements interface
functions to be used by the ACPI and PCI enumeration module. This is
done by:
1. Make functions proc_thermal_add() and proc_thermal_remove() non static
and export them for usage in other processor_thermal_device_pci_legacy.c
and in int3401_thermal.c.

2. Move the sysfs file creation for TCC offset and power limit attribute
group to the proc_thermal_add() from the individual enumeration callbacks
for PCI and ACPI.

3. Create new interface functions proc_thermal_mmio_add() and
proc_thermal_mmio_remove() which will be called from the
processor_thermal_device_pci_legacy module.

4. Export proc_thermal_resume(), so that it can be used by power
management callbacks.

5. Remove special check for double enumeration as it never happens.

While here, fix some cleanup on error conditions in proc_thermal_add().

No functional changes are expected with this change.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210525204811.3793651-2-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()</title>
<updated>2021-07-04T16:28:04+00:00</updated>
<author>
<name>Andy Shevchenko</name>
<email>andriy.shevchenko@linux.intel.com</email>
</author>
<published>2021-06-18T15:34:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=da5e562fbc5676902d9007db4b05af9de7a890d5'/>
<id>da5e562fbc5676902d9007db4b05af9de7a890d5</id>
<content type='text'>
Switch to use find_first_zero_bit() instead of open-coded variant.

Signed-off-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210618153451.89246-1-andriy.shevchenko@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch to use find_first_zero_bit() instead of open-coded variant.

Signed-off-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210618153451.89246-1-andriy.shevchenko@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/int340x: processor_thermal: Export additional attributes</title>
<updated>2021-06-11T09:55:47+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-05-17T06:14:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5d6fbc96bd36cfec7bbc0578ad3986affdeb70de'/>
<id>5d6fbc96bd36cfec7bbc0578ad3986affdeb70de</id>
<content type='text'>
Export additional attributes:

ddr_data_rate (RO) : Show current DDR (Double Data Rate) data rate.
rfi_restriction (RW) : Show or set current state for RFI (Radio
			Frequency Interference) protection.

These attributes use mailbox commands to get/set information. Here
command codes are:
0x0007: Read RFI restriction
0x0107: Read DDR data rate
0x0008: Write RFI restriction

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210517061441.1921901-3-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Export additional attributes:

ddr_data_rate (RO) : Show current DDR (Double Data Rate) data rate.
rfi_restriction (RW) : Show or set current state for RFI (Radio
			Frequency Interference) protection.

These attributes use mailbox commands to get/set information. Here
command codes are:
0x0007: Read RFI restriction
0x0107: Read DDR data rate
0x0008: Write RFI restriction

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210517061441.1921901-3-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/int340x: processor_thermal: Export mailbox interface</title>
<updated>2021-06-11T09:54:42+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-05-17T06:14:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fb5a6ec80367bcccb659a421850d757a5a81767a'/>
<id>fb5a6ec80367bcccb659a421850d757a5a81767a</id>
<content type='text'>
Export the mailbox interface to be used by other modules. Also change
command id and response from u8 to u32 data type.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210517061441.1921901-2-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Export the mailbox interface to be used by other modules. Also change
command id and response from u8 to u32 data type.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210517061441.1921901-2-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-06-06T19:25:43+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-06-06T19:25:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=773ac53bbfcebb58ce03577d94ce471cadf3ea18'/>
<id>773ac53bbfcebb58ce03577d94ce471cadf3ea18</id>
<content type='text'>
Pull x86 fixes from Borislav Petkov:
 "A bunch of x86/urgent stuff accumulated for the last two weeks so
  lemme unload it to you.

  It should be all totally risk-free, of course. :-)

   - Fix out-of-spec hardware (1st gen Hygon) which does not implement
     MSR_AMD64_SEV even though the spec clearly states so, and check
     CPUID bits first.

   - Send only one signal to a task when it is a SEGV_PKUERR si_code
     type.

   - Do away with all the wankery of reserving X amount of memory in the
     first megabyte to prevent BIOS corrupting it and simply and
     unconditionally reserve the whole first megabyte.

   - Make alternatives NOP optimization work at an arbitrary position
     within the patched sequence because the compiler can put
     single-byte NOPs for alignment anywhere in the sequence (32-bit
     retpoline), vs our previous assumption that the NOPs are only
     appended.

   - Force-disable ENQCMD[S] instructions support and remove
     update_pasid() because of insufficient protection against FPU state
     modification in an interrupt context, among other xstate horrors
     which are being addressed at the moment. This one limits the
     fallout until proper enablement.

   - Use cpu_feature_enabled() in the idxd driver so that it can be
     build-time disabled through the defines in disabled-features.h.

   - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
     LVT value is read before APIC initialization so that softlockups
     during boot do not happen at least on one machine.

   - Mark all legacy interrupts as legacy vectors when the IO-APIC is
     disabled and when all legacy interrupts are routed through the PIC"

* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check SME/SEV support in CPUID first
  x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
  x86/setup: Always reserve the first 1M of RAM
  x86/alternative: Optimize single-byte NOPs at an arbitrary position
  x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
  dmaengine: idxd: Use cpu_feature_enabled()
  x86/thermal: Fix LVT thermal setup for SMI delivery mode
  x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 fixes from Borislav Petkov:
 "A bunch of x86/urgent stuff accumulated for the last two weeks so
  lemme unload it to you.

  It should be all totally risk-free, of course. :-)

   - Fix out-of-spec hardware (1st gen Hygon) which does not implement
     MSR_AMD64_SEV even though the spec clearly states so, and check
     CPUID bits first.

   - Send only one signal to a task when it is a SEGV_PKUERR si_code
     type.

   - Do away with all the wankery of reserving X amount of memory in the
     first megabyte to prevent BIOS corrupting it and simply and
     unconditionally reserve the whole first megabyte.

   - Make alternatives NOP optimization work at an arbitrary position
     within the patched sequence because the compiler can put
     single-byte NOPs for alignment anywhere in the sequence (32-bit
     retpoline), vs our previous assumption that the NOPs are only
     appended.

   - Force-disable ENQCMD[S] instructions support and remove
     update_pasid() because of insufficient protection against FPU state
     modification in an interrupt context, among other xstate horrors
     which are being addressed at the moment. This one limits the
     fallout until proper enablement.

   - Use cpu_feature_enabled() in the idxd driver so that it can be
     build-time disabled through the defines in disabled-features.h.

   - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
     LVT value is read before APIC initialization so that softlockups
     during boot do not happen at least on one machine.

   - Mark all legacy interrupts as legacy vectors when the IO-APIC is
     disabled and when all legacy interrupts are routed through the PIC"

* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check SME/SEV support in CPUID first
  x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
  x86/setup: Always reserve the first 1M of RAM
  x86/alternative: Optimize single-byte NOPs at an arbitrary position
  x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
  dmaengine: idxd: Use cpu_feature_enabled()
  x86/thermal: Fix LVT thermal setup for SMI delivery mode
  x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/thermal: Fix LVT thermal setup for SMI delivery mode</title>
<updated>2021-05-31T20:32:26+00:00</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2021-05-27T09:02:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9a90ed065a155d13db0d0ffeaad5cc54e51c90c6'/>
<id>9a90ed065a155d13db0d0ffeaad5cc54e51c90c6</id>
<content type='text'>
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &amp;= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney &lt;james@nurealm.net&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Zhang Rui &lt;rui.zhang@intel.com&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &amp;= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney &lt;james@nurealm.net&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Zhang Rui &lt;rui.zhang@intel.com&gt;
Cc: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID</title>
<updated>2021-05-14T07:37:27+00:00</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2021-04-30T12:23:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eb8500b874cf295971a6a2a04e14eb0854197a3c'/>
<id>eb8500b874cf295971a6a2a04e14eb0854197a3c</id>
<content type='text'>
After commit 81ad4276b505 ("Thermal: Ignore invalid trip points") all
user_space governor notifications via RW trip point is broken in intel
thermal drivers. This commits marks trip_points with value of 0 during
call to thermal_zone_device_register() as invalid. RW trip points can be
0 as user space will set the correct trip temperature later.

During driver init, x86_package_temp and all int340x drivers sets RW trip
temperature as 0. This results in all these trips marked as invalid by
the thermal core.

To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0.

Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 81ad4276b505 ("Thermal: Ignore invalid trip points") all
user_space governor notifications via RW trip point is broken in intel
thermal drivers. This commits marks trip_points with value of 0 during
call to thermal_zone_device_register() as invalid. RW trip points can be
0 as user space will set the correct trip temperature later.

During driver init, x86_package_temp and all int340x drivers sets RW trip
temperature as 0. This results in all these trips marked as invalid by
the thermal core.

To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0.

Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
</pre>
</div>
</content>
</entry>
</feed>
