linux-stable.git/arch, branch v3.4.100

perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-07-28T14:06:46+00:00

commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream.

Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.

For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.

This commit deals with the issue simply by ignoring CondChgd bit.

Here is explanation in detail.

On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.

intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.

The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.

As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.

I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.

On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.

I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:

  In 18.9.1:

  "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
   indicate changes to the state of performancmonitoring hardware"

  In Table 35-2 IA-32 Architectural MSRs

  63 CondChg: status bits of this register has changed.

These are different from the bahviour I see on the actual system as I
explained above.

At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.

Signed-off-by: HATAYAMA Daisuke 
Acked-by: Don Zickus 
Signed-off-by: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86, ioremap: Speed up check for RAM pages

2014-07-17T22:39:50+00:00

commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream.

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier 
Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman

powerpc/perf: Never program book3s PMCs with values >= 0x80000000

2014-07-17T22:39:50+00:00

commit f56029410a13cae3652d1f34788045c40a13ffc7 upstream.

We are seeing a lot of PMU warnings on POWER8:

    Can't find PMC that caused IRQ

Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.

A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.

This patch takes the second option.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc/sysfs: Disable writing to PURR in guest mode

2014-07-09T17:51:22+00:00

commit d1211af3049f4c9c1d8d4eb8f8098cc4f4f0d0c7 upstream.

arch/powerpc/kernel/sysfs.c exports PURR with write permission.
This may be valid for kernel in phyp mode. But writing to
the file in guest mode causes crash due to a priviledge violation

Signed-off-by: Madhavan Srinivasan 
Signed-off-by: Benjamin Herrenschmidt 
[Backported to 3.4: adjust context]
Signed-off-by: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc/pseries: Duplicate dtl entries sometimes sent to userspace

2014-07-09T17:51:22+00:00

commit 84b073868b9d9e754ae48b828337633d1b386482 upstream.

When reading from the dispatch trace log (dtl) userspace interface, I
sometimes see duplicate entries. One example:

# hexdump -C dtl.out

00000000  07 04 00 0c 00 00 48 44  00 00 00 00 00 00 00 00
00000010  00 0c a0 b4 16 83 6d 68  00 00 00 00 00 00 00 00
00000020  00 00 00 00 10 00 13 50  80 00 00 00 00 00 d0 32

00000030  07 04 00 0c 00 00 48 44  00 00 00 00 00 00 00 00
00000040  00 0c a0 b4 16 83 6d 68  00 00 00 00 00 00 00 00
00000050  00 00 00 00 10 00 13 50  80 00 00 00 00 00 d0 32

The problem is in scan_dispatch_log() where we call dtl_consumer()
but bail out before incrementing the index.

To fix this I moved dtl_consumer() after the timebase comparison.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc/pseries/lparcfg: Fix possible overflow are more than 1026

2014-07-09T17:51:21+00:00

commit 5676005acf26ab7e924a8438ea4746e47d405762 upstream.

need set '\0' for 'local_buffer'.

SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of
rtas_data_buf may truncated in memcpy.

if contents are really truncated.
  the splpar_strlen is more than 1026. the next while loop checking will
  not find the end of buffer. that will cause memory access violation.

Signed-off-by: Chen Gang 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc: Restore registers on error exit from csum_partial_copy_generic()

2014-07-09T17:51:21+00:00

commit 8f21bd0090052e740944f9397e2be5ac7957ded7 upstream.

The csum_partial_copy_generic() function saves the PowerPC non-volatile
r14, r15, and r16 registers for the main checksum-and-copy loop.
Unfortunately, it fails to restore them upon error exit from this loop,
which results in silent corruption of these registers in the presumably
rare event of an access exception within that loop.

This commit therefore restores these register on error exit from the loop.

Signed-off-by: Paul E. McKenney 
Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
[bwh: Backported to 3.2: register name macros use lower-case 'r']
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without hypervisor

2014-07-09T17:51:21+00:00

commit f5f6cbb61610b7bf9d9d96db9c3979d62a424bab upstream.

/proc/powerpc/lparcfg is an ancient facility (though still actively used)
which allows access to some informations relative to the partition when
running underneath a PAPR compliant hypervisor.

It makes no sense on non-pseries machines. However, currently, not only
can it be created on these if the kernel has pseries support, but accessing
it on such a machine will crash due to trying to do hypervisor calls.

In fact, it should also not do HV calls on older pseries that didn't have
an hypervisor either.

Finally, it has the plumbing to be a module but is a "bool" Kconfig option.

This fixes the whole lot by turning it into a machine_device_initcall
that is only created on pseries, and adding the necessary hypervisor
check before calling the H_GET_EM_PARMS hypercall

Signed-off-by: Benjamin Herrenschmidt 
[bwh: Backported to 3.2: lparcfg_cleanup() was a bit different]
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc/smp: Section mismatch from smp_release_cpus to __initdata spinning_secondaries

2014-07-09T17:51:21+00:00

commit 8246aca7058f3f2c2ae503081777965cd8df7b90 upstream.

the smp_release_cpus is a normal funciton and called in normal environments,
  but it calls the __initdata spinning_secondaries.
  need modify spinning_secondaries to match smp_release_cpus.

the related warning:
  (the linker report boot_paca.33377, but it should be spinning_secondaries)

-----------------------------------------------------------------------------

WARNING: arch/powerpc/kernel/built-in.o(.text+0x23176): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.

WARNING: arch/powerpc/kernel/built-in.o(.text+0x231fe): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.

-----------------------------------------------------------------------------

Signed-off-by: Chen Gang 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman

powerpc: Fix emulation of illegal instructions on PowerNV platform

2014-07-09T17:51:21+00:00

commit bf593907f7236e95698a76b7c7a2bbf8b1165327 upstream.

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr).  The emulation of unimplemented instructions is currently not
working on the PowerNV platform.  The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt.  This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt().  With this, old programs that use no-longer
implemented instructions such as dcba now work again.

Signed-off-by: Paul Mackerras 
Signed-off-by: Benjamin Herrenschmidt 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
Cc: Yijing Wang 
Signed-off-by: Greg Kroah-Hartman