linux.git/arch/powerpc/lib/Makefile, branch v5.9

powerpc/64s: Implement queued spinlocks and rwlocks

2020-07-26T14:01:23+00:00

These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Waiman Long 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com

powerpc: Test prefixed code patching

2020-05-18T14:11:02+00:00

Expand the code-patching self-tests to includes tests for patching
prefixed instructions.

Signed-off-by: Jordan Niethe 
[mpe: Use CONFIG_PPC64 not __powerpc64__]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200506034050.24806-25-jniethe5@gmail.com

powerpc: Add a probe_user_read_inst() function

2020-05-18T14:10:37+00:00

Introduce a probe_user_read_inst() function to use in cases where
probe_user_read() is used for getting an instruction. This will be
more useful for prefixed instructions.

Signed-off-by: Jordan Niethe 
Reviewed-by: Alistair Popple 
[mpe: Don't write to *inst on error, fold in __user annotations]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200506034050.24806-14-jniethe5@gmail.com

powerpc/memcpy: Add memcpy_mcsafe for pmem

2019-08-21T12:23:48+00:00

The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to
convert machine check exceptions into a return value on failure in case
a machine check exception is encountered during the memcpy. The return
value is the number of bytes remaining to be copied.

This patch largely borrows from the copyuser_power7 logic and does not add
the VMX optimizations, largely to keep the patch simple. If needed those
optimizations can be folded in.

Signed-off-by: Balbir Singh 
[arbab@linux.ibm.com: Added symbol export]
Co-developed-by: Santosh Sivaraj 
Signed-off-by: Santosh Sivaraj 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190820081352.8641-7-santosh@fossix.org

powerpc/32: activate ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE

2019-08-05T08:53:04+00:00

PPC32 also have flush_dcache_range() so it can also support
ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE without changes.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/a682a2f9db308c5cfe77e45aa3352e41bc9f4e33.1564554634.git.christophe.leroy@c-s.fr

powerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set

2019-05-28T02:08:11+00:00

The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU,
so there is no point in building it when this config is not selected.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman

powerpc/lib: fix redundant inclusion of quad.o

2019-05-28T02:08:11+00:00

quad.o is only for PPC64, and already included in obj64-y,
so it doesn't have to be in obj-y

Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman

powerpc: disable KASAN instrumentation on early/critical files.

2019-05-02T15:20:26+00:00

All files containing functions run before kasan_early_init() is called
must have KASAN instrumentation disabled.

For those file, branch profiling also have to be disabled otherwise
each if () generates a call to ftrace_likely_update().

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman

powerpc: prepare string/mem functions for KASAN

2019-05-02T15:20:25+00:00

CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See commit 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy 
[mpe: Fixups to keep selftests working]
Signed-off-by: Michael Ellerman

powerpc: sstep: Add tests for compute type instructions

2019-02-23T10:04:31+00:00

This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das 
[mpe: Use patch_site for the code patching]
Signed-off-by: Michael Ellerman