linux.git/arch/powerpc/include/asm/processor.h, branch v5.1

powerpc: regain entire stack space

2019-02-23T11:31:40+00:00

thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman

powerpc: Use linux/thread_info.h in processor.h

2019-02-23T11:31:40+00:00

When we enable THREAD_INFO_IN_TASK we will remove our definition of
current_thread_info(). Instead it will come from linux/thread_info.h

So switch processor.h to include the latter, so that it can continue
to find current_thread_info().

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman

powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMIT

2019-02-23T11:31:40+00:00

Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol
won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof
the type which is the same value but will continue to work.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman

powerpc: Avoid circular header inclusion in mmu-hash.h

2019-02-23T11:31:39+00:00

When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes
asm/current.h. This generates a circular dependency. To avoid that,
asm/processor.h shall not be included in mmu-hash.h.

In order to do that, this patch moves into a new header called
asm/task_size_64/32.h all the TASK_SIZE related constants, which can
then be included in mmu-hash.h directly.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out all the TASK_SIZE constants not just 64-bit ones]
Signed-off-by: Michael Ellerman

powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS

2019-02-21T13:10:16+00:00

When calling RTAS, the stack pointer is stored in SPRN_SPRG2
in order to be able to restore it in case of machine check in RTAS.

As machine check is not a perfomance critical path, this patch
frees SPRN_SPRG2 by using a field in thread struct instead.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman

treewide: remove current_text_addr

2018-10-31T15:54:12+00:00

Prefer _THIS_IP_ defined in linux/kernel.h.

Most definitions of current_text_addr were the same as _THIS_IP_, but
a few archs had inline assembly instead.

This patch removes the final call site of current_text_addr, making all
of the definitions dead code.

[akpm@linux-foundation.org: fix arch/csky/include/asm/processor.h]
Link: http://lkml.kernel.org/r/20180911182413.180715-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

powerpc/64s/hash: Add a SLB preload cache

2018-10-14T07:04:09+00:00

When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman

powerpc/64: Interrupts save PPR on stack rather than thread_struct

2018-10-14T07:04:09+00:00

PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman

Revert "convert SLB miss handlers to C" and subsequent commits

2018-10-03T05:32:49+00:00

This reverts commits:
  5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C")
  8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca")
  655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps")
  2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
  89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache")

This series had a few bugs, and the fixes are not all trivial. So
revert most of it for now.

Signed-off-by: Michael Ellerman

powerpc/64s/hash: Add a SLB preload cache

2018-09-19T12:01:56+00:00

When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman