linux.git/arch/powerpc/lib/string_64.S, branch v4.11

powerpc/64: Fix naming of cache block vs. cache line

2017-02-06T08:46:04+00:00

In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).

We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Michael Ellerman

powerpc: EX_TABLE macro for exception tables

2016-11-14T00:11:51+00:00

This macro is taken from s390, and allows more flexibility in
changing exception table format.

mpe: Put it in ppc_asm.h and only define one version using
stringinfy_in_c(). Add some empty definitions and headers to keep the
selftests happy.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman

ppc: move exports to definitions

2016-08-08T03:50:09+00:00

Signed-off-by: Al Viro

powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()

2014-06-05T03:20:41+00:00

__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Optimise the 64bit optimised __clear_user

2012-07-03T04:14:48+00:00

I blame Mikey for this. He elevated my slightly dubious testcase:

to benchmark status. And naturally we need to be number 1 at creating
zeros. So lets improve __clear_user some more.

As Paul suggests we can use dcbz for large lengths. This patch gets
the destination cacheline aligned then uses dcbz on whole cachelines.

Before:
10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s

After:
10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s

39 GB/s, a new record.

Signed-off-by: Anton Blanchard 
Tested-by: Olof Johansson 
Acked-by: Olof Johansson 
Signed-off-by: Benjamin Herrenschmidt

powerpc: 64bit optimised __clear_user

2012-07-03T04:14:41+00:00

I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt