linux.git/arch/powerpc/lib/string.S, branch v4.11

powerpc: EX_TABLE macro for exception tables

2016-11-14T00:11:51+00:00

This macro is taken from s390, and allows more flexibility in
changing exception table format.

mpe: Put it in ppc_asm.h and only define one version using
stringinfy_in_c(). Add some empty definitions and headers to keep the
selftests happy.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman

ppc: move exports to definitions

2016-08-08T03:50:09+00:00

Signed-off-by: Al Viro

powerpc: Align hot loops of some string functions

2016-06-14T03:58:25+00:00

Align the hot loops in our assembly implementation of strncpy(),
strncmp() and memchr().

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman

powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp

2016-06-14T03:58:25+00:00

A number of our assembly implementations of string functions do not
align their hot loops. I was going to align them manually, but I
realised that they are are almost instruction for instruction
identical to what gcc produces, with the advantage that gcc does
align them.

In light of that, let's just remove the assembly versions.

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman

powerpc: Add 64bit optimised memcmp

2015-01-23T03:02:55+00:00

I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman

powerpc: 64bit optimised __clear_user

2012-07-03T04:14:41+00:00

I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Use the new generic strncpy_from_user() and strnlen_user()

2012-05-28T04:00:07+00:00

This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras 
Acked-by: David S. Miller 
Signed-off-by: Linus Torvalds

powerpc: Fix string library functions

2010-05-21T07:31:08+00:00

The powerpc strncmp implementation does not correctly handle a zero
length, despite the claim in 0119536cd314ef95553604208c25bc35581f7f0a
(Add hand-coded assembly strcmp).

Additionally, all the length arguments are size_t, not int, so use
PPC_LCMPI and eq instead of cmpwi and le throughout.

Signed-off-by: Andreas Schwab 
Acked-by: Paul Mackerras 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Fix handling of strncmp with zero len

2010-04-07T08:00:39+00:00

Commit 0119536c, which added the assembly version of strncmp to
powerpc, mentions that it adds two instructions to the version from
boot/string.S to allow it to handle len=0. Unfortunately, it doesn't
always return 0 when that is the case. The length is passed in r5, but
the return value is passed back in r3. In certain cases, this will
happen to work. Otherwise it will pass back the address of the first
string as the return value.

This patch lifts the len <= 0 handling code from memcpy to handle that
case.

Reported by: Christian_Sellars@symantec.com
Signed-off-by: Jeff Mahoney 
CC: 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S

2008-07-22T00:39:35+00:00

Replace ifdef clutter with the PPC_LONG and PPC_LONG_ALIGN macros
for readability.

No change to the generated code.

Signed-off-by: Michael Ellerman 
Signed-off-by: Benjamin Herrenschmidt