<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/powerpc/lib/string.S, branch v4.14.331</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>powerpc: Fix __clear_user() with KUAP enabled</title>
<updated>2020-11-22T09:00:23+00:00</updated>
<author>
<name>Andrew Donnellan</name>
<email>ajd@linux.ibm.com</email>
</author>
<published>2020-11-19T23:52:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6f830a304bf5fb768395642e49fb4e9375469e48'/>
<id>6f830a304bf5fb768395642e49fb4e9375469e48</id>
<content type='text'>
commit 61e3acd8c693a14fc69b824cb5b08d02cb90a6e7 upstream.

The KUAP implementation adds calls in clear_user() to enable and
disable access to userspace memory. However, it doesn't add these to
__clear_user(), which is used in the ptrace regset code.

As there's only one direct user of __clear_user() (the regset code),
and the time taken to set the AMR for KUAP purposes is going to
dominate the cost of a quick access_ok(), there's not much point
having a separate path.

Rename __clear_user() to __arch_clear_user(), and make __clear_user()
just call clear_user().

Reported-by: syzbot+f25ecf4b2982d8c7a640@syzkaller-ppc64.appspotmail.com
Reported-by: Daniel Axtens &lt;dja@axtens.net&gt;
Suggested-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Andrew Donnellan &lt;ajd@linux.ibm.com&gt;
[mpe: Use __arch_clear_user() for the asm version like arm64 &amp; nds32]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20191209132221.15328-1-ajd@linux.ibm.com
Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 61e3acd8c693a14fc69b824cb5b08d02cb90a6e7 upstream.

The KUAP implementation adds calls in clear_user() to enable and
disable access to userspace memory. However, it doesn't add these to
__clear_user(), which is used in the ptrace regset code.

As there's only one direct user of __clear_user() (the regset code),
and the time taken to set the AMR for KUAP purposes is going to
dominate the cost of a quick access_ok(), there's not much point
having a separate path.

Rename __clear_user() to __arch_clear_user(), and make __clear_user()
just call clear_user().

Reported-by: syzbot+f25ecf4b2982d8c7a640@syzkaller-ppc64.appspotmail.com
Reported-by: Daniel Axtens &lt;dja@axtens.net&gt;
Suggested-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Andrew Donnellan &lt;ajd@linux.ibm.com&gt;
[mpe: Use __arch_clear_user() for the asm version like arm64 &amp; nds32]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20191209132221.15328-1-ajd@linux.ibm.com
Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/lib: Adjust .balign inside string functions for PPC32</title>
<updated>2018-08-03T05:50:24+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2018-05-18T13:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=47b3561450178b538b99ba26a4989e0f846f2e8e'/>
<id>47b3561450178b538b99ba26a4989e0f846f2e8e</id>
<content type='text'>
[ Upstream commit 1128bb7813a896bd608fb622eee3c26aaf33b473 ]

commit 87a156fb18fe1 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops

A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%

Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.

Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 1128bb7813a896bd608fb622eee3c26aaf33b473 ]

commit 87a156fb18fe1 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops

A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%

Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.

Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: EX_TABLE macro for exception tables</title>
<updated>2016-11-14T00:11:51+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2016-10-13T05:42:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=24bfa6a9e0d4fe414dfc4ad06c93e10c4c37194e'/>
<id>24bfa6a9e0d4fe414dfc4ad06c93e10c4c37194e</id>
<content type='text'>
This macro is taken from s390, and allows more flexibility in
changing exception table format.

mpe: Put it in ppc_asm.h and only define one version using
stringinfy_in_c(). Add some empty definitions and headers to keep the
selftests happy.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This macro is taken from s390, and allows more flexibility in
changing exception table format.

mpe: Put it in ppc_asm.h and only define one version using
stringinfy_in_c(). Add some empty definitions and headers to keep the
selftests happy.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ppc: move exports to definitions</title>
<updated>2016-08-08T03:50:09+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2016-01-14T04:33:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9445aa1a3062a75a4d9de78026816ebc941e7b99'/>
<id>9445aa1a3062a75a4d9de78026816ebc941e7b99</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Align hot loops of some string functions</title>
<updated>2016-06-14T03:58:25+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2016-05-25T22:39:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=87a156fb18fe15d012c3db506b6b8b001af2e58d'/>
<id>87a156fb18fe15d012c3db506b6b8b001af2e58d</id>
<content type='text'>
Align the hot loops in our assembly implementation of strncpy(),
strncmp() and memchr().

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Align the hot loops in our assembly implementation of strncpy(),
strncmp() and memchr().

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp</title>
<updated>2016-06-14T03:58:25+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2016-05-25T22:38:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3ece16632b64120df2ef566ce32afbdb4aa8af1e'/>
<id>3ece16632b64120df2ef566ce32afbdb4aa8af1e</id>
<content type='text'>
A number of our assembly implementations of string functions do not
align their hot loops. I was going to align them manually, but I
realised that they are are almost instruction for instruction
identical to what gcc produces, with the advantage that gcc does
align them.

In light of that, let's just remove the assembly versions.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A number of our assembly implementations of string functions do not
align their hot loops. I was going to align them manually, but I
realised that they are are almost instruction for instruction
identical to what gcc produces, with the advantage that gcc does
align them.

In light of that, let's just remove the assembly versions.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Add 64bit optimised memcmp</title>
<updated>2015-01-23T03:02:55+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2015-01-21T01:27:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=15c2d45d17418cc4a712608c78ff3b5f0583d83b'/>
<id>15c2d45d17418cc4a712608c78ff3b5f0583d83b</id>
<content type='text'>
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: 64bit optimised __clear_user</title>
<updated>2012-07-03T04:14:41+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2012-05-27T19:54:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=17968fbbd19f1bb281ee4eb2548764ac5664c4ec'/>
<id>17968fbbd19f1bb281ee4eb2548764ac5664c4ec</id>
<content type='text'>
I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Use the new generic strncpy_from_user() and strnlen_user()</title>
<updated>2012-05-28T04:00:07+00:00</updated>
<author>
<name>Paul Mackerras</name>
<email>paulus@samba.org</email>
</author>
<published>2012-05-28T03:03:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1629372caaaf7ef744d3b983be56b99468a68ff8'/>
<id>1629372caaaf7ef744d3b983be56b99468a68ff8</id>
<content type='text'>
This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras &lt;paulus@samba.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras &lt;paulus@samba.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Fix string library functions</title>
<updated>2010-05-21T07:31:08+00:00</updated>
<author>
<name>Andreas Schwab</name>
<email>schwab@linux-m68k.org</email>
</author>
<published>2010-05-18T08:15:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ca5d0674c37840366f04a7bbfbf78e7b5f3ce0a4'/>
<id>ca5d0674c37840366f04a7bbfbf78e7b5f3ce0a4</id>
<content type='text'>
The powerpc strncmp implementation does not correctly handle a zero
length, despite the claim in 0119536cd314ef95553604208c25bc35581f7f0a
(Add hand-coded assembly strcmp).

Additionally, all the length arguments are size_t, not int, so use
PPC_LCMPI and eq instead of cmpwi and le throughout.

Signed-off-by: Andreas Schwab &lt;schwab@linux-m68k.org&gt;
Acked-by: Paul Mackerras &lt;paulus@samba.org&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The powerpc strncmp implementation does not correctly handle a zero
length, despite the claim in 0119536cd314ef95553604208c25bc35581f7f0a
(Add hand-coded assembly strcmp).

Additionally, all the length arguments are size_t, not int, so use
PPC_LCMPI and eq instead of cmpwi and le throughout.

Signed-off-by: Andreas Schwab &lt;schwab@linux-m68k.org&gt;
Acked-by: Paul Mackerras &lt;paulus@samba.org&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
