<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/lib, branch v4.6-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next</title>
<updated>2016-03-14T09:05:14+00:00</updated>
<author>
<name>Michael Ellerman</name>
<email>mpe@ellerman.id.au</email>
</author>
<published>2016-03-14T09:05:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a1b5344620a3e6291afaf7542714ba9c391ef1c7'/>
<id>a1b5344620a3e6291afaf7542714ba9c391ef1c7</id>
<content type='text'>
Freescale updates from Scott:

"Highlights include 8xx optimizations, 32-bit checksum optimizations,
86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt
bits, and minor fixes/cleanup."
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Freescale updates from Scott:

"Highlights include 8xx optimizations, 32-bit checksum optimizations,
86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt
bits, and minor fixes/cleanup."
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: optimise csum_partial() call when len is constant</title>
<updated>2016-03-09T16:44:18+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2016-03-07T17:44:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7e393220b6e1ecfa5520d1b2ca31150b7588f458'/>
<id>7e393220b6e1ecfa5520d1b2ca31150b7588f458</id>
<content type='text'>
csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace</title>
<updated>2016-03-07T03:53:55+00:00</updated>
<author>
<name>Torsten Duwe</name>
<email>duwe@lst.de</email>
</author>
<published>2016-03-03T04:26:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9a7841ae8d6ce9b7a7cf879c9968fcf4c9545563'/>
<id>9a7841ae8d6ce9b7a7cf879c9968fcf4c9545563</id>
<content type='text'>
Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.

This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.

Signed-off-by: Torsten Duwe &lt;duwe@suse.de&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.

This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.

Signed-off-by: Torsten Duwe &lt;duwe@suse.de&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc32: optimise csum_partial() loop</title>
<updated>2016-03-05T05:03:45+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f867d556dd8525fe6ff0d22a34249528e590f994'/>
<id>f867d556dd8525fe6ff0d22a34249528e590f994</id>
<content type='text'>
On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc32: optimise a few instructions in csum_partial()</title>
<updated>2016-03-05T05:00:52+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=48821a34b1bdc5d89505cb814b3f7c166940f200'/>
<id>48821a34b1bdc5d89505cb814b3f7c166940f200</id>
<content type='text'>
r5 does contain the value to be updated, so lets use r5 all way long
for that. It makes the code more readable.

To avoid confusion, it is better to use adde instead of addc

The first addition is useless. Its only purpose is to clear carry.
As r4 is a signed int that is always positive, this can be done by
using srawi instead of srwi

Let's also remove the comment about bdnz having no overhead as it
is not correct on all powerpc, at least on MPC8xx

In the last part, in our situation, the remaining quantity of bytes
to be proceeded is between 0 and 3. Therefore, we can base that part
on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
then proceding on comparisons and substractions.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
r5 does contain the value to be updated, so lets use r5 all way long
for that. It makes the code more readable.

To avoid confusion, it is better to use adde instead of addc

The first addition is useless. Its only purpose is to clear carry.
As r4 is a signed int that is always positive, this can be done by
using srawi instead of srwi

Let's also remove the comment about bdnz having no overhead as it
is not correct on all powerpc, at least on MPC8xx

In the last part, in our situation, the remaining quantity of bytes
to be proceeded is between 0 and 3. Therefore, we can base that part
on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
then proceding on comparisons and substractions.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()</title>
<updated>2016-03-05T04:53:27+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7aef4136566b0539a1a98391181e188905e33401'/>
<id>7aef4136566b0539a1a98391181e188905e33401</id>
<content type='text'>
csum_partial_copy_generic() does the same as copy_tofrom_user and also
calculates the checksum during the copy. Unlike copy_tofrom_user(),
the existing version of csum_partial_copy_generic() doesn't take
benefit of the cache.

This patch is a rewrite of csum_partial_copy_generic() based on
copy_tofrom_user().
The previous version of csum_partial_copy_generic() was handling
errors. Now we have the checksum wrapper functions to handle the error
case like in powerpc64 so we can make the error case simple:
just return -EFAULT.
copy_tofrom_user() only has r12 available =&gt; we use it for the
checksum r7 and r8 which contains pointers to error feedback are used,
so we stack them.

On a TCP benchmark using socklib on the loopback interface on which
checksum offload and scatter/gather have been deactivated, we get
about 20% performance increase.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
csum_partial_copy_generic() does the same as copy_tofrom_user and also
calculates the checksum during the copy. Unlike copy_tofrom_user(),
the existing version of csum_partial_copy_generic() doesn't take
benefit of the cache.

This patch is a rewrite of csum_partial_copy_generic() based on
copy_tofrom_user().
The previous version of csum_partial_copy_generic() was handling
errors. Now we have the checksum wrapper functions to handle the error
case like in powerpc64 so we can make the error case simple:
just return -EFAULT.
copy_tofrom_user() only has r12 available =&gt; we use it for the
checksum r7 and r8 which contains pointers to error feedback are used,
so we stack them.

On a TCP benchmark using socklib on the loopback interface on which
checksum offload and scatter/gather have been deactivated, we get
about 20% performance increase.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: inline ip_fast_csum()</title>
<updated>2016-03-05T03:49:49+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=37e08cad8f177f7bf6226a9b3724234ac3d3c81d'/>
<id>37e08cad8f177f7bf6226a9b3724234ac3d3c81d</id>
<content type='text'>
In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better

Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
[scottwood: whitespace and cast fixes]
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better

Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
[scottwood: whitespace and cast fixes]
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc32: checksum_wrappers_64 becomes checksum_wrappers</title>
<updated>2016-03-05T03:47:47+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=03bc8b0fc87616c75c6dd060a2191e7fc8faacb6'/>
<id>03bc8b0fc87616c75c6dd060a2191e7fc8faacb6</id>
<content type='text'>
The powerpc64 checksum wrapper functions adds csum_and_copy_to_user()
which otherwise is implemented in include/net/checksum.h by using
csum_partial() then copy_to_user()

Those two wrapper fonctions are also applicable to powerpc32 as it is
based on the use of csum_partial_copy_generic() which also
exists on powerpc32

This patch renames arch/powerpc/lib/checksum_wrappers_64.c to
arch/powerpc/lib/checksum_wrappers.c and
makes it non-conditional to CONFIG_WORD_SIZE

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The powerpc64 checksum wrapper functions adds csum_and_copy_to_user()
which otherwise is implemented in include/net/checksum.h by using
csum_partial() then copy_to_user()

Those two wrapper fonctions are also applicable to powerpc32 as it is
based on the use of csum_partial_copy_generic() which also
exists on powerpc32

This patch renames arch/powerpc/lib/checksum_wrappers_64.c to
arch/powerpc/lib/checksum_wrappers.c and
makes it non-conditional to CONFIG_WORD_SIZE

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: unexport csum_tcpudp_magic</title>
<updated>2016-03-05T03:47:22+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2015-09-22T14:34:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e0f82bdf2dd04c08a167b5e9b47d65834ac84ba9'/>
<id>e0f82bdf2dd04c08a167b5e9b47d65834ac84ba9</id>
<content type='text'>
csum_tcpudp_magic is now an inline function, so there is
nothing to export

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
csum_tcpudp_magic is now an inline function, so there is
nothing to export

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Scott Wood &lt;oss@buserror.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()</title>
<updated>2015-12-01T02:52:25+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2015-10-29T00:44:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc4fbba11e4661a6a77a1f89ba32f9082e6395ff'/>
<id>dc4fbba11e4661a6a77a1f89ba32f9082e6395ff</id>
<content type='text'>
The enable_kernel_*() functions leave the relevant MSR bits enabled
until we exit the kernel sometime later. Create disable versions
that wrap the kernel use of FP, Altivec VSX or SPE.

While we don't want to disable it normally for performance reasons
(MSR writes are slow), it will be used for a debug boot option that
does this and catches bad uses in other areas of the kernel.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The enable_kernel_*() functions leave the relevant MSR bits enabled
until we exit the kernel sometime later. Create disable versions
that wrap the kernel use of FP, Altivec VSX or SPE.

While we don't want to disable it normally for performance reasons
(MSR writes are slow), it will be used for a debug boot option that
does this and catches bad uses in other areas of the kernel.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
