<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/lib/Makefile, branch v5.9</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>powerpc/64s: Implement queued spinlocks and rwlocks</title>
<updated>2020-07-26T14:01:23+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-07-24T13:14:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aa65ff6b18e0366db1790609956a4ac7308c5668'/>
<id>aa65ff6b18e0366db1790609956a4ac7308c5668</id>
<content type='text'>
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Test prefixed code patching</title>
<updated>2020-05-18T14:11:02+00:00</updated>
<author>
<name>Jordan Niethe</name>
<email>jniethe5@gmail.com</email>
</author>
<published>2020-05-06T03:40:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927'/>
<id>f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927</id>
<content type='text'>
Expand the code-patching self-tests to includes tests for patching
prefixed instructions.

Signed-off-by: Jordan Niethe &lt;jniethe5@gmail.com&gt;
[mpe: Use CONFIG_PPC64 not __powerpc64__]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200506034050.24806-25-jniethe5@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Expand the code-patching self-tests to includes tests for patching
prefixed instructions.

Signed-off-by: Jordan Niethe &lt;jniethe5@gmail.com&gt;
[mpe: Use CONFIG_PPC64 not __powerpc64__]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200506034050.24806-25-jniethe5@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Add a probe_user_read_inst() function</title>
<updated>2020-05-18T14:10:37+00:00</updated>
<author>
<name>Jordan Niethe</name>
<email>jniethe5@gmail.com</email>
</author>
<published>2020-05-06T03:40:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7ba68b2172c19031fdc2a2caf37328edd146e299'/>
<id>7ba68b2172c19031fdc2a2caf37328edd146e299</id>
<content type='text'>
Introduce a probe_user_read_inst() function to use in cases where
probe_user_read() is used for getting an instruction. This will be
more useful for prefixed instructions.

Signed-off-by: Jordan Niethe &lt;jniethe5@gmail.com&gt;
Reviewed-by: Alistair Popple &lt;alistair@popple.id.au&gt;
[mpe: Don't write to *inst on error, fold in __user annotations]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200506034050.24806-14-jniethe5@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a probe_user_read_inst() function to use in cases where
probe_user_read() is used for getting an instruction. This will be
more useful for prefixed instructions.

Signed-off-by: Jordan Niethe &lt;jniethe5@gmail.com&gt;
Reviewed-by: Alistair Popple &lt;alistair@popple.id.au&gt;
[mpe: Don't write to *inst on error, fold in __user annotations]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200506034050.24806-14-jniethe5@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/memcpy: Add memcpy_mcsafe for pmem</title>
<updated>2019-08-21T12:23:48+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>bsingharora@gmail.com</email>
</author>
<published>2019-08-20T08:13:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4d4a273854c9eac1a1b163830658dc825e183ad5'/>
<id>4d4a273854c9eac1a1b163830658dc825e183ad5</id>
<content type='text'>
The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to
convert machine check exceptions into a return value on failure in case
a machine check exception is encountered during the memcpy. The return
value is the number of bytes remaining to be copied.

This patch largely borrows from the copyuser_power7 logic and does not add
the VMX optimizations, largely to keep the patch simple. If needed those
optimizations can be folded in.

Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
[arbab@linux.ibm.com: Added symbol export]
Co-developed-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-7-santosh@fossix.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to
convert machine check exceptions into a return value on failure in case
a machine check exception is encountered during the memcpy. The return
value is the number of bytes remaining to be copied.

This patch largely borrows from the copyuser_power7 logic and does not add
the VMX optimizations, largely to keep the patch simple. If needed those
optimizations can be folded in.

Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
[arbab@linux.ibm.com: Added symbol export]
Co-developed-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190820081352.8641-7-santosh@fossix.org
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/32: activate ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE</title>
<updated>2019-08-05T08:53:04+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2019-07-31T06:31:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=461cef2a676e7c578d0ef62969dbcb8237c8631d'/>
<id>461cef2a676e7c578d0ef62969dbcb8237c8631d</id>
<content type='text'>
PPC32 also have flush_dcache_range() so it can also support
ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE without changes.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/a682a2f9db308c5cfe77e45aa3352e41bc9f4e33.1564554634.git.christophe.leroy@c-s.fr

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PPC32 also have flush_dcache_range() so it can also support
ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE without changes.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/a682a2f9db308c5cfe77e45aa3352e41bc9f4e33.1564554634.git.christophe.leroy@c-s.fr

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set</title>
<updated>2019-05-28T02:08:11+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2019-05-13T10:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3e3ebed3fef4878e6f1680ff98088db1a9688831'/>
<id>3e3ebed3fef4878e6f1680ff98088db1a9688831</id>
<content type='text'>
The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU,
so there is no point in building it when this config is not selected.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU,
so there is no point in building it when this config is not selected.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/lib: fix redundant inclusion of quad.o</title>
<updated>2019-05-28T02:08:11+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2019-05-13T10:00:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f8e0d0fddf87f26aed576983f752329de4c0900f'/>
<id>f8e0d0fddf87f26aed576983f752329de4c0900f</id>
<content type='text'>
quad.o is only for PPC64, and already included in obj64-y,
so it doesn't have to be in obj-y

Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
quad.o is only for PPC64, and already included in obj64-y,
so it doesn't have to be in obj-y

Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: disable KASAN instrumentation on early/critical files.</title>
<updated>2019-05-02T15:20:26+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2019-04-26T16:23:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f072015c7b742c42aa5649d22f43163cd0eb7024'/>
<id>f072015c7b742c42aa5649d22f43163cd0eb7024</id>
<content type='text'>
All files containing functions run before kasan_early_init() is called
must have KASAN instrumentation disabled.

For those file, branch profiling also have to be disabled otherwise
each if () generates a call to ftrace_likely_update().

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All files containing functions run before kasan_early_init() is called
must have KASAN instrumentation disabled.

For those file, branch profiling also have to be disabled otherwise
each if () generates a call to ftrace_likely_update().

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: prepare string/mem functions for KASAN</title>
<updated>2019-05-02T15:20:25+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@c-s.fr</email>
</author>
<published>2019-04-26T16:23:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26deb04342e343ac58ab05bc7d2345ff0be9b667'/>
<id>26deb04342e343ac58ab05bc7d2345ff0be9b667</id>
<content type='text'>
CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See commit 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
[mpe: Fixups to keep selftests working]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See commit 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
[mpe: Fixups to keep selftests working]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: sstep: Add tests for compute type instructions</title>
<updated>2019-02-23T10:04:31+00:00</updated>
<author>
<name>Sandipan Das</name>
<email>sandipan@linux.ibm.com</email>
</author>
<published>2019-02-20T06:56:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=84022ac17327c5383917f46c162fd943cf79583f'/>
<id>84022ac17327c5383917f46c162fd943cf79583f</id>
<content type='text'>
This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
[mpe: Use patch_site for the code patching]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
[mpe: Use patch_site for the code patching]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
