<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/kernel/process.c, branch v5.9</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode</title>
<updated>2020-08-27T07:41:39+00:00</updated>
<author>
<name>Michael Ellerman</name>
<email>mpe@ellerman.id.au</email>
</author>
<published>2020-08-25T09:34:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b91eb5182405b01a8aeb42e9b5207831767e97ee'/>
<id>b91eb5182405b01a8aeb42e9b5207831767e97ee</id>
<content type='text'>
The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
unnecessarily changing MSR") changed some of the handling of floating
point/vector restore.

In particular it caused current-&gt;thread.fpexc_mode to be copied into
the current MSR (via msr_check_and_set()), rather than just into
regs-&gt;msr (which is moved into MSR on return to userspace).

This can lead to a crash in the kernel if we take a floating point
exception when restoring FPSCR:

  Oops: Exception in kernel mode, sig: 8 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9
  NIP:  c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570
  REGS: c0000016b7cfb3b0 TRAP: 0700   Not tainted  (5.9.0-rc1-00098-g18445bf405cb-dirty)
  MSR:  900000000290b933 &lt;SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE&gt;  CR: 44002444  XER: 00000000
  CFAR: c00000000001a7a8 IRQMASK: 1
  GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740
  GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00
  GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000
  GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58
  GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638
  GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0
  GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28
  GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900
  NIP load_fp_state+0x4/0x214
  LR  restore_math+0x17c/0x1f0
  Call Trace:
    0xc0000016b7cfb680 (unreliable)
    __switch_to+0x330/0x460
    __schedule+0x318/0x920
    schedule+0x74/0x140
    schedule_timeout+0x318/0x3f0
    wait_for_completion+0xc8/0x210
    call_usermodehelper_exec+0x234/0x280
    do_coredump+0xedc/0x13c0
    get_signal+0x1d4/0xbe0
    do_notify_resume+0x1a0/0x490
    interrupt_exit_user_prepare+0x1c4/0x230
    interrupt_return+0x14/0x1c0
  Instruction dump:
  ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4
  782905c4 7c0008a8 7c0008a8 c8030200 &lt;fffe058e&gt; 48000088 c8030000 c8230010

Fix it by only loading the fpexc_mode value into regs-&gt;msr.

Also add a comment to explain that although VSX is subject to the
value of fpexc_mode, we don't have to handle that separately because
we only allow VSX to be enabled if FP is also enabled.

Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR")
Reported-by: Milton Miller &lt;miltonm@us.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
unnecessarily changing MSR") changed some of the handling of floating
point/vector restore.

In particular it caused current-&gt;thread.fpexc_mode to be copied into
the current MSR (via msr_check_and_set()), rather than just into
regs-&gt;msr (which is moved into MSR on return to userspace).

This can lead to a crash in the kernel if we take a floating point
exception when restoring FPSCR:

  Oops: Exception in kernel mode, sig: 8 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9
  NIP:  c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570
  REGS: c0000016b7cfb3b0 TRAP: 0700   Not tainted  (5.9.0-rc1-00098-g18445bf405cb-dirty)
  MSR:  900000000290b933 &lt;SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE&gt;  CR: 44002444  XER: 00000000
  CFAR: c00000000001a7a8 IRQMASK: 1
  GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740
  GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00
  GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000
  GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58
  GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638
  GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0
  GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28
  GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900
  NIP load_fp_state+0x4/0x214
  LR  restore_math+0x17c/0x1f0
  Call Trace:
    0xc0000016b7cfb680 (unreliable)
    __switch_to+0x330/0x460
    __schedule+0x318/0x920
    schedule+0x74/0x140
    schedule_timeout+0x318/0x3f0
    wait_for_completion+0xc8/0x210
    call_usermodehelper_exec+0x234/0x280
    do_coredump+0xedc/0x13c0
    get_signal+0x1d4/0xbe0
    do_notify_resume+0x1a0/0x490
    interrupt_exit_user_prepare+0x1c4/0x230
    interrupt_return+0x14/0x1c0
  Instruction dump:
  ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4
  782905c4 7c0008a8 7c0008a8 c8030200 &lt;fffe058e&gt; 48000088 c8030000 c8230010

Fix it by only loading the fpexc_mode value into regs-&gt;msr.

Also add a comment to explain that although VSX is subject to the
value of fpexc_mode, we don't have to handle that separately because
we only allow VSX to be enabled if FP is also enabled.

Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR")
Reported-by: Milton Miller &lt;miltonm@us.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux</title>
<updated>2020-08-07T17:33:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-07T17:33:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25d8d4eecace9de5a6a2193e4df1917afbdd3052'/>
<id>25d8d4eecace9de5a6a2193e4df1917afbdd3052</id>
<content type='text'>
Pull powerpc updates from Michael Ellerman:

 - Add support for (optionally) using queued spinlocks &amp; rwlocks.

 - Support for a new faster system call ABI using the scv instruction on
   Power9 or later.

 - Drop support for the PROT_SAO mmap/mprotect flag as it will be
   unsupported on Power10 and future processors, leaving us with no way
   to implement the functionality it requests. This risks breaking
   userspace, though we believe it is unused in practice.

 - A bug fix for, and then the removal of, our custom stack expansion
   checking. We now allow stack expansion up to the rlimit, like other
   architectures.

 - Remove the remnants of our (previously disabled) topology update
   code, which tried to react to NUMA layout changes on virtualised
   systems, but was prone to crashes and other problems.

 - Add PMU support for Power10 CPUs.

 - A change to our signal trampoline so that we don't unbalance the link
   stack (branch return predictor) in the signal delivery path.

 - Lots of other cleanups, refactorings, smaller features and so on as
   usual.

Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.

* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
  selftests/powerpc: Fix pkey syscall redefinitions
  powerpc: Fix circular dependency between percpu.h and mmu.h
  powerpc/powernv/sriov: Fix use of uninitialised variable
  selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
  powerpc/40x: Fix assembler warning about r0
  powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  cpuidle: pseries: Fixup exit latency for CEDE(0)
  cpuidle: pseries: Add function to parse extended CEDE records
  cpuidle: pseries: Set the latency-hint before entering CEDE
  selftests/powerpc: Fix online CPU selection
  powerpc/perf: Consolidate perf_callchain_user_[64|32]()
  powerpc/pseries/hotplug-cpu: Remove double free in error path
  powerpc/pseries/mobility: Add pr_debug() for device tree changes
  powerpc/pseries/mobility: Set pr_fmt()
  powerpc/cacheinfo: Warn if cache object chain becomes unordered
  powerpc/cacheinfo: Improve diagnostics about malformed cache lists
  powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
  powerpc/cacheinfo: Set pr_fmt()
  powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull powerpc updates from Michael Ellerman:

 - Add support for (optionally) using queued spinlocks &amp; rwlocks.

 - Support for a new faster system call ABI using the scv instruction on
   Power9 or later.

 - Drop support for the PROT_SAO mmap/mprotect flag as it will be
   unsupported on Power10 and future processors, leaving us with no way
   to implement the functionality it requests. This risks breaking
   userspace, though we believe it is unused in practice.

 - A bug fix for, and then the removal of, our custom stack expansion
   checking. We now allow stack expansion up to the rlimit, like other
   architectures.

 - Remove the remnants of our (previously disabled) topology update
   code, which tried to react to NUMA layout changes on virtualised
   systems, but was prone to crashes and other problems.

 - Add PMU support for Power10 CPUs.

 - A change to our signal trampoline so that we don't unbalance the link
   stack (branch return predictor) in the signal delivery path.

 - Lots of other cleanups, refactorings, smaller features and so on as
   usual.

Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.

* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
  selftests/powerpc: Fix pkey syscall redefinitions
  powerpc: Fix circular dependency between percpu.h and mmu.h
  powerpc/powernv/sriov: Fix use of uninitialised variable
  selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
  powerpc/40x: Fix assembler warning about r0
  powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  cpuidle: pseries: Fixup exit latency for CEDE(0)
  cpuidle: pseries: Add function to parse extended CEDE records
  cpuidle: pseries: Set the latency-hint before entering CEDE
  selftests/powerpc: Fix online CPU selection
  powerpc/perf: Consolidate perf_callchain_user_[64|32]()
  powerpc/pseries/hotplug-cpu: Remove double free in error path
  powerpc/pseries/mobility: Add pr_debug() for device tree changes
  powerpc/pseries/mobility: Set pr_fmt()
  powerpc/cacheinfo: Warn if cache object chain becomes unordered
  powerpc/cacheinfo: Improve diagnostics about malformed cache lists
  powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
  powerpc/cacheinfo: Set pr_fmt()
  powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'scv' support into next</title>
<updated>2020-07-23T07:43:44+00:00</updated>
<author>
<name>Michael Ellerman</name>
<email>mpe@ellerman.id.au</email>
</author>
<published>2020-07-23T07:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=335aca5f65f1a39670944930131da5f2276f888f'/>
<id>335aca5f65f1a39670944930131da5f2276f888f</id>
<content type='text'>
From Nick's cover letter:

Linux powerpc new system call instruction and ABI

System Call Vectored (scv) ABI
==============================

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is
performance (trading slower SRR0/1 with faster LR/CTR registers, and
entering the kernel with MSR[EE] and MSR[RI] left enabled, which can
reduce MSR updates. The scv instruction has 128 levels (not enough to
cover the Linux system call space).

Assignment and advertisement
----------------------------
The proposal is to assign scv levels conservatively, and advertise
them with HWCAP feature bits as we add support for more.

Linux has not enabled FSCR[SCV] yet, so executing the scv instruction
will cause the kernel to log a "SCV facility unavilable" message, and
deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a
HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it.

This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').

Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility
unavailable" message (because the processor facility is enabled).

Calling convention
------------------
The proposal is for scv 0 to provide the standard Linux system call
ABI with the following differences from sc convention[1]:

- LR is to be volatile across scv calls. This is necessary because the
  scv instruction clobbers LR. From previous discussion, this should
  be possible to deal with in GCC clobbers and CFI.

- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow
  the kernel system call exit to avoid restoring the volatile cr
  registers (although we probably still would anyway to avoid
  information leaks).

- Error handling: The consensus among kernel, glibc, and musl is to
  move to using negative return values in r3 rather than CR0[SO]=1 to
  indicate error, which matches most other architectures, and is
  closer to a function call.

Notes
-----
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch
  as submitted currently preserves them. This is to leave room for
  deciding which way to go with these. Some small benefit was found by
  preserving them[1] but I'm not convinced it's worth deviating from
  the C function call ABI just for this. Release code should follow
  the ABI.

Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html

[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
From Nick's cover letter:

Linux powerpc new system call instruction and ABI

System Call Vectored (scv) ABI
==============================

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is
performance (trading slower SRR0/1 with faster LR/CTR registers, and
entering the kernel with MSR[EE] and MSR[RI] left enabled, which can
reduce MSR updates. The scv instruction has 128 levels (not enough to
cover the Linux system call space).

Assignment and advertisement
----------------------------
The proposal is to assign scv levels conservatively, and advertise
them with HWCAP feature bits as we add support for more.

Linux has not enabled FSCR[SCV] yet, so executing the scv instruction
will cause the kernel to log a "SCV facility unavilable" message, and
deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a
HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it.

This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').

Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility
unavailable" message (because the processor facility is enabled).

Calling convention
------------------
The proposal is for scv 0 to provide the standard Linux system call
ABI with the following differences from sc convention[1]:

- LR is to be volatile across scv calls. This is necessary because the
  scv instruction clobbers LR. From previous discussion, this should
  be possible to deal with in GCC clobbers and CFI.

- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow
  the kernel system call exit to avoid restoring the volatile cr
  registers (although we probably still would anyway to avoid
  information leaks).

- Error handling: The consensus among kernel, glibc, and musl is to
  move to using negative return values in r3 rather than CR0[SO]=1 to
  indicate error, which matches most other architectures, and is
  closer to a function call.

Notes
-----
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch
  as submitted currently preserves them. This is to leave room for
  deciding which way to go with these. Some small benefit was found by
  preserving them[1] but I'm not convinced it's worth deviating from
  the C function call ABI just for this. Release code should follow
  the ABI.

Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html

[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: system call support for scv/rfscv instructions</title>
<updated>2020-07-22T13:00:27+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-06-11T08:12:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fa95f9adaee7e5cbb195d3359741120829e488b'/>
<id>7fa95f9adaee7e5cbb195d3359741120829e488b</id>
<content type='text'>
Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.

getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix ppc64e build]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.

getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix ppc64e build]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: Fix restore_math unnecessarily changing MSR</title>
<updated>2020-07-16T03:00:24+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-06-23T23:41:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=01eb01877f3386d4bd5de75909abdd0af45a5fa2'/>
<id>01eb01877f3386d4bd5de75909abdd0af45a5fa2</id>
<content type='text'>
Before returning to user, if there are missing FP/VEC/VSX bits from the
user MSR then those registers had been saved and must be restored again
before use. restore_math will decide whether to restore immediately, or
skip the restore and let fp/vec/vsx unavailable faults demand load the
registers.

Each time restore_math restores one of the FP/VSX or VEC register sets
is loaded, an 8-bit counter is incremented (load_fp and load_vec). When
these wrap to zero, restore_math no longer restores that register set
until after they are next demand faulted.

It's quite usual for those counters to have different values, so if one
wraps to zero and restore_math no longer restores its registers or user
MSR bit but the other is not zero yet does not need to be restored
(because the kernel is not frequently using the FPU), then restore_math
will be called and it will also not return in the early exit check.
This causes msr_check_and_set to test and set the MSR at every kernel
exit despite having no work to do.

This can cause workloads (e.g., a NULL syscall microbenchmark) to run
fast for a time while both counters are non-zero, then slow down when
one of the counters reaches zero, then speed up again after the second
counter reaches zero. The cost is significant, about 10% slowdown on a
NULL syscall benchmark, and the jittery behaviour is very undesirable.

Fix this by having restore_math test all conditions first, and only
update MSR if we will be loading registers.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before returning to user, if there are missing FP/VEC/VSX bits from the
user MSR then those registers had been saved and must be restored again
before use. restore_math will decide whether to restore immediately, or
skip the restore and let fp/vec/vsx unavailable faults demand load the
registers.

Each time restore_math restores one of the FP/VSX or VEC register sets
is loaded, an 8-bit counter is incremented (load_fp and load_vec). When
these wrap to zero, restore_math no longer restores that register set
until after they are next demand faulted.

It's quite usual for those counters to have different values, so if one
wraps to zero and restore_math no longer restores its registers or user
MSR bit but the other is not zero yet does not need to be restored
(because the kernel is not frequently using the FPU), then restore_math
will be called and it will also not return in the early exit check.
This causes msr_check_and_set to test and set the MSR at every kernel
exit despite having no work to do.

This can cause workloads (e.g., a NULL syscall microbenchmark) to run
fast for a time while both counters are non-zero, then slow down when
one of the counters reaches zero, then speed up again after the second
counter reaches zero. The cost is significant, about 10% slowdown on a
NULL syscall benchmark, and the jittery behaviour is very undesirable.

Fix this by having restore_math test all conditions first, and only
update MSR if we will be loading registers.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: restore_math remove TM test</title>
<updated>2020-07-16T03:00:24+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-06-23T23:41:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=891b4fe8fe3d09f20948b391f24c9fc5b7580a2b'/>
<id>891b4fe8fe3d09f20948b391f24c9fc5b7580a2b</id>
<content type='text'>
The TM test in restore_math added by commit dc16b553c949e ("powerpc:
Always restore FPU/VEC/VSX if hardware transactional memory in use") is
no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix
restoring FP/VMX facility incorrectly on interrupts"), which removed
the cases where restore_math has to restore if TM is active.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The TM test in restore_math added by commit dc16b553c949e ("powerpc:
Always restore FPU/VEC/VSX if hardware transactional memory in use") is
no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix
restoring FP/VMX facility incorrectly on interrupts"), which removed
the cases where restore_math has to restore if TM is active.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>arch: rename copy_thread_tls() back to copy_thread()</title>
<updated>2020-07-04T21:41:37+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2020-06-11T09:04:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=714acdbd1c94e7e3ab90f6b6938f1ccb27b662f0'/>
<id>714acdbd1c94e7e3ab90f6b6938f1ccb27b662f0</id>
<content type='text'>
Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
back simply copy_thread(). It's a simpler name, and doesn't imply that only
tls is copied here. This finishes an outstanding chunk of internal process
creation work since we've added clone3().

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;A
Acked-by: Stafford Horne &lt;shorne@gmail.com&gt;
Acked-by: Greentime Hu &lt;green.hu@gmail.com&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;A
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
back simply copy_thread(). It's a simpler name, and doesn't imply that only
tls is copied here. This finishes an outstanding chunk of internal process
creation work since we've added clone3().

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;A
Acked-by: Stafford Horne &lt;shorne@gmail.com&gt;
Acked-by: Greentime Hu &lt;green.hu@gmail.com&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;A
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux</title>
<updated>2020-06-21T17:02:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-21T17:02:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=75613939084f59c0848b146e54ba463dc494c433'/>
<id>75613939084f59c0848b146e54ba463dc494c433</id>
<content type='text'>
Pull powerpc fixes from Michael Ellerman:

 - One fix for the interrupt rework we did last release which broke
   KVM-PR

 - Three commits fixing some fallout from the READ_ONCE() changes
   interacting badly with our 8xx 16K pages support, which uses a pte_t
   that is a structure of 4 actual PTEs

 - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

 - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
   enabled

 - A minor fix for the SPU syscall generation

Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
Rapoport, Nicholas Piggin.

* tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Provide ptep_get() with 16k pages
  mm: Allow arches to provide ptep_get()
  mm/gup: Use huge_ptep_get() in gup_hugepte()
  powerpc/syscalls: Use the number when building SPU syscall table
  powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
  powerpc/64s: Fix KVM interrupt using wrong save area
  powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull powerpc fixes from Michael Ellerman:

 - One fix for the interrupt rework we did last release which broke
   KVM-PR

 - Three commits fixing some fallout from the READ_ONCE() changes
   interacting badly with our 8xx 16K pages support, which uses a pte_t
   that is a structure of 4 actual PTEs

 - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

 - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
   enabled

 - A minor fix for the SPU syscall generation

Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
Rapoport, Nicholas Piggin.

* tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Provide ptep_get() with 16k pages
  mm: Allow arches to provide ptep_get()
  mm/gup: Use huge_ptep_get() in gup_hugepte()
  powerpc/syscalls: Use the number when building SPU syscall table
  powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
  powerpc/64s: Fix KVM interrupt using wrong save area
  powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
</pre>
</div>
</content>
</entry>
<entry>
<title>maccess: rename probe_kernel_address to get_kernel_nofault</title>
<updated>2020-06-18T18:14:40+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-06-17T07:37:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25f12ae45fc1931a1dce3cc59f9989a9d87834b0'/>
<id>25f12ae45fc1931a1dce3cc59f9989a9d87834b0</id>
<content type='text'>
Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Also switch the argument order around, so that it acts and looks
like get_user().

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Also switch the argument order around, so that it acts and looks
like get_user().

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault</title>
<updated>2020-06-17T17:57:41+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-06-17T07:37:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c0ee37e85e0e47402b8bbe35b6cec8e06937ca58'/>
<id>c0ee37e85e0e47402b8bbe35b6cec8e06937ca58</id>
<content type='text'>
Better describe what these functions do.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Better describe what these functions do.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
