linux.git/arch/arm64/include/asm/processor.h, branch v5.15

arm64: move preemption disablement to prctl handlers

2021-07-28T17:33:49+00:00

In the next patch, we will start reading sctlr_user from
mte_update_sctlr_user and subsequently writing a new value based on the
task's TCF setting and potentially the per-CPU TCF preference. This
means that we need to be careful to disable preemption around any
code sequences that read from sctlr_user and subsequently write to
sctlr_user and/or SCTLR_EL1, so that we don't end up writing a stale
value (based on the previous CPU's TCF preference) to either of them.

We currently have four such sequences, in the prctl handlers for
PR_SET_TAGGED_ADDR_CTRL and PR_PAC_SET_ENABLED_KEYS, as well as in
the task initialization code that resets the prctl settings. Change
the prctl handlers to disable preemption in the handlers themselves
rather than the functions that they call, and change the task
initialization code to call the respective prctl handlers instead of
setting sctlr_user directly.

As a result of this change, we no longer need the helper function
set_task_sctlr_el1, nor does its behavior make sense any more, so
remove it.

Signed-off-by: Peter Collingbourne 
Link: https://linux-review.googlesource.com/id/Ic0e8a0c00bb47d786c1e8011df0b7fe99bee4bb5
Link: https://lore.kernel.org/r/20210727205300.2554659-4-pcc@google.com
Acked-by: Will Deacon 
Signed-off-by: Catalin Marinas

arm64: mte: change ASYNC and SYNC TCF settings into bitfields

2021-07-28T17:33:43+00:00

Allow the user program to specify both ASYNC and SYNC TCF modes by
repurposing the existing constants as bitfields. This will allow the
kernel to select one of the modes on behalf of the user program. With
this patch the kernel will always select async mode, but a subsequent
patch will make this configurable.

Link: https://linux-review.googlesource.com/id/Icc5923c85a8ea284588cc399ae74fd19ec291230
Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Link: https://lore.kernel.org/r/20210727205300.2554659-3-pcc@google.com
Acked-by: Will Deacon 
Signed-off-by: Catalin Marinas

arm64: mte: rename gcr_user_excl to mte_ctrl

2021-07-28T17:33:19+00:00

We are going to use this field to store more data. To prepare for
that, rename it and change the users to rely on the bit position of
gcr_user_excl in mte_ctrl.

Link: https://linux-review.googlesource.com/id/Ie1fd18e480100655f5d22137f5b22f4f3a9f9e2e
Signed-off-by: Peter Collingbourne 
Reviewed-by: Catalin Marinas 
Link: https://lore.kernel.org/r/20210727205300.2554659-2-pcc@google.com
Acked-by: Will Deacon 
Signed-off-by: Catalin Marinas

Merge branch 'for-next/ptrauth' into for-next/core

2021-06-24T13:06:23+00:00

Allow Pointer Authentication to be configured independently for kernel
and userspace.

* for-next/ptrauth:
  arm64: Conditionally configure PTR_AUTH key of the kernel.
  arm64: Add ARM64_PTR_AUTH_KERNEL config option

Merge branch 'for-next/entry' into for-next/core

2021-06-24T13:01:55+00:00

The never-ending entry.S refactoring continues, putting us in a much
better place wrt compiler instrumentation whilst moving more of the code
into C.

* for-next/entry:
  arm64: idle: don't instrument idle code with KCOV
  arm64: entry: don't instrument entry code with KCOV
  arm64: entry: make NMI entry/exit functions static
  arm64: entry: split SDEI entry
  arm64: entry: split bad stack entry
  arm64: entry: fold el1_inv() into el1h_64_sync_handler()
  arm64: entry: handle all vectors with C
  arm64: entry: template the entry asm functions
  arm64: entry: improve bad_mode()
  arm64: entry: move bad_mode() to entry-common.c
  arm64: entry: consolidate EL1 exception returns
  arm64: entry: organise entry vectors consistently
  arm64: entry: organise entry handlers consistently
  arm64: entry: convert IRQ+FIQ handlers to C
  arm64: entry: add a call_on_irq_stack helper
  arm64: entry: move NMI preempt logic to C
  arm64: entry: move arm64_preempt_schedule_irq to entry-common.c
  arm64: entry: convert SError handlers to C
  arm64: entry: unmask IRQ+FIQ after EL0 handling
  arm64: remove redundant local_daif_mask() in bad_mode()

arm64: Conditionally configure PTR_AUTH key of the kernel.

2021-06-15T10:32:31+00:00

If the kernel is not compiled with CONFIG_ARM64_PTR_AUTH_KERNEL=y,
then no PACI/AUTI instructions are expected while the kernel is running
so the kernel's key will not be used. Write of a system registers
is expensive therefore avoid if not required.

Signed-off-by: Daniel Kiss 
Reviewed-by: Catalin Marinas 
Link: https://lore.kernel.org/r/20210613092632.93591-3-daniel.kiss@arm.com
Signed-off-by: Will Deacon

arm64: entry: convert IRQ+FIQ handlers to C

2021-06-07T10:35:55+00:00

For various reasons we'd like to convert the bulk of arm64's exception
triage logic to C. As a step towards that, this patch converts the EL1
and EL0 IRQ+FIQ triage logic to C.

Separate C functions are added for the native and compat cases so that
in subsequent patches we can handle native/compat differences in C.

Since the triage functions can now call arm64_apply_bp_hardening()
directly, the do_el0_irq_bp_hardening() wrapper function is removed.

Since the user_exit_irqoff macro is now unused, it is removed. The
user_enter_irqoff macro is still used by the ret_to_user code, and
cannot be removed at this time.

Signed-off-by: Mark Rutland 
Acked-by: Catalin Marinas 
Acked-by: Marc Zyngier 
Reviewed-by: Joey Gouly 
Cc: James Morse 
Cc: Will Deacon 
Link: https://lore.kernel.org/r/20210607094624.34689-8-mark.rutland@arm.com
Signed-off-by: Will Deacon

arm64: Change the on_*stack functions to take a size argument

2021-05-26T19:01:17+00:00

unwind_frame() was previously implicitly checking that the frame
record is in bounds of the stack by enforcing that FP is both aligned
to 16 and in bounds of the stack. Once the FP alignment requirement
is relaxed to 8 this will not be sufficient because it does not
account for the case where FP points to 8 bytes before the end of the
stack.

Make the check explicit by changing the on_*stack functions to take a
size argument and adjusting the callers to pass the appropriate sizes.

Signed-off-by: Peter Collingbourne 
Link: https://linux-review.googlesource.com/id/Ib7a3eb3eea41b0687ffaba045ceb2012d077d8b4
Reviewed-by: Mark Rutland 
Tested-by: Mark Rutland 
Link: https://lore.kernel.org/r/20210526174927.2477847-1-pcc@google.com
Signed-off-by: Will Deacon

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2021-04-26T17:25:03+00:00

Pull arm64 updates from Catalin Marinas:

- MTE asynchronous support for KASan. Previously only synchronous
(slower) mode was supported. Asynchronous is faster but does not
allow precise identification of the illegal access.

- Run kernel mode SIMD with softirqs disabled. This allows using NEON
in softirq context for crypto performance improvements. The
conditional yield support is modified to take softirqs into account
and reduce the latency.

- Preparatory patches for Apple M1: handle CPUs that only have the VHE
mode available (host kernel running at EL2), add FIQ support.

- arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers,
new functions for the HiSilicon HHA and L3C PMU, cleanups.

- Re-introduce support for execute-only user permissions but only when
the EPAN (Enhanced Privileged Access Never) architecture feature is
available.

- Disable fine-grained traps at boot and improve the documented boot
requirements.

- Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC).

- Add hierarchical eXecute Never permissions for all page tables.

- Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs
to control which PAC keys are enabled in a particular task.

- arm64 kselftests for BTI and some improvements to the MTE tests.

- Minor improvements to the compat vdso and sigpage.

- Miscellaneous cleanups.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits)
arm64/sve: Add compile time checks for SVE hooks in generic functions
arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG.
arm64: pac: Optimize kernel entry/exit key installation code paths
arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)
arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere
arm64/sve: Remove redundant system_supports_sve() tests
arm64: fpsimd: run kernel mode NEON with softirqs disabled
arm64: assembler: introduce wxN aliases for wN registers
arm64: assembler: remove conditional NEON yield macros
kasan, arm64: tests supports for HW_TAGS async mode
arm64: mte: Report async tag faults before suspend
arm64: mte: Enable async tag check fault
arm64: mte: Conditionally compile mte_enable_kernel_*()
arm64: mte: Enable TCO in functions that can read beyond buffer limits
kasan: Add report for async mode
arm64: mte: Drop arch_enable_tagging()
kasan: Add KASAN mode kernel parameter
arm64: mte: Add asynchronous mode support
arm64: Get rid of CONFIG_ARM64_VHE
arm64: Cope with CPUs stuck in VHE mode
...

arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)

2021-04-13T16:31:44+00:00

This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.

The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.

This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.

On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.

On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.

Signed-off-by: Peter Collingbourne 
Reviewed-by: Dave Martin 
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas