linux.git/arch/arm64/kernel/module.c, branch v6.6

arm64: module: rework module VA range selection

2023-06-06T16:39:06+00:00

Currently, the modules region is 128M in size, which is a problem for
some large modules. Shanker reports [1] that the NVIDIA GPU driver alone
can consume 110M of module space in some configurations. We'd like to
make the modules region a full 2G such that we can always make use of a
2G range.

It's possible to build kernel images which are larger than 128M in some
configurations, such as when many debug options are selected and many
drivers are built in. In these configurations, we can't legitimately
select a base for a 128M module region, though we currently select a
value for which allocation will fail. It would be nicer to have a
diagnostic message in this case.

Similarly, in theory it's possible to build a kernel image which is
larger than 2G and which cannot support modules. While this isn't likely
to be the case for any realistic kernel deplyed in the field, it would
be nice if we could print a diagnostic in this case.

This patch reworks the module VA range selection to use a 2G range, and
improves handling of cases where we cannot select legitimate module
regions. We now attempt to select a 128M region and a 2G region:

* The 128M region is selected such that modules can use direct branches
  (with JUMP26/CALL26 relocations) to branch to kernel code and other
  modules, and so that modules can reference data and text (using PREL32
  relocations) anywhere in the kernel image and other modules.

  This region covers the entire kernel image (rather than just the text)
  to ensure that all PREL32 relocations are in range even when the
  kernel data section is absurdly large. Where we cannot allocate from
  this region, we'll fall back to the full 2G region.

* The 2G region is selected such that modules can use direct branches
  with PLTs to branch to kernel code and other modules, and so that
  modules can use reference data and text (with PREL32 relocations) in
  the kernel image and other modules.

  This region covers the entire kernel image, and the 128M region (if
  one is selected).

The two module regions are randomized independently while ensuring the
constraints described above.

[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/

Signed-off-by: Mark Rutland 
Reviewed-by: Ard Biesheuvel 
Cc: Shanker Donthineni 
Cc: Will Deacon 
Tested-by: Shanker Donthineni 
Link: https://lore.kernel.org/r/20230530110328.2213762-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

arm64: module: mandate MODULE_PLTS

2023-06-06T16:39:05+00:00

Contemporary kernels and modules can be relatively large, especially
when common debug options are enabled. Using GCC 12.1.0, a v6.3-rc7
defconfig kernel is ~38M, and with PROVE_LOCKING + KASAN_INLINE enabled
this expands to ~117M. Shanker reports [1] that the NVIDIA GPU driver
alone can consume 110M of module space in some configurations.

Both KASLR and ARM64_ERRATUM_843419 select MODULE_PLTS, so anyone
wanting a kernel to have KASLR or run on Cortex-A53 will have
MODULE_PLTS selected. This is the case in defconfig and distribution
kernels (e.g. Debian, Android, etc).

Practically speaking, this means we're very likely to need MODULE_PLTS
and while it's almost guaranteed that MODULE_PLTS will be selected, it
is possible to disable support, and we have to maintain some awkward
special cases for such unusual configurations.

This patch removes the MODULE_PLTS config option, with the support code
always enabled if MODULES is selected. This results in a slight
simplification, and will allow for further improvement in subsequent
patches.

For any config which currently selects MODULE_PLTS, there will be no
functional change as a result of this patch.

[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/

Signed-off-by: Mark Rutland 
Reviewed-by: Ard Biesheuvel 
Cc: Shanker Donthineni 
Cc: Will Deacon 
Tested-by: Shanker Donthineni 
Link: https://lore.kernel.org/r/20230530110328.2213762-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

arm64: module: move module randomization to module.c

2023-06-06T16:39:05+00:00

When CONFIG_RANDOMIZE_BASE=y, module_alloc_base is a variable which is
configured by kaslr_module_init() in kaslr.c, and otherwise it is an
expression defined in module.h.

As kaslr_module_init() is no longer tightly coupled with the KASLR
initialization code, we can centralize this in module.c.

This patch moves kaslr_module_init() to module.c, making
module_alloc_base a static variable, and removing redundant includes from
kaslr.c. For the defintion of struct arm64_ftr_override we must include
, which was previously included transitively via
another header.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Reviewed-by: Ard Biesheuvel 
Cc: Will Deacon 
Tested-by: Shanker Donthineni 
Link: https://lore.kernel.org/r/20230530110328.2213762-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

arm64: module: remove old !KASAN_VMALLOC logic

2023-06-06T16:39:05+00:00

Historically, KASAN could be selected with or without KASAN_VMALLOC, and
we had to be very careful where to place modules when KASAN_VMALLOC was
not selected.

However, since commit:

  f6f37d9320a11e90 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")

Selecting CONFIG_KASAN on arm64 will also select CONFIG_KASAN_VMALLOC,
and so the logic for handling CONFIG_KASAN without CONFIG_KASAN_VMALLOC
is redundant and can be removed.

Note: the "kasan.vmalloc={on,off}" option which only exists for HW_TAGS
changes whether the vmalloc region is given non-match-all tags, and does
not affect the page table manipulation code.

The VM_DEFER_KMEMLEAK flag was only necessary for !CONFIG_KASAN_VMALLOC
as described in its introduction in commit:

  60115fa54ad7b913 ("mm: defer kmemleak object creation of module_alloc()")

... and therefore it can also be removed.

Remove the redundant logic for !CONFIG_KASAN_VMALLOC. At the same time,
add the missing braces around the multi-line conditional block in
arch/arm64/kernel/module.c.

Suggested-by: Ard Biesheuvel 
Signed-off-by: Mark Rutland 
Reviewed-by: Ard Biesheuvel 
Cc: Alexander Potapenko 
Cc: Andrew Morton 
Cc: Andrey Konovalov 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Will Deacon 
Tested-by: Shanker Donthineni 
Link: https://lore.kernel.org/r/20230530110328.2213762-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

Merge branch 'for-next/ftrace' into for-next/core

2022-12-06T11:07:39+00:00

* for-next/ftrace:
  ftrace: arm64: remove static ftrace
  ftrace: arm64: move from REGS to ARGS
  ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
  ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
  ftrace: pass fregs to arch_ftrace_set_direct_caller()

ftrace: arm64: move from REGS to ARGS

2022-11-18T13:56:41+00:00

This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).

FTRACE_WITH_REGS has been supported on arm64 since commit:

  3b23e4991fb66f6d ("arm64: implement ftrace with regs")

As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:

(1) To make it possible to use the ftrace graph tracer with pointer
    authentication, where it's necessary to snapshot/manipulate the LR
    before it is signed by the instrumented function.

(2) To make it possible to implement LIVEPATCH in future, where we need
    to hook function entry before an instrumented function manipulates
    the stack or argument registers. Practically speaking, we need to
    preserve the argument/return registers, PC, LR, and SP.

Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:

  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst

Per AAPCS64, all function call argument and return values are held in
the following GPRs:

* X0 - X7 : parameter / result registers
* X8      : indirect result location register
* SP      : stack pointer (AKA SP)

Additionally, ad function call boundaries, the following GPRs hold
context/return information:

* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)

... and for ftrace we need to capture the instrumented address:

 * PC  : program counter

No other GPRs are relevant, as none of the other arguments hold
parameters or return values:

* X9  - X17 : temporaries, may be clobbered
* X18       : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved

This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.

This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.

As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.

I've tested this with the ftrace selftests, where there are no
unexpected failures.

Co-developed-by: Florent Revest 
Signed-off-by: Mark Rutland 
Signed-off-by: Florent Revest 
Cc: Catalin Marinas 
Cc: Masami Hiramatsu 
Cc: Steven Rostedt 
Cc: Will Deacon 
Reviewed-by: Masami Hiramatsu (Google) 
Reviewed-by: Steven Rostedt (Google) 
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon

arm64: implement dynamic shadow call stack for Clang

2022-11-09T18:06:35+00:00

Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.

This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.

Signed-off-by: Ard Biesheuvel 
Reviewed-by: Sami Tolvanen 
Tested-by: Sami Tolvanen 
Link: https://lore.kernel.org/r/20221027155908.1940624-4-ardb@kernel.org
Signed-off-by: Will Deacon

arm64: module: move find_section to header

2022-09-09T11:27:25+00:00

Move it to the header so that the implementation can be shared
by the alternatives code.

Signed-off-by: Joey Gouly 
Cc: Will Deacon 
Acked-by: Mark Rutland 
Link: https://lore.kernel.org/r/20220830104833.34636-2-joey.gouly@arm.com
Signed-off-by: Catalin Marinas

kasan, arm64: don't tag executable vmalloc allocations

2022-03-25T02:06:48+00:00

Besides asking vmalloc memory to be executable via the prot argument of
__vmalloc_node_range() (see the previous patch), the kernel can skip that
bit and instead mark memory as executable via set_memory_x().

Once tag-based KASAN modes start tagging vmalloc allocations, executing
code from such allocations will lead to the PC register getting a tag,
which is not tolerated by the kernel.

Generic kernel code typically allocates memory via module_alloc() if it
intends to mark memory as executable.  (On arm64 module_alloc() uses
__vmalloc_node_range() without setting the executable bit).

Thus, reset pointer tags of pointers returned from module_alloc().

However, on arm64 there's an exception: the eBPF subsystem.  Instead of
using module_alloc(), it uses vmalloc() (via bpf_jit_alloc_exec()) to
allocate its JIT region.

Thus, reset pointer tags of pointers returned from bpf_jit_alloc_exec().

Resetting tags for these pointers results in untagged pointers being
passed to set_memory_x().  This causes conflicts in arithmetic checks in
change_memory_common(), as vm_struct->addr pointer returned by
find_vm_area() is tagged.

Reset pointer tag of find_vm_area(addr)->addr in change_memory_common().

Link: https://lkml.kernel.org/r/b7b2595423340cd7d76b770e5d519acf3b72f0ab.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov 
Acked-by: Catalin Marinas 
Acked-by: Marco Elver 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Evgenii Stepanov 
Cc: Mark Rutland 
Cc: Peter Collingbourne 
Cc: Vincenzo Frascino 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kasan, x86, arm64, s390: rename functions for modules shadow

2022-03-25T02:06:47+00:00

Rename kasan_free_shadow to kasan_free_module_shadow and
kasan_module_alloc to kasan_alloc_module_shadow.

These functions are used to allocate/free shadow memory for kernel modules
when KASAN_VMALLOC is not enabled.  The new names better reflect their
purpose.

Also reword the comment next to their declaration to improve clarity.

Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov 
Acked-by: Catalin Marinas 
Acked-by: Marco Elver 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Evgenii Stepanov 
Cc: Mark Rutland 
Cc: Peter Collingbourne 
Cc: Vincenzo Frascino 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds