diff options
| author | Arnd Bergmann <arnd@arndb.de> | 2026-06-11 14:59:39 +0200 |
|---|---|---|
| committer | Eric Biggers <ebiggers@kernel.org> | 2026-06-11 12:57:49 -0700 |
| commit | 065f978a0e015c4dd9f536f5c08078a37f5509c1 (patch) | |
| tree | 155437a7217a72397d27d3a3c7f8c7f4ac539756 /scripts/Makefile.thinlto | |
| parent | cf52058dcdd96420cfc38ee284c5ac077901ea61 (diff) | |
lib/crypto: gf128hash: mark clmul32() as noinline_for_stack
During randconfig testing, I came across a lot of warnings for the newly
added carryless multiplication function triggering excessive stack usage
from spilling temporary variables to the stack:
lib/crypto/gf128hash.c:166:1: error: stack frame size (1192) exceeds limit (1024) in 'polyval_mul_generic' [-Werror,-Wframe-larger-than]
In addition to the possible risk of overflowing the kernel stack,
the generated object code surely performs very poorly.
This only happens on architectures that don't provide uint128_t
(which should be all 32-bit architectures on modern compilers), but
though I tested random x86 and arm configs, I only saw this with arm's
CONFIG_THUMB2_KERNEL, which adds more pressure to the register allocator.
The testing was done using clang-22, I don't know if gcc has the same
problem. Marking clmul32() as noinline_for_stack experimentally shows
all of the affected builds to completely solve the problem, reducing
the stack usage to a few bytes as expected.
Since u64 arithmetic frequently leads to compilers badly optimizing
32-bit targets, keeping clmul32 out of line is likely to help on
other 32-bit configurations as well when they run into this problem,
though it may also result in a small performance degradation in
configurations that would benefit from inlining.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260611125952.3387258-1-arnd@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Diffstat (limited to 'scripts/Makefile.thinlto')
0 files changed, 0 insertions, 0 deletions
