| Age | Commit message (Collapse) | Author |
|
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized on arm64 (though only with
NEON, not PMULL; for now the goal is just parity with crypto_shash). It
greatly reduces the amount of arm64-specific glue code that is needed,
and it fixes the issue where this optimization was disabled by default.
To integrate the assembly code correctly with the library, make the
following tweaks:
- Change the type of 'blocks' from int to size_t
- Change the types of 'dg' and 'h' to polyval_elem. Note that this
simply reflects the format that the code was already using.
- Remove the 'head' argument, which is no longer needed.
- Remove the CFI stubs, as indirect calls are no longer used.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized on arm (though only with NEON,
not PMULL; for now the goal is just parity with crypto_shash). It
greatly reduces the amount of arm-specific glue code that is needed, and
it fixes the issue where this optimization was disabled by default.
To integrate the assembly code correctly with the library, make the
following tweaks:
- Change the type of 'blocks' from int to size_t.
- Change the types of 'dg' and 'h' to polyval_elem. Note that this
simply reflects the format that the code was already using, at least
on little endian CPUs. For big endian CPUs, add byte-swaps.
- Remove the 'head' argument, which is no longer needed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a KUnit test suite for the GHASH library functions.
It closely mirrors the POLYVAL test suite.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add GHASH support to the gf128hash module.
This will replace the GHASH support in the crypto_shash API. It will be
used by the "gcm" template and by the AES-GCM library (when an
arch-optimized implementation of the full AES-GCM is unavailable).
This consists of a simple API that mirrors the existing POLYVAL API, a
generic implementation of that API based on the existing efficient and
side-channel-resistant polyval_mul_generic(), and the framework for
architecture-optimized implementations of the GHASH functions.
The GHASH accumulator is stored in POLYVAL format rather than GHASH
format, since this is what most modern GHASH implementations actually
need. The few implementations that expect the accumulator in GHASH
format will just convert the accumulator to/from GHASH format
temporarily. (Supporting architecture-specific accumulator formats
would be possible, but doesn't seem worth the complexity.)
However, architecture-specific formats of struct ghash_key will be
supported, since a variety of formats will be needed there anyway. The
default format is just the key in POLYVAL format.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Currently, some architectures (arm64 and x86) have optimized code for
both GHASH and POLYVAL. Others (arm, powerpc, riscv, and s390) have
optimized code only for GHASH. While POLYVAL support could be
implemented on these other architectures, until then we need to support
the case where arch-optimized functions are present only for GHASH.
Therefore, update the support for arch-optimized POLYVAL functions to
allow architectures to opt into supporting these functions individually.
The new meaning of CONFIG_CRYPTO_LIB_GF128HASH_ARCH is that some level
of GHASH and/or POLYVAL acceleration is provided.
Also provide an implementation of polyval_mul() based on
polyval_blocks_arch(), for when polyval_mul_arch() isn't implemented.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Currently, the standalone GHASH code is coupled with crypto_shash. This
has resulted in unnecessary complexity and overhead, as well as the code
being unavailable to library code such as the AES-GCM library. Like was
done with POLYVAL, it needs to find a new home in lib/crypto/.
GHASH and POLYVAL are closely related and can each be implemented in
terms of each other. Optimized code for one can be reused with the
other. But also since GHASH tends to be difficult to implement directly
due to its unnatural bit order, most modern GHASH implementations
(including the existing arm, arm64, powerpc, and x86 optimized GHASH
code, and the new generic GHASH code I'll be adding) actually
reinterpret the GHASH computation as an equivalent POLYVAL computation,
pre and post-processing the inputs and outputs to map to/from POLYVAL.
Given this close relationship, it makes sense to group the GHASH and
POLYVAL code together in the same module. This gives us a wide range of
options for implementing them, reusing code between the two and properly
utilizing whatever instructions each architecture provides.
Thus, GHASH support will be added to the library module that is
currently called "polyval". Rename it to an appropriate name:
"gf128hash". Rename files, options, functions, etc. where appropriate
to reflect the upcoming sharing with GHASH. (Note: polyval_kunit is not
renamed, as ghash_kunit will be added alongside it instead.)
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Bitmap API handles nbits == 0 in most cases correctly, i.e. it doesn't
dereferene underlying bitmap and returns a sane value where convenient,
or implementation defined, or undef.
Implicitly testing nbits == 0 case, however, may make an impression that
this is a regular case. This is wrong. In most cases nbits == 0 is a
sign of an error on a client side. The tests should not make such an
implression.
This patch reworks the existing tests to not test nbits == 0. The
following patch adds an explicit test for it with an appropriate
precaution.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
Test the function for correctness when some bits are set in the last word
of bitmap beyond nbits. This is motivated by commit a9dadc1c512807f9
("powerpc/xive: Fix the size of the cpumask used in
xive_find_target_in_mask()").
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
count_leading_zeros() is based on fls(), which is defined for x == 0,
contrary to __ffs() family. The comment in crypto/mpi erroneously
states that the function may return undef in such case.
Fix the comment together with the outdated function signature, and now
that COUNT_LEADING_ZEROS_0 is not referenced in the codebase, get rid of
it too.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
The function calculates a Hamming weight of a bitmap starting from an
arbitrary bit.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
test_find_first_bit()
test_find_first_bit() searches for a set bit from the beginning of the
test bitmap and clears it repeatedly, eventually clearing the entire
bitmap.
After test_find_first_bit() is executed, test_find_first_and_bit() and
test_find_next_and_bit() are executed without randomly reinitializing the
cleared bitmap.
In the first phase (testing find_bit() with a random-filled bitmap),
test_find_first_bit() only operates on 1/10 of the entire size of the
testing bitmap, so this isn't a big problem.
However, in the second phase (testing find_bit() with a sparse bitmap),
test_find_first_bit() clears the entire test bitmap, so the subsequent
test_find_first_and_bit() and test_find_next_and_bit() will not find any
set bits. This is probably not the intended benchmark.
To fix this issue, test_find_first_bit() operates on a duplicated bitmap
and does not clear the original test bitmap.
The same is already done in test_find_first_and_bit().
While we're at it, add const qualifiers to the bitmap pointer arguments
in the test functions.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
Remove find_nth_andnot_bit() leftovers.
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Fixes: b0c85e99458a ("cpumask: Remove unnecessary cpumask_nth_andnot()")
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
Different subtests print output in slightly different formats. Unify the
format for better visual representation.
The test output before:
[ 0.553474] test_bitmap: parselist: 14: input is '0-2047:128/256' OK, Time: 202
[ 0.555121] test_bitmap: bitmap_print_to_pagebuf: input is '0-32767
[ 0.555121] ', Time: 1278
[ 0.578392] test_bitmap: Time spent in test_bitmap_read_perf: 427864
[ 0.580137] test_bitmap: Time spent in test_bitmap_write_perf: 793554
[ 0.581957] test_bitmap: all 390447 tests passed
And after:
[ 0.314982] test_bitmap: parselist('0-2047:128/256'): 135
[ 0.315517] test_bitmap: scnprintf("%*pbl", '0-32767'): 342
[ 0.330045] test_bitmap: test_bitmap_read_perf: 252294
[ 0.331132] test_bitmap: test_bitmap_write_perf: 539001
[ 0.332163] test_bitmap: all 390447 tests passed
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
scnprintf("%*pbl") is more verbose than bitmap_print_to_pagebuf().
Switch the test to using it. This also improves the test output
because bitmap_print_to_pagebuf() adds \n at the end of the printed
bitmap, which breaks the test format.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
Make sure that bitmap_scatter() and bitmap_gather() do not modify
the bits outside of the given nbits span.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fixes from Masami Hiramatsu:
- Check error code of xbc_init_node() in override value path in
xbc_parse_kv()
- Fix fd leak in load_xbc_file() on fstat failure
* tag 'bootconfig-fixes-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/bootconfig: fix fd leak in load_xbc_file() on fstat failure
lib/bootconfig: check xbc_init_node() return in override path
|
|
Resolve conflict between this change in the upstream kernel:
4c652a47722f ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline")
... and this pending change in timers/core:
0e98eb14814e ("entry: Prepare for deferred hrtimer rearming")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
CONFIG_KERNEL_MODE_NEON is always enabled on arm64, and it always has
been since its introduction in 2013. Given that and the fact that the
usefulness of kernel-mode NEON has only been increasing over time,
checking for this option in arm64-specific code is unnecessary. Remove
these checks from lib/crypto/ to simplify the code and prevent any
future bugs where e.g. code gets disabled due to a typo in this logic.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260314175049.26931-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Defaulting the crypto KUnit tests to KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
instead of simply KUNIT_ALL_TESTS was originally intended to make it
easy to enable all the crypto KUnit tests. This additional default is
nonstandard for KUnit tests, though, and it can cause all the KUnit
tests to be built-in unexpectedly if CRYPTO_SELFTESTS is set. It also
constitutes a back-reference to crypto/ from lib/crypto/, which is
something that we should be avoiding in order to get clean layering.
Now that we provide a lib/crypto/.kunitconfig file that enables all
crypto KUnit tests, let's consider that to be the supported way to
enable all these tests, and drop the default of CRYPTO_SELFTESTS.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260317040626.5697-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
For kunit.py to run all the crypto library tests when passed the
--alltests option, tools/testing/kunit/configs/all_tests.config needs to
enable options that satisfy the test dependencies.
This is the same as what lib/crypto/.kunitconfig already does.
However, the strategy that lib/crypto/.kunitconfig currently uses to
select all the hidden library options isn't going to scale up well when
it needs to be repeated in two places.
Instead let's go ahead and introduce an option
CRYPTO_LIB_ENABLE_ALL_FOR_KUNIT that depends on KUNIT and selects all
the crypto library options that have corresponding KUnit tests.
Update lib/crypto/.kunitconfig to use this option.
Link: https://lore.kernel.org/r/20260314035927.51351-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
lib/bootconfig.c:136:21: warning: conversion from 'long int' to
'int' may change value [-Wconversion]
lib/bootconfig.c:308:33: warning: conversion from 'int' to 'uint16_t'
may change value [-Wconversion]
lib/bootconfig.c:467:37: warning: conversion from 'int' to 'uint16_t'
may change value [-Wconversion]
lib/bootconfig.c:469:40: warning: conversion from 'int' to 'uint16_t'
may change value [-Wconversion]
lib/bootconfig.c:472:54: warning: conversion from 'int' to 'uint16_t'
may change value [-Wconversion]
lib/bootconfig.c:476:45: warning: conversion from 'int' to 'uint16_t'
may change value [-Wconversion]
xbc_node_index() returns the position of a node in the xbc_nodes array,
which has at most XBC_NODE_MAX (8192) entries, well within uint16_t
range. Every caller stores the result in a uint16_t field (node->parent,
node->child, node->next, or the keys[] array in compose_key_after), so
the int return type causes narrowing warnings at all six call sites.
Change the return type to uint16_t and add an explicit cast on the
pointer subtraction to match the storage width and eliminate the
warnings.
Link: https://lore.kernel.org/all/20260318155919.78168-14-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
lib/bootconfig.c:839:24: warning: conversion from 'size_t' to 'int'
may change value [-Wconversion]
lib/bootconfig.c:860:32: warning: conversion from 'size_t' to 'int'
may change value [-Wconversion]
lib/bootconfig.c:860:29: warning: conversion to 'size_t' from 'int'
may change the sign of the result [-Wsign-conversion]
The key length variables len and wlen accumulate strlen() results but
were declared as int, causing truncation and sign-conversion warnings.
Change both to size_t to match the strlen() return type and avoid
mixed-sign arithmetic.
Link: https://lore.kernel.org/all/20260318155919.78168-13-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
lib/bootconfig.c:415:32: warning: conversion to 'long unsigned int'
from 'long int' may change the sign of the result [-Wsign-conversion]
Pointer subtraction yields ptrdiff_t (signed), which was stored in
unsigned long. The original unsigned type implicitly caught a negative
offset (data < xbc_data) because the wrapped value would exceed
XBC_DATA_MAX. Make this intent explicit by using a signed long and
adding an offset < 0 check to the WARN_ON condition.
Link: https://lore.kernel.org/all/20260318155919.78168-12-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
lib/bootconfig.c:198:19: warning: conversion from 'size_t' to 'int'
may change value [-Wconversion]
lib/bootconfig.c:200:33: warning: conversion to '__kernel_size_t'
from 'int' may change the sign of the result [-Wsign-conversion]
strlen() returns size_t but the result was stored in an int. The value
is then passed back to strncmp() which expects size_t, causing a second
sign-conversion warning on the round-trip. Use size_t throughout to
match the API types.
Link: https://lore.kernel.org/all/20260318155919.78168-11-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
lib/bootconfig.c:188:28: warning: comparison of integer expressions
of different signedness: 'int' and 'size_t' [-Wsign-compare]
The local variable 'offset' is declared as int, but xbc_data_size is
size_t. Change the type to size_t to match and eliminate the warning.
Link: https://lore.kernel.org/all/20260318155919.78168-10-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
xbc_verify_tree() validates that each node's next index is within
bounds, but does not check the child index. Add the same bounds
check for the child field.
Without this check, a corrupt bootconfig that passes next-index
validation could still trigger an out-of-bounds memory access via an
invalid child index when xbc_node_get_child() is called during tree
traversal at boot time.
Link: https://lore.kernel.org/all/20260318155919.78168-9-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
linux/kernel.h is a legacy catch-all header. Replace it with the
specific headers actually needed: linux/cache.h for SMP_CACHE_BYTES,
linux/compiler.h for unlikely(), and linux/sprintf.h for snprintf().
Link: https://lore.kernel.org/all/20260318155919.78168-8-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
memblock_alloc() already returns zeroed memory, so the explicit memset
in xbc_init() is redundant. Switch the userspace xbc_alloc_mem() from
malloc() to calloc() so both paths return zeroed memory, and remove
the separate memset call.
Link: https://lore.kernel.org/all/20260318155919.78168-6-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Move the xbc_node_num increment to after xbc_init_node() so a failed
init does not leave a partially initialized node counted in the array.
If xbc_init_node() fails on a data offset at the boundary of a
maximum-size bootconfig, the pre-incremented count causes subsequent
tree verification and traversal to consider the uninitialized node as
valid, potentially leading to an out-of-bounds read or unpredictable
boot behavior.
Link: https://lore.kernel.org/all/20260318155919.78168-5-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Valid node indices are 0 to xbc_node_num-1, so a next value equal to
xbc_node_num is out of bounds. Use >= instead of > to catch this.
A malformed or corrupt bootconfig could pass tree verification with
an out-of-bounds next index. On subsequent tree traversal at boot
time, xbc_node_get_next() would return a pointer past the allocated
xbc_nodes array, causing an out-of-bounds read of kernel memory.
Link: https://lore.kernel.org/all/20260318155919.78168-4-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The flag parameter in the node creation helpers only ever carries
XBC_KEY (0) or XBC_VALUE (0x8000), both of which fit in uint16_t.
Using uint16_t matches the width of xbc_node.data where the flag is
ultimately stored.
Link: https://lore.kernel.org/all/20260318155919.78168-3-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Fixes kerneldoc typos ("initiized" and "uder") and adds a missing blank line.
Also fixes inconsistent if/else bracing in __xbc_add_key() and elsewhere.
Link: https://lore.kernel.org/all/20260318155919.78168-2-objecting@objecting.org/
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The ':=' override path in xbc_parse_kv() calls xbc_init_node() to
re-initialize an existing value node but does not check the return
value. If xbc_init_node() fails (data offset out of range), parsing
silently continues with stale node data.
Add the missing error check to match the xbc_add_node() call path
which already checks for failure.
In practice, a bootconfig using ':=' to override a value near the
32KB data limit could silently retain the old value, meaning a
security-relevant boot parameter override (e.g., a trace filter or
debug setting) would not take effect as intended.
Link: https://lore.kernel.org/all/20260318155847.78065-2-objecting@objecting.org/
Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key")
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
- Disable the "padlock" SHA-1 and SHA-256 driver on Zhaoxin
processors, since it does not compute hash values correctly
- Make a generated file be removed by 'make clean'
- Fix excessive stack usage in some of the arm64 AES code
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: powerpc: Add powerpc/aesp8-ppc.S to clean-files
crypto: padlock-sha - Disable for Zhaoxin processor
crypto: arm64/aes-neonbs - Move key expansion off the stack
|
|
CONFIG_KERNEL_MODE_NEON is always enabled on arm64, and it always has
been since its introduction in 2013. Given that and the fact that the
usefulness of kernel-mode NEON has only been increasing over time,
checking for this option in arm64-specific code is unnecessary. Remove
this check from lib/crc/ to simplify the code and prevent any future
bugs where e.g. code gets disabled due to a typo in this logic.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260314175744.30620-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Make the generated file powerpc/aesp8-ppc.S be removed by 'make clean'.
Fixes: 7cf2082e74ce ("lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260317044925.104184-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
KASAN-enabled kernels with LOCKDEP and PREEMPT_FULL hit
"BUG: MAX_STACK_TRACE_ENTRIES too low!" within 9-23 hours of normal
desktop use.
The root cause is a feedback loop between KASAN slab tracking and
lockdep: every KASAN-tracked slab allocation saves a stack trace via
stack_trace_save() -> arch_stack_walk(). The unwinder calls
is_bpf_text_address(), which under PREEMPT_FULL can trigger RCU
deferred quiescent-state processing -> swake_up_one() -> lock_acquire()
-> lockdep validate_chain() -> save_trace(). This means KASAN's own
stack captures indirectly generate new lockdep dependency chains,
consuming the buffer from both directions.
/proc/lockdep_stats at the moment of overflow confirms that
stack-trace entries is the sole exhausted resource:
stack-trace entries: 524288 [max: 524288] <- 100% full
number of stack traces: 22080 <- unique after dedup
dependency chains: 164665 [max: 524288] <- only 31% used
direct dependencies: 45270 [max: 65536] <- 69%
lock-classes: 2811 [max: 8192] <- 34%
22080 genuinely unique traces averaging ~24 frames each fill the
buffer in under a day. The hash-based deduplication (12593b7467f9) is
working correctly -- the traces are simply all different due to the
deep and varied call stacks from GPU + filesystem + Wine/Proton + KASAN
instrumentation.
Raise the LOCKDEP_STACK_TRACE_BITS default from 19 to 21 when KASAN is
enabled (2M entries, +12MB). This is negligible compared to KASAN's
own shadow memory overhead (~12.5% of total RAM). Scale
LOCKDEP_STACK_TRACE_HASH_BITS accordingly to maintain dedup efficiency.
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260313171118.1702954-2-mikhail.v.gavrilov@gmail.com
|
|
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
XSHA256, XSHA384 and XSHA512 instructions. The instruction specification
is available at the following link.
(https://gitee.com/openzhaoxin/zhaoxin_specifications/blob/20260227/ZX_Padlock_Reference.pdf)
With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.
This patch includes the XSHA256 instruction optimized implementation of
SHA-256 transform function.
The table below shows the benchmark results before and after applying
this patch by using CRYPTO_LIB_BENCHMARK on Zhaoxin KX-7000 platform,
highlighting the achieved speedups.
+---------+--------------------------+
| | SHA256 |
+---------+--------+-----------------+
| Len | Before | After |
+---------+--------+-----------------+
| 1* | 2 | 7 (3.50x) |
| 16 | 35 | 119 (3.40x) |
| 64 | 74 | 280 (3.78x) |
| 127 | 99 | 387 (3.91x) |
| 128 | 103 | 427 (4.15x) |
| 200 | 123 | 537 (4.37x) |
| 256 | 128 | 582 (4.55x) |
| 511 | 144 | 679 (4.72x) |
| 512 | 146 | 714 (4.89x) |
| 1024 | 157 | 796 (5.07x) |
| 3173 | 167 | 883 (5.28x) |
| 4096 | 166 | 876 (5.28x) |
| 16384 | 169 | 899 (5.32x) |
+---------+--------+-----------------+
*: The length of each data block to be processed by one complete SHA
sequence.
**: The throughput of processing data blocks, unit is Mb/s.
After applying this patch, the SHA256 KUnit test suite passes on Zhaoxin
platforms. Detailed test logs are shown below.
[ 7.767257] # Subtest: sha256
[ 7.770542] # module: sha256_kunit
[ 7.770544] 1..15
[ 7.777383] ok 1 test_hash_test_vectors
[ 7.788563] ok 2 test_hash_all_lens_up_to_4096
[ 7.806090] ok 3 test_hash_incremental_updates
[ 7.813553] ok 4 test_hash_buffer_overruns
[ 7.822384] ok 5 test_hash_overlaps
[ 7.829388] ok 6 test_hash_alignment_consistency
[ 7.833843] ok 7 test_hash_ctx_zeroization
[ 7.915191] ok 8 test_hash_interrupt_context_1
[ 8.362312] ok 9 test_hash_interrupt_context_2
[ 8.401607] ok 10 test_hmac
[ 8.415458] ok 11 test_sha256_finup_2x
[ 8.419397] ok 12 test_sha256_finup_2x_defaultctx
[ 8.424107] ok 13 test_sha256_finup_2x_hugelen
[ 8.451289] # benchmark_hash: len=1: 7 MB/s
[ 8.465372] # benchmark_hash: len=16: 119 MB/s
[ 8.481760] # benchmark_hash: len=64: 280 MB/s
[ 8.499344] # benchmark_hash: len=127: 387 MB/s
[ 8.515800] # benchmark_hash: len=128: 427 MB/s
[ 8.531970] # benchmark_hash: len=200: 537 MB/s
[ 8.548241] # benchmark_hash: len=256: 582 MB/s
[ 8.564838] # benchmark_hash: len=511: 679 MB/s
[ 8.580872] # benchmark_hash: len=512: 714 MB/s
[ 8.596858] # benchmark_hash: len=1024: 796 MB/s
[ 8.612567] # benchmark_hash: len=3173: 883 MB/s
[ 8.628546] # benchmark_hash: len=4096: 876 MB/s
[ 8.644482] # benchmark_hash: len=16384: 899 MB/s
[ 8.649773] ok 14 benchmark_hash
[ 8.655505] ok 15 benchmark_sha256_finup_2x # SKIP not relevant
[ 8.659065] # sha256: pass:14 fail:0 skip:1 total:15
[ 8.665276] # Totals: pass:14 fail:0 skip:1 total:15
[ 8.670195] ok 7 sha256
Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com>
Link: https://lore.kernel.org/r/20260313080150.9393-3-AlanSong-oc@zhaoxin.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
snprintf() returns the number of characters that would have been
written excluding the NUL terminator. Output is truncated when the
return value is >= the buffer size, not just > the buffer size.
When ret == size, the current code takes the non-truncated path,
advancing buf by ret and reducing size to 0. This is wrong because
the output was actually truncated (the last character was replaced by
NUL). Fix by using >= so the truncation path is taken correctly.
Link: https://lore.kernel.org/all/20260312191143.28719-4-objecting@objecting.org/
Fixes: 76db5a27a827 ("bootconfig: Add Extra Boot Config support")
Cc: stable@vger.kernel.org
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The bounds check for brace_index happens after the array write.
While the current call pattern prevents an actual out-of-bounds
access (the previous call would have returned an error), the
write-before-check pattern is fragile and would become a real
out-of-bounds write if the error return were ever not propagated.
Move the bounds check before the array write so the function is
self-contained and safe regardless of caller behavior.
Link: https://lore.kernel.org/all/20260312191143.28719-3-objecting@objecting.org/
Fixes: ead1e19ad905 ("lib/bootconfig: Fix a bug of breaking existing tree nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
__xbc_open_brace() pushes entries with post-increment
(open_brace[brace_index++]), so brace_index always points one past
the last valid entry. xbc_verify_tree() reads open_brace[brace_index]
to report which brace is unclosed, but this is one past the last
pushed entry and contains stale/zero data, causing the error message
to reference the wrong node.
Use open_brace[brace_index - 1] to correctly identify the unclosed
brace. brace_index is known to be > 0 here since we are inside the
if (brace_index) guard.
Link: https://lore.kernel.org/all/20260312191143.28719-2-objecting@objecting.org/
Fixes: ead1e19ad905 ("lib/bootconfig: Fix a bug of breaking existing tree nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
vdso/datapage.h is useful without pulling in the architecture-specific
gettimeofday() helpers.
Move the include to the only users which needs it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-12-35d60acf7410@linutronix.de
|
|
Various used symbols are only visible through transitive includes.
These transitive includes are about to go away.
Explicitly include the necessary headers.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-10-35d60acf7410@linutronix.de
|
|
Various used symbols are only visible through transitive includes.
These transitive includes are about to go away.
Explicitly include the necessary headers.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-9-35d60acf7410@linutronix.de
|
|
All callers of vdso_read_retry() test its return value with unlikely().
Move the unlikely into the helper to make the code easier to read.
This is equivalent to the retry function of non-vDSO seqlocks.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-4-c848b4bc4850@linutronix.de
|
|
Currently this logic is duplicate multiple times.
Add a helper for it to make the code more readable.
[ bp: Add a missing clocksource.h include, see
https://lore.kernel.org/r/20260311113435-f72f81d8-33a6-4a0f-bd80-4997aad068cc@linutronix.de ]
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-3-c848b4bc4850@linutronix.de
|
|
namespace aware clock
Currently there are three different open-coded variants of a time
namespace aware variant of vdso_read_begin(). They make the code hard to
read and introduce an inconsistency, as only the first copy uses
unlikely().
Split the code into a shared helper function.
Move that next to the definition of the regular vdso_read_begin(), so
that any future changes can be kept in sync easily.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-2-c848b4bc4850@linutronix.de
|
|
These functions are used from the very same file,
so this annotation is unnecessary.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-1-c848b4bc4850@linutronix.de
|
|
Allocating the data pages as part of the kernel image does not work on
SPARC. The MMU will raise a fault when userspace tries to access them.
Allocate the data pages through the page allocator instead.
Unused pages in the vDSO VMA are still allocated to keep the virtual
addresses aligned. Switch the mapping from PFNs to 'struct page' as that is
required for dynamically allocated pages. This also aligns the allocation
of the datapages with the code pages and is a prerequisite for mlockall()
support.
VM_MIXEDMAP is necessary for the call to vmf_insert_page() in the timens
prefault path to work.
The data pages need to be order-0, non-compound pages so that the mapping
to userspace and the different orderings work.
These pages are also used by the timekeeping, random pool and architecture
initialization code. Some of these are running before the page allocator is
available. To keep these subsytems working without changes, introduce
early, statically data storage which will then replaced by the real one as
soon as that is available.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260304-vdso-sparc64-generic-2-v6-3-d8eb3b0e1410@linutronix.de
|
|
This header is unnecessary and together with some upcoming changes would
introduce compiler warnings.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://lore.kernel.org/lkml/20250916-mm-rcuwait-v1-1-39a3beea6ec3@linutronix.de/
Link: https://patch.msgid.link/20260304-vdso-sparc64-generic-2-v6-2-d8eb3b0e1410@linutronix.de
|