summaryrefslogtreecommitdiff
path: root/arch/um/os-Linux/include/git@git.tavy.me:linux.git
diff options
context:
space:
mode:
authorDemian Shulhan <demyansh@gmail.com>2026-03-29 07:43:38 +0000
committerEric Biggers <ebiggers@kernel.org>2026-03-29 13:22:13 -0700
commit63432fd625372a0e79fb00a4009af204f4edc013 (patch)
treea6db54c5b5044e13a5779f230359c0f93abeaee3 /arch/um/os-Linux/include/git@git.tavy.me:linux.git
parent6e4d63e8993c681e1cec7d564b4e018e21e658d0 (diff)
lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems. The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability. Key highlights of this implementation: - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers. - Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling. - Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128. Performance results (kunit crc_benchmark on Cortex-A72): - Generic (len=4096): ~268 MB/s - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement) Signed-off-by: Demian Shulhan <demyansh@gmail.com> Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Diffstat (limited to 'arch/um/os-Linux/include/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions