summaryrefslogtreecommitdiff
path: root/rust/kernel/ptr/git@git.tavy.me:linux.git
diff options
context:
space:
mode:
authorMuhammad Usama Anjum <usama.anjum@arm.com>2026-03-11 17:50:50 +0000
committerCatalin Marinas <catalin.marinas@arm.com>2026-04-09 18:56:22 +0100
commit249bf9733198fe0527542ba6acf4bd8f12b7e5fc (patch)
tree131448d67d1707dd6b49b61365d26578c4bae3be /rust/kernel/ptr/git@git.tavy.me:linux.git
parentabed23c3c44f565dc812563ac015be70dd61e97b (diff)
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
With KASAN_HW_TAGS (MTE) in synchronous mode, tag check faults are reported as immediate Data Abort exceptions. The TFSR_EL1.TF1 bit is never set since faults never go through the asynchronous path. Therefore, reading TFSR_EL1 and executing data and instruction barriers on kernel entry, exit, context switch and suspend is unnecessary overhead. As with the check_mte_async_tcf and clear_mte_async_tcf paths for TFSRE0_EL1, extend the same optimisation to kernel entry/exit, context switch and suspend. All mte kselftests pass. The kunit before and after the patch show same results. A selection of test_vmalloc benchmarks running on a arm64 machine. v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) = statistically significant Regression/Improvement). Based on significance and ignoring the noise, the benchmarks improved. * 77 result classes were considered, with 9 wins, 0 losses and 68 ties Results of fastpath [1] on v6.19 vs this patch: +----------------------------+----------------------------------------------------------+------------+ | Benchmark | Result Class | barriers | +============================+==========================================================+============+ | micromm/fork | fork: p:1, d:10 (seconds) | (I) 2.75% | | | fork: p:512, d:10 (seconds) | 0.96% | +----------------------------+----------------------------------------------------------+------------+ | micromm/munmap | munmap: p:1, d:10 (seconds) | -1.78% | | | munmap: p:512, d:10 (seconds) | 5.02% | +----------------------------+----------------------------------------------------------+------------+ | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% | | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 0.70% | | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 1.18% | | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | -5.01% | | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 13.81% | | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 6.51% | | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 32.87% | | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 4.17% | | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 8.40% | | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | -0.48% | | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | -0.74% | | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 0.53% | | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.81% | | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.06% | | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% | | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | -0.41% | | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 0.89% | | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 1.71% | | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 0.83% | +----------------------------+----------------------------------------------------------+------------+ | schbench/thread-contention | -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec) | 0.05% | | | -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.60% | | | -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% | | | -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% | | | -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | -0.58% | | | -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 9.09% | | | -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.74% | | | -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | -1.40% | | | -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% | | | -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec) | -0.78% | | | -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | -0.11% | | | -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% | | | -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.64% | | | -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 3.15% | | | -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 17.54% | | | -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -1.22% | | | -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.85% | | | -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% | | | -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% | | | -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 1.05% | | | -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% | | | -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.41% | | | -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.58% | | | -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 2.13% | | | -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.67% | | | -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 2.07% | | | -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -1.28% | | | -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 1.01% | | | -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 0.69% | | | -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 13.12% | | | -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -0.25% | | | -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | -0.48% | | | -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 10.53% | | | -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.06% | | | -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 0.00% | | | -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% | | | -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.36% | | | -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.52% | | | -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% | | | -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.52% | | | -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 3.53% | | | -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -0.10% | | | -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.53% | | | -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 1.82% | | | -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | -5.80% | +----------------------------+----------------------------------------------------------+------------+ | syscall/getpid | mean (ns) | (I) 15.98% | | | p99 (ns) | (I) 11.11% | | | p99.9 (ns) | (I) 16.13% | +----------------------------+----------------------------------------------------------+------------+ | syscall/getppid | mean (ns) | (I) 14.82% | | | p99 (ns) | (I) 17.86% | | | p99.9 (ns) | (I) 9.09% | +----------------------------+----------------------------------------------------------+------------+ | syscall/invalid | mean (ns) | (I) 17.78% | | | p99 (ns) | (I) 11.11% | | | p99.9 (ns) | 13.33% | +----------------------------+----------------------------------------------------------+------------+ [1] https://gitlab.arm.com/tooling/fastpath Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diffstat (limited to 'rust/kernel/ptr/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions