summaryrefslogtreecommitdiff
path: root/scripts/Makefile.thinlto
diff options
context:
space:
mode:
authorHeiko Carstens <hca@linux.ibm.com>2026-05-26 07:56:56 +0200
committerAlexander Gordeev <agordeev@linux.ibm.com>2026-06-03 15:32:46 +0200
commita737737cdb9c94e40a9926cdc2320f874c05d709 (patch)
tree9f6dae4e8ed70cead922cbb59c6cf7805dd1c600 /scripts/Makefile.thinlto
parenta2b94afeb5166989753109949c6be6da0215739e (diff)
s390/percpu: Infrastructure for more efficient this_cpu operations
With the intended removal of PREEMPT_NONE this_cpu operations based on atomic instructions, guarded with preempt_disable()/preempt_enable() pairs become more expensive: the preempt_disable() / preempt_enable() pairs are not optimized away anymore during compile time. In particular the conditional call to preempt_schedule_notrace() after preempt_enable() adds additional code and register pressure. E.g. this simple C code sequence DEFINE_PER_CPU(long, foo); long bar(long a) { return this_cpu_add_return(foo, a); } generates this code: 11a976: eb af f0 68 00 24 stmg %r10,%r15,104(%r15) 11a97c: b9 04 00 ef lgr %r14,%r15 11a980: b9 04 00 b2 lgr %r11,%r2 11a984: e3 f0 ff c8 ff 71 lay %r15,-56(%r15) 11a98a: e3 e0 f0 98 00 24 stg %r14,152(%r15) 11a990: eb 01 03 a8 00 6a asi 936,1 <- __preempt_count_add(1) 11a996: c0 10 00 d2 ac b5 larl %r1,1b70300 <- address of percpu var 11a9a0: e3 10 23 b8 00 08 ag %r1,952 <- add percpu offset 11a9a6: eb ab 10 00 00 e8 laag %r10,%r11,0(%r1) <- atomic op 11a9ac: eb ff 03 a8 00 6e alsi 936,-1 <- __preempt_count_dec_and_test() 11a9b2: a7 54 00 05 jnhe 11a9bc <bar+0x4c> 11a9b6: c0 e5 00 76 d1 bd brasl %r14,ff4d30 <preempt_schedule_notrace> 11a9bc: b9 e8 b0 2a agrk %r2,%r10,%r11 11a9c0: eb af f0 a0 00 04 lmg %r10,%r15,160(%r15) 11a9c6 07 fe br %r14 Even though the above example is more or less the worst case, since the branch to preempt_schedule_notrace() requires a stackframe, which otherwise wouldn't be necessary, there is also the conditional jnhe branch instruction. Get rid of the conditional branch with the following code sequence: 11a8e6: c0 30 00 d0 c5 0d larl %r3,1b33300 11a8ec: b9 04 00 43 lgr %r4,%r3 11a8f0: eb 00 43 c0 00 52 mviy 960,4 11a8f6: e3 40 03 b8 00 08 ag %r4,952 11a8fc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4) 11a902: eb 00 03 c0 00 52 mviy 960,0 11a908: b9 08 00 25 agr %r2,%r5 11a90c 07 fe br %r14 The general idea is that this_cpu operations based on atomic instructions are guarded with mviy instructions: - The first mviy instruction writes the register number, which contains the percpu address variable to lowcore. This also indicates that a percpu code section is executed. - The first instruction following the mviy instruction must be the ag instruction which adds the percpu offset to the percpu address register. - Afterwards the atomic percpu operation follows. - Then a second mviy instruction writes a zero to lowcore, which indicates the end of the percpu code section. - In case of an interrupt/exception/nmi the register number which was written to lowcore is copied to the exception frame (pt_regs), and a zero is written to lowcore. - On return to the previous context it is checked if a percpu code section was executed (saved register number not zero), and if the process was migrated to a different cpu. If the percpu offset was already added to the percpu address register (instruction address does _not_ point to the ag instruction) the content of the percpu address register is adjusted so it points to percpu variable of the new cpu. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Diffstat (limited to 'scripts/Makefile.thinlto')
0 files changed, 0 insertions, 0 deletions