<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/loongarch/lib/Makefile, branch master</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson</title>
<updated>2026-04-24T16:54:45+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-24T16:54:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ff57d59200baadfdb41f94a49fed7d161a9a8124'/>
<id>ff57d59200baadfdb41f94a49fed7d161a9a8124</id>
<content type='text'>
Pull LoongArch updates from Huacai Chen:

 - Adjust build infrastructure for 32BIT/64BIT

 - Add HIGHMEM (PKMAP and FIX_KMAP) support

 - Show and handle CPU vulnerabilites correctly

 - Batch the icache maintenance for jump_label

 - Add more atomic instructions support for BPF JIT

 - Add more features (e.g. fsession) support for BPF trampoline

 - Some bug fixes and other small changes

* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
  selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
  LoongArch: BPF: Add fsession support for trampolines
  LoongArch: BPF: Introduce emit_store_stack_imm64() helper
  LoongArch: BPF: Support up to 12 function arguments for trampoline
  LoongArch: BPF: Support small struct arguments for trampoline
  LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
  LoongArch: BPF: Support load-acquire and store-release instructions
  LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
  LoongArch: BPF: Add the default case in emit_atomic() and rename it
  LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
  LoongArch: Batch the icache maintenance for jump_label
  LoongArch: Add flush_icache_all()/local_flush_icache_all()
  LoongArch: Add spectre boundry for syscall dispatch table
  LoongArch: Show CPU vulnerabilites correctly
  LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
  LoongArch: Use get_random_canary() for stack canary init
  LoongArch: Improve the logging of disabling KASLR
  LoongArch: Align FPU register state to 32 bytes
  LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
  LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull LoongArch updates from Huacai Chen:

 - Adjust build infrastructure for 32BIT/64BIT

 - Add HIGHMEM (PKMAP and FIX_KMAP) support

 - Show and handle CPU vulnerabilites correctly

 - Batch the icache maintenance for jump_label

 - Add more atomic instructions support for BPF JIT

 - Add more features (e.g. fsession) support for BPF trampoline

 - Some bug fixes and other small changes

* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
  selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
  LoongArch: BPF: Add fsession support for trampolines
  LoongArch: BPF: Introduce emit_store_stack_imm64() helper
  LoongArch: BPF: Support up to 12 function arguments for trampoline
  LoongArch: BPF: Support small struct arguments for trampoline
  LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
  LoongArch: BPF: Support load-acquire and store-release instructions
  LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
  LoongArch: BPF: Add the default case in emit_atomic() and rename it
  LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
  LoongArch: Batch the icache maintenance for jump_label
  LoongArch: Add flush_icache_all()/local_flush_icache_all()
  LoongArch: Add spectre boundry for syscall dispatch table
  LoongArch: Show CPU vulnerabilites correctly
  LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
  LoongArch: Use get_random_canary() for stack canary init
  LoongArch: Improve the logging of disabling KASLR
  LoongArch: Align FPU register state to 32 bytes
  LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
  LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Adjust build infrastructure for 32BIT/64BIT</title>
<updated>2026-04-22T07:44:26+00:00</updated>
<author>
<name>Huacai Chen</name>
<email>chenhuacai@loongson.cn</email>
</author>
<published>2026-04-22T07:44:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d9aba6618d115750729bba2d1f8af180bd7d3bd'/>
<id>3d9aba6618d115750729bba2d1f8af180bd7d3bd</id>
<content type='text'>
Adjust build infrastructure (Kconfig, Makefile and ld scripts) to let
us enable both 32BIT/64BIT kernel build.

Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jiaxun Yang &lt;jiaxun.yang@flygoat.com&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adjust build infrastructure (Kconfig, Makefile and ld scripts) to let
us enable both 32BIT/64BIT kernel build.

Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jiaxun Yang &lt;jiaxun.yang@flygoat.com&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>loongarch: move the XOR code to lib/raid/</title>
<updated>2026-04-03T06:36:18+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2026-03-27T06:16:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=033bee3e49631bd0c7e081aeafeadc7623495107'/>
<id>033bee3e49631bd0c7e081aeafeadc7623495107</id>
<content type='text'>
Move the optimized XOR into lib/raid and include it it in xor.ko instead
of always building it into the main kernel image.

Link: https://lkml.kernel.org/r/20260327061704.3707577-15-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Tested-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Andreas Larsson &lt;andreas@gaisler.com&gt;
Cc: Anton Ivanov &lt;anton.ivanov@cambridgegreys.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Borislav Petkov (AMD)" &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason A. Donenfeld &lt;jason@zx2c4.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: Li Nan &lt;linan122@huawei.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Magnus Lindholm &lt;linmag7@gmail.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Ted Ts'o &lt;tytso@mit.edu&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the optimized XOR into lib/raid and include it it in xor.ko instead
of always building it into the main kernel image.

Link: https://lkml.kernel.org/r/20260327061704.3707577-15-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Tested-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Andreas Larsson &lt;andreas@gaisler.com&gt;
Cc: Anton Ivanov &lt;anton.ivanov@cambridgegreys.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Borislav Petkov (AMD)" &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason A. Donenfeld &lt;jason@zx2c4.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: Li Nan &lt;linan122@huawei.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Magnus Lindholm &lt;linmag7@gmail.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Ted Ts'o &lt;tytso@mit.edu&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crc: loongarch: Migrate optimized CRC code into lib/crc/</title>
<updated>2025-06-30T16:31:57+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-06-07T20:04:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b10d2d20d9786c14133c76cc6aee13e46664dabc'/>
<id>b10d2d20d9786c14133c76cc6aee13e46664dabc</id>
<content type='text'>
Move the loongarch-optimized CRC code from arch/loongarch/lib/crc* into
its new location in lib/crc/loongarch/, and wire it up in the new way.
This new way of organizing the CRC code eliminates the need to
artificially split the code for each CRC variant into separate arch and
generic modules, enabling better inlining and dead code elimination.
For more details, see "lib/crc: Prepare for arch-optimized code in
subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: "Jason A. Donenfeld" &lt;Jason@zx2c4.com&gt;
Link: https://lore.kernel.org/r/20250607200454.73587-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the loongarch-optimized CRC code from arch/loongarch/lib/crc* into
its new location in lib/crc/loongarch/, and wire it up in the new way.
This new way of organizing the CRC code eliminates the need to
artificially split the code for each CRC variant into separate arch and
generic modules, enabling better inlining and dead code elimination.
For more details, see "lib/crc: Prepare for arch-optimized code in
subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: "Jason A. Donenfeld" &lt;Jason@zx2c4.com&gt;
Link: https://lore.kernel.org/r/20250607200454.73587-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>loongarch/crc32: expose CRC32 functions through lib</title>
<updated>2024-12-02T01:23:01+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2024-12-02T01:08:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=72f51a4f4b076d2ea2751432dee07600aea4c4c4'/>
<id>72f51a4f4b076d2ea2751432dee07600aea4c4c4</id>
<content type='text'>
Move the loongarch CRC32 assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/loongarch/crypto/crc32-loongarch.c to
arch/loongarch/lib/crc32-loongarch.c, view this commit with
'git show -M10'.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: WangYuli &lt;wangyuli@uniontech.com&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the loongarch CRC32 assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/loongarch/crypto/crc32-loongarch.c to
arch/loongarch/lib/crc32-loongarch.c, view this commit with
'git show -M10'.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: WangYuli &lt;wangyuli@uniontech.com&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128</title>
<updated>2024-05-14T04:24:18+00:00</updated>
<author>
<name>Xi Ruoyao</name>
<email>xry111@xry111.site</email>
</author>
<published>2024-05-14T04:24:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5125d033c8af733ee4d52e3e3c6ebf5784976e46'/>
<id>5125d033c8af733ee4d52e3e3c6ebf5784976e46</id>
<content type='text'>
This allows compiling a full 128-bit product of two 64-bit integers as a
mul/mulh pair, instead of a nasty long sequence of 20+ instructions.

However, after selecting ARCH_SUPPORTS_INT128, when optimizing for size
the compiler generates calls to __ashlti3, __ashrti3, and __lshrti3 for
shifting __int128 values, causing a link failure:

    loongarch64-unknown-linux-gnu-ld: kernel/sched/fair.o: in
    function `mul_u64_u32_shr':
    &lt;PATH&gt;/include/linux/math64.h:161:(.text+0x5e4): undefined
    reference to `__lshrti3'

So provide the implementation of these functions if ARCH_SUPPORTS_INT128.

Closes: https://lore.kernel.org/loongarch/CAAhV-H5EZ=7OF7CSiYyZ8_+wWuenpo=K2WT8-6mAT4CvzUC_4g@mail.gmail.com/
Signed-off-by: Xi Ruoyao &lt;xry111@xry111.site&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows compiling a full 128-bit product of two 64-bit integers as a
mul/mulh pair, instead of a nasty long sequence of 20+ instructions.

However, after selecting ARCH_SUPPORTS_INT128, when optimizing for size
the compiler generates calls to __ashlti3, __ashrti3, and __lshrti3 for
shifting __int128 values, causing a link failure:

    loongarch64-unknown-linux-gnu-ld: kernel/sched/fair.o: in
    function `mul_u64_u32_shr':
    &lt;PATH&gt;/include/linux/math64.h:161:(.text+0x5e4): undefined
    reference to `__lshrti3'

So provide the implementation of these functions if ARCH_SUPPORTS_INT128.

Closes: https://lore.kernel.org/loongarch/CAAhV-H5EZ=7OF7CSiYyZ8_+wWuenpo=K2WT8-6mAT4CvzUC_4g@mail.gmail.com/
Signed-off-by: Xi Ruoyao &lt;xry111@xry111.site&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Add SIMD-optimized XOR routines</title>
<updated>2023-09-06T14:53:55+00:00</updated>
<author>
<name>WANG Xuerui</name>
<email>git@xen0n.name</email>
</author>
<published>2023-09-06T14:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=75ded18a5e8e51ca2d26d55f010d60ae9aab652c'/>
<id>75ded18a5e8e51ca2d26d55f010d60ae9aab652c</id>
<content type='text'>
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

&gt; 8regs           : 12702 MB/sec
&gt; 8regs_prefetch  : 10920 MB/sec
&gt; 32regs          : 12686 MB/sec
&gt; 32regs_prefetch : 10918 MB/sec
&gt; lsx             : 17589 MB/sec
&gt; lasx            : 26116 MB/sec

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

&gt; 8regs           : 12702 MB/sec
&gt; 8regs_prefetch  : 10920 MB/sec
&gt; 32regs          : 12686 MB/sec
&gt; 32regs_prefetch : 10918 MB/sec
&gt; lsx             : 17589 MB/sec
&gt; lasx            : 26116 MB/sec

Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: WANG Xuerui &lt;git@xen0n.name&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Add support for function error injection</title>
<updated>2023-05-01T09:19:52+00:00</updated>
<author>
<name>Tiezhu Yang</name>
<email>yangtiezhu@loongson.cn</email>
</author>
<published>2023-05-01T09:19:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b5ee2c66d5c4c1312fd193d4138e6963160ba43'/>
<id>8b5ee2c66d5c4c1312fd193d4138e6963160ba43</id>
<content type='text'>
Inspired by the commit 42d038c4fb00f ("arm64: Add support for function
error injection") and the commit ee55ff803b383 ("riscv: Add support for
function error injection"), this patch supports function error injection
for LoongArch.

Mainly implement two functions:
(1) regs_set_return_value() which is used to overwrite the return value,
(2) override_function_with_return() which is used to override the probed
function returning and jump to its caller.

Here is a simple test under CONFIG_FUNCTION_ERROR_INJECTION and
CONFIG_FAIL_FUNCTION:

  # echo sys_clone &gt; /sys/kernel/debug/fail_function/inject
  # echo 100 &gt; /sys/kernel/debug/fail_function/probability
  # dmesg
  bash: fork: Invalid argument
  # dmesg
  ...
  FAULT_INJECTION: forcing a failure.
  name fail_function, interval 1, probability 100, space 0, times 1
  ...
  Call Trace:
  [&lt;90000000002238f4&gt;] show_stack+0x5c/0x180
  [&lt;90000000012e384c&gt;] dump_stack_lvl+0x60/0x88
  [&lt;9000000000b1879c&gt;] should_fail_ex+0x1b0/0x1f4
  [&lt;900000000032ead4&gt;] fei_kprobe_handler+0x28/0x6c
  [&lt;9000000000230970&gt;] kprobe_breakpoint_handler+0xf0/0x118
  [&lt;90000000012e3e60&gt;] do_bp+0x2c4/0x358
  [&lt;9000000002241924&gt;] exception_handlers+0x1924/0x10000
  [&lt;900000000023b7d0&gt;] sys_clone+0x0/0x4
  [&lt;90000000012e4744&gt;] do_syscall+0x7c/0x94
  [&lt;9000000000221e44&gt;] handle_syscall+0xc4/0x160

Tested-by: Hengqi Chen &lt;hengqi.chen@gmail.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Inspired by the commit 42d038c4fb00f ("arm64: Add support for function
error injection") and the commit ee55ff803b383 ("riscv: Add support for
function error injection"), this patch supports function error injection
for LoongArch.

Mainly implement two functions:
(1) regs_set_return_value() which is used to overwrite the return value,
(2) override_function_with_return() which is used to override the probed
function returning and jump to its caller.

Here is a simple test under CONFIG_FUNCTION_ERROR_INJECTION and
CONFIG_FAIL_FUNCTION:

  # echo sys_clone &gt; /sys/kernel/debug/fail_function/inject
  # echo 100 &gt; /sys/kernel/debug/fail_function/probability
  # dmesg
  bash: fork: Invalid argument
  # dmesg
  ...
  FAULT_INJECTION: forcing a failure.
  name fail_function, interval 1, probability 100, space 0, times 1
  ...
  Call Trace:
  [&lt;90000000002238f4&gt;] show_stack+0x5c/0x180
  [&lt;90000000012e384c&gt;] dump_stack_lvl+0x60/0x88
  [&lt;9000000000b1879c&gt;] should_fail_ex+0x1b0/0x1f4
  [&lt;900000000032ead4&gt;] fei_kprobe_handler+0x28/0x6c
  [&lt;9000000000230970&gt;] kprobe_breakpoint_handler+0xf0/0x118
  [&lt;90000000012e3e60&gt;] do_bp+0x2c4/0x358
  [&lt;9000000002241924&gt;] exception_handlers+0x1924/0x10000
  [&lt;900000000023b7d0&gt;] sys_clone+0x0/0x4
  [&lt;90000000012e4744&gt;] do_syscall+0x7c/0x94
  [&lt;9000000000221e44&gt;] handle_syscall+0xc4/0x160

Tested-by: Hengqi Chen &lt;hengqi.chen@gmail.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Add checksum optimization for 64-bit system</title>
<updated>2023-05-01T09:19:43+00:00</updated>
<author>
<name>Bibo Mao</name>
<email>maobibo@loongson.cn</email>
</author>
<published>2023-05-01T09:19:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69e3a6aa6be21de6aaf38130fad97ecde34a193c'/>
<id>69e3a6aa6be21de6aaf38130fad97ecde34a193c</id>
<content type='text'>
LoongArch platform is 64-bit system, which supports 8-bytes memory
accessing, but generic checksum functions use 4-byte memory access.
So add 8-bytes memory access optimization for checksum functions on
LoongArch. And the code comes from arm64 system.

When network hw checksum is disabled, iperf performance improves about
10% with this patch.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
LoongArch platform is 64-bit system, which supports 8-bytes memory
accessing, but generic checksum functions use 4-byte memory access.
So add 8-bytes memory access optimization for checksum functions on
LoongArch. And the code comes from arm64 system.

When network hw checksum is disabled, iperf performance improves about
10% with this patch.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Use alternative to optimize libraries</title>
<updated>2022-12-14T00:36:11+00:00</updated>
<author>
<name>Huacai Chen</name>
<email>chenhuacai@loongson.cn</email>
</author>
<published>2022-12-10T14:39:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a275a82dcd4024c75337db15d59ed039c31e21da'/>
<id>a275a82dcd4024c75337db15d59ed039c31e21da</id>
<content type='text'>
Use the alternative to optimize common libraries according whether CPU
has UAL (hardware unaligned access support) feature, including memset(),
memcopy(), memmove(), copy_user() and clear_user().

We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):

1, One copy, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9566582.0    819.8
Double-Precision Whetstone                       55.0       2805.3    510.1
Execl Throughput                                 43.0       2120.0    493.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     209833.0    529.9
File Copy 256 bufsize 500 maxblocks            1655.0      89400.0    540.2
File Copy 4096 bufsize 8000 maxblocks          5800.0     320036.0    551.8
Pipe Throughput                               12440.0     340624.0    273.8
Pipe-based Context Switching                   4000.0     109939.1    274.8
Process Creation                                126.0       4728.7    375.3
Shell Scripts (1 concurrent)                     42.4       2223.1    524.3
Shell Scripts (8 concurrent)                      6.0        883.1   1471.9
System Call Overhead                          15000.0     518639.1    345.8
                                                                   ========
System Benchmarks Index Score                                         500.2

2, One copy, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9567674.7    819.9
Double-Precision Whetstone                       55.0       2805.5    510.1
Execl Throughput                                 43.0       2392.7    556.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     417804.0   1055.1
File Copy 256 bufsize 500 maxblocks            1655.0     112909.5    682.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1255207.4   2164.2
Pipe Throughput                               12440.0     555712.0    446.7
Pipe-based Context Switching                   4000.0      99964.5    249.9
Process Creation                                126.0       5192.5    412.1
Shell Scripts (1 concurrent)                     42.4       2302.4    543.0
Shell Scripts (8 concurrent)                      6.0        919.6   1532.6
System Call Overhead                          15000.0     511159.3    340.8
                                                                   ========
System Benchmarks Index Score                                         640.1

3, Four copies, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38268610.5   3279.2
Double-Precision Whetstone                       55.0      11222.2   2040.4
Execl Throughput                                 43.0       7892.0   1835.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     235149.6    593.8
File Copy 256 bufsize 500 maxblocks            1655.0      74959.6    452.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     545048.5    939.7
Pipe Throughput                               12440.0    1337359.0   1075.0
Pipe-based Context Switching                   4000.0     473663.9   1184.2
Process Creation                                126.0      17491.2   1388.2
Shell Scripts (1 concurrent)                     42.4       6865.7   1619.3
Shell Scripts (8 concurrent)                      6.0       1015.9   1693.1
System Call Overhead                          15000.0    1899535.2   1266.4
                                                                   ========
System Benchmarks Index Score                                        1278.3

4, Four copies, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38272815.5   3279.6
Double-Precision Whetstone                       55.0      11222.8   2040.5
Execl Throughput                                 43.0       8839.2   2055.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     313912.9    792.7
File Copy 256 bufsize 500 maxblocks            1655.0      80976.1    489.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    1176594.3   2028.6
Pipe Throughput                               12440.0    2100941.9   1688.9
Pipe-based Context Switching                   4000.0     476696.4   1191.7
Process Creation                                126.0      18394.7   1459.9
Shell Scripts (1 concurrent)                     42.4       7172.2   1691.6
Shell Scripts (8 concurrent)                      6.0       1058.3   1763.9
System Call Overhead                          15000.0    1874714.7   1249.8
                                                                   ========
System Benchmarks Index Score                                        1488.8

Signed-off-by: Jun Yi &lt;yijun@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the alternative to optimize common libraries according whether CPU
has UAL (hardware unaligned access support) feature, including memset(),
memcopy(), memmove(), copy_user() and clear_user().

We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):

1, One copy, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9566582.0    819.8
Double-Precision Whetstone                       55.0       2805.3    510.1
Execl Throughput                                 43.0       2120.0    493.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     209833.0    529.9
File Copy 256 bufsize 500 maxblocks            1655.0      89400.0    540.2
File Copy 4096 bufsize 8000 maxblocks          5800.0     320036.0    551.8
Pipe Throughput                               12440.0     340624.0    273.8
Pipe-based Context Switching                   4000.0     109939.1    274.8
Process Creation                                126.0       4728.7    375.3
Shell Scripts (1 concurrent)                     42.4       2223.1    524.3
Shell Scripts (8 concurrent)                      6.0        883.1   1471.9
System Call Overhead                          15000.0     518639.1    345.8
                                                                   ========
System Benchmarks Index Score                                         500.2

2, One copy, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    9567674.7    819.9
Double-Precision Whetstone                       55.0       2805.5    510.1
Execl Throughput                                 43.0       2392.7    556.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     417804.0   1055.1
File Copy 256 bufsize 500 maxblocks            1655.0     112909.5    682.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1255207.4   2164.2
Pipe Throughput                               12440.0     555712.0    446.7
Pipe-based Context Switching                   4000.0      99964.5    249.9
Process Creation                                126.0       5192.5    412.1
Shell Scripts (1 concurrent)                     42.4       2302.4    543.0
Shell Scripts (8 concurrent)                      6.0        919.6   1532.6
System Call Overhead                          15000.0     511159.3    340.8
                                                                   ========
System Benchmarks Index Score                                         640.1

3, Four copies, before patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38268610.5   3279.2
Double-Precision Whetstone                       55.0      11222.2   2040.4
Execl Throughput                                 43.0       7892.0   1835.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     235149.6    593.8
File Copy 256 bufsize 500 maxblocks            1655.0      74959.6    452.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     545048.5    939.7
Pipe Throughput                               12440.0    1337359.0   1075.0
Pipe-based Context Switching                   4000.0     473663.9   1184.2
Process Creation                                126.0      17491.2   1388.2
Shell Scripts (1 concurrent)                     42.4       6865.7   1619.3
Shell Scripts (8 concurrent)                      6.0       1015.9   1693.1
System Call Overhead                          15000.0    1899535.2   1266.4
                                                                   ========
System Benchmarks Index Score                                        1278.3

4, Four copies, after patch:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38272815.5   3279.6
Double-Precision Whetstone                       55.0      11222.8   2040.5
Execl Throughput                                 43.0       8839.2   2055.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     313912.9    792.7
File Copy 256 bufsize 500 maxblocks            1655.0      80976.1    489.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    1176594.3   2028.6
Pipe Throughput                               12440.0    2100941.9   1688.9
Pipe-based Context Switching                   4000.0     476696.4   1191.7
Process Creation                                126.0      18394.7   1459.9
Shell Scripts (1 concurrent)                     42.4       7172.2   1691.6
Shell Scripts (8 concurrent)                      6.0       1058.3   1763.9
System Call Overhead                          15000.0    1874714.7   1249.8
                                                                   ========
System Benchmarks Index Score                                        1488.8

Signed-off-by: Jun Yi &lt;yijun@loongson.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
</feed>
