<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm64/lib, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>arm64/crc-t10dif: expose CRC-T10DIF function through lib</title>
<updated>2024-12-02T01:23:13+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2024-12-02T01:20:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2051da858534a73589cdb27af914fe1c03b9ee98'/>
<id>2051da858534a73589cdb27af914fe1c03b9ee98</id>
<content type='text'>
Move the arm64 CRC-T10DIF assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to
arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the arm64 CRC-T10DIF assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to
arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crc32: expose whether the lib is really optimized at runtime</title>
<updated>2024-12-02T01:23:01+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2024-12-02T01:08:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5ae12e0ee099e4c458f7814f0317f4e2cbf105e'/>
<id>b5ae12e0ee099e4c458f7814f0317f4e2cbf105e</id>
<content type='text'>
Make the CRC32 library export a function crc32_optimizations() which
returns flags that indicate which CRC32 functions are actually executing
optimized code at runtime.

This will be used to determine whether the crc32[c]-$arch shash
algorithms should be registered in the crypto API.  btrfs could also
start using these flags instead of the hack that it currently uses where
it parses the crypto_shash_driver_name.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make the CRC32 library export a function crc32_optimizations() which
returns flags that indicate which CRC32 functions are actually executing
optimized code at runtime.

This will be used to determine whether the crc32[c]-$arch shash
algorithms should be registered in the crypto API.  btrfs could also
start using these flags instead of the hack that it currently uses where
it parses the crypto_shash_driver_name.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crc32: improve support for arch-specific overrides</title>
<updated>2024-12-02T01:23:01+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2024-12-02T01:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d36cebe03c3ae4ea1fde20cfc797fab8729c3ab5'/>
<id>d36cebe03c3ae4ea1fde20cfc797fab8729c3ab5</id>
<content type='text'>
Currently the CRC32 library functions are defined as weak symbols, and
the arm64 and riscv architectures override them.

This method of arch-specific overrides has the limitation that it only
works when both the base and arch code is built-in.  Also, it makes the
arch-specific code be silently not used if it is accidentally built with
lib-y instead of obj-y; unfortunately the RISC-V code does this.

This commit reorganizes the code to have explicit *_arch() functions
that are called when they are enabled, similar to how some of the crypto
library code works (e.g. chacha_crypt() calls chacha_crypt_arch()).

Make the existing kconfig choice for the CRC32 implementation also
control whether the arch-optimized implementation (if one is available)
is enabled or not.  Make it enabled by default if CRC32 is also enabled.

The result is that arch-optimized CRC32 library functions will be
included automatically when appropriate, but it is now possible to
disable them.  They can also now be built as a loadable module if the
CRC32 library functions happen to be used only by loadable modules, in
which case the arch and base CRC32 modules will be automatically loaded
via direct symbol dependency when appropriate.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the CRC32 library functions are defined as weak symbols, and
the arm64 and riscv architectures override them.

This method of arch-specific overrides has the limitation that it only
works when both the base and arch code is built-in.  Also, it makes the
arch-specific code be silently not used if it is accidentally built with
lib-y instead of obj-y; unfortunately the RISC-V code does this.

This commit reorganizes the code to have explicit *_arch() functions
that are called when they are enabled, similar to how some of the crypto
library code works (e.g. chacha_crypt() calls chacha_crypt_arch()).

Make the existing kconfig choice for the CRC32 implementation also
control whether the arch-optimized implementation (if one is available)
is enabled or not.  Make it enabled by default if CRC32 is also enabled.

The result is that arch-optimized CRC32 library functions will be
included automatically when appropriate, but it is now possible to
disable them.  They can also now be built as a loadable module if the
CRC32 library functions happen to be used only by loadable modules, in
which case the arch and base CRC32 modules will be automatically loaded
via direct symbol dependency when appropriate.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crc32: drop leading underscores from __crc32c_le_base</title>
<updated>2024-12-02T01:23:01+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2024-12-02T01:08:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0a499a7e9819e7a0980408f18df68160a0b55f2e'/>
<id>0a499a7e9819e7a0980408f18df68160a0b55f2e</id>
<content type='text'>
Remove the leading underscores from __crc32c_le_base().

This is in preparation for adding crc32c_le_arch() and eventually
renaming __crc32c_le() to crc32c_le().

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the leading underscores from __crc32c_le_base().

This is in preparation for adding crc32c_le_arch() and eventually
renaming __crc32c_le() to crc32c_le().

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241202010844.144356-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next/mops' into for-next/core</title>
<updated>2024-11-14T12:07:28+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2024-11-14T12:07:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=437330d90c507be109a161667a77eaf61be0edac'/>
<id>437330d90c507be109a161667a77eaf61be0edac</id>
<content type='text'>
* for-next/mops:
  : More FEAT_MOPS (memcpy instructions) uses - in-kernel routines
  arm64: mops: Document requirements for hypervisors
  arm64: lib: Use MOPS for copy_page() and clear_page()
  arm64: lib: Use MOPS for memcpy() routines
  arm64: mops: Document booting requirement for HCR_EL2.MCE2
  arm64: mops: Handle MOPS exceptions from EL1
  arm64: probes: Disable kprobes/uprobes on MOPS instructions

# Conflicts:
#	arch/arm64/kernel/entry-common.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* for-next/mops:
  : More FEAT_MOPS (memcpy instructions) uses - in-kernel routines
  arm64: mops: Document requirements for hypervisors
  arm64: lib: Use MOPS for copy_page() and clear_page()
  arm64: lib: Use MOPS for memcpy() routines
  arm64: mops: Document booting requirement for HCR_EL2.MCE2
  arm64: mops: Handle MOPS exceptions from EL1
  arm64: probes: Disable kprobes/uprobes on MOPS instructions

# Conflicts:
#	arch/arm64/kernel/entry-common.c
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64/crc32: Implement 4-way interleave using PMULL</title>
<updated>2024-10-22T10:54:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2024-10-18T07:53:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a6478d69cf56d5deb4c28a6486376d9c7895abec'/>
<id>a6478d69cf56d5deb4c28a6486376d9c7895abec</id>
<content type='text'>
Now that kernel mode NEON no longer disables preemption, using FP/SIMD
in library code which is not obviously part of the crypto subsystem is
no longer problematic, as it will no longer incur unexpected latencies.

So accelerate the CRC-32 library code on arm64 to use a 4-way
interleave, using PMULL instructions to implement the folding.

On Apple M2, this results in a speedup of 2 - 2.8x when using input
sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
restoring the FP/SIMD register file may not be worth it, so 1k is used
as a threshold for choosing this code path.

The coefficient tables were generated using code provided by Eric. [0]

[0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c

Cc: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-8-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that kernel mode NEON no longer disables preemption, using FP/SIMD
in library code which is not obviously part of the crypto subsystem is
no longer problematic, as it will no longer incur unexpected latencies.

So accelerate the CRC-32 library code on arm64 to use a 4-way
interleave, using PMULL instructions to implement the folding.

On Apple M2, this results in a speedup of 2 - 2.8x when using input
sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
restoring the FP/SIMD register file may not be worth it, so 1k is used
as a threshold for choosing this code path.

The coefficient tables were generated using code provided by Eric. [0]

[0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c

Cc: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-8-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64/crc32: Reorganize bit/byte ordering macros</title>
<updated>2024-10-22T10:54:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2024-10-18T07:53:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b98b23e19492f4009070761c53b755f623f60e49'/>
<id>b98b23e19492f4009070761c53b755f623f60e49</id>
<content type='text'>
In preparation for a new user, reorganize the bit/byte ordering macros
that are used to parameterize the crc32 template code and instantiate
CRC-32, CRC-32c and 'big endian' CRC-32.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-7-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for a new user, reorganize the bit/byte ordering macros
that are used to parameterize the crc32 template code and instantiate
CRC-32, CRC-32c and 'big endian' CRC-32.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-7-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64/lib: Handle CRC-32 alternative in C code</title>
<updated>2024-10-22T10:54:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2024-10-18T07:53:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc7454107d1b7c27bb98d3b109e5f44a8a46d7f8'/>
<id>fc7454107d1b7c27bb98d3b109e5f44a8a46d7f8</id>
<content type='text'>
In preparation for adding another code path for performing CRC-32, move
the alternative patching for ARM64_HAS_CRC32 into C code. The logic for
deciding whether to use this new code path will be implemented in C too.

Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-6-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for adding another code path for performing CRC-32, move
the alternative patching for ARM64_HAS_CRC32 into C code. The logic for
deciding whether to use this new code path will be implemented in C too.

Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20241018075347.2821102-6-ardb+git@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: lib: Use MOPS for copy_page() and clear_page()</title>
<updated>2024-10-17T15:42:51+00:00</updated>
<author>
<name>Kristina Martsenko</name>
<email>kristina.martsenko@arm.com</email>
</author>
<published>2024-09-30T16:10:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ce6b5ff5f16dd9267d62d09b3af3f0c7dc3c24f0'/>
<id>ce6b5ff5f16dd9267d62d09b3af3f0c7dc3c24f0</id>
<content type='text'>
Similarly to what was done to the memcpy() routines, make copy_page()
and clear_page() also use the Armv8.8 FEAT_MOPS instructions.

Note: For copy_page() this uses the CPY* instructions instead of CPYF*
as CPYF* doesn't allow src and dst to be equal. It's not clear if
copy_page() needs to allow equal src and dst but it has worked so far
with the current implementation and there is no documentation forbidding
it.

Note, the unoptimized version of copy_page() in assembler.h is left as
it is.

Signed-off-by: Kristina Martsenko &lt;kristina.martsenko@arm.com&gt;
Link: https://lore.kernel.org/r/20240930161051.3777828-6-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similarly to what was done to the memcpy() routines, make copy_page()
and clear_page() also use the Armv8.8 FEAT_MOPS instructions.

Note: For copy_page() this uses the CPY* instructions instead of CPYF*
as CPYF* doesn't allow src and dst to be equal. It's not clear if
copy_page() needs to allow equal src and dst but it has worked so far
with the current implementation and there is no documentation forbidding
it.

Note, the unoptimized version of copy_page() in assembler.h is left as
it is.

Signed-off-by: Kristina Martsenko &lt;kristina.martsenko@arm.com&gt;
Link: https://lore.kernel.org/r/20240930161051.3777828-6-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: lib: Use MOPS for memcpy() routines</title>
<updated>2024-10-17T15:42:51+00:00</updated>
<author>
<name>Kristina Martsenko</name>
<email>kristina.martsenko@arm.com</email>
</author>
<published>2024-09-30T16:10:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=836ed3c4e473fef9e0814a2ba6dd40f9656c03f1'/>
<id>836ed3c4e473fef9e0814a2ba6dd40f9656c03f1</id>
<content type='text'>
Make memcpy(), memmove() and memset() use the Armv8.8 FEAT_MOPS
instructions when implemented on the CPU.

The CPY*/SET* instructions copy or set a block of memory of arbitrary
size and alignment. They can be interrupted by the CPU and the copying
resumed later. Their performance is expected to be close to the best
generic copy/set sequence of loads/stores for a given CPU. Using them in
the kernel's copy/set routines therefore avoids the need to periodically
rewrite the routines to optimize for new microarchitectures. It could
also lead to a performance improvement for some CPUs and systems.

With this change the kernel will always use the instructions if they are
implemented on the CPU (and have not been disabled by the arm64.nomops
command line parameter). When not implemented the usual routines will be
used (patched via alternatives). Note, we need to patch B/NOP instead of
the whole sequence to avoid executing a partially patched sequence in
case the compiler generates a mem*() call inside the alternatives
patching code.

Note that MOPS instructions have relaxed behavior on Device memory, but
it is expected that these routines are not generally used on MMIO.

Note: For memcpy(), this uses the CPY* instructions instead of CPYF*, as
CPY* allows overlaps between the source and destination buffers, and
despite contradicting the C standard, compilers require that memcpy()
work on exactly overlapping source and destination:
  https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language
  https://reviews.llvm.org/D86993

Signed-off-by: Kristina Martsenko &lt;kristina.martsenko@arm.com&gt;
Link: https://lore.kernel.org/r/20240930161051.3777828-5-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make memcpy(), memmove() and memset() use the Armv8.8 FEAT_MOPS
instructions when implemented on the CPU.

The CPY*/SET* instructions copy or set a block of memory of arbitrary
size and alignment. They can be interrupted by the CPU and the copying
resumed later. Their performance is expected to be close to the best
generic copy/set sequence of loads/stores for a given CPU. Using them in
the kernel's copy/set routines therefore avoids the need to periodically
rewrite the routines to optimize for new microarchitectures. It could
also lead to a performance improvement for some CPUs and systems.

With this change the kernel will always use the instructions if they are
implemented on the CPU (and have not been disabled by the arm64.nomops
command line parameter). When not implemented the usual routines will be
used (patched via alternatives). Note, we need to patch B/NOP instead of
the whole sequence to avoid executing a partially patched sequence in
case the compiler generates a mem*() call inside the alternatives
patching code.

Note that MOPS instructions have relaxed behavior on Device memory, but
it is expected that these routines are not generally used on MMIO.

Note: For memcpy(), this uses the CPY* instructions instead of CPYF*, as
CPY* allows overlaps between the source and destination buffers, and
despite contradicting the C standard, compilers require that memcpy()
work on exactly overlapping source and destination:
  https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language
  https://reviews.llvm.org/D86993

Signed-off-by: Kristina Martsenko &lt;kristina.martsenko@arm.com&gt;
Link: https://lore.kernel.org/r/20240930161051.3777828-5-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
