linux.git/arch/arm/crypto, branch v5.11

crypto: arm/chacha-neon - add missing counter increment

2021-01-02T21:35:35+00:00

Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block
size multiples") refactored the chacha block handling in the glue code in
a way that may result in the counter increment to be omitted when calling
chacha_block_xor_neon() to process a full block. This violates the skcipher
API, which requires that the output IV is suitable for handling more input
as long as the preceding input has been presented in round multiples of the
block size. Also, the same code is exposed via the chacha library interface
whose callers may actually rely on this increment to occur even for final
blocks that are smaller than the chacha block size.

So increment the counter after calling chacha_block_xor_neon().

Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples")
Reported-by: Eric Biggers 
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata

2020-12-04T07:13:14+00:00

ARM Cortex-A57 and Cortex-A72 cores running in 32-bit mode are affected
by silicon errata #1742098 and #1655431, respectively, where the second
instruction of a AES instruction pair may execute twice if an interrupt
is taken right after the first instruction consumes an input register of
which a single 32-bit lane has been updated the last time it was modified.

This is not such a rare occurrence as it may seem: in counter mode, only
the least significant 32-bit word is incremented in the absence of a
carry, which makes our counter mode implementation susceptible to these
errata.

So let's shuffle the counter assignments around a bit so that the most
recent updates when the AES instruction pair executes are 128-bit wide.

[0] ARM-EPM-049219 v23 Cortex-A57 MPCore Software Developers Errata Notice
[1] ARM-EPM-012079 v11.0 Cortex-A72 MPCore Software Developers Errata Notice

Cc:  # v5.4+
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: sha - split sha.h into sha1.h and sha2.h

2020-11-20T03:45:33+00:00

Currently  contains declarations for both SHA-1 and SHA-2,
and  contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure.  So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split  into two headers  and
, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1.  It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers 
Acked-by: Ard Biesheuvel 
Acked-by: Jason A. Donenfeld 
Signed-off-by: Herbert Xu

crypto: arm/chacha-neon - optimize for non-block size multiples

2020-11-13T09:38:44+00:00

The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.

For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.

Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.

This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)

   Cortex-A8  (BeagleBone)       :   7%
   Cortex-A15 (Calxeda Midway)   :  21%
   Cortex-A53 (Raspberry Pi 3)   :   3%
   Cortex-A72 (Raspberry Pi 4)   :  19%

Cc: Eric Biggers 
Cc: "Jason A . Donenfeld" 
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback

2020-11-06T03:31:15+00:00

Loading the module deadlocks since:
-local cbc(aes) implementation needs a fallback and
-crypto API tries to find one but the request_module() resolves back to
the same module

Fix this by changing the module alias for cbc(aes) and
using the NEED_FALLBACK flag when requesting for a fallback algorithm.

Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc encryption path")
Signed-off-by: Horia Geantă 
Signed-off-by: Herbert Xu

crypto: arm/aes-neonbs - use typed init/exit routines for XTS

2020-09-25T07:48:15+00:00

Use the typed skcipher init/exit routines instead of the generic
cra_init/_exit routines when instantiating/releasing the XTS
skciphers.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: arm/aes-neonbs - avoid loading reorder argument on encryption

2020-09-25T07:48:15+00:00

Reordering the tweak is never necessary for encryption, so avoid the
argument load on the encryption path.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: arm/aes-neonbs - avoid hacks to prevent Thumb2 mode switches

2020-09-25T07:48:14+00:00

Instead of using a homegrown macrofied version of the adr instruction
that sets the Thumb bit in the output value, only to ensure that any
bx instructions consuming that value will not switch out of Thumb mode
when branching, use non-interworking mov (to PC) instructions, which
achieve the same thing.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu

crypto: arm/sha512-neon - avoid ADRL pseudo instruction

2020-09-25T07:48:14+00:00

The ADRL pseudo instruction is not an architectural construct, but a
convenience macro that was supported by the ARM proprietary assembler
and adopted by binutils GAS as well, but only when assembling in 32-bit
ARM mode. Therefore, it can only be used in assembler code that is known
to assemble in ARM mode only, but as it turns out, the Clang assembler
does not implement ADRL at all, and so it is better to get rid of it
entirely.

So replace the ADRL instruction with a ADR instruction that refers to
a nearer symbol, and apply the delta explicitly using an additional
instruction.

Signed-off-by: Ard Biesheuvel 
Tested-by: Nick Desaulniers 
Signed-off-by: Herbert Xu

crypto: arm/sha256-neon - avoid ADRL pseudo instruction

2020-09-25T07:48:13+00:00

The ADRL pseudo instruction is not an architectural construct, but a
convenience macro that was supported by the ARM proprietary assembler
and adopted by binutils GAS as well, but only when assembling in 32-bit
ARM mode. Therefore, it can only be used in assembler code that is known
to assemble in ARM mode only, but as it turns out, the Clang assembler
does not implement ADRL at all, and so it is better to get rid of it
entirely.

So replace the ADRL instruction with a ADR instruction that refers to
a nearer symbol, and apply the delta explicitly using an additional
instruction.

Signed-off-by: Ard Biesheuvel 
Tested-by: Nick Desaulniers 
Signed-off-by: Herbert Xu