<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm/crypto, branch v4.12</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>crypto: arm/aes-neonbs - resolve fallback cipher at runtime</title>
<updated>2017-03-09T10:34:16+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-02-14T21:51:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b56f5cbc7e08ec7d31c42fc41e5247677f20b143'/>
<id>b56f5cbc7e08ec7d31c42fc41e5247677f20b143</id>
<content type='text'>
Currently, the bit sliced NEON AES code for ARM has a link time
dependency on the scalar ARM asm implementation, which it uses as a
fallback to perform CBC encryption and the encryption of the initial
XTS tweak.

The bit sliced NEON code is both fast and time invariant, which makes
it a reasonable default on hardware that supports it. However, the
ARM asm code it pulls in is not time invariant, and due to the way it
is linked in, cannot be overridden by the new generic time invariant
driver. In fact, it will not be used at all, given that the ARM asm
code registers itself as a cipher with a priority that exceeds the
priority of the fixed time cipher.

So remove the link time dependency, and allocate the fallback cipher
via the crypto API. Note that this requires this driver's module_init
call to be replaced with late_initcall, so that the (possibly generic)
fallback cipher is guaranteed to be available when the builtin test
is performed at registration time.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, the bit sliced NEON AES code for ARM has a link time
dependency on the scalar ARM asm implementation, which it uses as a
fallback to perform CBC encryption and the encryption of the initial
XTS tweak.

The bit sliced NEON code is both fast and time invariant, which makes
it a reasonable default on hardware that supports it. However, the
ARM asm code it pulls in is not time invariant, and due to the way it
is linked in, cannot be overridden by the new generic time invariant
driver. In fact, it will not be used at all, given that the ARM asm
code registers itself as a cipher with a priority that exceeds the
priority of the fixed time cipher.

So remove the link time dependency, and allocate the fallback cipher
via the crypto API. Note that this requires this driver's module_init
call to be replaced with late_initcall, so that the (possibly generic)
fallback cipher is guaranteed to be available when the builtin test
is performed at registration time.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/crc32 - add build time test for CRC instruction support</title>
<updated>2017-03-01T11:47:53+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-02-28T14:36:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=efa7cebdbfde8506b54acd8947822394768cd476'/>
<id>efa7cebdbfde8506b54acd8947822394768cd476</id>
<content type='text'>
The accelerated CRC32 module for ARM may use either the scalar CRC32
instructions, the NEON 64x64 to 128 bit polynomial multiplication
(vmull.p64) instruction, or both, depending on what the current CPU
supports.

However, this also requires support in binutils, and as it turns out,
versions of binutils exist that support the vmull.p64 instruction but
not the crc32 instructions.

So refactor the Makefile logic so that this module only gets built if
binutils has support for both.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Acked-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Tested-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The accelerated CRC32 module for ARM may use either the scalar CRC32
instructions, the NEON 64x64 to 128 bit polynomial multiplication
(vmull.p64) instruction, or both, depending on what the current CPU
supports.

However, this also requires support in binutils, and as it turns out,
versions of binutils exist that support the vmull.p64 instruction but
not the crc32 instructions.

So refactor the Makefile logic so that this module only gets built if
binutils has support for both.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Acked-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Tested-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/crc32 - fix build error with outdated binutils</title>
<updated>2017-03-01T11:47:51+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-02-28T14:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1fb1683cb343d80736625f3048de2107cf5bbf79'/>
<id>1fb1683cb343d80736625f3048de2107cf5bbf79</id>
<content type='text'>
Annotate a vmov instruction with an explicit element size of 32 bits.
This is inferred by recent toolchains, but apparently, older versions
need some help figuring this out.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Annotate a vmov instruction with an explicit element size of 32 bits.
This is inferred by recent toolchains, but apparently, older versions
need some help figuring this out.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes - don't use IV buffer to return final keystream block</title>
<updated>2017-02-03T10:16:21+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-02-02T11:38:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a20b96612656b3ff2d6967c3111bec0e21904a8'/>
<id>1a20b96612656b3ff2d6967c3111bec0e21904a8</id>
<content type='text'>
The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.

So use a separate buffer to return the final keystream block.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.

So use a separate buffer to return the final keystream block.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/chacha20 - remove cra_alignmask</title>
<updated>2017-02-03T10:16:19+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-28T23:25:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4a70b52620357a680ac71d5b63ceb533eb6216cb'/>
<id>4a70b52620357a680ac71d5b63ceb533eb6216cb</id>
<content type='text'>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes-ce - remove cra_alignmask</title>
<updated>2017-02-03T10:16:16+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-28T23:25:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1465fb13d3599e465b3b202f8ebbb5e1ca9b1970'/>
<id>1465fb13d3599e465b3b202f8ebbb5e1ca9b1970</id>
<content type='text'>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes-neonbs - fix issue with v2.22 and older assembler</title>
<updated>2017-01-23T14:50:25+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-19T12:23:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=13954e788dc7e3a47cc00419c120a63756be4eb1'/>
<id>13954e788dc7e3a47cc00419c120a63756be4eb1</id>
<content type='text'>
The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;

.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'

Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;

.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'

Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code</title>
<updated>2017-01-13T10:47:21+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-13T08:33:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=658fa754cd0700e8e96d61898f0953c4514859c6'/>
<id>658fa754cd0700e8e96d61898f0953c4514859c6</id>
<content type='text'>
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.

Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.

Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes - replace bit-sliced OpenSSL NEON code</title>
<updated>2017-01-13T10:27:31+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-11T16:41:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cc477bf645736739e69d31fdf715281ef0dd5f9b'/>
<id>cc477bf645736739e69d31fdf715281ef0dd5f9b</id>
<content type='text'>
This replaces the unwieldy generated implementation of bit-sliced AES
in CBC/CTR/XTS modes that originated in the OpenSSL project with a
new version that is heavily based on the OpenSSL implementation, but
has a number of advantages over the old version:
- it does not rely on the scalar AES cipher that also originated in the
  OpenSSL project and contains redundant lookup tables and key schedule
  generation routines (which we already have in crypto/aes_generic.)
- it uses the same expanded key schedule for encryption and decryption,
  reducing the size of the per-key data structure by 1696 bytes
- it adds an implementation of AES in ECB mode, which can be wrapped by
  other generic chaining mode implementations
- it moves the handling of corner cases that are non critical to performance
  to the glue layer written in C
- it was written directly in assembler rather than generated from a Perl
  script

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This replaces the unwieldy generated implementation of bit-sliced AES
in CBC/CTR/XTS modes that originated in the OpenSSL project with a
new version that is heavily based on the OpenSSL implementation, but
has a number of advantages over the old version:
- it does not rely on the scalar AES cipher that also originated in the
  OpenSSL project and contains redundant lookup tables and key schedule
  generation routines (which we already have in crypto/aes_generic.)
- it uses the same expanded key schedule for encryption and decryption,
  reducing the size of the per-key data structure by 1696 bytes
- it adds an implementation of AES in ECB mode, which can be wrapped by
  other generic chaining mode implementations
- it moves the handling of corner cases that are non critical to performance
  to the glue layer written in C
- it was written directly in assembler rather than generated from a Perl
  script

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm/aes - replace scalar AES cipher</title>
<updated>2017-01-12T16:26:50+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2017-01-11T16:41:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81edb42629758bacdf813dd5e4542ae26e3ad73a'/>
<id>81edb42629758bacdf813dd5e4542ae26e3ad73a</id>
<content type='text'>
This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).

Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.

* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
  for 128-bit keys.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).

Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.

* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
  for 128-bit keys.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
