diff options
author | Eric Biggers <ebiggers@google.com> | 2024-03-29 09:03:53 +0100 |
---|---|---|
committer | Herbert Xu <herbert@gondor.apana.org.au> | 2024-04-05 09:46:33 +0200 |
commit | ee63fea005be4a84a50603269ac9ca2c9bf9c6ca (patch) | |
tree | 493c83c81b37eb8a26f632170ce1b25abc7d6a1b /arch/x86/kernel/fred.c | |
parent | crypto: x86/aes-xts - wire up VAES + AVX2 implementation (diff) | |
download | linux-ee63fea005be4a84a50603269ac9ca2c9bf9c6ca.tar.xz linux-ee63fea005be4a84a50603269ac9ca2c9bf9c6ca.zip |
crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
extensions. This implementation avoids using zmm registers, instead
using ymm registers to operate on two AES blocks at a time. The
assembly code is instantiated using a macro so that most of the source
code is shared with other implementations.
This is the optimal implementation on CPUs that support VAES and AVX512
but where the zmm registers should not be used due to downclocking
effects, for example Intel's Ice Lake. It should also be the optimal
implementation on future CPUs that support AVX10/256 but not AVX10/512.
The performance is slightly better than that of xts-aes-vaes-avx2, which
uses the same 256-bit vector length, due to factors such as being able
to use ymm16-ymm31 to cache the AES round keys, and being able to use
the vpternlogd instruction to do XORs more efficiently. For example, on
Ice Lake, the throughput of decrypting 4096-byte messages with
AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
xts-aes-vaes-avx2. While this is a small improvement, it is
straightforward to provide this implementation (xts-aes-vaes-avx10_256)
as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
anyway, due to the way the _aes_xts_crypt macro is structured.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'arch/x86/kernel/fred.c')
0 files changed, 0 insertions, 0 deletions