summaryrefslogtreecommitdiffstats
path: root/arch/arm/lib
diff options
context:
space:
mode:
authorArd Biesheuvel <ardb@kernel.org>2022-02-05 16:23:46 +0100
committerHerbert Xu <herbert@gondor.apana.org.au>2022-02-11 10:39:39 +0100
commita69cb445f7d129abf7c50d48c8a8eca7c8d5df15 (patch)
tree71e2dfdd16647c73ee8523c2d01604778a33b94b /arch/arm/lib
parentlib/xor: make xor prototypes more friendly to compiler vectorization (diff)
downloadlinux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.tar.xz
linux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.zip
crypto: arm/xor - make vectorized C code Clang-friendly
The ARM version of the accelerated XOR routines are simply the 8-way C routines passed through the auto-vectorizer with SIMD codegen enabled. This used to require GCC version 4.6 at least, but given that 5.1 is now the baseline, this check is no longer necessary, and actually misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as well, but makes no attempt at doing this in a way that conveys feature parity with a certain version of GCC (which would not be a great idea in the first place). So let's drop the version check, and make the auto-vectorize pragma (which is based on a GCC-specific command line option) GCC-only. Since Clang performs SIMD auto-vectorization by default at -O2, no pragma is necessary here. Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/496 Link: https://github.com/ClangBuiltLinux/linux/issues/503 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'arch/arm/lib')
-rw-r--r--arch/arm/lib/xor-neon.c12
1 files changed, 3 insertions, 9 deletions
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index b99dd8e1c93f..522510baed49 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -17,17 +17,11 @@ MODULE_LICENSE("GPL");
/*
* Pull in the reference implementations while instructing GCC (through
* -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions.
+ * NEON instructions. Clang does this by default at O2 so no pragma is
+ * needed.
*/
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#ifdef CONFIG_CC_IS_GCC
#pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
#endif
#pragma GCC diagnostic ignored "-Wunused-variable"