crypto: arm/xor - make vectorized C code Clang-friendly

The ARM version of the accelerated XOR routines are simply the 8-way C routines passed through the auto-vectorizer with SIMD codegen enabled. This used to require GCC version 4.6 at least, but given that 5.1 is now the baseline, this check is no longer necessary, and actually misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as well, but makes no attempt at doing this in a way that conveys feature parity with a certain version of GCC (which would not be a great idea in the first place). So let's drop the version check, and make the auto-vectorize pragma (which is based on a GCC-specific command line option) GCC-only. Since Clang performs SIMD auto-vectorization by default at -O2, no pragma is necessary here. Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/496 Link: https://github.com/ClangBuiltLinux/linux/issues/503 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
author: Ard Biesheuvel <ardb@kernel.org> 2022-02-05 16:23:46 +0100
committer: Herbert Xu <herbert@gondor.apana.org.au> 2022-02-11 10:39:39 +0100
commit: a69cb445f7d129abf7c50d48c8a8eca7c8d5df15 (patch)
tree: 71e2dfdd16647c73ee8523c2d01604778a33b94b /arch/arm/lib
parent: lib/xor: make xor prototypes more friendly to compiler vectorization (diff)
download: linux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.tar.xz
linux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.zip
1 files changed, 3 insertions, 9 deletions
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index b99dd8e1c93f..522510baed49 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -17,17 +17,11 @@ MODULE_LICENSE("GPL");
 /*
  * Pull in the reference implementations while instructing GCC (through
  * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions.
+ * NEON instructions. Clang does this by default at O2 so no pragma is
+ * needed.
  */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#ifdef CONFIG_CC_IS_GCC
 #pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
 #endif
 
 #pragma GCC diagnostic ignored "-Wunused-variable"
author	Ard Biesheuvel <ardb@kernel.org>	2022-02-05 16:23:46 +0100
committer	Herbert Xu <herbert@gondor.apana.org.au>	2022-02-11 10:39:39 +0100
commit	a69cb445f7d129abf7c50d48c8a8eca7c8d5df15 (patch)
tree	71e2dfdd16647c73ee8523c2d01604778a33b94b /arch/arm/lib
parent	lib/xor: make xor prototypes more friendly to compiler vectorization (diff)
download	linux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.tar.xz linux-a69cb445f7d129abf7c50d48c8a8eca7c8d5df15.zip