diff options
author | Matt Brown <matthew.brown.dev@gmail.com> | 2017-08-04 05:42:32 +0200 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2018-03-20 06:47:25 +0100 |
commit | 751ba79cc552c146595cd439b21c4ff8998c3b69 (patch) | |
tree | fc7aa71ed1ca788ab3a9c553021f7c876ccd4115 /arch/powerpc/include/asm/ppc-opcode.h | |
parent | powerpc/5200: dts: digsy_mtc.dts: fix rv3029 compatible (diff) | |
download | linux-751ba79cc552c146595cd439b21c4ff8998c3b69.tar.xz linux-751ba79cc552c146595cd439b21c4ff8998c3b69.zip |
lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
This patch uses the vpermxor instruction to optimise the raid6 Q
syndrome. This instruction was made available with POWER8, ISA version
2.07. It allows for both vperm and vxor instructions to be done in a
single instruction. This has been tested for correctness on a ppc64le
vm with a basic RAID6 setup containing 5 drives.
The performance benchmarks are from the raid6test in the
/lib/raid6/test directory. These results are from an IBM Firestone
machine with ppc64le architecture. The benchmark results show a 35%
speed increase over the best existing algorithm for powerpc (altivec).
The raid6test has also been run on a big-endian ppc64 vm to ensure it
also works for big-endian architectures.
Performance benchmarks:
raid6: altivecx4 gen() 18773 MB/s
raid6: altivecx8 gen() 19438 MB/s
raid6: vpermxor4 gen() 25112 MB/s
raid6: vpermxor8 gen() 26279 MB/s
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Add VPERMXOR macro so we can build with old binutils]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'arch/powerpc/include/asm/ppc-opcode.h')
-rw-r--r-- | arch/powerpc/include/asm/ppc-opcode.h | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index f1083bcf449c..7370da18035e 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -271,6 +271,7 @@ #define PPC_INST_TLBSRX_DOT 0x7c0006a5 #define PPC_INST_VPMSUMW 0x10000488 #define PPC_INST_VPMSUMD 0x100004c8 +#define PPC_INST_VPERMXOR 0x1000002d #define PPC_INST_XXLOR 0xf0000490 #define PPC_INST_XXSWAPD 0xf0000250 #define PPC_INST_XVCPSGNDP 0xf0000780 @@ -517,6 +518,11 @@ #define XVCPSGNDP(t, a, b) stringify_in_c(.long (PPC_INST_XVCPSGNDP | \ VSX_XX3((t), (a), (b)))) +#define VPERMXOR(vrt, vra, vrb, vrc) \ + stringify_in_c(.long (PPC_INST_VPERMXOR | \ + ___PPC_RT(vrt) | ___PPC_RA(vra) | \ + ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6))) + #define PPC_NAP stringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) #define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) |