summaryrefslogtreecommitdiffstats
path: root/lib/raid6 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* lib/raid6: fix unnecessary rebuild of vpermxor*.cMasahiro Yamada2019-07-311-1/+1
| | | | | | | | | | | | | | The following four files are every time rebuilt: UNROLL lib/raid6/vpermxor1.c UNROLL lib/raid6/vpermxor2.c UNROLL lib/raid6/vpermxor4.c UNROLL lib/raid6/vpermxor8.c Fix the suffixes in the targets. Fixes: 72ad21075df8 ("lib/raid6: refactor unroll rules with pattern rules") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* Merge tag 'kbuild-v5.3' of ↵Linus Torvalds2019-07-131-87/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - remove headers_{install,check}_all targets - remove unreasonable 'depends on !UML' from CONFIG_SAMPLES - re-implement 'make headers_install' more cleanly - add new header-test-y syntax to compile-test headers - compile-test exported headers to ensure they are compilable in user-space - compile-test headers under include/ to ensure they are self-contained - remove -Waggregate-return, -Wno-uninitialized, -Wno-unused-value flags - add -Werror=unknown-warning-option for Clang - add 128-bit built-in types support to genksyms - fix missed rebuild of modules.builtin - propagate 'No space left on device' error in fixdep to Make - allow Clang to use its integrated assembler - improve some coccinelle scripts - add a new flag KBUILD_ABS_SRCTREE to request Kbuild to use absolute path for $(srctree). - do not ignore errors when compression utility is missing - misc cleanups * tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (49 commits) kbuild: use -- separater intead of $(filter-out ...) for cc-cross-prefix kbuild: Inform user to pass ARCH= for make mrproper kbuild: fix compression errors getting ignored kbuild: add a flag to force absolute path for srctree kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree kbuild: remove src and obj from the top Makefile scripts/tags.sh: remove unused environment variables from comments scripts/tags.sh: drop SUBARCH support for ARM kbuild: compile-test kernel headers to ensure they are self-contained kheaders: include only headers into kheaders_data.tar.xz kheaders: remove meaningless -R option of 'ls' kbuild: support header-test-pattern-y kbuild: do not create wrappers for header-test-y kbuild: compile-test exported headers to ensure they are self-contained init/Kconfig: add CONFIG_CC_CAN_LINK kallsyms: exclude kasan local symbols on s390 kbuild: add more hints about SUBDIRS replacement coccinelle: api/stream_open: treat all wait_.*() calls as blocking coccinelle: put_device: Add a cast to an expression for an assignment coccinelle: put_device: Adjust a message construction ...
| * lib/raid6: refactor unroll rules with pattern rulesMasahiro Yamada2019-06-231-86/+11
| | | | | | | | | | | | | | | | | | | | This Makefile repeats very similar rules. Let's use pattern rules. $(UNROLL) can be replaced with $*. No intended change in behavior. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
| * lib/raid6: remove duplicated CFLAGS_REMOVE_altivec8.oMasahiro Yamada2019-06-231-1/+0
| | | | | | | | | | | | No intended change in behavior. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* | Merge tag 's390-5.3-1' of ↵Linus Torvalds2019-07-081-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Improve stop_machine wait logic: replace cpu_relax_yield call in generic stop_machine function with a weak stop_machine_yield function. This is overridden on s390, which yields the current cpu to the neighbouring cpu after a couple of retries, instead of blindly giving up the cpu to the hipervisor. This significantly improves stop_machine performance on s390 in overcommitted scenarios. This includes common code changes which have been Acked by Peter Zijlstra and Thomas Gleixner. - Improve jump label transformation speed: transform jump labels without using stop_machine. - Refactoring of the vfio-ccw cp handling, simplifying the code and avoiding unneeded allocating/copying. - Various vfio-ccw fixes (ccw translation, state machine). - Add support for vfio-ap queue interrupt control in the guest. This includes s390 kvm changes which have been Acked by Christian Borntraeger. - Add protected virtualization support for virtio-ccw. - Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to remove some code which most likely isn't working at all, besides that s390 didn't even compile for !CONFIG_SMP. - Support for special flagged EP11 CPRBs for zcrypt. - Handle PCI devices with no support for new MIO instructions. - Avoid KASAN false positives in reworked stack unwinder. - Couple of fixes for the QDIO layer. - Convert s390 specific documentation to ReST format. - Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if hardware is missing. This way our modules behave like most other modules and which is also what systemd's systemd-modules-load.service expects. - Replace defconfig with performance_defconfig, so there is one config file less to maintain. - Remove the SCLP call home device driver, which was never useful. - Cleanups all over the place. * tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (83 commits) docs: s390: s390dbf: typos and formatting, update crash command docs: s390: unify and update s390dbf kdocs at debug.c docs: s390: restore important non-kdoc parts of s390dbf.rst vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1 s390/pci: correctly handle MIO opt-out s390/pci: deal with devices that have no support for MIO instructions s390: ap: kvm: Enable PQAP/AQIC facility for the guest s390: ap: implement PAPQ AQIC interception in kernel vfio: ap: register IOMMU VFIO notifier s390: ap: kvm: add PQAP interception for AQIC s390/unwind: cleanup unused READ_ONCE_TASK_STACK s390/kasan: avoid false positives during stack unwind s390/qdio: don't touch the dsci in tiqdio_add_input_queues() s390/qdio: (re-)initialize tiqdio list entries s390/dasd: Fix a precision vs width bug in dasd_feature_list() s390/cio: introduce driver_override on the css bus vfio-ccw: make convert_ccw0_to_ccw1 static vfio-ccw: Remove copy_ccw_from_iova() vfio-ccw: Factor out the ccw0-to-ccw1 transition vfio-ccw: Copy CCW data outside length calculation ...
| * | RAID/s390: remove invalid 'r' inline asm operand modifierVasily Gorbik2019-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc silently ignores unsupported inline asm operand modifiers, effectively turning '%r0' into '%0', but upcoming clang 9 complains about them: lib/raid6/s390vx8.c:63:16: error: invalid operand in inline asm: 'VLM $2,$3,0,${1:r}' asm volatile ("VLM %2,%3,0,%r1" ^ Clean up what look like a typo 'r' inline asm operand modifier usage. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* | | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2019-06-191-4/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner2019-06-055-26/+5
|/ | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 83Thomas Gleixner2019-05-242-8/+2
| | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this file is part of the linux kernel and is made available under the terms of the gnu general public license version 2 or at your option any later version incorporated herein by reference extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 18 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075211.321157221@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 48Thomas Gleixner2019-05-248-49/+8
| | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation inc 53 temple place ste 330 boston ma 02111 1307 usa either version 2 of the license or at your option any later version incorporated herein by reference extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 13 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.645641371@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2019-03-151-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM updates from Russell King: - An improvement from Ard Biesheuvel, who noted that the identity map setup was taking a long time due to flush_cache_louis(). - Update a comment about dma_ops from Wolfram Sang. - Remove use of "-p" with ld, where this flag has been a no-op since 2004. - Remove the printing of the virtual memory layout, which is no longer useful since we hide pointers. - Correct SCU help text. - Remove legacy TWD registration method. - Add pgprot_device() implementation for mapping PCI sysfs resource files. - Initialise PFN limits earlier for kmemleak. - Fix argument count to match macro definition (affects clang builds) - Use unified assembler language almost everywhere for clang, and other clang improvements (from Stefan Agner, Nathan Chancellor). - Support security extension for noMMU and other noMMU cleanups (from Vladimir Murzin). - Remove unnecessary SMP bringup code (which was incorrectly copy'n' pasted from the ARM platform implementations) and remove it from the arch code to discourge further copys of it appearing. - Add Cortex A9 erratum preventing kexec working on some SoCs. - AMBA bus identification updates from Mike Leach. - More use of raw spinlocks to avoid -RT kernel issues (from Yang Shi and Sebastian Andrzej Siewior). - MCPM hyp/svc mode mismatch fixes from Marek Szyprowski. * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (32 commits) ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4 ARM: 8848/1: virt: Align GIC version check with arm64 counterpart ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used ARM: 8845/1: use unified assembler in c files ARM: 8844/1: use unified assembler in assembly files ARM: 8843/1: use unified assembler in headers ARM: 8841/1: use unified assembler in macros ARM: 8840/1: use a raw_spinlock_t in unwind ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t ARM: 8837/1: coresight: etmv4: Update ID register table to add UCI support ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values. ARM: 8838/1: drivers: amba: Updates to component identification for driver matching. ARM: 8833/1: Ensure that NEON code always compiles with Clang ARM: avoid Cortex-A9 livelock on tight dmb loops ARM: smp: remove arch-provided "pen_release" ARM: actions: remove boot_lock and pen_release ARM: oxnas: remove CPU hotplug implementation ARM: qcom: remove unnecessary boot_lock ARM: 8832/1: NOMMU: Limit visibility for CONFIG_FLASH_{MEM_BASE,SIZE} ARM: 8831/1: NOMMU: pmsa-v8: remove unneeded semicolon ...
| * ARM: 8833/1: Ensure that NEON code always compiles with ClangNathan Chancellor2019-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While building arm32 allyesconfig, I ran into the following errors: arch/arm/lib/xor-neon.c:17:2: error: You should compile this file with '-mfloat-abi=softfp -mfpu=neon' In file included from lib/raid6/neon1.c:27: /home/nathan/cbl/prebuilt/lib/clang/8.0.0/include/arm_neon.h:28:2: error: "NEON support not enabled" Building V=1 showed NEON_FLAGS getting passed along to Clang but __ARM_NEON__ was not getting defined. Ultimately, it boils down to Clang only defining __ARM_NEON__ when targeting armv7, rather than armv6k, which is the '-march' value for allyesconfig. >From lib/Basic/Targets/ARM.cpp in the Clang source: // This only gets set when Neon instructions are actually available, unlike // the VFP define, hence the soft float and arch check. This is subtly // different from gcc, we follow the intent which was that it should be set // when Neon instructions are actually available. if ((FPU & NeonFPU) && !SoftFloat && ArchVersion >= 7) { Builder.defineMacro("__ARM_NEON", "1"); Builder.defineMacro("__ARM_NEON__"); // current AArch32 NEON implementations do not support double-precision // floating-point even when it is present in VFP. Builder.defineMacro("__ARM_NEON_FP", "0x" + Twine::utohexstr(HW_FP & ~HW_FP_DP)); } Ard Biesheuvel recommended explicitly adding '-march=armv7-a' at the beginning of the NEON_FLAGS definitions so that __ARM_NEON__ always gets definined by Clang. This doesn't functionally change anything because that code will only run where NEON is supported, which is implicitly armv7. Link: https://github.com/ClangBuiltLinux/linux/issues/287 Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
* | lib/raid6: arm: optimize away a mask operation in NEON recovery routineArd Biesheuvel2019-02-281-6/+6
| | | | | | | | | | | | | | | | | | | | The NEON recovery code was modeled after the x86 SIMD code, and for some reason, that code uses a 16 bit wide signed shift and a mask to perform what amounts to a 8 bit unsigned shift. So fold the ops together. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | lib/raid6: use vdupq_n_u8 to avoid endianness warningsndesaulniers@google.com2019-02-282-8/+4
|/ | | | | | | | | | | | | | | | | | | Clang warns: vector initializers are not compatible with NEON intrinsics in big endian mode [-Wnonportable-vector-initialization] While this is usually the case, it's not an issue for this case since we're initializing the uint8x16_t (16x uint8_t's) with the same value. Instead, use vdupq_n_u8 which both compilers lower into a single movi instruction: https://godbolt.org/z/vBrgzt This avoids the static storage for a constant value. Link: https://github.com/ClangBuiltLinux/linux/issues/214 Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* Merge tag 'kbuild-v4.21-3' of ↵Linus Torvalds2019-01-071-3/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - improve boolinit.cocci and use_after_iter.cocci semantic patches - fix alignment for kallsyms - move 'asm goto' compiler test to Kconfig and clean up jump_label CONFIG option - generate asm-generic wrappers automatically if arch does not implement mandatory UAPI headers - remove redundant generic-y defines - misc cleanups * tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: rename generated .*conf-cfg to *conf-cfg kbuild: remove unnecessary stubs for archheader and archscripts kbuild: use assignment instead of define ... endef for filechk_* rules arch: remove redundant UAPI generic-y defines kbuild: generate asm-generic wrappers if mandatory headers are missing arch: remove stale comments "UAPI Header export list" riscv: remove redundant kernel-space generic-y kbuild: change filechk to surround the given command with { } kbuild: remove redundant target cleaning on failure kbuild: clean up rule_dtc_dt_yaml kbuild: remove UIMAGE_IN and UIMAGE_OUT jump_label: move 'asm goto' support test to Kconfig kallsyms: lower alignment on ARM scripts: coccinelle: boolinit: drop warnings on named constants scripts: coccinelle: check for redeclaration kconfig: remove unused "file" field of yylval union nds32: remove redundant kernel-space generic-y nios2: remove unneeded HAS_DMA define
| * kbuild: remove redundant target cleaning on failureMasahiro Yamada2019-01-061-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special target"), the target file is automatically deleted on failure. The boilerplate code ... || { rm -f $@; false; } is unneeded. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* | Merge branch 'for-next' of ↵Jens Axboe2019-01-032-38/+46
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shli/md into for-linus Pull the pending 4.21 changes for md from Shaohua. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md: fix raid10 hang issue caused by barrier raid10: refactor common wait code from regular read/write request md: remvoe redundant condition check lib/raid6: add option to skip algo benchmarking lib/raid6: sort algos in rough performance order lib/raid6: check for assembler SSSE3 support lib/raid6: avoid __attribute_const__ redefinition lib/raid6: add missing include for raid6test md: remove set but not used variable 'bi_rdev'
| * lib/raid6: add option to skip algo benchmarkingDaniel Verkamp2018-12-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is helpful for systems where fast startup time is important. It is especially nice to avoid benchmarking RAID functions that are never used (for example, BTRFS selects RAID6_PQ even if the parity RAID mode is not in use). This saves 250+ milliseconds of boot time on modern x86 and ARM systems with a dozen or more available implementations. The new option is defaulted to 'y' to match the previous behavior of always benchmarking on init. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6: sort algos in rough performance orderDaniel Verkamp2018-12-201-38/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sort the list of RAID6 algorithms in roughly decreasing order of expected performance: newer instruction sets first (within each architecture) and wider unrollings first. This doesn't make any difference right now, since all functions are benchmarked; a follow-up change will make use of this by optionally choosing the first valid function rather than testing all of them. The Itanium raid6_intx{16,32} entries are also moved down to be near the other raid6_intx entries for clarity. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6: check for assembler SSSE3 supportDaniel Verkamp2018-12-201-0/+3
| | | | | | | | | | | | | | Allow the x86 SSSE3 recovery function to be tested in raid6test. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
* | raid6/ppc: Fix build for clangJoel Stanley2018-12-201-0/+15
|/ | | | | | | | | | | | | | | | | | | | | | | We cannot build these files with clang as it does not allow altivec instructions in assembly when -msoft-float is passed. Jinsong Ji <jji@us.ibm.com> wrote: > We currently disable Altivec/VSX support when enabling soft-float. So > any usage of vector builtins will break. > > Enable Altivec/VSX with soft-float may need quite some clean up work, so > I guess this is currently a limitation. > > Removing -msoft-float will make it work (and we are lucky that no > floating point instructions will be generated as well). This is a workaround until the issue is resolved in clang. Link: https://bugs.llvm.org/show_bug.cgi?id=31177 Link: https://github.com/ClangBuiltLinux/linux/issues/239 Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* lib/raid6: Fix arm64 test buildJeremy Linton2018-11-061-2/+2
| | | | | | | | | | | | The lib/raid6/test fails to build the neon objects on arm64 because the correct machine type is 'aarch64'. Once this is correctly enabled, the neon recovery objects need to be added to the build. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* RAID/s390: Remove VLA usageKees Cook2018-07-041-16/+18
| | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this moves the "$#" replacement from being an argument to being inside the function, which avoids generating VLAs. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* Merge tag 'powerpc-4.17-1' of ↵Linus Torvalds2018-04-076-5/+157
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
| * lib/raid6: Build proper raid6test files on powerpcMatt Brown2018-03-202-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Previously the raid6 test Makefile did not build the POWER specific files (altivec and vpermxor). This patch fixes the bug, so that all appropriate files for powerpc are built. This patch also fixes the missing and mismatched ifdef statements to allow the altivec.uc file to be built correctly. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndromeMatt Brown2018-03-205-3/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | Merge branch 'for-linus' of ↵Linus Torvalds2018-04-051-7/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces
| * | treewide: Align function definition open/close bracesJoe Perches2018-03-261-7/+7
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* / raid: remove tile specific raid6 implementationArnd Bergmann2018-03-264-103/+0
|/ | | | | | | The Tile architecture is getting removed, so we no longer need this either. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-024-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds2017-09-071-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MD updates from Shaohua Li: "This update mainly fixes bugs: - Make raid5 ppl support several ppl from Pawel - Several raid5-cache bug fixes from Song - Bitmap fixes from Neil and Me - One raid1/10 regression fix since 4.12 from Me - Other small fixes and cleanup" * tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/bitmap: disable bitmap_resize for file-backed bitmaps. raid5-ppl: Recovery support for multiple partial parity logs md: Runtime support for multiple ppls md/raid0: attach correct cgroup info in bio lib/raid6: align AVX512 constants to 512 bits, not bytes raid5: remove raid5_build_block md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show md: replace seq_release_private with seq_release md: notify about new spare disk in the container md/raid1/10: reset bio allocated from mempool md/raid5: release/flush io in raid5_do_work() md/bitmap: copy correct data for bitmap super
| * lib/raid6: align AVX512 constants to 512 bits, not bytesDenys Vlasenko2017-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: mingo@redhat.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Shaohua Li <shli@fb.com>
* | md/raid6: implement recovery using ARM NEON intrinsicsArd Biesheuvel2017-08-094-1/+233
| | | | | | | | | | | | | | | | Provide a NEON accelerated implementation of the recovery algorithm, which supersedes the default byte-by-byte one. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | md/raid6: use faster multiplication for ARM NEON delta syndromeArd Biesheuvel2017-08-091-3/+30
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The P/Q left side optimization in the delta syndrome simply involves repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given that 'x * x * x * x' equals 'x^4' even in the polynomial world, we can accelerate this substantially by performing up to 4 such operations at once, using the NEON instructions for polynomial multiplication. Results on a Cortex-A57 running in 64-bit mode: Before: ------- raid6: neonx1 xor() 1680 MB/s raid6: neonx2 xor() 2286 MB/s raid6: neonx4 xor() 3162 MB/s raid6: neonx8 xor() 3389 MB/s After: ------ raid6: neonx1 xor() 2281 MB/s raid6: neonx2 xor() 3362 MB/s raid6: neonx4 xor() 3787 MB/s raid6: neonx8 xor() 4239 MB/s While we're at it, simplify MASK() by using a signed shift rather than a vector compare involving a temp register. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* lib/raid6: Add log-of-2 table for RAID6 HW requiring disk positionAnup Patel2017-05-161-0/+20
| | | | | | | | | | | | | | | | | | | | | The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The Linux async_tx framework pass values from raid6_gfexp as coefficients for each source to prep_dma_pq() callback of DMA channel with PQ capability. This creates problem for RAID6 offload engines (such as Broadcom SBA) which take disk position (i.e. log of {2}) instead of multiplicative cofficients from raid6_gfexp table. This patch adds raid6_gflog table having log-of-2 value for any given x such that 0 <= x < 256. For any given disk coefficient x, the corresponding disk position is given by raid6_gflog[x]. The RAID6 offload engine driver can use this newly added raid6_gflog table to get disk position from multiplicative coefficient. Signed-off-by: Anup Patel <anup.patel@broadcom.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Acked-by: Shaohua Li <shli@fb.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* lib/raid6: Add AVX2 optimized xor_syndrome functionsGayatri Kammela2016-11-081-3/+229
| | | | | | | | | | | Implement the AVX2 optimization of RAID6 xor_syndrome functions which is simply based on sse2.c written by hpa. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
* Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds2016-10-077-5/+988
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MD updates from Shaohua Li: "This update includes: - new AVX512 instruction based raid6 gen/recovery algorithm - a couple of md-cluster related bug fixes - fix a potential deadlock - set nonrotational bit for raid array with SSD - set correct max_hw_sectors for raid5/6, which hopefuly can improve performance a little bit - other minor fixes" * tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md: set rotational bit raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays raid5: handle register_shrinker failure raid5: fix to detect failure of register_shrinker md: fix a potential deadlock md/bitmap: fix wrong cleanup raid5: allow arbitrary max_hw_sectors lib/raid6: Add AVX512 optimized xor_syndrome functions lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions lib/raid6: Add AVX512 optimized recovery functions lib/raid6: Add AVX512 optimized gen_syndrome functions md-cluster: make resync lock also could be interruptted md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang md-cluster: convert the completion to wait queue md-cluster: protect md_find_rdev_nr_rcu with rcu lock md-cluster: clean related infos of cluster md: changes for MD_STILL_CLOSED flag md-cluster: remove some unnecessary dlm_unlock_sync md-cluster: use FORCEUNLOCK in lockres_free md-cluster: call md_kick_rdev_from_array once ack failed
| * raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the ↵Gayatri Kammela2016-09-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | char arrays Specifying the aligned attributes to the char data[NDISKS][PAGE_SIZE], char recovi[PAGE_SIZE] and char recovi[PAGE_SIZE] arrays, so that all malloc memory is page boundary aligned. Without these alignment attributes, the test causes a segfault in userspace when the NDISKS are changed to 4 from 16. The RAID stripes will be page aligned anyway, so we want to test what the kernel actually will execute. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6: Add AVX512 optimized xor_syndrome functionsGayatri Kammela2016-09-211-3/+278
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized xor_syndrome functions, which is simply based on sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functionsGayatri Kammela2016-09-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding avx512 gen_syndrome and recovery functions so as to allow code to be compiled and tested successfully in userspace. This patch is tested in userspace and improvement in performace is observed. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6: Add AVX512 optimized recovery functionsGayatri Kammela2016-09-213-1/+392
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize RAID6 recovery functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized recovery functions, which is simply based on recov_avx2.c written by Jim Kukunas This patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * lib/raid6: Add AVX512 optimized gen_syndrome functionsGayatri Kammela2016-09-214-1/+314
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize RAID6 gen_syndrom functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized gen_syndrom functions, which is simply based on avx2.c written by Yuanhan Liu and sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
* | RAID/s390: provide raid6 recovery optimizationMartin Schwidefsky2016-09-013-1/+120
| | | | | | | | | | | | | | The XC instruction can be used to improve the speed of the raid6 recovery. The loops now operate on blocks of 256 bytes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | RAID/s390: add SIMD implementation for raid6 gen/xorMartin Schwidefsky2016-08-294-0/+178
|/ | | | | | | | | | | | | | | | | | | | | | | | Using vector registers is slightly faster: raid6: vx128x8 gen() 19705 MB/s raid6: vx128x8 xor() 11886 MB/s raid6: using algorithm vx128x8 gen() 19705 MB/s raid6: .... xor() 11886 MB/s, rmw enabled vs the software algorithms: raid6: int64x1 gen() 3018 MB/s raid6: int64x1 xor() 1429 MB/s raid6: int64x2 gen() 4661 MB/s raid6: int64x2 xor() 3143 MB/s raid6: int64x4 gen() 5392 MB/s raid6: int64x4 xor() 3509 MB/s raid6: int64x8 gen() 4441 MB/s raid6: int64x8 xor() 3207 MB/s raid6: using algorithm int64x4 gen() 5392 MB/s raid6: .... xor() 3509 MB/s, rmw enabled Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()Anton Blanchard2015-12-011-0/+1
| | | | | | | | | | | | | The enable_kernel_*() functions leave the relevant MSR bits enabled until we exit the kernel sometime later. Create disable versions that wrap the kernel use of FP, Altivec VSX or SPE. While we don't want to disable it normally for performance reasons (MSR writes are slow), it will be used for a debug boot option that does this and catches bad uses in other areas of the kernel. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* md/raid6: delta syndrome for ARM NEONArd Biesheuvel2015-08-312-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements XOR syndrome calculation using NEON intrinsics. As before, the module can be built for ARM and arm64 from the same source. Relative performance on a Cortex-A57 based system: raid6: int64x1 gen() 905 MB/s raid6: int64x1 xor() 881 MB/s raid6: int64x2 gen() 1343 MB/s raid6: int64x2 xor() 1286 MB/s raid6: int64x4 gen() 1896 MB/s raid6: int64x4 xor() 1321 MB/s raid6: int64x8 gen() 1773 MB/s raid6: int64x8 xor() 1165 MB/s raid6: neonx1 gen() 1834 MB/s raid6: neonx1 xor() 1278 MB/s raid6: neonx2 gen() 2528 MB/s raid6: neonx2 xor() 1942 MB/s raid6: neonx4 gen() 2888 MB/s raid6: neonx4 xor() 2334 MB/s raid6: neonx8 gen() 2957 MB/s raid6: neonx8 xor() 2232 MB/s raid6: using algorithm neonx8 gen() 2957 MB/s raid6: .... xor() 2232 MB/s, rmw enabled Cc: Markus Stockhausen <stockhausen@collogia.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NeilBrown <neilb@suse.com>
* Merge tag 'powerpc-4.2-1' of ↵Linus Torvalds2015-06-241-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - enable the sys_kcmp syscall from Laurent. - sysfs control for fastsleep workaround from Shreyas. - expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - fix for kernel to userspace backtraces for perf from Anton. - merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - reworked version of the patch to abort syscalls when transactional. - fix the swap encoding to support 4TB, from Aneesh. - various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits) cxl: Fix typo in debug print cxl: Add CXL_KERNEL_API config option powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() powerpc/mm: Change the swap encoding in pte. powerpc/mm: PTE_RPN_MAX is not used, remove the same powerpc/tm: Abort syscalls in active transactions powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off powerpc/include: Add opal-prd to installed uapi headers powerpc/powernv: fix construction of opal PRD messages powerpc/powernv: Increase opal-irqchip initcall priority powerpc: Make doorbell check preemption safe powerpc/powernv: pnv_init_idle_states() should only run on powernv macintosh/nvram: Remove as unused powerpc: Don't use gcc specific options on clang powerpc: Don't use -mno-strict-align on clang powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it powerpc: Only use -mabi=altivec if toolchain supports it powerpc: Fix duplicate const clang warning in user access code vfio: powerpc/spapr: Support Dynamic DMA windows vfio: powerpc/spapr: Register memory and define IOMMU v2 ...
| * powerpc: Only use -mabi=altivec if toolchain supports itAnton Blanchard2015-06-111-1/+1
| | | | | | | | | | | | | | | | The -mabi=altivec option is not recognised on LLVM, so use call cc-option to check for support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | x86/fpu: Rename i387.h to fpu/api.hIngo Molnar2015-05-191-1/+1
|/ | | | | | | | | | | | | | | | | | We already have fpu/types.h, move i387.h to fpu/api.h. The file name has become a misnomer anyway: it offers generic FPU APIs, but is not limited to i387 functionality. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* md/raid6 algorithms: xor_syndrome() for SSE2Markus Stockhausen2015-04-221-3/+227
| | | | | | | | | | | | | | The second and (last) optimized XOR syndrome calculation. This version supports right and left side optimization. All CPUs with architecture older than Haswell will benefit from it. It should be noted that SSE2 movntdq kills performance for memory areas that are read and written simultaneously in chunks smaller than cache line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR functions. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>