summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* crypto: arm/crc32 - fix build error with outdated binutilsArd Biesheuvel2017-03-011-1/+1
| | | | | | | | | Annotate a vmov instruction with an explicit element size of 32 bits. This is inferred by recent toolchains, but apparently, older versions need some help figuring this out. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driverArd Biesheuvel2017-02-112-2/+267
| | | | | | | | | | | | | | | | | | On ARMv8 implementations that do not support the Crypto Extensions, such as the Raspberry Pi 3, the CCM driver falls back to the generic table based AES implementation to perform the MAC part of the algorithm, which is slow and not time invariant. So add a CBCMAC implementation to the shared glue code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Also, given how these algorithms mostly only differ in the way the key handling and the final encryption are implemented, expose CMAC and XCBC algorithms as well based on the same core update code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha512-mb - Protect sha512 mb ctx mgr accessTim Chen2017-02-111-22/+42
| | | | | | | | | | The flusher and regular multi-buffer computation via mcryptd may race with another. Add here a lock and turn off interrupt to to access multi-buffer computation state cstate->mgr before a round of computation. This should prevent the flusher code jumping in. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/crc32 - merge CRC32 and PMULL instruction based driversArd Biesheuvel2017-02-115-312/+41
| | | | | | | | | | | | | | | | The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2017-02-032-11/+14
| | | | | | | | | | | | | | | | | | | The ARM bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2017-02-032-18/+28
| | | | | | | | | | | | | | | | | | | The arm64 bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - replace scalar fallback with plain NEON fallbackArd Biesheuvel2017-02-032-11/+29
| | | | | | | | | | | | | | | | | | The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-neon-blk - tweak performance for low end coresArd Biesheuvel2017-02-032-135/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | The non-bitsliced AES implementation using the NEON is highly sensitive to micro-architectural details, and, as it turns out, the Cortex-A53 on the Raspberry Pi 3 is a core that can benefit from this code, given that its scalar AES performance is abysmal (32.9 cycles per byte). The new bitsliced AES code manages 19.8 cycles per byte on this core, but can only operate on 8 blocks at a time, which is not supported by all chaining modes. With a bit of tweaking, we can get the plain NEON code to run at 22.0 cycles per byte, making it useful for sequential modes like CBC encryption. (Like bitsliced NEON, the plain NEON implementation does not use any lookup tables, which makes it easy on the D-cache, and invulnerable to cache timing attacks) So tweak the plain NEON AES code to use tbl instructions rather than shl/sri pairs, and to avoid the need to reload permutation vectors or other constants from memory in every round. Also, improve the decryption performance by switching to 16x8 pmul instructions for the performing the multiplications in GF(2^8). To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - performance tweakArd Biesheuvel2017-02-031-33/+19
| | | | | | | | Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - avoid literals for cross-module symbol referencesArd Biesheuvel2017-02-031-5/+2
| | | | | | | | | | | | | Using simple adrp/add pairs to refer to the AES lookup tables exposed by the generic AES driver (which could be loaded far away from this driver when KASLR is in effect) was unreliable at module load time before commit 41c066f2c4d4 ("arm64: assembler: make adr_l work in modules under KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/chacha20 - remove cra_alignmaskArd Biesheuvel2017-02-031-1/+0
| | | | | | | | | Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - remove cra_alignmaskArd Biesheuvel2017-02-032-15/+9
| | | | | | | | | Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-ce-ccm - remove cra_alignmaskArd Biesheuvel2017-02-031-1/+0
| | | | | | | | | Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/chacha20 - remove cra_alignmaskArd Biesheuvel2017-02-031-1/+0
| | | | | | | | | Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/aes-ce - remove cra_alignmaskArd Biesheuvel2017-02-032-52/+47
| | | | | | | | | Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu2017-02-032-51/+48
|\ | | | | | | Merge the crypto tree to pick up arm64 output IV patch.
| * crypto: aesni - Fix failure when pcbc module is absentHerbert Xu2017-02-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | When aesni is built as a module together with pcbc, the pcbc module must be present for aesni to load. However, the pcbc module may not be present for reasons such as its absence on initramfs. This patch allows the aesni to function even if the pcbc module is enabled but not present. Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modesArd Biesheuvel2017-01-231-46/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | Update the ARMv8 Crypto Extensions and the plain NEON AES implementations in CBC and CTR modes to return the next IV back to the skcipher API client. This is necessary for chaining to work correctly. Note that for CTR, this is only done if the request is a round multiple of the block size, since otherwise, chaining is impossible anyway. Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aesni - Fix failure when built-in with modular pcbcHerbert Xu2016-12-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If aesni is built-in but pcbc is built as a module, then aesni will fail completely because when it tries to register the pcbc variant of aes the pcbc template is not available. This patch fixes this by modifying the pcbc presence test so that if aesni is built-in then pcbc must also be built-in for it to be used by aesni. Fixes: 85671860caac ("crypto: aesni - Convert to skcipher") Reported-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: x86 - make constants readonly, allow linker to merge themDenys Vlasenko2017-01-2333-74/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A lot of asm-optimized routines in arch/x86/crypto/ keep its constants in .data. This is wrong, they should be on .rodata. Mnay of these constants are the same in different modules. For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F exists in at least half a dozen places. There is a way to let linker merge them and use just one copy. The rules are as follows: mergeable objects of different sizes should not share sections. You can't put them all in one .rodata section, they will lose "mergeability". GCC puts its mergeable constants in ".rodata.cstSIZE" sections, or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used. This patch does the same: .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16 It is important that all data in such section consists of 16-byte elements, not larger ones, and there are no implicit use of one element from another. When this is not the case, use non-mergeable section: .section .rodata[.VAR_NAME], "a", @progbits This reduces .data by ~15 kbytes: text data bss dec hex filename 11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o 11112095 2690672 2630712 16433479 fac147 vmlinux.o Merged objects are visible in System.map: ffffffff81a28810 r POLY ffffffff81a28810 r POLY ffffffff81a28820 r TWOONE ffffffff81a28820 r TWOONE ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of ffffffff81a28830 r SHUF_MASK <------------- the name difference ffffffff81a28830 r SHUF_MASK ffffffff81a28830 r SHUF_MASK .. ffffffff81a28d00 r K512 <- merged three identical 640-byte tables ffffffff81a28d00 r K512 ffffffff81a28d00 r K512 Use of object names in section name suffixes is not strictly necessary, but might help if someday link stage will use garbage collection to eliminate unused sections (ld --gc-sections). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: x86/crc32c - fix %progbits -> @progbitsDenys Vlasenko2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | %progbits form is used on ARM (where @ is a comment char). x86 consistently uses @progbits everywhere else. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: George Spelvin <linux@horizon.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/aes-neonbs - fix issue with v2.22 and older assemblerArd Biesheuvel2017-01-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GNU assembler for ARM version 2.22 or older fails to infer the element size from the vmov instructions, and aborts the build in the following way; .../aes-neonbs-core.S: Assembler messages: .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7' Fix this by setting the element size explicitly, by replacing vmov with vmov.32. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/aes - avoid reserved 'tt' mnemonic in asm codeArd Biesheuvel2017-01-131-5/+5
| | | | | | | | | | | | | | | | | | | | The ARMv8-M architecture introduces 'tt' and 'ttt' instructions, which means we can no longer use 'tt' as a register alias on recent versions of binutils for ARM. So replace the alias with 'ttab'. Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/aes - replace bit-sliced OpenSSL NEON codeArd Biesheuvel2017-01-139-6499/+1429
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the unwieldy generated implementation of bit-sliced AES in CBC/CTR/XTS modes that originated in the OpenSSL project with a new version that is heavily based on the OpenSSL implementation, but has a number of advantages over the old version: - it does not rely on the scalar AES cipher that also originated in the OpenSSL project and contains redundant lookup tables and key schedule generation routines (which we already have in crypto/aes_generic.) - it uses the same expanded key schedule for encryption and decryption, reducing the size of the per-key data structure by 1696 bytes - it adds an implementation of AES in ECB mode, which can be wrapped by other generic chaining mode implementations - it moves the handling of corner cases that are non critical to performance to the glue layer written in C - it was written directly in assembler rather than generated from a Perl script Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64Ard Biesheuvel2017-01-124-0/+1393
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% (*) faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) * Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/aes - replace scalar AES cipherArd Biesheuvel2017-01-125-119/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the scalar AES cipher that originates in the OpenSSL project with a new implementation that is ~15% (*) faster (on modern cores), and reuses the lookup tables and the key schedule generation routines from the generic C implementation (which is usually compiled in anyway due to networking and other subsystems depending on it). Note that the bit sliced NEON code for AES still depends on the scalar cipher that this patch replaces, so it is not removed entirely yet. * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte for 128-bit keys. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/aes - add scalar implementationArd Biesheuvel2017-01-124-0/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as wellArd Biesheuvel2017-01-121-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In addition to wrapping the AES-CTR cipher into the async SIMD wrapper, which exposes it as an async skcipher that defers processing to process context, expose our AES-CTR implementation directly as a synchronous cipher as well, but with a lower priority. This makes the AES-CTR transform usable in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/chacha20 - implement NEON version based on SSE3 codeArd Biesheuvel2017-01-124-0/+659
| | | | | | | | | | | | | | | | | | This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha20 - implement NEON version based on SSE3 codeArd Biesheuvel2017-01-124-0/+586
| | | | | | | | | | | | | | | | | | This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: x86/chacha20 - Manually align stack bufferHerbert Xu2017-01-121-1/+4
| | | | | | | | | | | | | | | | | | | | The kernel on x86-64 cannot use gcc attribute align to align to a 16-byte boundary. This patch reverts to the old way of aligning it by hand. Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxHerbert Xu2017-01-1280-373/+477
|\ \ | | | | | | | | | Merging 4.10-rc3 so that the cryptodev tree builds on ARM64.
| * \ Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-01-074-15/+17
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Radim Krčmář: "MIPS: - fix host kernel crashes when receiving a signal with 64-bit userspace - flush instruction cache on all vcpus after generating entry code (both for stable) x86: - fix NULL dereference in MMU caused by SMM transitions (for stable) - correct guest instruction pointer after emulating some VMX errors - minor cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: remove duplicated declaration KVM: MIPS: Flush KVM entry code from icache globally KVM: MIPS: Don't clobber CP0_Status.UX KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS KVM: nVMX: fix instruction skipping during emulated vm-entry
| | * | KVM: VMX: remove duplicated declarationJan Dakinevich2017-01-051-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code. Probably, it was happened after unsuccessful merge. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| | * | KVM: MIPS: Flush KVM entry code from icache globallyJames Hogan2017-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flush the KVM entry code from the icache on all CPUs, not just the one that built the entry code. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.16.x- Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| | * | KVM: MIPS: Don't clobber CP0_Status.UXJames Hogan2017-01-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 64-bit kernels, MIPS KVM will clear CP0_Status.UX to prevent the guest (running in user mode) from accessing the 64-bit memory segments. However the previous value of CP0_Status.UX is never restored when exiting from the guest. If the user process uses 64-bit addressing (the n64 ABI) this can result in address error exceptions from the kernel if it needs to deliver a signal before returning to user mode, as the kernel will need to write a sigframe to high user addresses on the user stack which are disallowed by CP0_Status.UX=0. This is fixed by explicitly setting SX and UX again when exiting from the guest, and explicitly clearing those bits when returning to the guest. Having the SX and UX bits set when handling guest exits (rather than only when exiting to userland) will be helpful when we support VZ, since we shouldn't need to directly read or write guest memory, so it will be valid for cache management IPIs to access host user addresses. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 4.8.x- Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| | * | KVM: x86: reset MMU on KVM_SET_VCPU_EVENTSXiao Guangrong2016-12-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, mismatch between the smm bit in hflags and the MMU role can cause a NULL pointer dereference. Cc: stable@vger.kernel.org Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | KVM: nVMX: fix instruction skipping during emulated vm-entryDavid Matlack2016-12-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_skip_emulated_instruction() should not be called after emulating a VM-entry failure during or after loading guest state (nested_vmx_entry_failure()). Otherwise the L1 hypervisor is resumed some number of bytes past vmcs->host_rip. Fixes: eb2775621701e6ee3ea2a474437d04e93ccdcb2f Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2017-01-072-5/+13
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - re-introduce the arm64 get_current() optimisation - KERN_CONT fallout fix in show_pte() * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: restore get_current() optimisation arm64: mm: fix show_pte KERN_CONT fallout
| | * | | arm64: restore get_current() optimisationMark Rutland2017-01-041-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c02433dd6de32f04 ("arm64: split thread_info from task stack") inverted the relationship between get_current() and current_thread_info(), with sp_el0 now holding the current task_struct rather than the current thead_info. The new implementation of get_current() prevents the compiler from being able to optimize repeated calls to either, resulting in a noticeable penalty in some microbenchmarks. This patch restores the previous optimisation by implementing get_current() in the same way as our old current_thread_info(), using a non-volatile asm statement. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | arm64: mm: fix show_pte KERN_CONT falloutMark Rutland2017-01-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes made KERN_CONT mandatory for continued lines. In the absence of KERN_CONT, a newline may be implicit inserted by the core printk code. In show_pte, we (erroneously) use printk without KERN_CONT for continued prints, resulting in output being split across a number of lines, and not matching the intended output, e.g. [ff000000000000] *pgd=00000009f511b003 , *pud=00000009f4a80003 , *pmd=0000000000000000 Fix this by using pr_cont() for all the continuations. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | Merge branch 'stable/for-linus-4.10' of ↵Linus Torvalds2017-01-064-7/+7
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixes from Konrad Rzeszutek Wilk: "This has one fix to make i915 work when using Xen SWIOTLB, and a feature from Geert to aid in debugging of devices that can't do DMA outside the 32-bit address space. The feature from Geert is on top of v4.10 merge window commit (specifically you pulling my previous branch), as his changes were dependent on the Documentation/ movement patches. I figured it would just easier than me trying than to cherry-pick the Documentation patches to satisfy git. The patches have been soaking since 12/20, albeit I updated the last patch due to linux-next catching an compiler error and adding an Tested-and-Reported-by tag" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Export swiotlb_max_segment to users swiotlb: Add swiotlb=noforce debug option swiotlb: Convert swiotlb_force from int to enum x86, swiotlb: Simplify pci_swiotlb_detect_override()
| | * | | | swiotlb: Convert swiotlb_force from int to enumGeert Uytterhoeven2016-12-194-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the flag swiotlb_force from an int to an enum, to prepare for the advent of more possible values. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | x86, swiotlb: Simplify pci_swiotlb_detect_override()Geert Uytterhoeven2016-12-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of the function, the local variable use_swiotlb has always the same value as the global variable swiotlb. Hence drop the local variable completely. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | Merge tag 'armsoc-fixes' of ↵Linus Torvalds2017-01-0552-263/+314
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This is a rather large set of bugfixes, as we just returned from the Christmas break. Most of these are relatively unimportant fixes for regressions introduced during the merge window, and about half of the changes are for mach-omap2. A couple of patches are just cleanups and dead code removal that I would not normally have considered for merging after -rc2, but I decided to take them along with the fixes this time. Notable fixes include: - removing the skeleton.dtsi include broke a number of machines, and we have to put empty /chosen nodes back to be able to pass kernel command lines as before - enabling Samsung platforms no longer hardwires CONFIG_HZ to 200, as it had been for no good reason for a long time" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits) MAINTAINERS: extend PSCI entry to cover the newly add PSCI checker code drivers: psci: annotate timer on stack to silence odebug messages ARM64: defconfig: enable DRM_MESON as module ARM64: dts: meson-gx: Add Graphic Controller nodes ARM64: dts: meson-gxl: fix GPIO include ARM: dts: imx6: Disable "weim" node in the dtsi files ARM: dts: qcom: apq8064: Add missing scm clock ARM: davinci: da8xx: Fix sleeping function called from invalid context ARM: davinci: Make __clk_{enable,disable} functions public ARM: davinci: da850: don't add emac clock to lookup table twice ARM: davinci: da850: fix infinite loop in clk_set_rate() ARM: i.MX: remove map_io callback ARM: dts: vf610-zii-dev-rev-b: Add missing newline ARM: dts: imx6qdl-nitrogen6x: remove duplicate iomux entry ARM: dts: imx31: fix AVIC base address ARM: dts: am572x-idk: Add gpios property to control PCIE_RESETn arm64: dts: vexpress: Support GICC_DIR operations ARM: dts: vexpress: Support GICC_DIR operations firmware: arm_scpi: fix reading sensor values on pre-1.0 SCPI firmwares arm64: dts: msm8996: Add required memory carveouts ...
| | * \ \ \ \ Merge tag 'davinci-fixes-for-v4.10' of ↵Arnd Bergmann2017-01-044-27/+53
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes Pull "DaVinci fixes for v4.10" from Sekhar Nori: This pull request contains fixes for the following issues 1) Fix two instances of infinite loop occurring in clock list for DA850. This fixes kernel hangs in some instances and so have been marked for stable kernel. 2) Fix for sleeping function called from atomic context with USB 2.0 clock management code introduced in v4.10 merge window. * tag 'davinci-fixes-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: davinci: da8xx: Fix sleeping function called from invalid context ARM: davinci: Make __clk_{enable,disable} functions public ARM: davinci: da850: don't add emac clock to lookup table twice ARM: davinci: da850: fix infinite loop in clk_set_rate()
| | | * | | | | ARM: davinci: da8xx: Fix sleeping function called from invalid contextAlexandre Bailon2017-01-021-19/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Everytime the usb20 phy is enabled, there is a "sleeping function called from invalid context" BUG. In addition, there is a recursive locking happening because of the recurse call to clk_enable(). clk_enable() from arch/arm/mach-davinci/clock.c uses spin_lock_irqsave() before to invoke the callback usb20_phy_clk_enable(). usb20_phy_clk_enable() uses clk_get() and clk_enable_prepapre() which may sleep. Replace clk_prepare_enable() by davinci_clk_enable(). Signed-off-by: Alexandre Bailon <abailon@baylibre.com> Suggested-by: David Lechner <david@lechnology.com> [nsekhar@ti.com: minor commit description adjustment] Signed-off-by: Sekhar Nori <nsekhar@ti.com>
| | | * | | | | ARM: davinci: Make __clk_{enable,disable} functions publicAlexandre Bailon2017-01-022-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, there is a need to enable a clock as part of clock enable callback of a different clock. For example, USB 2.0 PHY clock enable requires USB 2.0 clock to be enabled. In this case, it is safe to instead call __clk_enable() since the clock framework lock is already taken. Calling clk_enable() causes recursive locking error. A similar case arises in the clock disable path. To enable such usage, make __clk_{enable,disable} functions publicly available outside of clock.c. Also, call them davinci_clk_{enable|disable} now to be consistent with how other davinci-specific clock functions are named. Note that these functions are not exported to drivers. They are meant for usage in platform specific clock management code. Signed-off-by: Alexandre Bailon <abailon@baylibre.com> Suggested-by: David Lechner <david@lechnology.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
| | | * | | | | ARM: davinci: da850: don't add emac clock to lookup table twiceBartosz Golaszewski2017-01-021-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to the aemif clock - this screws up the linked list of clock children. Create a separate clock for mdio inheriting the rate from emac_clk. Cc: <stable@vger.kernel.org> # 3.12.x- Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> [nsekhar@ti.com: add a comment over mdio_clk to explaing its existence + commit headline updates] Signed-off-by: Sekhar Nori <nsekhar@ti.com>
| | | * | | | | ARM: davinci: da850: fix infinite loop in clk_set_rate()Bartosz Golaszewski2017-01-021-1/+19
| | | | |/ / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aemif clock is added twice to the lookup table in da850.c. This breaks the children list of pll0_sysclk3 as we're using the same list links in struct clk. When calling clk_set_rate(), we get stuck in propagate_rate(). Create a separate clock for nand, inheriting the rate of the aemif clock and retrieve it in the davinci_nand module. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>