summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helperArd Biesheuvel2021-01-083-145/+356
| | | | | | | | | | | | | | | | | | | | | The AES-NI driver implements XTS via the glue helper, which consumes a struct with sets of function pointers which are invoked on chunks of input data of the appropriate size, as annotated in the struct. Let's get rid of this indirection, so that we can perform direct calls to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e., provide a helper which can consume inputs of any size, provided that the penultimate, full block is passed via the last call if ciphertext stealing needs to be applied. This also allows us to enable the XTS mode for i386. Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: x86/aes-ni-xts - use direct calls to and 4-way strideArd Biesheuvel2021-01-082-56/+84
| | | | | | | | | | | | | | | | | | | | | | | The XTS asm helper arrangement is a bit odd: the 8-way stride helper consists of back-to-back calls to the 4-way core transforms, which are called indirectly, based on a boolean that indicates whether we are performing encryption or decryption. Given how costly indirect calls are on x86, let's switch to direct calls, and given how the 8-way stride doesn't really add anything substantial, use a 4-way stride instead, and make the asm core routine deal with any multiple of 4 blocks. Since 512 byte sectors or 4 KB blocks are the typical quantities XTS operates on, increase the stride exported to the glue helper to 512 bytes as well. As a result, the number of indirect calls is reduced from 3 per 64 bytes of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU) Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps") Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: picoxcell - Remove PicoXcell driverRob Herring2021-01-024-1941/+0
| | | | | | | | | | | | | | PicoXcell has had nothing but treewide cleanups for at least the last 8 years and no signs of activity. The most recent activity is a yocto vendor kernel based on v3.0 in 2015. Cc: Jamie Iles <jamie@jamieiles.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-crypto@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/blake2b - add NEON-accelerated BLAKE2bEric Biggers2021-01-024-0/+464
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a NEON-accelerated implementation of BLAKE2b. On Cortex-A7 (which these days is the most common ARM processor that doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as SHA-256, and slightly faster than SHA-1. It is also almost three times as fast as the generic implementation of BLAKE2b: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.0 sha1-neon 16.3 blake2s-256-arm 18.8 sha1-asm 20.8 blake2s-256-generic 26.0 sha256-neon 28.9 sha256-asm 32.0 blake2b-256-generic 38.9 This implementation isn't directly based on any other implementation, but it borrows some ideas from previous NEON code I've written as well as from chacha-neon-core.S. At least on Cortex-A7, it is faster than the other NEON implementations of BLAKE2b I'm aware of (the implementation in the BLAKE2 official repository using intrinsics, and Andrew Moon's implementation which can be found in SUPERCOP). It does only one block at a time, so it performs well on short messages too. NEON-accelerated BLAKE2b is useful because there is interest in using BLAKE2b-256 for dm-verity on low-end Android devices (specifically, devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On these devices, the performance cost of upgrading to SHA-256 may be unacceptable, whereas BLAKE2b-256 would actually improve performance. Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which is intended for 32-bit platforms), on 32-bit ARM processors with NEON, BLAKE2b is actually faster than BLAKE2s. This is because NEON supports 64-bit operations, and because BLAKE2s's block size is too small for NEON to be helpful for it. The best I've been able to do with BLAKE2s on Cortex-A7 is 18.8 cpb with an optimized scalar implementation. (I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but they're more complex as they require running multiple hashes at once. Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7, so I expect that any speedup from BLAKE2sp or BLAKE3 would come only from the smaller number of rounds, not from the extra parallelism.) For now this BLAKE2b implementation is only wired up to the shash API, since there is no library API for BLAKE2b yet. However, I've tried to keep things consistent with BLAKE2s, e.g. by defining blake2b_compress_arch() which is analogous to blake2s_compress_arch() and could be exported for use by the library API later if needed. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2b - update file commentEric Biggers2021-01-021-13/+10
| | | | | | | | | | | The file comment for blake2b_generic.c makes it sound like it's the reference implementation of BLAKE2b with only minor changes. But it's actually been changed a lot. Update the comment to make this clearer. Reviewed-by: David Sterba <dsterba@suse.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2b - sync with blake2s implementationEric Biggers2021-01-023-178/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sync the BLAKE2b code with the BLAKE2s code as much as possible: - Move a lot of code into new headers <crypto/blake2b.h> and <crypto/internal/blake2b.h>, and adjust it to be like the corresponding BLAKE2s code, i.e. like <crypto/blake2s.h> and <crypto/internal/blake2s.h>. - Rename constants, e.g. BLAKE2B_*_DIGEST_SIZE => BLAKE2B_*_HASH_SIZE. - Use a macro BLAKE2B_ALG() to define the shash_alg structs. - Export blake2b_compress_generic() for use as a fallback. This makes it much easier to add optimized implementations of BLAKE2b, as optimized implementations can use the helper functions crypto_blake2b_{setkey,init,update,final}() and blake2b_compress_generic(). The ARM implementation will use these. But this change is also helpful because it eliminates unnecessary differences between the BLAKE2b and BLAKE2s code, so that the same improvements can easily be made to both. (The two algorithms are basically identical, except for the word size and constants.) It also makes it straightforward to add a library API for BLAKE2b in the future if/when it's needed. This change does make the BLAKE2b code slightly more complicated than it needs to be, as it doesn't actually provide a library API yet. For example, __blake2b_update() doesn't really need to exist yet; it could just be inlined into crypto_blake2b_update(). But I believe this is outweighed by the benefits of keeping the code in sync. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* wireguard: Kconfig: select CRYPTO_BLAKE2S_ARMEric Biggers2021-01-021-0/+1
| | | | | | | | | | When available, select the new implementation of BLAKE2s for 32-bit ARM. This is faster than the generic C implementation. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/blake2s - add ARM scalar optimized BLAKE2sEric Biggers2021-01-024-0/+374
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an ARM scalar optimized implementation of BLAKE2s. NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too small for NEON to help. Each NEON instruction would depend on the previous one, resulting in poor performance. With scalar instructions, on the other hand, we can take advantage of ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an implementation get runs much faster than the C implementation. Performance results on Cortex-A7 in cycles per byte using the shash API: 4096-byte messages: blake2s-256-arm: 18.8 blake2s-256-generic: 26.0 500-byte messages: blake2s-256-arm: 20.3 blake2s-256-generic: 27.9 100-byte messages: blake2s-256-arm: 29.7 blake2s-256-generic: 39.2 32-byte messages: blake2s-256-arm: 50.6 blake2s-256-generic: 66.2 Except on very short messages, this is still slower than the NEON implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8, and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively. However, optimized BLAKE2s is useful for cases where BLAKE2s is used instead of BLAKE2b, such as WireGuard. This new implementation is added in the form of a new module blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it provides blake2s_compress_arch() for use by the library API as well as optionally register the algorithms with the shash API. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - include <linux/bug.h> instead of <asm/bug.h>Eric Biggers2021-01-021-2/+1
| | | | | | | | | | Address the following checkpatch warning: WARNING: Use #include <linux/bug.h> instead of <asm/bug.h> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - adjust include guard namingEric Biggers2021-01-022-6/+6
| | | | | | | | | Use the full path in the include guards for the BLAKE2s headers to avoid ambiguity and to match the convention for most files in include/crypto/. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - add comment for blake2s_state fieldsEric Biggers2021-01-021-0/+1
| | | | | | | | | The first three fields of 'struct blake2s_state' are used in assembly code, which isn't immediately obvious, so add a comment to this effect. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - optimize blake2s initializationEric Biggers2021-01-022-30/+28
| | | | | | | | | | | | | | | | | | | | | If no key was provided, then don't waste time initializing the block buffer, as its initial contents won't be used. Also, make crypto_blake2s_init() and blake2s() call a single internal function __blake2s_init() which treats the key as optional, rather than conditionally calling blake2s_init() or blake2s_init_key(). This reduces the compiled code size, as previously both blake2s_init() and blake2s_init_key() were being inlined into these two callers, except when the key size passed to blake2s() was a compile-time constant. These optimizations aren't that significant for BLAKE2s. However, the equivalent optimizations will be more significant for BLAKE2b, as everything is twice as big in BLAKE2b. And it's good to keep things consistent rather than making optimizations for BLAKE2b but not BLAKE2s. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - share the "shash" API boilerplate codeEric Biggers2021-01-023-139/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add helper functions for shash implementations of BLAKE2s to include/crypto/internal/blake2s.h, taking advantage of __blake2s_update() and __blake2s_final() that were added by the previous patch to share more code between the library and shash implementations. crypto_blake2s_setkey() and crypto_blake2s_init() are usable as shash_alg::setkey and shash_alg::init directly, while crypto_blake2s_update() and crypto_blake2s_final() take an extra 'blake2s_compress_t' function pointer parameter. This allows the implementation of the compression function to be overridden, which is the only part that optimized implementations really care about. The new functions are inline functions (similar to those in sha1_base.h, sha256_base.h, and sm3_base.h) because this avoids needing to add a new module blake2s_helpers.ko, they aren't *too* long, and this avoids indirect calls which are expensive these days. Note that they can't go in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency. Finally, use these new helper functions in the x86 implementation of BLAKE2s. (This part should be a separate patch, but unfortunately the x86 implementation used the exact same function names like "crypto_blake2s_update()", so it had to be updated at the same time.) Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - move update and final logic to internal/blake2s.hEric Biggers2021-01-022-40/+49
| | | | | | | | | | | | Move most of blake2s_update() and blake2s_final() into new inline functions __blake2s_update() and __blake2s_final() in include/crypto/internal/blake2s.h so that this logic can be shared by the shash helper functions. This will avoid duplicating this logic between the library and shash implementations. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - remove unneeded includesEric Biggers2021-01-021-2/+0
| | | | | | | | | | | It doesn't make sense for the generic implementation of BLAKE2s to include <crypto/internal/simd.h> and <linux/jump_label.h>, as these are things that would only be useful in an architecture-specific implementation. Remove these unnecessary includes. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: x86/blake2s - define shash_alg structs using macrosEric Biggers2021-01-021-61/+23
| | | | | | | | | | The shash_alg structs for the four variants of BLAKE2s are identical except for the algorithm name, driver name, and digest size. So, avoid code duplication by using a macro to define these structs. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: blake2s - define shash_alg structs using macrosEric Biggers2021-01-021-61/+27
| | | | | | | | | | The shash_alg structs for the four variants of BLAKE2s are identical except for the algorithm name, driver name, and digest size. So, avoid code duplication by using a macro to define these structs. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* hwrng: ingenic - Fix a resource leak in an error handling pathChristophe JAILLET2021-01-021-1/+5
| | | | | | | | | | | In case of error, we should call 'clk_disable_unprepare()' to undo a previous 'clk_prepare_enable()' call, as already done in the remove function. Fixes: 406346d22278 ("hwrng: ingenic - Add hardware TRNG for Ingenic X1830") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* hwrng: iproc-rng200 - Move enable/disable in separate functionMatthias Brugger2021-01-021-19/+16
| | | | | | | | | | | We are calling the same code for enable and disable the block in various parts of the driver. Put that code into a new function to reduce code duplication. Signed-off-by: Matthias Brugger <mbrugger@suse.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* hwrng: iproc-rng200 - Fix disable of the block.Matthias Brugger2021-01-021-3/+0
| | | | | | | | | | | | When trying to disable the block we bitwise or the control register with value zero. This is confusing as using bitwise or with value zero doesn't have any effect at all. Drop this as we already set the enable bit to zero by appling inverted RNG_RBGEN_MASK. Signed-off-by: Matthias Brugger <mbrugger@suse.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-ctr - improve tail handlingArd Biesheuvel2021-01-022-74/+137
| | | | | | | | | | | | | | | | | | | | Counter mode is a stream cipher chaining mode that is typically used with inputs that are of arbitrarily length, and so a tail block which is smaller than a full AES block is rule rather than exception. The current ctr(aes) implementation for arm64 always makes a separate call into the assembler routine to process this tail block, which is suboptimal, given that it requires reloading of the AES round keys, and prevents us from handling this tail block using the 5-way stride that we use for better performance on deep pipelines. So let's update the assembler routine so it can handle any input size, and uses NEON permutation instructions and overlapping loads and stores to handle the tail block. This results in a ~16% speedup for 1420 byte blocks on cores with deep pipelines such as ThunderX2. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-ce - really hide slower algos when faster ones are enabledArd Biesheuvel2021-01-021-2/+2
| | | | | | | | | | | | | Commit 69b6f2e817e5b ("crypto: arm64/aes-neon - limit exposed routines if faster driver is enabled") intended to hide modes from the plain NEON driver that are also implemented by the faster bit sliced NEON one if both are enabled. However, the defined() CPP function does not detect if the bit sliced NEON driver is enabled as a module. So instead, let's use IS_ENABLED() here. Fixes: 69b6f2e817e5b ("crypto: arm64/aes-neon - limit exposed routines if ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* MAINTAINERS: Add maintainers for Keem Bay OCS HCU driverDaniele Alessandrelli2021-01-021-0/+11
| | | | | | | | | Add maintainers for the Intel Keem Bay Offload Crypto Subsystem (OCS) Hash Control Unit (HCU) crypto driver. Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Acked-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: keembay-ocs-hcu - Add optional support for sha224Daniele Alessandrelli2021-01-022-0/+75
| | | | | | | | | Add optional support of sha224 and hmac(sha224). Co-developed-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: keembay-ocs-hcu - Add HMAC supportDaniele Alessandrelli2021-01-024-9/+544
| | | | | | | | | | | | | | | | | | | | | | Add HMAC support to the Keem Bay OCS HCU driver, thus making it provide the following additional transformations: - hmac(sha256) - hmac(sha384) - hmac(sha512) - hmac(sm3) The Keem Bay OCS HCU hardware does not allow "context-switch" for HMAC operations, i.e., it does not support computing a partial HMAC, save its state and then continue it later. Therefore, full hardware acceleration is provided only when possible (e.g., when crypto_ahash_digest() is called); in all other cases hardware acceleration is only partial (OPAD and IPAD calculation is done in software, while hashing is hardware accelerated). Co-developed-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: keembay - Add Keem Bay OCS HCU driverDeclan Murphy2021-01-025-0/+1632
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the Hashing Control Unit (HCU) included in the Offload Crypto Subsystem (OCS) of the Intel Keem Bay SoC, thus enabling hardware-accelerated hashing on the Keem Bay SoC for the following algorithms: - sha256 - sha384 - sha512 - sm3 The driver is composed of two files: - 'ocs-hcu.c' which interacts with the hardware and abstracts it by providing an API following the usual paradigm used in hashing drivers / libraries (e.g., hash_init(), hash_update(), hash_final(), etc.). NOTE: this API can block and sleep, since completions are used to wait for the HW to complete the hashing. - 'keembay-ocs-hcu-core.c' which exports the functionality provided by 'ocs-hcu.c' as a ahash crypto driver. The crypto engine is used to provide asynchronous behavior. 'keembay-ocs-hcu-core.c' also takes care of the DMA mapping of the input sg list. The driver passes crypto manager self-tests, including the extra tests (CRYPTO_MANAGER_EXTRA_TESTS=y). Signed-off-by: Declan Murphy <declan.murphy@intel.com> Co-developed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Acked-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* dt-bindings: crypto: Add Keem Bay OCS HCU bindingsDeclan Murphy2021-01-021-0/+46
| | | | | | | | | | | Add device-tree bindings for the Intel Keem Bay Offload Crypto Subsystem (OCS) Hashing Control Unit (HCU) crypto driver. Signed-off-by: Declan Murphy <declan.murphy@intel.com> Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Acked-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - add SPDX header and remove blank linesCorentin Labbe2021-01-022-3/+1
| | | | | | | | This patchs fixes some remaining style issue. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - enabled stats via debugfsCorentin Labbe2021-01-026-0/+98
| | | | | | | This patch enable to access usage stats for each algorithm. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - fix kmap usageCorentin Labbe2021-01-021-44/+65
| | | | | | | | | | | | | | | | | | | With the recent kmap change, some tests which were conditional on CONFIG_DEBUG_HIGHMEM now are enabled by default. This permit to detect a problem in sun4i-ss usage of kmap. sun4i-ss uses two kmap via sg_miter (one for input, one for output), but using two kmap at the same time is hard: "the ordering has to be correct and with sg_miter that's probably hard to get right." (quoting Tlgx) So the easiest solution is to never have two sg_miter/kmap open at the same time. After each use of sg_miter, I store the current index, for being able to resume sg_miter to the right place. Fixes: 6298e948215f ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - initialize need_fallbackCorentin Labbe2021-01-021-1/+1
| | | | | | | | | | The need_fallback is never initialized and seem to be always true at runtime. So all hardware operations are always bypassed. Fixes: 0ae1f46c55f87 ("crypto: sun4i-ss - fallback when length is not multiple of blocksize") Cc: <stable@vger.kernel.org> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - handle BigEndian for cipherCorentin Labbe2021-01-021-6/+6
| | | | | | | | | | Ciphers produce invalid results on BE. Key and IV need to be written in LE. Fixes: 6298e948215f2 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Cc: <stable@vger.kernel.org> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - IV register does not work on A10 and A13Corentin Labbe2021-01-021-6/+28
| | | | | | | | | | | | Allwinner A10 and A13 SoC have a version of the SS which produce invalid IV in IVx register. Instead of adding a variant for those, let's convert SS to produce IV directly from data. Fixes: 6298e948215f2 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Cc: <stable@vger.kernel.org> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - checking sg length is not sufficientCorentin Labbe2021-01-021-2/+2
| | | | | | | | | | | | The optimized cipher function need length multiple of 4 bytes. But it get sometimes odd length. This is due to SG data could be stored with an offset. So the fix is to check also if the offset is aligned with 4 bytes. Fixes: 6298e948215f2 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Cc: <stable@vger.kernel.org> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sun4i-ss - linearize buffers content must be keptCorentin Labbe2021-01-022-8/+6
| | | | | | | | | | | | | When running the non-optimized cipher function, SS produce partial random output. This is due to linearize buffers being reseted after each loop. For preserving stack, instead of moving them back to start of function, I move them in sun4i_ss_ctx. Fixes: 8d3bcb9900ca ("crypto: sun4i-ss - reduce stack usage") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: inside-secure - fix platform_get_irq.cocci warningsTian Tao2021-01-021-4/+1
| | | | | | | | | | | | Remove dev_err() messages after platform_get_irq*() failures. drivers/crypto/inside-secure/safexcel.c: line 1161 is redundant because platform_get_irq() already prints an error Generated by: scripts/coccinelle/api/platform_get_irq.cocci Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Acked-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: remove cipher routines from public crypto APIArd Biesheuvel2021-01-0233-207/+273
| | | | | | | | | | | | | The cipher routines in the crypto API are mostly intended for templates implementing skcipher modes generically in software, and shouldn't be used outside of the crypto subsystem. So move the prototypes and all related definitions to a new header file under include/crypto/internal. Also, let's use the new module namespace feature to move the symbol exports into a new namespace CRYPTO_INTERNAL. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* chcr_ktls: use AES library for single use cipherArd Biesheuvel2021-01-022-12/+8
| | | | | | | | | | | | | | Allocating a cipher via the crypto API only to free it again after using it to encrypt a single block is unnecessary in cases where the algorithm is known at compile time. So replace this pattern with a call to the AES library. Cc: Ayush Sawal <ayush.sawal@chelsio.com> Cc: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Cc: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccree - remove unused including <linux/version.h>Tian Tao2021-01-021-1/+0
| | | | | | | Remove including <linux/version.h> that don't need it. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sahara - Remove unused .id_table supportFabio Estevam2021-01-021-7/+0
| | | | | | | | | | | Since 5.10-rc1 i.MX is a devicetree-only platform and the existing .id_table support in this driver was only useful for old non-devicetree platforms. Remove the unused .id_table support. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: tcrypt - avoid signed overflow in byte countArd Biesheuvel2021-01-021-10/+10
| | | | | | | | | | | | | The signed long type used for printing the number of bytes processed in tcrypt benchmarks limits the range to -/+ 2 GiB, which is not sufficient to cover the performance of common accelerated ciphers such as AES-NI when benchmarked with sec=1. So switch to u64 instead. While at it, fix up a missing printk->pr_cont conversion in the AEAD benchmark. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: aesni - implement support for cts(cbc(aes))Ard Biesheuvel2021-01-022-1/+261
| | | | | | | | | | | | | Follow the same approach as the arm64 driver for implementing a version of AES-NI in CBC mode that supports ciphertext stealing. This results in a ~2x speed increase for relatively short inputs (less than 256 bytes), which is relevant given that AES-CBC with ciphertext stealing is used for filename encryption in the fscrypt layer. For larger inputs, the speedup is still significant (~25% on decryption, ~6% on encryption) Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* MAINTAINERS: crypto: s5p-sss: drop Kamil KoniecznyKrzysztof Kozlowski2021-01-024-4/+0
| | | | | | | | | | | | | E-mails to Kamil Konieczny to his Samsung address bounce with 550 (User unknown). Kamil no longer takes care about Samsung S5P SSS driver so remove the invalid email address from: - mailmap, - bindings maintainer entries, - maintainers entry for S5P Security Subsystem crypto accelerator. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: mediatek - remove obsolete driverVic Wu2021-01-028-3650/+0
| | | | | | | | | The crypto mediatek driver has been replaced by the inside-secure driver now. Remove this driver to avoid having duplicate drivers. Signed-off-by: Vic Wu <vic.wu@mediatek.com> Acked-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ecdh - avoid buffer overflow in ecdh_set_secret()Ard Biesheuvel2021-01-021-1/+2
| | | | | | | | | | | | | | | | | Pavel reports that commit 17858b140bf4 ("crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()") fixes one problem but introduces another: the unconditional memcpy() introduced by that commit may overflow the target buffer if the source data is invalid, which could be the result of intentional tampering. So check params.key_size explicitly against the size of the target buffer before validating the key further. Fixes: 17858b140bf4 ("crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()") Reported-by: Pavel Machek <pavel@denx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/chacha-neon - add missing counter incrementArd Biesheuvel2021-01-021-0/+1
| | | | | | | | | | | | | | | | | | | Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") refactored the chacha block handling in the glue code in a way that may result in the counter increment to be omitted when calling chacha_block_xor_neon() to process a full block. This violates the skcipher API, which requires that the output IV is suitable for handling more input as long as the preceding input has been presented in round multiples of the block size. Also, the same code is exposed via the chacha library interface whose callers may actually rely on this increment to occur even for final blocks that are smaller than the chacha block size. So increment the counter after calling chacha_block_xor_neon(). Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Linux 5.11-rc1v5.11-rc1Linus Torvalds2020-12-281-2/+2
|
* proc mountinfo: make splice available againLinus Torvalds2020-12-271-3/+6
| | | | | | | | | | | | | | | | | | | | | | Since commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") we've required that file operation structures explicitly enable splice support, rather than falling back to the default handlers. Most /proc files use the indirect 'struct proc_ops' to describe their file operations, and were fixed up to support splice earlier in commits 40be821d627c..b24c30c67863, but the mountinfo files interact with the VFS directly using their own 'struct file_operations' and got missed as a result. This adds the necessary support for splice to work for /proc/*/mountinfo and friends. Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Reported-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'ntb-5.11' of git://github.com/jonmason/ntbLinus Torvalds2020-12-274-3/+44
|\ | | | | | | | | | | | | | | | | Pull NTB fixes from Jon Mason: "Bug fix for IDT NTB and Intel NTB LTR management support" * tag 'ntb-5.11' of git://github.com/jonmason/ntb: ntb: intel: add Intel NTB LTR vendor support for gen4 NTB ntb: idt: fix error check in ntb_hw_idt.c
| * ntb: intel: add Intel NTB LTR vendor support for gen4 NTBDave Jiang2020-12-073-1/+42
| | | | | | | | | | | | | | | | | | Intel NTB device has custom LTR management that is not compliant with the PCIe standard. Add support to set LTR status triggered by link status change. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>