summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* crypto: fix a memory leak in rsa-kcs1pad's encryption modeDan Aloni2018-09-281-9/+0
| | | | | | | | | | | | The encryption mode of pkcs1pad never uses out_sg and out_buf, so there's no need to allocate the buffer, which presently is not even being freed. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: linux-crypto@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: s5p-sss: Add aes-ctr supportChristoph Manszewski2018-09-281-5/+40
| | | | | | | | | | | | | | | | | | | Add support for aes counter(ctr) block cipher mode of operation for Exynos Hardware. In contrast to ecb and cbc modes, aes-ctr allows encyption/decryption for request sizes not being a multiple of 16(bytes). Hardware requires block sizes being a multiple of 16(bytes). In order to achieve this, copy request source and destination memory, and align it's size to 16. That way hardware processes additional bytes, that are omitted when copying the result back to its original destination. Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.19-rc2 with crypto run-time self test testmgr. Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: s5p-sss: Minor code cleanupChristoph Manszewski2018-09-281-37/+17
| | | | | | | | | | | | Modifications in s5p-sss.c: - remove unnecessary 'goto' statements (making code shorter), - change uint_8 and uint_32 to u8 and u32 types (for consistency in the driver and making code shorter), Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: s5p-sss: Fix Fix argument list alignmentChristoph Manszewski2018-09-281-2/+2
| | | | | | | | | Fix misalignment of continued argument list. Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: s5p-sss: Fix race in error handlingChristoph Manszewski2018-09-281-5/+7
| | | | | | | | | | | | | | | | | Remove a race condition introduced by error path in functions: s5p_aes_interrupt and s5p_aes_crypt_start. Setting the busy field of struct s5p_aes_dev to false made it possible for s5p_tasklet_cb to change the req field, before s5p_aes_complete was called. Change the first parameter of s5p_aes_complete to struct ablkcipher_request. Before spin_unlock, make a copy of the currently handled request, to ensure s5p_aes_complete function call with the correct request. Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm/crc32 - avoid warning when compiling with ClangStefan Agner2018-09-211-1/+1
| | | | | | | | | | | | | | | | | | | The table id (second) argument to MODULE_DEVICE_TABLE is often referenced otherwise. This is not the case for CPU features. This leads to a warning when building the kernel with Clang: arch/arm/crypto/crc32-ce-glue.c:239:33: warning: variable 'crc32_cpu_feature' is not needed and will not be emitted [-Wunneeded-internal-declaration] static const struct cpu_feature crc32_cpu_feature[] = { ^ Avoid warnings by using __maybe_unused, similar to commit 1f318a8bafcf ("modules: mark __inittest/__exittest as __maybe_unused"). Fixes: 2a9faf8b7e43 ("crypto: arm/crc32 - enable module autoloading based on CPU feature bits") Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* cpufeature: avoid warning when compiling with clangStefan Agner2018-09-211-1/+1
| | | | | | | | | | | | | | | | | | | The table id (second) argument to MODULE_DEVICE_TABLE is often referenced otherwise. This is not the case for CPU features. This leads to warnings when building the kernel with Clang: arch/arm/crypto/aes-ce-glue.c:450:1: warning: variable 'cpu_feature_match_AES' is not needed and will not be emitted [-Wunneeded-internal-declaration] module_cpu_feature_match(AES, aes_init); ^ Avoid warnings by using __maybe_unused, similar to commit 1f318a8bafcf ("modules: mark __inittest/__exittest as __maybe_unused"). Fixes: 67bad2fdb754 ("cpu: add generic support for CPU feature based module autoloading") Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccp - Allow SEV firmware to be chosen based on Family and ModelJanakarajan Natarajan2018-09-211-4/+40
| | | | | | | | | | | | | | | | | | During PSP initialization, there is an attempt to update the SEV firmware by looking in /lib/firmware/amd/. Currently, sev.fw is the expected name of the firmware blob. This patch will allow for firmware filenames based on the family and model of the processor. Model specific firmware files are given highest priority. Followed by firmware for a subset of models. Lastly, failing the previous two options, fallback to looking for sev.fw. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: ccp - Fix static checker warningJanakarajan Natarajan2018-09-211-1/+1
| | | | | | | | | | | | | Under certain configuration SEV functions can be defined as no-op. In such a case error can be uninitialized. Initialize the variable to 0. Cc: Dan Carpenter <Dan.Carpenter@oracle.com> Reported-by: Dan Carpenter <Dan.Carpenter@oracle.com> Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: lrw - Do not use auxiliary bufferOndrej Mosnacek2018-09-211-229/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the LRW template to recompute the LRW tweaks from scratch in the second pass and thus also removes the need to allocate a dynamic buffer using kmalloc(). As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt. PERFORMANCE MEASUREMENTS (x86_64) Performed using: https://gitlab.com/omos/linux-crypto-bench Crypto driver used: lrw(ecb-aes-aesni) The results show that the new code has about the same performance as the old code. For 512-byte message it seems to be even slightly faster, but that might be just noise. Before: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 200 203 lrw(aes) 320 64 202 204 lrw(aes) 384 64 204 205 lrw(aes) 256 512 415 415 lrw(aes) 320 512 432 440 lrw(aes) 384 512 449 451 lrw(aes) 256 4096 1838 1995 lrw(aes) 320 4096 2123 1980 lrw(aes) 384 4096 2100 2119 lrw(aes) 256 16384 7183 6954 lrw(aes) 320 16384 7844 7631 lrw(aes) 384 16384 8256 8126 lrw(aes) 256 32768 14772 14484 lrw(aes) 320 32768 15281 15431 lrw(aes) 384 32768 16469 16293 After: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 197 196 lrw(aes) 320 64 200 197 lrw(aes) 384 64 203 199 lrw(aes) 256 512 385 380 lrw(aes) 320 512 401 395 lrw(aes) 384 512 415 415 lrw(aes) 256 4096 1869 1846 lrw(aes) 320 4096 2080 1981 lrw(aes) 384 4096 2160 2109 lrw(aes) 256 16384 7077 7127 lrw(aes) 320 16384 7807 7766 lrw(aes) 384 16384 8108 8357 lrw(aes) 256 32768 14111 14454 lrw(aes) 320 32768 15268 15082 lrw(aes) 384 32768 16581 16250 [1] https://lkml.org/lkml/2018/8/23/1315 Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: lrw - Optimize tweak computationOndrej Mosnacek2018-09-211-24/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch rewrites the tweak computation to a slightly simpler method that performs less bswaps. Based on performance measurements the new code seems to provide slightly better performance than the old one. PERFORMANCE MEASUREMENTS (x86_64) Performed using: https://gitlab.com/omos/linux-crypto-bench Crypto driver used: lrw(ecb-aes-aesni) Before: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 204 286 lrw(aes) 320 64 227 203 lrw(aes) 384 64 208 204 lrw(aes) 256 512 441 439 lrw(aes) 320 512 456 455 lrw(aes) 384 512 469 483 lrw(aes) 256 4096 2136 2190 lrw(aes) 320 4096 2161 2213 lrw(aes) 384 4096 2295 2369 lrw(aes) 256 16384 7692 7868 lrw(aes) 320 16384 8230 8691 lrw(aes) 384 16384 8971 8813 lrw(aes) 256 32768 15336 15560 lrw(aes) 320 32768 16410 16346 lrw(aes) 384 32768 18023 17465 After: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 200 203 lrw(aes) 320 64 202 204 lrw(aes) 384 64 204 205 lrw(aes) 256 512 415 415 lrw(aes) 320 512 432 440 lrw(aes) 384 512 449 451 lrw(aes) 256 4096 1838 1995 lrw(aes) 320 4096 2123 1980 lrw(aes) 384 4096 2100 2119 lrw(aes) 256 16384 7183 6954 lrw(aes) 320 16384 7844 7631 lrw(aes) 384 16384 8256 8126 lrw(aes) 256 32768 14772 14484 lrw(aes) 320 32768 15281 15431 lrw(aes) 384 32768 16469 16293 Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: testmgr - Add test for LRW counter wrap-aroundOndrej Mosnacek2018-09-211-0/+21
| | | | | | | | | This patch adds a test vector for lrw(aes) that triggers wrap-around of the counter, which is a tricky corner case. Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: lrw - Fix out-of bounds access on counter overflowOndrej Mosnacek2018-09-211-1/+6
| | | | | | | | | | | | When the LRW block counter overflows, the current implementation returns 128 as the index to the precomputed multiplication table, which has 128 entries. This patch fixes it to return the correct value (127). Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode") Cc: <stable@vger.kernel.org> # 2.6.20+ Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: tcrypt - fix ghash-generic speed testHoria Geantă2018-09-211-0/+3
| | | | | | | | | | | | | | | | ghash is a keyed hash algorithm, thus setkey needs to be called. Otherwise the following error occurs: $ modprobe tcrypt mode=318 sec=1 testing speed of async ghash-generic (ghash-generic) tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): tcrypt: hashing failed ret=-126 Cc: <stable@vger.kernel.org> # 4.6+ Fixes: 0660511c0bee ("crypto: tcrypt - Use ahash") Tested-by: Franck Lenormand <franck.lenormand@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* arm64: defconfig: enable CAAM crypto engine on QorIQ DPAA2 SoCsHoria Geantă2018-09-211-0/+1
| | | | | | | | | | Enable CAAM (Cryptographic Accelerator and Assurance Module) driver for QorIQ Data Path Acceleration Architecture (DPAA) v2. It handles DPSECI (Data Path SEC Interface) DPAA2 objects that sit on the Management Complex (MC) fsl-mc bus. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam/qi2 - add support for ahash algorithmsHoria Geantă2018-09-213-1/+1750
| | | | | | | Add support for unkeyed and keyed (hmac) md5, sha algorithms. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam - export ahash shared descriptor generationHoria Geantă2018-09-215-70/+114
| | | | | | | | caam/qi2 driver will support ahash algorithms, thus move ahash descriptors generation in a shared location. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam/qi2 - add skcipher algorithmsHoria Geantă2018-09-213-1/+582
| | | | | | | | | | | Add support to submit the following skcipher algorithms via the DPSECI backend: cbc({aes,des,des3_ede}) ctr(aes), rfc3686(ctr(aes)) xts(aes) Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam/qi2 - add DPAA2-CAAM driverHoria Geantă2018-09-217-17/+3109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add CAAM driver that works using the DPSECI backend, i.e. manages DPSECI DPAA2 objects sitting on the Management Complex (MC) fsl-mc bus. Data transfers (crypto requests) are sent/received to/from CAAM crypto engine via Queue Interface (v2), this being similar to existing caam/qi. OTOH, configuration/setup (obtaining virtual queue IDs, authorization etc.) is done by sending commands to the MC f/w. Note that the CAAM accelerator included in DPAA2 platforms still has Job Rings. However, the driver being added does not handle access via this backend. Kconfig & Makefile are updated such that DPAA2-CAAM (a.k.a. "caam/qi2") driver does not depend on caam/jr or caam/qi backends - which rely on platform bus support (ctrl.c). Support for the following aead and authenc algorithms is also added in this patch: -aead: gcm(aes) rfc4106(gcm(aes)) rfc4543(gcm(aes)) -authenc: authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede})) echainiv(authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede}))) authenc(hmac({md5,sha*}),rfc3686(ctr(aes)) seqiv(authenc(hmac({md5,sha*}),rfc3686(ctr(aes))) Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam - add Queue Interface v2 error codesHoria Geantă2018-09-213-4/+79
| | | | | | | | | Add support to translate error codes returned by QI v2, i.e. Queue Interface present on DataPath Acceleration Architecture v2 (DPAA2). Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam - add DPAA2-CAAM (DPSECI) backend APIHoria Geantă2018-09-213-0/+908
| | | | | | | | | | Add the low-level API that allows to manage DPSECI DPAA2 objects that sit on the Management Complex (MC) fsl-mc bus. The API is compatible with MC firmware 10.2.0+. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: caam - fix implicit casts in endianness helpersHoria Geantă2018-09-211-14/+14
| | | | | | | | | | | | | | | | | | | Fix the following sparse endianness warnings: drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __le32unsigned int @@ drivers/crypto/caam/regs.h:95:1: expected unsigned int drivers/crypto/caam/regs.h:95:1: got restricted __le32 [usertype] <noident> drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __be32unsigned int @@ drivers/crypto/caam/regs.h:95:1: expected unsigned int drivers/crypto/caam/regs.h:95:1: got restricted __be32 [usertype] <noident> drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __le32 drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __be32 Fixes: 261ea058f016 ("crypto: caam - handle core endianness != caam endianness") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* soc: fsl: dpio: add congestion notification supportHoria Geantă2018-09-211-0/+15
| | | | | | | | | | Add support for Congestion State Change Notifications (CSCN), which allow DPIO users to be notified when a congestion group changes its state (due to hitting the entrance / exit threshold). Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* soc: fsl: dpio: add frame list format supportHoria Geantă2018-09-211-0/+242
| | | | | | | | | | | | | | | Add support for dpaa2_fd_list format, i.e. dpaa2_fl_entry structure and accessors. Frame list entries (FLEs) are similar, but not identical to FDs: + "F" (final) bit - FMT[b'01] is reserved - DD, SC, DROPP bits (covered by "FD compatibility" field in FLE case) - FLC[5:0] not used for stashing Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* soc: fsl: dpio: add back some frame queue functionsHoria Geantă2018-09-212-0/+62
| | | | | | | | | | This commit adds back functions removed in commit a211c8170b3c ("staging: fsl-mc/dpio: remove couple of unused functions") since dpseci object will make use of them. Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* bus: fsl-mc: add support for dpseci device typeHoria Geantă2018-09-212-0/+11
| | | | | | Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: chacha20 - Fix chacha20_block() keystream alignment (again)Eric Biggers2018-09-214-20/+20
| | | | | | | | | | | | | | | | | | | | | In commit 9f480faec58c ("crypto: chacha20 - Fix keystream alignment for chacha20_block()"), I had missed that chacha20_block() can be called directly on the buffer passed to get_random_bytes(), which can have any alignment. So, while my commit didn't break anything, it didn't fully solve the alignment problems. Revert my solution and just update chacha20_block() to use put_unaligned_le32(), so the output buffer need not be aligned. This is simpler, and on many CPUs it's the same speed. But, I kept the 'tmp' buffers in extract_crng_user() and _get_random_bytes() 4-byte aligned, since that alignment is actually needed for _crng_backtrack_protect() too. Reported-by: Stephan Müller <smueller@chronox.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: xts - Drop use of auxiliary bufferOndrej Mosnacek2018-09-211-223/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit acb9b159c784 ("crypto: gf128mul - define gf128mul_x_* in gf128mul.h"), the gf128mul_x_*() functions are very fast and therefore caching the computed XTS tweaks has only negligible advantage over computing them twice. In fact, since the current caching implementation limits the size of the calls to the child ecb(...) algorithm to PAGE_SIZE (usually 4096 B), it is often actually slower than the simple recomputing implementation. This patch simplifies the XTS template to recompute the XTS tweaks from scratch in the second pass and thus also removes the need to allocate a dynamic buffer using kmalloc(). As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt. PERFORMANCE RESULTS I measured time to encrypt/decrypt a memory buffer of varying sizes with xts(ecb-aes-aesni) using a tool I wrote ([2]) and the results suggest that after this patch the performance is either better or comparable for both small and large buffers. Note that there is a lot of noise in the measurements, but the overall difference is easy to see. Old code: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) xts(aes) 256 64 331 328 xts(aes) 384 64 332 333 xts(aes) 512 64 338 348 xts(aes) 256 512 889 920 xts(aes) 384 512 1019 993 xts(aes) 512 512 1032 990 xts(aes) 256 4096 2152 2292 xts(aes) 384 4096 2453 2597 xts(aes) 512 4096 3041 2641 xts(aes) 256 16384 9443 8027 xts(aes) 384 16384 8536 8925 xts(aes) 512 16384 9232 9417 xts(aes) 256 32768 16383 14897 xts(aes) 384 32768 17527 16102 xts(aes) 512 32768 18483 17322 New code: ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) xts(aes) 256 64 328 324 xts(aes) 384 64 324 319 xts(aes) 512 64 320 322 xts(aes) 256 512 476 473 xts(aes) 384 512 509 492 xts(aes) 512 512 531 514 xts(aes) 256 4096 2132 1829 xts(aes) 384 4096 2357 2055 xts(aes) 512 4096 2178 2027 xts(aes) 256 16384 6920 6983 xts(aes) 384 16384 8597 7505 xts(aes) 512 16384 7841 8164 xts(aes) 256 32768 13468 12307 xts(aes) 384 32768 14808 13402 xts(aes) 512 32768 15753 14636 [1] https://lkml.org/lkml/2018/8/23/1315 [2] https://gitlab.com/omos/linux-crypto-bench Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - improve XTS mask handlingArd Biesheuvel2018-09-213-19/+32
| | | | | | | | | | | | | | | | | | | The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - add support for CTS-CBC modeArd Biesheuvel2018-09-212-1/+243
| | | | | | | | | | | | | Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - revert NEON yield for skciphersArd Biesheuvel2018-09-211-173/+108
| | | | | | | | | | | | | | The reasoning of commit f10dc56c64bb ("crypto: arm64 - revert NEON yield for fast AEAD implementations") applies equally to skciphers: the walk API already guarantees that the input size of each call into the NEON code is bounded to the size of a page, and so there is no need for an additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - remove pointless (u8 *) castsArd Biesheuvel2018-09-211-24/+23
| | | | | | | | | | For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8* casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: cavium/nitrox - use dma_pool_zalloc()Srikanth Jampala2018-09-211-2/+2
| | | | | | | | use dma_pool_zalloc() instead of dma_pool_alloc with __GFP_ZERO flag. crypto dma pool renamed to "nitrox-context". Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu2018-09-216-7/+2
|\ | | | | | | Merge crypto-2.6 to resolve caam conflict with skcipher conversion.
| * crypto: caam/jr - fix ablkcipher_edesc pointer arithmeticHoria Geantă2018-09-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases the zero-length hw_desc array at the end of ablkcipher_edesc struct requires for 4B of tail padding. Due to tail padding and the way pointers to S/G table and IV are computed: edesc->sec4_sg = (void *)edesc + sizeof(struct ablkcipher_edesc) + desc_bytes; iv = (u8 *)edesc->hw_desc + desc_bytes + sec4_sg_bytes; first 4 bytes of IV are overwritten by S/G table. Update computation of pointer to S/G table to rely on offset of hw_desc member and not on sizeof() operator. Cc: <stable@vger.kernel.org> # 4.13+ Fixes: 115957bb3e59 ("crypto: caam - fix IV DMA mapping and updating") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2Ondrej Mosnacek2018-09-145-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | It turns out OSXSAVE needs to be checked only for AVX, not for SSE. Without this patch the affected modules refuse to load on CPUs with SSE2 but without AVX support. Fixes: 877ccce7cbe8 ("crypto: x86/aegis,morus - Fix and simplify CPUID checks") Cc: <stable@vger.kernel.org> # 4.18 Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccp - add timeout support in the SEV commandBrijesh Singh2018-09-131-5/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the CCP driver assumes that the SEV command issued to the PSP will always return (i.e. it will never hang). But recently, firmware bugs have shown that a command can hang. Since of the SEV commands are used in probe routines, this can cause boot hangs and/or loss of virtualization capabilities. To protect against firmware bugs, add a timeout in the SEV command execution flow. If a command does not complete within the specified timeout then return -ETIMEOUT and stop the driver from executing any further commands since the state of the SEV firmware is unknown. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Gary Hook <Gary.Hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: cavium/nitrox - Added support for SR-IOV configuration.Srikanth Jampala2018-09-148-30/+236
| | | | | | | | | | | | | | | | | | | | | | Added support to configure SR-IOV using sysfs interface. Supported VF modes are 16, 32, 64 and 128. Grouped the hardware configuration functions to "nitrox_hal.h" file. Changed driver version to "1.1". Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com> Reviewed-by: Gadam Sreerama <sgadam@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross ↵Mikulas Patocka2018-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a page in gcm This patch fixes gcmaes_crypt_by_sg so that it won't use memory allocation if the data doesn't cross a page boundary. Authenticated encryption may be used by dm-crypt. If the encryption or decryption fails, it would result in I/O error and filesystem corruption. The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can fail anytime. This patch fixes the logic so that it won't attempt the failing allocation if the data doesn't cross a page boundary. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crc-t10dif: crc_t10dif_mutex can be statickbuild test robot2018-09-141-1/+1
| | | | | | | | | | | | Fixes: b76377543b73 ("crc-t10dif: Pick better transform if one becomes available") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | dm: Remove VLA usage from hashesKees Cook2018-09-142-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this uses the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper bounds on stack usage. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/chacha20 - faster 8-bit rotations and other optimizationsEric Biggers2018-09-041-134/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize ChaCha20 NEON performance by: - Implementing the 8-bit rotations using the 'vtbl.8' instruction. - Streamlining the part that adds the original state and XORs the data. - Making some other small tweaks. On ARM Cortex-A7, these optimizations improve ChaCha20 performance from about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement. There is a tradeoff involved with the 'vtbl.8' rotation method since there is at least one CPU (Cortex-A53) where it's not fastest. But it seems to be a better default; see the added comment. Overall, this patch reduces Cortex-A53 performance by less than 0.5%. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crc-t10dif: Allow current transform to be inspected in sysfsMartin K. Petersen2018-09-041-0/+11
| | | | | | | | | | | | | | | | | | Add a way to print the currently active CRC algorithm in: /sys/module/crc_t10dif/parameters/transform Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crc-t10dif: Pick better transform if one becomes availableMartin K. Petersen2018-09-042-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | T10 CRC library is linked into the kernel thanks to block and SCSI. The crypto accelerators are typically loaded later as modules and are therefore not available when the T10 CRC library is initialized. Use the crypto notifier facility to trigger a switch to a better algorithm if one becomes available after the initial hash has been registered. Use RCU to protect the original transform while the new one is being set up. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: api - Introduce notifier for new crypto algorithmsMartin K. Petersen2018-09-044-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a facility that can be used to receive a notification callback when a new algorithm becomes available. This can be used by existing crypto registrations to trigger a switch from a software-only algorithm to a hardware-accelerated version. A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto notification chain, and the register/unregister functions are exported so they can be called by subsystems outside of crypto. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/crct10dif - implement non-Crypto Extensions alternativeArd Biesheuvel2018-09-042-2/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient. So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL versionArd Biesheuvel2018-09-042-76/+90
| | | | | | | | | | | | | | | | | | Reorganize the CRC-T10DIF asm routine so we can easily instantiate an alternative version based on 8x8 polynomial multiplication in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/crc32 - remove PMULL based CRC32 driverArd Biesheuvel2018-09-045-540/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the scalar fallbacks have been moved out of this driver into the core crc32()/crc32c() routines, we are left with a CRC32 crypto API driver for arm64 that is based only on 64x64 polynomial multiplication, which is an optional instruction in the ARMv8 architecture, and is less and less likely to be available on cores that do not also implement the CRC32 instructions, given that those are mandatory in the architecture as of ARMv8.1. Since the scalar instructions do not require the special handling that SIMD instructions do, and since they turn out to be considerably faster on some cores (Cortex-A53) as well, there is really no point in keeping this code around so let's just remove it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/aes-modes - get rid of literal load of addend vectorArd Biesheuvel2018-09-041-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the literal load of the addend vector with a sequence that performs each add individually. This sequence is only 2 instructions longer than the original, and 2% faster on Cortex-A53. This is an improvement by itself, but also works around a Clang issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm/ghash-ce - implement support for 4-way aggregationArd Biesheuvel2018-09-043-16/+131
| | | | | | | | | | | | | | | | | | | | | | | | Speed up the GHASH algorithm based on 64-bit polynomial multiplication by adding support for 4-way aggregation. This improves throughput by ~85% on Cortex-A53, from 1.7 cycles per byte to 0.9 cycles per byte. When combined with AES into GCM, throughput improves by ~25%, from 3.8 cycles per byte to 3.0 cycles per byte. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>