summaryrefslogtreecommitdiffstats
path: root/crypto/xor.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* crypto: xor - Remove unused variable count in do_xor_speedNathan Chancellor2020-10-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Clang warns: crypto/xor.c:101:4: warning: variable 'count' is uninitialized when used here [-Wuninitialized] count++; ^~~~~ crypto/xor.c:86:17: note: initialize the variable 'count' to silence this warning int i, j, count; ^ = 0 1 warning generated. After the refactoring to use ktime that happened in this function, count is only assigned, never read. Just remove the variable to get rid of the warning. Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking") Link: https://github.com/ClangBuiltLinux/linux/issues/1171 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: xor - use ktime for template benchmarkingArd Biesheuvel2020-10-021-22/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we use the jiffies counter as a time source, by staring at it until a HZ period elapses, and then staring at it again and perform as many XOR operations as we can at the same time until another HZ period elapses, so that we can calculate the throughput. This takes longer than necessary, and depends on HZ, which is undesirable, since HZ is system dependent. Let's use the ktime interface instead, and use it to time a fixed number of XOR operations, which can be done much faster, and makes the time spent depend on the performance level of the system itself, which is much more reasonable. To ensure that we have the resolution we need even on systems with 32 kHz time sources, while not spending too much time in the benchmark on a slow CPU, let's switch to 3 attempts of 800 repetitions each: that way, we will only misidentify algorithms that perform within 10% of each other as the fastest if they are faster than 10 GB/s to begin with, which is not expected to occur on systems with such coarse clocks. On ThunderX2, I get the following results: Before: [72625.956765] xor: measuring software checksum speed [72625.993104] 8regs : 10169.000 MB/sec [72626.033099] 32regs : 12050.000 MB/sec [72626.073095] arm64_neon: 11100.000 MB/sec [72626.073097] xor: using function: 32regs (12050.000 MB/sec) After: [72599.650216] xor: measuring software checksum speed [72599.651188] 8regs : 10491 MB/sec [72599.652006] 32regs : 12345 MB/sec [72599.652871] arm64_neon : 11402 MB/sec [72599.652873] xor: using function: 32regs (12345 MB/sec) Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: xor - defer load time benchmark to a later timeArd Biesheuvel2020-10-021-1/+28
| | | | | | | | | | | | | | Currently, the XOR module performs its boot time benchmark at core initcall time when it is built-in, to ensure that the RAID code can make use of it when it is built-in as well. Let's defer this to a later stage during the boot, to avoid impacting the overall boot time of the system. Instead, just pick an arbitrary implementation from the list, and use that as the preliminary default. Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47Thomas Gleixner2019-05-241-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version you should have received a copy of the gnu general public license for example usr src linux copying if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 20 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.552543146@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACKLevin, Alexander (Sasha Levin)2017-11-161-6/+1
| | | | | | | | | | | | | | | | Convert all allocations that used a NOTRACK flag to stop using it. Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* crypto: xor - Fix warning when XOR_SELECT_TEMPLATE is unsetHerbert Xu2016-08-311-4/+5
| | | | | | | | | | This patch fixes an unused label warning triggered when the macro XOR_SELECT_TEMPLATE is not set. Fixes: 39457acda913 ("crypto: xor - skip speed test if the xor...") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: xor - skip speed test if the xor function is selected automaticallyMartin Schwidefsky2016-08-241-21/+19
| | | | | | | | | | If the architecture selected the xor function with XOR_SELECT_TEMPLATE the speed result of the do_xor_speed benchmark is of limited value. The speed measurement increases the bootup time a little, which can makes a difference for kernels used in container like virtual machines. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* add further __init annotations to crypto/xor.cJan Beulich2012-10-111-2/+2
| | | | | | | Allow particularly do_xor_speed() to be discarded post-init. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Merge tag 'md-3.5' of git://neil.brown.name/mdLinus Torvalds2012-05-241-3/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md updates from NeilBrown: "It's been a busy cycle for md - lots of fun stuff here.. if you like this kind of thing :-) Main features: - RAID10 arrays can be reshaped - adding and removing devices and changing chunks (not 'far' array though) - allow RAID5 arrays to be reshaped with a backup file (not tested yet, but the priciple works fine for RAID10). - arrays can be reshaped while a bitmap is present - you no longer need to remove it first - SSSE3 support for RAID6 syndrome calculations and of course a number of minor fixes etc." * tag 'md-3.5' of git://neil.brown.name/md: (56 commits) md/bitmap: record the space available for the bitmap in the superblock. md/raid10: Remove extras after reshape to smaller number of devices. md/raid5: improve removal of extra devices after reshape. md: check the return of mddev_find() MD RAID1: Further conditionalize 'fullsync' DM RAID: Use md_error() in place of simply setting Faulty bit DM RAID: Record and handle missing devices DM RAID: Set recovery flags on resume md/raid5: Allow reshape while a bitmap is present. md/raid10: resize bitmap when required during reshape. md: allow array to be resized while bitmap is present. md/bitmap: make sure reshape request are reflected in superblock. md/bitmap: add bitmap_resize function to allow bitmap resizing. md/bitmap: use DIV_ROUND_UP instead of open-code md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap' md/bitmap: make bitmap bitops atomic. md/bitmap: make _page_attr bitops atomic. md/bitmap: merge bitmap_file_unmap and bitmap_file_put. md/bitmap: remove async freeing of bitmap file. md/bitmap: convert some spin_lock_irqsave to spin_lock_irq ...
| * crypto: disable preemption while benchmarking RAID5 xor checksummingJim Kukunas2012-05-221-0/+5
| | | | | | | | | | | | | | | | | | With CONFIG_PREEMPT=y, we need to disable preemption while benchmarking RAID5 xor checksumming to ensure we're actually measuring what we think we're measuring. Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * crypto: wait for a full jiffy in do_xor_speedJim Kukunas2012-05-221-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | In the existing do_xor_speed(), there is no guarantee that we actually run do_2() for a full jiffy. We get the current jiffy, then run do_2() until the next jiffy. Instead, let's get the current jiffy, then wait until the next jiffy to start our test. Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | crypto, xor: Sanitize checksumming function selection outputBorislav Petkov2012-04-091-2/+3
|/ | | | | | | | | | | | | | | | | | | | | Currently, it says [ 1.015541] xor: automatically using best checksumming function: generic_sse [ 1.040769] generic_sse: 6679.000 MB/sec [ 1.045377] xor: using function: generic_sse (6679.000 MB/sec) and repeats the function name three times unnecessarily. Change it into [ 1.015115] xor: automatically using best checksumming function: [ 1.040794] generic_sse: 6680.000 MB/sec and save us a line in dmesg. No functional change. Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* crypto: don't track xor test pages with kmemcheckVegard Nossum2009-06-151-1/+6
| | | | | | | | | The xor tests are run on uninitialized data, because it is doesn't really matter what the underlying data is. Annotate this false- positive warning. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* md: move lots of #include lines out of .h files and into .cNeilBrown2009-03-311-1/+1
| | | | | | | | | | This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>
* async_tx: add the async_tx apiDan Williams2007-07-131-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_<operation>(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { /* run <operation> asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run <operation> synchronously */ ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* xor: make 'xor_blocks' a library routine for use with async_txDan Williams2007-07-131-0/+156
The async_tx api tries to use a dma engine for an operation, but will fall back to an optimized software routine otherwise. Xor support is implemented using the raid5 xor routines. For organizational purposes this routine is moved to a common area. The following fixes are also made: * rename xor_block => xor_blocks, suggested by Adrian Bunk * ensure that xor.o initializes before md.o in the built-in case * checkpatch.pl fixes * mark calibrate_xor_blocks __init, Adrian Bunk Cc: Adrian Bunk <bunk@stusta.de> Cc: NeilBrown <neilb@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>