summaryrefslogtreecommitdiffstats
path: root/raid6check.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Replace error prone signal() with sigaction()Lukasz Florczak2022-04-041-10/+15
| | | | | | | | | | | | Up to this date signal() was used which implementation could vary [1]. Sigaction() call is preferred. This commit introduces replacement from signal() to sigaction() by the use of signal_s() wrapper. Also remove redundant signal.h header includes. [1] https://man7.org/linux/man-pages/man2/signal.2.html Signed-off-by: Lukasz Florczak <lukasz.florczak@linux.intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Get failed disk count from array stateTomasz Majchrzak2017-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | Recent commit has changed the way failed disks are counted. It breaks recovery for external metadata arrays as failed disks are not part of the array and have no corresponding entries is sysfs (they are only reported for containers) so degraded arrays show no failed disks. Recent commit overwrites GET_DEGRADED result prior to GET_STATE and it is not set again if GET_STATE has not been requested. As GET_STATE provides the same information as GET_DEGRADED, the latter is not needed anymore. Remove GET_DEGRADED option and replace it with GET_STATE option. Don't count number of failed disks looking at sysfs entries but calculate it at the end. Do it only for arrays as containers report no disks, just spares. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* raid6check.c: fix "misleading-indentation" errorYilong Ren2016-10-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | To fix the following error info: root@vm-lkp-nex04-8G-7 /tmp/mdadm# make test cc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"3.4-43-g1dcee1c\" -DVERS_DATE="\"06th April 2016\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\" -c -o raid6check.o raid6check.c raid6check.c: In function 'manual_repair': raid6check.c:267:4: error: this 'else' clause does not guard... [-Werror=misleading-indentation] else ^~~~ raid6check.c:269:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'else' printf("Repairing D(%d) and P\n", failed_data); ^~~~~~ cc1: all warnings being treated as errors <builtin>: recipe for target 'raid6check.o' failed make: *** [raid6check.o] Error 1 root@vm-lkp-nex04-8G-7 /tmp/mdadm# Cc: NeilBrown <neilb@suse.com> Cc: linux-raid <linux-raid@vger.kernel.org> Cc: LKP <lkp@eclists.intel.com> Reviewed-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Yilong Ren <yilongx.ren@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* raid6check: don't ignore return value from posix_memalign.NeilBrown2015-08-051-1/+2
| | | | | | Compilers don't like that. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: use O_DIRECT instead of O_SYNC.NeilBrown2015-07-201-2/+3
| | | | | | | O_DIRECT is more direct and is faster. This requires aligned memory allocation, but that isn't hard. Signed-off-by: NeilBrown <neilb@suse.de>
* restripe: fix data block order in raid6_2_data_recovNeilBrown2015-07-201-5/+0
| | | | | | | | | | ... rather than relying on the caller getting them in the correct order. This is better engineering and fixes a bug, but because the failed_slotX numbers are used later with assumption that they weren't swapped Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: various cleanup/fixesNeilBrown2015-07-201-121/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - document meaning of various arrays. In particular: stripes[] blocks[] blocks_page[] block_index_for_slot[] It needs to be clear if these are indexed by raid_disk number or syndrome number. - changed meaning of block_index_for_slot[]. It didn't seem to be used consistently. It also made use of the block numbers in array data ordering, which is not directly relevant for syndrome calculations. - reduced number of args to autorepair and manual_repair There don't need both stripes[] and blocks[]. And they don't need diskP or diskQ. blocks[-1] is the P chunk, blocks[-2] is the Q chunk. block_index_for_slot[] can be used to find the target device for a particular syndrome block. - remove stripe locking from within manual_repair, and instead use the global stripe locking used for check and autorepair. - this necessitated changes to raid6_datap_recov and raid5_2data_reov so the P and Q blocks could be before or after the data blocks. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6checkNeilBrown2015-07-161-19/+34
| | | | | | | | fix checking of DDF layouts. Stuff probably still broken. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: get device ordering correct for syndrome calculation.NeilBrown2015-07-161-6/+15
| | | | | | | | | | | | | | | The order of devices used for the syndrome calculation is not the same as the order of data in the array. The D block immediately after Q is first, then they continue cyclicly in raid-disk order, skipping over the P disk if it is seen. This gets the 'check' right for all layouts other than DDF, which is quite different. I haven't confirmed that this does't break repair. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: report role of suspect device.NeilBrown2015-07-101-2/+3
| | | | | | i.e. -2 for Q, -1 for P, 0-N for data. Signed-off-by: NeilBrown <neilb@suse.de>
* Add "Name" defines to some ancillary programsNeilBrown2015-05-071-0/+2
| | | | | | | All programs now need to declare their "Name". Signed-off-by: NeilBrown <neilb@suse.de> Fixes: d56dd607ba43 ("Change way of printing name of a process")
* Don't break long strings onto multiple lines.NeilBrown2015-02-121-2/+1
| | | | | | | | | | | | | | | | | It is best to keep strings all together so that they are easier to search for in the source code. If a string is so long that it looks ugly one line, them maybe it should be broken into multiple lines for display too. Only strings which contain a newline can be broken into multiple lines: "It is OK to\n" "break this string\n" Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: move manual repair code to separate functionPiergiorgio Sartor2014-03-311-84/+110
| | | | | | | | | | This patch cleans up a bit the code by moving the second repair mode, that is the manual repair, to a separate function. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: move autorepair code to separate functionPiergiorgio Sartor2014-03-311-49/+65
| | | | | | | | | This patch cleans up a bit the code by moving the autorepair part into a separate function. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: lock the stripe until necessaryPiergiorgio Sartor2014-03-311-58/+58
| | | | | | | | | | | | | | | | | | | The stripe locking mechanism must be atomic between the check and the, potential, autorepair. For this reason, the autorepair code needs to be just after the check and both parts (check and autorepair) must be excuted under stripe lock. Of course, the manual repair can operate as before. This patch reorganize the code and provides the single, atomic, stripe lock. It should be confirmed that this new locking is not too demanding. In case it is, some other solutions will be required (suggestions wellcome). Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: reduce verbosityPiergiorgio Sartor2014-02-051-9/+1
| | | | | | | | | | | This patch will remove some legacy code. It is part of the verbosity "cleanup". In any case, if information about the P and Q parity mismatches is required, it should go inside the code handling page size blocks, not full stripe size. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: add O_SYNC to openPiergiorgio Sartor2014-02-041-1/+1
| | | | | | | | | | It could be better to make sure the data reaches the disks, so open the drives with O_SYNC flag. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: fix Q parity generationPiergiorgio Sartor2014-02-041-1/+1
| | | | | | | | | | | In the transition to 4K page processing, the Q parity generation had a wrong offset in the buffer. This patche fix this. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: fix position printoutPiergiorgio Sartor2014-02-041-2/+2
| | | | | | | | | | This patch make a bit more clear the position, in the disk, where an error is found. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c: reduce verbosityPiergiorgio Sartor2014-02-041-3/+0
| | | | | | | | | | | This patch removes some printouts, which are not really useful here. These could be re-added later, in case a verbosity parameter will be provided. Signed off: piergiorgio.sartor@nexgo.de Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check.c add page size check and repairPiergiorgio Sartor2014-01-231-33/+74
| | | | | | | | | | | | | raid6check current performs checks and repair on a whole chunk at a time. This is often not ideal as corruption can happen with smaller granularity. This patches changes raid6check to use a page-size (4K) granularity. We still process a chunk at a time, but within each chunk we process a page at a time. Signed-off-by: NeilBrown <neilb@suse.de>
* Remove lots of unnecessary white space.NeilBrown2013-06-191-4/+0
| | | | | | | Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Check return value of lseek64()Bernd Schubert2013-06-191-4/+28
| | | | | | | | If lseek64() failed it was still writing to the disks, which would introduce data corruption. Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Fix compiler warnings.Bernd Schubert2013-06-191-6/+26
| | | | | | | Fix some compiler warnings appearing with optimization levels. Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Use enums for repair typeBernd Schubert2013-06-191-6/+12
| | | | | | | Using hard coded numbers is error prone and hard to read by humans. Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Fix memory leaks detected by valgrindBernd Schubert2013-06-191-0/+2
| | | | | | | | | | | | | | | | | | ==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10 ==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2389947== by 0x408067: xmalloc (xmalloc.c:36) ==2389947== by 0x401B19: check_stripes (raid6check.c:151) ==2389947== by 0x4030C6: main (raid6check.c:521) ==2389947== ==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10 ==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2389947== by 0x408067: xmalloc (xmalloc.c:36) ==2389947== by 0x401B67: check_stripes (raid6check.c:155) ==2389947== by 0x4030C6: main (raid6check.c:521) ==2389947== Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Fix build of raid6checkBernd Schubert2013-06-191-1/+1
| | | | | | | | | | | | | After recent git pull 'make raid6check' did not work anymore, as sysfs_read() was called with a wrong argument and as check_env() was used by use_udev(), but not defined. Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...) Move check_env() from util.c to lib.c Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Auto-repair modeRobert Buchholz2012-09-101-1/+32
| | | | | | | | When calling raid6check in regular scanning mode, specifiying "autorepair" as the last positional parameter will cause it to automatically repair any single slot failes it identifies. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Extract (un)locking into functionsRobert Buchholz2012-09-101-43/+47
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Repair mode used geo_map incorrectlyRobert Buchholz2012-09-101-11/+13
| | | | | | | | | In repair mode, the data block indices to be repaired were calculated using geo_map() which returns the disk slot for a data block index and not the reverse. Now we simply store the reverse of that calculation when we do it anyway. Signed-off-by: NeilBrown <neilb@suse.de>
* raid6check: Fix off-by-one in argument checkRobert Buchholz2012-09-101-2/+2
| | | | | | | In repair mode, specifying a failed slot that is equal to the number of devices in the raid could cause a segfault. Signed-off-by: NeilBrown <neilb@suse.de>
* Repair mode for raid6Robert Buchholz2012-07-091-5/+128
| | | | | | | | | | | | | | | | In repair mode, raid6check will rewrite one single stripe by regenerating the data (or parity) of two raid devices that are specified via the command line. If you need to rewrite just one slot, pick any other slot at random. Note that the repair option will change data on the disks directly, so both the md layer above as well as any layers above md (such as filesystems) may be accessing the stripe data from cached buffers. Either instruct the kernels to drop the caches or reassemble the raid after repair. Signed-off-by: NeilBrown <neilb@suse.de>
* Remove scattered checks for malloc success.NeilBrown2012-07-091-30/+10
| | | | | | | | | | | | | | malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>
* Introduce pr_err for printing error messages.NeilBrown2012-07-091-1/+1
| | | | | | | 'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": ' cont_err() is also available. Signed-off-by: NeilBrown <neilb@suse.de>
* RAID-6 check standalone suspend arrayPiergiorgio Sartor2011-05-161-3/+38
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* RAID-6 check standalone fix component list parsingPiergiorgio Sartor2011-04-141-12/+23
| | | | | | | | Fix the parsing of the component list, i.e. skipping the "spare" one. I also added a check in case the array is degraded. Signed-off-by: NeilBrown <neilb@suse.de>
* RAID-6 check standalone code cleanupPiergiorgio Sartor2011-04-051-46/+91
| | | | | | | | | | | Major change is code cleanup and simplification. Furthermore, a better error handling and a couple of bug fixes. Last but not least, the command line parameters are changed from "bytes" to "stripes", which is more convenient, I guess. Signed-off-by: NeilBrown <neilb@suse.de>
* RAID-6 check standalone md devicePiergiorgio Sartor2011-04-051-30/+80
| | | | | | | | | | | | | | | | | | | | | Allow RAID-6 check to be passed only the MD device, start and length. The three parameters are mandatory. All necessary information is collected using the "sysfs_read()" call. Furthermore, if "length" is "0", then the check is performed until the end of the array. Some checks are done, for example if the md device is really a RAID-6. Nevertheless I guess it is not bullet proof... Next patch will include the "suspend" action. My idea is to do it "per stripe", please let me know if you've some better options. Signed-off-by: NeilBrown <neilb@suse.de>
* RAID-6 check standalonePiergiorgio Sartor2011-03-211-0/+262
Hi Neil, please find attached a patch, to mdadm-3.2 base, including a standalone versione of the raid-6 check. This is basically a re-working (and hopefully improvement) of the already implemented check in "restripe.c". I splitted the check function into "collect" and "stats", so that the second one could be easily replaced. The API is also simplified. The command line option are reduced, since we only level is raid-6, but the ":offset" option is included. The output reports the block/stripe rotation, P/Q errors and the possible HDD (or unknown). BTW, the patch applies also to the already patched "restripe.c", including the last ":offset" patch (which is not yet in git). Other item is that due to "sysfs.c" linking (see below) the "Makefile" needed some changes, I hope this is not a problem. Next steps (TODO list you like) would be: 1) Add the "sysfs.c" code in order to retrieve the HDDs info from the MD device. It is already linked, together with the whole (mdadm) universe, since it seems it cannot leave alone. I'll need some advice or hint on how to do use it. I checked "sysfs.c", but before I dig deep into it maybe better to have some advice (maybe just one function call will do it). 2) Add the suspend lo/hi control. Fellow John Robinson was suggesting to look into "Grow.c", which I did, but I guess the same story as 1) is valid: better to have some hint on where to look before wasting time. 3) Add a repair option (future). This should have different levels, like "all", "disk", "stripe". That is, fix everything (more or less like "repair"), fix only if a disk is clearly having problems, fix each stripe which has clearly a problem (but maybe different stripes may belong to different HDDs). So, for the point 1) and 2) would be nice to have some more detail on where to look what. Point 3) we will discuss later. Thanks, please consider for inclusion, bye, pg Signed-off-by: NeilBrown <neilb@suse.de>