md: Set MD_BROKEN for RAID1 and RAID10

There is no direct mechanism to determine raid failure outside personality. It is done by checking rdev->flags after executing md_error(). If "faulty" flag is not set then -EBUSY is returned to userspace. -EBUSY means that array will be failed after drive removal. Mdadm has special routine to handle the array failure and it is executed if -EBUSY is returned by md. There are at least two known reasons to not consider this mechanism as correct: 1. drive can be removed even if array will be failed[1]. 2. -EBUSY seems to be wrong status. Array is not busy, but removal process cannot proceed safe. -EBUSY expectation cannot be removed without breaking compatibility with userspace. In this patch first issue is resolved by adding support for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in next commit. The idea is to set the MD_BROKEN if we are sure that raid is in failed state now. This is done in each error_handler(). In md_error() MD_BROKEN flag is checked. If is set, then -EBUSY is returned to userspace. As in previous commit, it causes that #mdadm --set-faulty is able to fail array. Previously proposed workaround is valid if optional functionality[1] is disabled. [1] commit 9a567843f7ce("md: allow last device to be forcibly removed from RAID1/RAID10.") Reviewd-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org>
author: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> 2022-03-22 16:23:38 +0100
committer: Song Liu <song@kernel.org> 2022-04-25 23:00:34 +0200
commit: 9631abdbf406c764f2a5d8305eac063bc3396a0a (patch)
tree: 562765b208e282d83742f8c90b84d6bc5fcf934e /drivers/md/raid10.c
parent: block/rnbd-clt: Avoid flush_workqueue(system_long_wq) usage (diff)
download: linux-9631abdbf406c764f2a5d8305eac063bc3396a0a.tar.xz
linux-9631abdbf406c764f2a5d8305eac063bc3396a0a.zip
1 files changed, 24 insertions, 16 deletions
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 834eb3ba95a6..dfa576cdf11c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1970,32 +1970,40 @@ static int enough(struct r10conf *conf, int ignore)
 		_enough(conf, 1, ignore);
 }
 
+/**
+ * raid10_error() - RAID10 error handler.
+ * @mddev: affected md device.
+ * @rdev: member device to fail.
+ *
+ * The routine acknowledges &rdev failure and determines new @mddev state.
+ * If it failed, then:
+ *	- &MD_BROKEN flag is set in &mddev->flags.
+ * Otherwise, it must be degraded:
+ *	- recovery is interrupted.
+ *	- &mddev->degraded is bumped.
+
+ * @rdev is marked as &Faulty excluding case when array is failed and
+ * &mddev->fail_last_dev is off.
+ */
 static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 	struct r10conf *conf = mddev->private;
 	unsigned long flags;
 
-	/*
-	 * If it is not operational, then we have already marked it as dead
-	 * else if it is the last working disks with "fail_last_dev == false",
-	 * ignore the error, let the next level up know.
-	 * else mark the drive as failed
-	 */
 	spin_lock_irqsave(&conf->device_lock, flags);
-	if (test_bit(In_sync, &rdev->flags) && !mddev->fail_last_dev
-	    && !enough(conf, rdev->raid_disk)) {
-		/*
-		 * Don't fail the drive, just return an IO error.
-		 */
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
+		set_bit(MD_BROKEN, &mddev->flags);
+
+		if (!mddev->fail_last_dev) {
+			spin_unlock_irqrestore(&conf->device_lock, flags);
+			return;
+		}
 	}
 	if (test_and_clear_bit(In_sync, &rdev->flags))
 		mddev->degraded++;
-	/*
-	 * If recovery is running, make sure it aborts.
-	 */
+
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	set_bit(Blocked, &rdev->flags);
 	set_bit(Faulty, &rdev->flags);
author	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>	2022-03-22 16:23:38 +0100
committer	Song Liu <song@kernel.org>	2022-04-25 23:00:34 +0200
commit	9631abdbf406c764f2a5d8305eac063bc3396a0a (patch)
tree	562765b208e282d83742f8c90b84d6bc5fcf934e /drivers/md/raid10.c
parent	block/rnbd-clt: Avoid flush_workqueue(system_long_wq) usage (diff)
download	linux-9631abdbf406c764f2a5d8305eac063bc3396a0a.tar.xz linux-9631abdbf406c764f2a5d8305eac063bc3396a0a.zip