gfs2: Warn when a journal replay overwrites a rgrp with buffers

This patch adds some instrumentation in gfs2's journal replay that indicates when we're about to overwrite a rgrp for which we already have a valid buffer_head. When this problem occurs, it's a situation in which this node has been granted a rgrp glock and subsequently read in buffer_heads for it, and possibly even made changes to the rgrp bits and/or allocation values. But now another node has failed and forced us to replay its journal, but its journal contains a copy of the same rgrp, without a revoke, which means we're about to overwrite a rgrp that we now rightfully own, with an obsolete copy. That is always a problem. It means the other node (which failed and left its journal to be replayed) failed to flush out its rgrp buffers, write out the revoke, and invalidate its copy before it released the glock to our possession. No node should ever release a glock until its metadata has been written to the journal and revoked and invalidated.. We also kludge around the problem and refuse to replace our good copy with the journals bad copy by not marking the buffer dirty, but never do it silently. That's wallpapering over a larger problem that still exists. IOW, if this situation can happen to this node, it can also happen to a different node and we wouldn't even know it or be able to circumvent it: Suppose we have a 3-node cluster: Node 1 fails, leaving an obsolete rgrp block in its journal without a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is released and starts making changes, allocating and freeing blocks from the rgrp, etc. Node 3 replays the journal from node 1, oblivious and unaware that it's about to overwrite node 2's changes. So we still need to be vocal and log the error to make it apparent that a corruption path still exists in gfs2. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
author: Bob Peterson <rpeterso@redhat.com> 2019-02-27 21:26:59 +0100
committer: Andreas Gruenbacher <agruenba@redhat.com> 2019-06-27 21:04:07 +0200
commit: d14e1ca305fc27dbceabad64bf5158b35d8864c8 (patch)
tree: c795c802e098945407c7976c3dabb8546da7eec2 /fs/gfs2
parent: gfs2: log which portion of the journal is replayed (diff)
download: linux-d14e1ca305fc27dbceabad64bf5158b35d8864c8.tar.xz
linux-d14e1ca305fc27dbceabad64bf5158b35d8864c8.zip
1 files changed, 20 insertions, 2 deletions
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 1921cda034fd..8c3678f42746 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -759,9 +759,27 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, u32 start,
 
 		if (gfs2_meta_check(sdp, bh_ip))
 			error = -EIO;
-		else
+		else {
+			struct gfs2_meta_header *mh =
+				(struct gfs2_meta_header *)bh_ip->b_data;
+
+			if (mh->mh_type == cpu_to_be32(GFS2_METATYPE_RG)) {
+				struct gfs2_rgrpd *rgd;
+
+				rgd = gfs2_blk2rgrpd(sdp, blkno, false);
+				if (rgd && rgd->rd_addr == blkno &&
+				    rgd->rd_bits && rgd->rd_bits->bi_bh) {
+					fs_info(sdp, "Replaying 0x%llx but we "
+						"already have a bh!\n",
+						(unsigned long long)blkno);
+					fs_info(sdp, "busy:%d, pinned:%d\n",
+						buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0,
+						buffer_pinned(rgd->rd_bits->bi_bh));
+					gfs2_dump_glock(NULL, rgd->rd_gl);
+				}
+			}
 			mark_buffer_dirty(bh_ip);
-
+		}
 		brelse(bh_log);
 		brelse(bh_ip);
author	Bob Peterson <rpeterso@redhat.com>	2019-02-27 21:26:59 +0100
committer	Andreas Gruenbacher <agruenba@redhat.com>	2019-06-27 21:04:07 +0200
commit	d14e1ca305fc27dbceabad64bf5158b35d8864c8 (patch)
tree	c795c802e098945407c7976c3dabb8546da7eec2 /fs/gfs2
parent	gfs2: log which portion of the journal is replayed (diff)
download	linux-d14e1ca305fc27dbceabad64bf5158b35d8864c8.tar.xz linux-d14e1ca305fc27dbceabad64bf5158b35d8864c8.zip