xfs: properly serialise fallocate against AIO+DIO

AIO+DIO can extend the file size on IO completion, and it holds no inode locks while the IO is in flight. Therefore, a race condition exists in file size updates if we do something like this: aio-thread fallocate-thread lock inode submit IO beyond inode->i_size unlock inode ..... lock inode break layouts if (off + len > inode->i_size) new_size = off + len ..... inode_dio_wait() <blocks> ..... completes inode->i_size updated inode_dio_done() .... <wakes> <does stuff no long beyond EOF> if (new_size) xfs_vn_setattr(inode, new_size) Yup, that attempt to extend the file size in the fallocate code turns into a truncate - it removes the whatever the aio write allocated and put to disk, and reduced the inode size back down to where the fallocate operation ends. Fundamentally, xfs_file_fallocate() not compatible with racing AIO+DIO completions, so we need to move the inode_dio_wait() call up to where the lock the inode and break the layouts. Secondly, storing the inode size and then using it unchecked without holding the ILOCK is not safe; we can only do such a thing if we've locked out and drained all IO and other modification operations, which we don't do initially in xfs_file_fallocate. It should be noted that some of the fallocate operations are compound operations - they are made up of multiple manipulations that may zero data, and so we may need to flush and invalidate the file multiple times during an operation. However, we only need to lock out IO and other space manipulation operations once, as that lockout is maintained until the entire fallocate operation has been completed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
author: Dave Chinner <dchinner@redhat.com> 2019-10-29 21:04:32 +0100
committer: Darrick J. Wong <darrick.wong@oracle.com> 2019-10-31 17:17:55 +0100
commit: 249bd9087a5264d2b8a974081870e2e27671b4dc (patch)
tree: b1d0df80c930dea916e409b2c90100267ed3ccc4 /fs/xfs/xfs_bmap_util.c
parent: xfs: merge xfs_showargs into xfs_fs_show_options (diff)
download: linux-249bd9087a5264d2b8a974081870e2e27671b4dc.tar.xz
linux-249bd9087a5264d2b8a974081870e2e27671b4dc.zip
1 files changed, 1 insertions, 7 deletions
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b16081c9d646..036719b29461 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -930,6 +930,7 @@ out_trans_cancel:
 	goto out_unlock;
 }
 
+/* Caller must first wait for the completion of any pending DIOs if required. */
 int
 xfs_flush_unmap_range(
 	struct xfs_inode	*ip,
@@ -941,9 +942,6 @@ xfs_flush_unmap_range(
 	xfs_off_t		rounding, start, end;
 	int			error;
 
-	/* wait for the completion of any pending DIOs */
-	inode_dio_wait(inode);
-
 	rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_SIZE);
 	start = round_down(offset, rounding);
 	end = round_up(offset + len, rounding) - 1;
@@ -975,10 +973,6 @@ xfs_free_file_space(
 	if (len <= 0)	/* if nothing being freed */
 		return 0;
 
-	error = xfs_flush_unmap_range(ip, offset, len);
-	if (error)
-		return error;
-
 	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
 	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
author	Dave Chinner <dchinner@redhat.com>	2019-10-29 21:04:32 +0100
committer	Darrick J. Wong <darrick.wong@oracle.com>	2019-10-31 17:17:55 +0100
commit	249bd9087a5264d2b8a974081870e2e27671b4dc (patch)
tree	b1d0df80c930dea916e409b2c90100267ed3ccc4 /fs/xfs/xfs_bmap_util.c
parent	xfs: merge xfs_showargs into xfs_fs_show_options (diff)
download	linux-249bd9087a5264d2b8a974081870e2e27671b4dc.tar.xz linux-249bd9087a5264d2b8a974081870e2e27671b4dc.zip