xfs: add missing ilock around dio write last extent alignment

The iomap codepath (via get_blocks()) acquires and release the inode lock in the case of a direct write that requires block allocation. This is because xfs_iomap_write_direct() allocates a transaction, which means the ilock must be dropped and reacquired after the transaction is allocated and reserved. xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before the transaction is created and thus before the ilock is reacquired. This can lead to calls to xfs_iread_extents() and reads of the in-core extent list without any synchronization (via xfs_bmap_eof() and xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock is not held, but this is not currently seen in practice as the current callers had already invoked xfs_bmapi_read(). What has been seen in practice are reports of crashes down in the xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer references from xfs_iext_get_ext(). While an explicit reproducer is not currently available to confirm the cause of the problem, crash analysis and code inspection from David Jeffrey had identified the insufficient locking. xfs_iomap_eof_align_last_fsb() is called from other contexts with the inode lock already held, so we cannot acquire it therein. __xfs_get_blocks() acquires and drops the ilock with variable flags to cover the event that the extent list must be read in. The common case is that __xfs_get_blocks() acquires the shared ilock. To provide locking around the last extent alignment call without adding more lock cycles to the dio path, update xfs_iomap_write_direct() to expect the shared ilock held on entry and do the extent alignment under its protection. Demote the lock, if necessary, from __xfs_get_blocks() and push the xfs_qm_dqattach() call outside of the shared lock critical section. Also, add an assert to document that the extent list is always expected to be present in this path. Otherwise, we risk a call to xfs_iread_extents() while under the shared ilock. This is safe as all current callers have executed an xfs_bmapi_read() call under the current iolock context. Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
author: Brian Foster <bfoster@redhat.com> 2015-10-12 06:34:20 +0200
committer: Dave Chinner <david@fromorbit.com> 2015-10-12 06:34:20 +0200
commit: 009c6e871e98aa23bc2e58474c3d9feb05dd1ae6 (patch)
tree: 5af6b1c62320e3643bfbc895f9ee9fbe93103ba4 /fs/xfs/xfs_pnfs.c
parent: cancel the setfilesize transation when io error happen (diff)
download: linux-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.tar.xz
linux-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.zip
1 files changed, 5 insertions, 0 deletions
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index ab4a6066f7ca..dc6221942b85 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -181,6 +181,11 @@ xfs_fs_map_blocks(
 		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
 
 		if (!nimaps || imap.br_startblock == HOLESTARTBLOCK) {
+			/*
+			 * xfs_iomap_write_direct() expects to take ownership of
+			 * the shared ilock.
+			 */
+			xfs_ilock(ip, XFS_ILOCK_SHARED);
 			error = xfs_iomap_write_direct(ip, offset, length,
 						       &imap, nimaps);
 			if (error)
author	Brian Foster <bfoster@redhat.com>	2015-10-12 06:34:20 +0200
committer	Dave Chinner <david@fromorbit.com>	2015-10-12 06:34:20 +0200
commit	009c6e871e98aa23bc2e58474c3d9feb05dd1ae6 (patch)
tree	5af6b1c62320e3643bfbc895f9ee9fbe93103ba4 /fs/xfs/xfs_pnfs.c
parent	cancel the setfilesize transation when io error happen (diff)
download	linux-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.tar.xz linux-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.zip