diff options
author | Darrick J. Wong <djwong@kernel.org> | 2024-05-22 08:01:45 +0200 |
---|---|---|
committer | Chandan Babu R <chandanbabu@kernel.org> | 2024-05-27 12:25:52 +0200 |
commit | 2b3f004d3d518ec7a392066d935fd85c81412e33 (patch) | |
tree | 3fc22e3dbb79ba0044fdebe16978dca9649ab704 /fs/xfs | |
parent | xfs: Stop using __maybe_unused in xfs_alloc.c (diff) | |
download | linux-2b3f004d3d518ec7a392066d935fd85c81412e33.tar.xz linux-2b3f004d3d518ec7a392066d935fd85c81412e33.zip |
xfs: drop xfarray sortinfo folio on error
Chandan Babu reports the following livelock in xfs/708:
run fstests xfs/708 at 2024-05-04 15:35:29
XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk!
XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be
XFS (loop5): Ending clean mount
XFS (loop5): Quotacheck needed: Please wait.
XFS (loop5): Quotacheck: Done.
XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk!
INFO: task xfs_io:143725 blocked for more than 122 seconds.
Not tainted 6.9.0-rc4+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:xfs_io state:D stack:0 pid:143725 tgid:143725 ppid:117661 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x69c/0x17a0
schedule+0x74/0x1b0
io_schedule+0xc4/0x140
folio_wait_bit_common+0x254/0x650
shmem_undo_range+0x9d5/0xb40
shmem_evict_inode+0x322/0x8f0
evict+0x24e/0x560
__dentry_kill+0x17d/0x4d0
dput+0x263/0x430
__fput+0x2fc/0xaa0
task_work_run+0x132/0x210
get_signal+0x1a8/0x1910
arch_do_signal_or_restart+0x7b/0x2f0
syscall_exit_to_user_mode+0x1c2/0x200
do_syscall_64+0x72/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The shmem code is trying to drop all the folios attached to a shmem
file and gets stuck on a locked folio after a bnobt repair. It looks
like the process has a signal pending, so I started looking for places
where we lock an xfile folio and then deal with a fatal signal.
I found a bug in xfarray_sort_scan via code inspection. This function
is called to set up the scanning phase of a quicksort operation, which
may involve grabbing a locked xfile folio. If we exit the function with
an error code, the caller does not call xfarray_sort_scan_done to put
the xfile folio. If _sort_scan returns an error code while si->folio is
set, we leak the reference and never unlock the folio.
Therefore, change xfarray_sort to call _scan_done on exit. This is safe
to call multiple times because it sets si->folio to NULL and ignores a
NULL si->folio. Also change _sort_scan to use an intermediate variable
so that we never pollute si->folio with an errptr.
Fixes: 232ea052775f9 ("xfs: enable sorting of xfile-backed arrays")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Diffstat (limited to 'fs/xfs')
-rw-r--r-- | fs/xfs/scrub/xfarray.c | 9 |
1 files changed, 6 insertions, 3 deletions
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c index 9185ae7088d4..cdd13ed9c569 100644 --- a/fs/xfs/scrub/xfarray.c +++ b/fs/xfs/scrub/xfarray.c @@ -822,12 +822,14 @@ xfarray_sort_scan( /* Grab the first folio that backs this array element. */ if (!si->folio) { + struct folio *folio; loff_t next_pos; - si->folio = xfile_get_folio(si->array->xfile, idx_pos, + folio = xfile_get_folio(si->array->xfile, idx_pos, si->array->obj_size, XFILE_ALLOC); - if (IS_ERR(si->folio)) - return PTR_ERR(si->folio); + if (IS_ERR(folio)) + return PTR_ERR(folio); + si->folio = folio; si->first_folio_idx = xfarray_idx(si->array, folio_pos(si->folio) + si->array->obj_size - 1); @@ -1048,6 +1050,7 @@ xfarray_sort( out_free: trace_xfarray_sort_stats(si, error); + xfarray_sort_scan_done(si); kvfree(si); return error; } |