summaryrefslogtreecommitdiffstats
path: root/drivers/md/md.c
diff options
context:
space:
mode:
authorColy Li <colyli@suse.de>2020-04-09 16:17:20 +0200
committerSong Liu <songliubraving@fb.com>2020-05-13 20:22:31 +0200
commit78f57ef9d50a75326da73d352d7c27828495229a (patch)
tree862bd498328a72db8b0b74b5d2948805e558829f /drivers/md/md.c
parentmd: remove the extra line for ->hot_add_disk (diff)
downloadlinux-78f57ef9d50a75326da73d352d7c27828495229a.tar.xz
linux-78f57ef9d50a75326da73d352d7c27828495229a.zip
md: use memalloc scope APIs in mddev_suspend()/mddev_resume()
In raid5.c:resize_chunk(), scribble_alloc() is called with GFP_NOIO flag, then it is sent into kvmalloc_array() inside scribble_alloc(). The problem is kvmalloc_array() eventually calls kvmalloc_node() which does not accept non GFP_KERNEL compatible flag like GFP_NOIO, then kmalloc_node() is called indeed to allocate physically continuous pages. When system memory is under heavy pressure, and the requesting size is large, there is high probability that allocating continueous pages will fail. But simply using GFP_KERNEL flag to call kvmalloc_array() is also progblematic. In the code path where scribble_alloc() is called, the raid array is suspended, if kvmalloc_node() triggers memory reclaim I/Os and such I/Os go back to the suspend raid array, deadlock will happen. What is desired here is to allocate non-physically (a.k.a virtually) continuous pages and avoid memory reclaim I/Os. Michal Hocko suggests to use the mmealloc sceope APIs to restrict memory reclaim I/O in allocating context, specifically to call memalloc_noio_save() when suspend the raid array and to call memalloc_noio_restore() when resume the raid array. This patch adds the memalloc scope APIs in mddev_suspend() and mddev_resume(), to restrict memory reclaim I/Os during the raid array is suspended. The benifit of adding the memalloc scope API in the unified entry point mddev_suspend()/mddev_resume() is, no matter which md raid array type (personality), we are sure the deadlock by recursive memory reclaim I/O won't happen on the suspending context. Please notice that the memalloc scope APIs only take effect on the raid array suspending context, if the memory allocation is from another new created kthread after raid array suspended, the recursive memory reclaim I/Os won't be restricted. The mddev_suspend()/mddev_resume() entries are used for the critical section where the raid metadata is modifying, creating a kthread to allocate memory inside the critical section is queer and very probably being buggy. Fixes: b330e6a49dc3 ("md: convert to kvmalloc") Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
Diffstat (limited to 'drivers/md/md.c')
-rw-r--r--drivers/md/md.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8ff690ac6247..1b3316c79b7b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -528,11 +528,15 @@ void mddev_suspend(struct mddev *mddev)
wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
del_timer_sync(&mddev->safemode_timer);
+ /* restrict memory reclaim I/O during raid array is suspend */
+ mddev->noio_flag = memalloc_noio_save();
}
EXPORT_SYMBOL_GPL(mddev_suspend);
void mddev_resume(struct mddev *mddev)
{
+ /* entred the memalloc scope from mddev_suspend() */
+ memalloc_noio_restore(mddev->noio_flag);
lockdep_assert_held(&mddev->reconfig_mutex);
if (--mddev->suspended)
return;