mm: vmscan: do not iterate all mem cgroups for global direct reclaim

In current implementation, both kswapd and direct reclaim has to iterate all mem cgroups. It is not a problem before offline mem cgroups could be iterated. But, currently with iterating offline mem cgroups, it could be very time consuming. In our workloads, we saw over 400K mem cgroups accumulated in some cases, only a few hundred are online memcgs. Although kswapd could help out to reduce the number of memcgs, direct reclaim still get hit with iterating a number of offline memcgs in some cases. We experienced the responsiveness problems due to this occassionally. A simple test with pref shows it may take around 220ms to iterate 8K memcgs in direct reclaim: dd 13873 [011] 578.542919: vmscan:mm_vmscan_direct_reclaim_begin dd 13873 [011] 578.758689: vmscan:mm_vmscan_direct_reclaim_end So for 400K, it may take around 11 seconds to iterate all memcgs. Here just break the iteration once it reclaims enough pages as what memcg direct reclaim does. This may hurt the fairness among memcgs. But the cached iterator cookie could help to achieve the fairness more or less. Link: http://lkml.kernel.org/r/1548799877-10949-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Yang Shi <yang.shi@linux.alibaba.com> 2019-03-06 00:48:05 +0100
committer: Linus Torvalds <torvalds@linux-foundation.org> 2019-03-06 06:07:19 +0100
commit: 2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d (patch)
tree: bd2d2c8cf7bb7d3e76592363e4704927393fcbee /mm
parent: mm: swap: use mem_cgroup_is_root() instead of deferencing css->parent (diff)
download: linux-2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d.tar.xz
linux-2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d.zip
1 files changed, 3 insertions, 4 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1c2f78cb78c..07a68dcd5f58 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2747,16 +2747,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 				   sc->nr_reclaimed - reclaimed);
 
 			/*
-			 * Direct reclaim and kswapd have to scan all memory
-			 * cgroups to fulfill the overall scan target for the
-			 * node.
+			 * Kswapd have to scan all memory cgroups to fulfill
+			 * the overall scan target for the node.
 			 *
 			 * Limit reclaim, on the other hand, only cares about
 			 * nr_to_reclaim pages to be reclaimed and it will
 			 * retry with decreasing priority if one round over the
 			 * whole hierarchy is not sufficient.
 			 */
-			if (!global_reclaim(sc) &&
+			if (!current_is_kswapd() &&
 					sc->nr_reclaimed >= sc->nr_to_reclaim) {
 				mem_cgroup_iter_break(root, memcg);
 				break;
author	Yang Shi <yang.shi@linux.alibaba.com>	2019-03-06 00:48:05 +0100
committer	Linus Torvalds <torvalds@linux-foundation.org>	2019-03-06 06:07:19 +0100
commit	2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d (patch)
tree	bd2d2c8cf7bb7d3e76592363e4704927393fcbee /mm
parent	mm: swap: use mem_cgroup_is_root() instead of deferencing css->parent (diff)
download	linux-2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d.tar.xz linux-2bb0f34fe3c1f04196cbcf8aa86b0a9371f6938d.zip