diff options
author | Shakeel Butt <shakeelb@google.com> | 2022-03-22 22:40:28 +0100 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-03-22 23:57:02 +0100 |
commit | c9afe31ec443ea6d81d556159abc7ef0bc462ac0 (patch) | |
tree | 6ba30ac979e9aff11107a46069c7982c787de853 /mm/memcontrol.c | |
parent | selftests: memcg: test high limit for single entry allocation (diff) | |
download | linux-c9afe31ec443ea6d81d556159abc7ef0bc462ac0.tar.xz linux-c9afe31ec443ea6d81d556159abc7ef0bc462ac0.zip |
memcg: synchronously enforce memory.high for large overcharges
The high limit is used to throttle the workload without invoking the
oom-killer. Recently we tried to use the high limit to right size our
internal workloads. More specifically dynamically adjusting the limits
of the workload without letting the workload get oom-killed. However
due to the limitation of the implementation of high limit enforcement,
we observed the mechanism fails for some real workloads.
The high limit is enforced on return-to-userspace i.e. the kernel let
the usage goes over the limit and when the execution returns to
userspace, the high reclaim is triggered and the process can get
throttled as well. However this mechanism fails for workloads which do
large allocations in a single kernel entry e.g. applications that
mlock() a large chunk of memory in a single syscall. Such applications
bypass the high limit and can trigger the oom-killer.
To make high limit enforcement more robust, this patch makes the limit
enforcement synchronous only if the accumulated overcharge becomes
larger than MEMCG_CHARGE_BATCH. So, most of the allocations would still
be throttled on the return-to-userspace path but only the extreme
allocations which accumulates large amount of overcharge without
returning to the userspace will be throttled synchronously. The value
MEMCG_CHARGE_BATCH is a bit arbitrary but most of other places in the
memcg codebase uses this constant therefore for now uses the same one.
Link: https://lkml.kernel.org/r/20220211064917.2028469-5-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/memcontrol.c')
-rw-r--r-- | mm/memcontrol.c | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0e8a58d6e374..17398e7601f6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2704,6 +2704,11 @@ done_restock: } } while ((memcg = parent_mem_cgroup(memcg))); + if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH && + !(current->flags & PF_MEMALLOC) && + gfpflags_allow_blocking(gfp_mask)) { + mem_cgroup_handle_over_high(); + } return 0; } |