slub: make dead caches discard free slabs immediately

To speed up further allocations SLUB may store empty slabs in per cpu/node partial lists instead of freeing them immediately. This prevents per memcg caches destruction, because kmem caches created for a memory cgroup are only destroyed after the last page charged to the cgroup is freed. To fix this issue, this patch resurrects approach first proposed in [1]. It forbids SLUB to cache empty slabs after the memory cgroup that the cache belongs to was destroyed. It is achieved by setting kmem_cache's cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so that it would drop frozen empty slabs immediately if cpu_partial = 0. The runtime overhead is minimal. From all the hot functions, we only touch relatively cold put_cpu_partial(): we make it call unfreeze_partials() after freezing a slab that belongs to an offline memory cgroup. Since slab freezing exists to avoid moving slabs from/to a partial list on free/alloc, and there can't be allocations from dead caches, it shouldn't cause any overhead. We do have to disable preemption for put_cpu_partial() to achieve that though. The original patch was accepted well and even merged to the mm tree. However, I decided to withdraw it due to changes happening to the memcg core at that time. I had an idea of introducing per-memcg shrinkers for kmem caches, but now, as memcg has finally settled down, I do not see it as an option, because SLUB shrinker would be too costly to call since SLUB does not keep free slabs on a separate list. Besides, we currently do not even call per-memcg shrinkers for offline memcgs. Overall, it would introduce much more complexity to both SLUB and memcg than this small patch. Regarding to SLAB, there's no problem with it, because it shrinks per-cpu/node caches periodically. Thanks to list_lru reparenting, we no longer keep entries for offline cgroups in per-memcg arrays (such as memcg_cache_params->memcg_caches), so we do not have to bother if a per-memcg cache will be shrunk a bit later than it could be. [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650 Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Vladimir Davydov <vdavydov@parallels.com> 2015-02-12 23:59:47 +0100
committer: Linus Torvalds <torvalds@linux-foundation.org> 2015-02-13 03:54:10 +0100
commit: d6e0b7fa11862433773d986b5f995ffdf47ce672 (patch)
tree: 031830bb978d8861c3089941480de8effe9ccc6a /mm/slab.c
parent: slub: fix kmem_cache_shrink return value (diff)
download: linux-d6e0b7fa11862433773d986b5f995ffdf47ce672.tar.xz
linux-d6e0b7fa11862433773d986b5f995ffdf47ce672.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/mm/slab.c b/mm/slab.c
index 7894017bc160..c4b89eaf4c96 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2382,7 +2382,7 @@ out:
 	return nr_freed;
 }
 
-int __kmem_cache_shrink(struct kmem_cache *cachep)
+int __kmem_cache_shrink(struct kmem_cache *cachep, bool deactivate)
 {
 	int ret = 0;
 	int node;
@@ -2404,7 +2404,7 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
 {
 	int i;
 	struct kmem_cache_node *n;
-	int rc = __kmem_cache_shrink(cachep);
+	int rc = __kmem_cache_shrink(cachep, false);
 
 	if (rc)
 		return rc;
author	Vladimir Davydov <vdavydov@parallels.com>	2015-02-12 23:59:47 +0100
committer	Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 03:54:10 +0100
commit	d6e0b7fa11862433773d986b5f995ffdf47ce672 (patch)
tree	031830bb978d8861c3089941480de8effe9ccc6a /mm/slab.c
parent	slub: fix kmem_cache_shrink return value (diff)
download	linux-d6e0b7fa11862433773d986b5f995ffdf47ce672.tar.xz linux-d6e0b7fa11862433773d986b5f995ffdf47ce672.zip