From ac39cf8cb86c45eeac6a592ce0d58f9021a97235 Mon Sep 17 00:00:00 2001 From: "akpm@linux-foundation.org" Date: Wed, 26 May 2010 14:42:46 -0700 Subject: memcg: fix mis-accounting of file mapped racy with migration FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: Daisuke Nishimura Tested-by: Daisuke Nishimura Reported-by: Daisuke Nishimura Signed-off-by: KAMEZAWA Hiroyuki Cc: Balbir Singh Cc: Christoph Lameter Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 6 ++++-- include/linux/page_cgroup.h | 5 +++++ 2 files changed, 9 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 05894795fdc1..9411d32840b0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -90,7 +90,8 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup) extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem); extern int -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr); +mem_cgroup_prepare_migration(struct page *page, + struct page *newpage, struct mem_cgroup **ptr); extern void mem_cgroup_end_migration(struct mem_cgroup *mem, struct page *oldpage, struct page *newpage); @@ -227,7 +228,8 @@ static inline struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem) } static inline int -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) +mem_cgroup_prepare_migration(struct page *page, struct page *newpage, + struct mem_cgroup **ptr) { return 0; } diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index aef22ae2af47..5bb13b3db84d 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -40,6 +40,7 @@ enum { PCG_USED, /* this object is in use. */ PCG_ACCT_LRU, /* page has been accounted for */ PCG_FILE_MAPPED, /* page is accounted as "mapped" */ + PCG_MIGRATION, /* under page migration */ }; #define TESTPCGFLAG(uname, lname) \ @@ -79,6 +80,10 @@ SETPCGFLAG(FileMapped, FILE_MAPPED) CLEARPCGFLAG(FileMapped, FILE_MAPPED) TESTPCGFLAG(FileMapped, FILE_MAPPED) +SETPCGFLAG(Migration, MIGRATION) +CLEARPCGFLAG(Migration, MIGRATION) +TESTPCGFLAG(Migration, MIGRATION) + static inline int page_cgroup_nid(struct page_cgroup *pc) { return page_to_nid(pc->page); -- cgit v1.2.3