mm: fix race between mremap and removing migration entry

I don't usually pay much attention to the stale "? " addresses in stack backtraces, but this lucky report from Pawel Sikora hints that mremap's move_ptes() has inadequate locking against page migration. 3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page(): kernel BUG at include/linux/swapops.h:105! RIP: 0010:[<ffffffff81127b76>] [<ffffffff81127b76>] migration_entry_wait+0x156/0x160 [<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0 [<ffffffff810feee2>] ? __pte_alloc+0x42/0x120 [<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310 [<ffffffff81102a31>] handle_mm_fault+0x181/0x310 [<ffffffff81106097>] ? vma_adjust+0x537/0x570 [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0 [<ffffffff81109a05>] ? do_mremap+0x2d5/0x570 [<ffffffff81421d5f>] page_fault+0x1f/0x30 mremap's down_write of mmap_sem, together with i_mmap_mutex or lock, and pagetable locks, were good enough before page migration (with its requirement that every migration entry be found) came in, and enough while migration always held mmap_sem; but not enough nowadays, when there's memory hotremove and compaction. The danger is that move_ptes() lets a migration entry dodge around behind remove_migration_pte()'s back, so it's in the old location when looking at the new, then in the new location when looking at the old. Either mremap's move_ptes() must additionally take anon_vma lock(), or migration's remove_migration_pte() must stop peeking for is_swap_entry() before it takes pagetable lock. Consensus chooses the latter: we prefer to add overhead to migration than to mremapping, which gets used by JVMs and by exec stack setup. Reported-and-tested-by: Paweł Sikora <pluto@agmk.net> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Hugh Dickins <hughd@google.com> 2011-10-19 21:50:35 +0200
committer: Linus Torvalds <torvalds@linux-foundation.org> 2011-10-20 08:42:58 +0200
commit: 486cf46f3f9be5f2a966016c1a8fe01e32cde09e (patch)
tree: 98a6e2376507dee6ea89a9b0073511c703d940dc /mm
parent: Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus (diff)
download: linux-486cf46f3f9be5f2a966016c1a8fe01e32cde09e.tar.xz
linux-486cf46f3f9be5f2a966016c1a8fe01e32cde09e.zip
1 files changed, 4 insertions, 4 deletions
diff --git a/mm/migrate.c b/mm/migrate.c
index 666e4e677414..14d0a6a632f6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -120,10 +120,10 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
 
 		ptep = pte_offset_map(pmd, addr);
 
-		if (!is_swap_pte(*ptep)) {
-			pte_unmap(ptep);
-			goto out;
-		}
+		/*
+		 * Peek to check is_swap_pte() before taking ptlock?  No, we
+		 * can race mremap's move_ptes(), which skips anon_vma lock.
+		 */
 
 		ptl = pte_lockptr(mm, pmd);
 	}
author	Hugh Dickins <hughd@google.com>	2011-10-19 21:50:35 +0200
committer	Linus Torvalds <torvalds@linux-foundation.org>	2011-10-20 08:42:58 +0200
commit	486cf46f3f9be5f2a966016c1a8fe01e32cde09e (patch)
tree	98a6e2376507dee6ea89a9b0073511c703d940dc /mm
parent	Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus (diff)
download	linux-486cf46f3f9be5f2a966016c1a8fe01e32cde09e.tar.xz linux-486cf46f3f9be5f2a966016c1a8fe01e32cde09e.zip