mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns

Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") fixed races between mremap and other operations for both file-backed and anonymous mappings. The file-backed was the most critical as it allowed the possibility that data could be changed on a physical page after page_mkclean returned which could trigger data loss or data integrity issues. A customer reported that the cost of the TLBs for anonymous regressions was excessive and resulting in a 30-50% drop in performance overall since this commit on a microbenchmark. Unfortunately I neither have access to the test-case nor can I describe what it does other than saying that mremap operations dominate heavily. This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD boundary instead of every 64 pages to reduce the number of TLB shootdowns by a factor of 8 in the ideal case. LATENCY_LIMIT was almost certainly used originally to limit the PTL hold times but the latency savings are likely offset by the cost of IPIs in many cases. This patch is not reported to completely restore performance but gets it within an acceptable percentage. The given metric here is simply described as "higher is better". Baseline that was known good 002: Metric: 91.05 004: Metric: 109.45 008: Metric: 73.08 016: Metric: 58.14 032: Metric: 61.09 064: Metric: 57.76 128: Metric: 55.43 Current 001: Metric: 54.98 002: Metric: 56.56 004: Metric: 41.22 008: Metric: 35.96 016: Metric: 36.45 032: Metric: 35.71 064: Metric: 35.73 128: Metric: 34.96 With patch 001: Metric: 61.43 002: Metric: 81.64 004: Metric: 67.92 008: Metric: 51.67 016: Metric: 50.47 032: Metric: 52.29 064: Metric: 50.01 128: Metric: 49.04 So for low threads, it's not restored but for larger number of threads, it's closer to the "known good" baseline. Using a different mremap-intensive workload that is not representative of the real workload there is little difference observed outside of noise in the headline metrics However, the TLB shootdowns are reduced by 11% on average and at the peak, TLB shootdowns were reduced by 21%. Interrupts were sampled every second while the workload ran to get those figures. It's known that the figures will vary as the non-representative load is non-deterministic. An alternative patch was posted that should have significantly reduced the TLB flushes but unfortunately it does not perform as well as this version on the customer test case. If revisited, the two patches can stack on top of each other. Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Mel Gorman <mgorman@techsingularity.net> 2018-06-15 00:26:41 +0200
committer: Linus Torvalds <torvalds@linux-foundation.org> 2018-06-15 00:55:24 +0200
commit: 37a4094e828f3c7673aa9c60f8b2b9d1019db81b (patch)
tree: f02fe812118fbf7bf8e560566ca6e83574a3c2a5 /mm
parent: mm/memblock: add missing include <linux/bootmem.h> (diff)
download: linux-37a4094e828f3c7673aa9c60f8b2b9d1019db81b.tar.xz
linux-37a4094e828f3c7673aa9c60f8b2b9d1019db81b.zip
1 files changed, 0 insertions, 4 deletions
diff --git a/mm/mremap.c b/mm/mremap.c
index 049470aa1e3e..5c2e18505f75 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -191,8 +191,6 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		drop_rmap_locks(vma);
 }
 
-#define LATENCY_LIMIT	(64 * PAGE_SIZE)
-
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -247,8 +245,6 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		next = (new_addr + PMD_SIZE) & PMD_MASK;
 		if (extent > next - new_addr)
 			extent = next - new_addr;
-		if (extent > LATENCY_LIMIT)
-			extent = LATENCY_LIMIT;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
 			  new_pmd, new_addr, need_rmap_locks, &need_flush);
 	}
author	Mel Gorman <mgorman@techsingularity.net>	2018-06-15 00:26:41 +0200
committer	Linus Torvalds <torvalds@linux-foundation.org>	2018-06-15 00:55:24 +0200
commit	37a4094e828f3c7673aa9c60f8b2b9d1019db81b (patch)
tree	f02fe812118fbf7bf8e560566ca6e83574a3c2a5 /mm
parent	mm/memblock: add missing include <linux/bootmem.h> (diff)
download	linux-37a4094e828f3c7673aa9c60f8b2b9d1019db81b.tar.xz linux-37a4094e828f3c7673aa9c60f8b2b9d1019db81b.zip