bpf: cpumap memory prefetchw optimizations for struct page

A lot of the performance gain comes from this patch. While analysing performance overhead it was found that the largest CPU stalls were caused when touching the struct page area. It is first read with a READ_ONCE from build_skb_around via page_is_pfmemalloc(), and when freed written by page_frag_free() call. Measurements show that the prefetchw (W) variant operation is needed to achieve the performance gain. We believe this optimization it two fold, first the W-variant saves one step in the cache-coherency protocol, and second it helps us to avoid the non-temporal prefetch HW optimizations and bring this into all cache-levels. It might be worth investigating if prefetch into L2 will have the same benefit. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
author: Jesper Dangaard Brouer <brouer@redhat.com> 2019-04-12 17:07:48 +0200
committer: Alexei Starovoitov <ast@kernel.org> 2019-04-18 04:09:25 +0200
commit: 86d231459d6dc9094e70c35c3517f4ef860b2f1e (patch)
tree: ac3d55b7b5731f7e19fd2e27b242135b3fd453d2 /kernel/bpf
parent: bpf: cpumap do bulk allocation of SKBs (diff)
download: linux-86d231459d6dc9094e70c35c3517f4ef860b2f1e.tar.xz
linux-86d231459d6dc9094e70c35c3517f4ef860b2f1e.zip
1 files changed, 12 insertions, 0 deletions
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 732d6ced3987..cf727d77c6c6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -280,6 +280,18 @@ static int cpu_map_kthread_run(void *data)
 		 * consume side valid as no-resize allowed of queue.
 		 */
 		n = ptr_ring_consume_batched(rcpu->queue, frames, CPUMAP_BATCH);
+
+		for (i = 0; i < n; i++) {
+			void *f = frames[i];
+			struct page *page = virt_to_page(f);
+
+			/* Bring struct page memory area to curr CPU. Read by
+			 * build_skb_around via page_is_pfmemalloc(), and when
+			 * freed written by page_frag_free call.
+			 */
+			prefetchw(page);
+		}
+
 		m = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, n, skbs);
 		if (unlikely(m == 0)) {
 			for (i = 0; i < n; i++)
author	Jesper Dangaard Brouer <brouer@redhat.com>	2019-04-12 17:07:48 +0200
committer	Alexei Starovoitov <ast@kernel.org>	2019-04-18 04:09:25 +0200
commit	86d231459d6dc9094e70c35c3517f4ef860b2f1e (patch)
tree	ac3d55b7b5731f7e19fd2e27b242135b3fd453d2 /kernel/bpf
parent	bpf: cpumap do bulk allocation of SKBs (diff)
download	linux-86d231459d6dc9094e70c35c3517f4ef860b2f1e.tar.xz linux-86d231459d6dc9094e70c35c3517f4ef860b2f1e.zip