summaryrefslogtreecommitdiffstats
path: root/arch/arm64/include/asm/pgtable.h
diff options
context:
space:
mode:
authorRyan Roberts <ryan.roberts@arm.com>2024-04-12 15:19:08 +0200
committerWill Deacon <will@kernel.org>2024-04-12 17:45:05 +0200
commit0e9df1c905d8293d333ace86c13d147382f5caf9 (patch)
tree5aba9294b7fff5af242c040ecb733a410c44997e /arch/arm64/include/asm/pgtable.h
parentarm64: mm: Batch dsb and isb when populating pgtables (diff)
downloadlinux-0e9df1c905d8293d333ace86c13d147382f5caf9.tar.xz
linux-0e9df1c905d8293d333ace86c13d147382f5caf9.zip
arm64: mm: Don't remap pgtables for allocate vs populate
During linear map pgtable creation, each pgtable is fixmapped / fixunmapped twice; once during allocation to zero the memory, and a again during population to write the entries. This means each table has 2 TLB invalidations issued against it. Let's fix this so that each table is only fixmapped/fixunmapped once, halving the number of TLBIs, and improving performance. Achieve this by separating allocation and initialization (zeroing) of the page. The allocated page is now fixmapped directly by the walker and initialized, before being populated and finally fixunmapped. This approach keeps the change small, but has the side effect that late allocations (using __get_free_page()) must also go through the generic memory clearing routine. So let's tell __get_free_page() not to zero the memory to avoid duplication. Additionally this approach means that fixmap/fixunmap is still used for late pgtable modifications. That's not technically needed since the memory is all mapped in the linear map by that point. That's left as a possible future optimization if found to be needed. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 11 (0%) | 161 (0%) | 656 (0%) | 1654 (0%) after | 10 (-11%) | 104 (-35%) | 438 (-33%) | 1223 (-26%) Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Tested-by: Eric Chanudet <echanude@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240412131908.433043-4-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'arch/arm64/include/asm/pgtable.h')
-rw-r--r--arch/arm64/include/asm/pgtable.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 105a95a8845c..92c9aed5e7af 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
static inline bool pgtable_l5_enabled(void) { return false; }
+#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
+
/* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
#define p4d_set_fixmap(addr) NULL
#define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp)