diff options
author | Ryan Roberts <ryan.roberts@arm.com> | 2024-04-12 15:19:08 +0200 |
---|---|---|
committer | Will Deacon <will@kernel.org> | 2024-04-12 17:45:05 +0200 |
commit | 0e9df1c905d8293d333ace86c13d147382f5caf9 (patch) | |
tree | 5aba9294b7fff5af242c040ecb733a410c44997e /arch/arm64/include/asm/pgtable.h | |
parent | arm64: mm: Batch dsb and isb when populating pgtables (diff) | |
download | linux-0e9df1c905d8293d333ace86c13d147382f5caf9.tar.xz linux-0e9df1c905d8293d333ace86c13d147382f5caf9.zip |
arm64: mm: Don't remap pgtables for allocate vs populate
During linear map pgtable creation, each pgtable is fixmapped /
fixunmapped twice; once during allocation to zero the memory, and a
again during population to write the entries. This means each table has
2 TLB invalidations issued against it. Let's fix this so that each table
is only fixmapped/fixunmapped once, halving the number of TLBIs, and
improving performance.
Achieve this by separating allocation and initialization (zeroing) of
the page. The allocated page is now fixmapped directly by the walker and
initialized, before being populated and finally fixunmapped.
This approach keeps the change small, but has the side effect that late
allocations (using __get_free_page()) must also go through the generic
memory clearing routine. So let's tell __get_free_page() not to zero the
memory to avoid duplication.
Additionally this approach means that fixmap/fixunmap is still used for
late pgtable modifications. That's not technically needed since the
memory is all mapped in the linear map by that point. That's left as a
possible future optimization if found to be needed.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 11 (0%) | 161 (0%) | 656 (0%) | 1654 (0%)
after | 10 (-11%) | 104 (-35%) | 438 (-33%) | 1223 (-26%)
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Tested-by: Eric Chanudet <echanude@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240412131908.433043-4-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'arch/arm64/include/asm/pgtable.h')
-rw-r--r-- | arch/arm64/include/asm/pgtable.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 105a95a8845c..92c9aed5e7af 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr) static inline bool pgtable_l5_enabled(void) { return false; } +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1)) + /* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */ #define p4d_set_fixmap(addr) NULL #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp) |