diff options
author | Nadav Amit <namit@vmware.com> | 2022-05-10 03:20:50 +0200 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2022-05-13 16:20:05 +0200 |
commit | 4f83145721f362c2f4d312edc4755269a2069488 (patch) | |
tree | 4a13cb87519b0bd3659d91f6f3a19df4575d4f96 /mm | |
parent | mm/mprotect: do not flush when not required architecturally (diff) | |
download | linux-4f83145721f362c2f4d312edc4755269a2069488.tar.xz linux-4f83145721f362c2f4d312edc4755269a2069488.zip |
mm: avoid unnecessary flush on change_huge_pmd()
Calls to change_protection_range() on THP can trigger, at least on x86,
two TLB flushes for one page: one immediately, when pmdp_invalidate() is
called by change_huge_pmd(), and then another one later (that can be
batched) when change_protection_range() finishes.
The first TLB flush is only necessary to prevent the dirty bit (and with a
lesser importance the access bit) from changing while the PTE is modified.
However, this is not necessary as the x86 CPUs set the dirty-bit
atomically with an additional check that the PTE is (still) present. One
caveat is Intel's Knights Landing that has a bug and does not do so.
Leverage this behavior to eliminate the unnecessary TLB flush in
change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad()
that only invalidates the access and dirty bit from further changes.
Link: https://lkml.kernel.org/r/20220401180821.1986781-4-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r-- | mm/huge_memory.c | 4 | ||||
-rw-r--r-- | mm/pgtable-generic.c | 8 |
2 files changed, 10 insertions, 2 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2befa9cfb46e..6f37f77eb48c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1801,10 +1801,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss + * pmdp_invalidate_ad() is required to make sure we don't miss * dirty/young flags set by hardware. */ - oldpmd = pmdp_invalidate(vma, addr, pmd); + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(oldpmd, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 6523fda274e5..90ab721a12a8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -201,6 +201,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) |