summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'x86_mm_for_6.2_v2' of ↵Linus Torvalds2022-12-1755-395/+356
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: "New Feature: - Randomize the per-cpu entry areas Cleanups: - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it - Move to "native" set_memory_rox() helper - Clean up pmd_get_atomic() and i386-PAE - Remove some unused page table size macros" * tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) x86/mm: Ensure forced page table splitting x86/kasan: Populate shadow for shared chunk of the CPU entry area x86/kasan: Add helpers to align shadow addresses up and down x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area x86/mm: Recompute physical address for every page of per-CPU CEA mapping x86/mm: Rename __change_page_attr_set_clr(.checkalias) x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias() x86/mm: Untangle __change_page_attr_set_clr(.checkalias) x86/mm: Add a few comments x86/mm: Fix CR3_ADDR_MASK x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros mm: Convert __HAVE_ARCH_P..P_GET to the new style mm: Remove pointless barrier() after pmdp_get_lockless() x86/mm/pae: Get rid of set_64bit() x86_64: Remove pointless set_64bit() usage x86/mm/pae: Be consistent with pXXp_get_and_clear() x86/mm/pae: Use WRITE_ONCE() x86/mm/pae: Don't (ab)use atomic64 mm/gup: Fix the lockless PMD access ...
| * x86/mm: Ensure forced page table splittingDave Hansen2022-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few kernel users like kfence that require 4k pages to work correctly and do not support large mappings. They use set_memory_4k() to break down those large mappings. That, in turn relies on cpa_data->force_split option to indicate to set_memory code that it should split page tables regardless of whether the need to be. But, a recent change added an optimization which would return early if a set_memory request came in that did not change permissions. It did not consult ->force_split and would mistakenly optimize away the splitting that set_memory_4k() needs. This broke kfence. Skip the same-permission optimization when ->force_split is set. Fixes: 127960a05548 ("x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CA+G9fYuFxZTxkeS35VTZMXwQvohu73W3xbZ5NtjebsVvH6hCuA@mail.gmail.com/
| * x86/kasan: Populate shadow for shared chunk of the CPU entry areaSean Christopherson2022-12-151-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Popuplate the shadow for the shared portion of the CPU entry area, i.e. the read-only IDT mapping, during KASAN initialization. A recent change modified KASAN to map the per-CPU areas on-demand, but forgot to keep a shadow for the common area that is shared amongst all CPUs. Map the common area in KASAN init instead of letting idt_map_in_cea() do the dirty work so that it Just Works in the unlikely event more shared data is shoved into the CPU entry area. The bug manifests as a not-present #PF when software attempts to lookup an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs direct CALL to the IRQ handler to avoid the overhead of INTn): BUG: unable to handle page fault for address: fffffbc0000001d8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel] vcpu_run+0x1d89/0x2bd0 [kvm] kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm] kvm_vcpu_ioctl+0x349/0x900 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+8cdd16fd5a6c0565e227@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110203504.1985010-6-seanjc@google.com
| * x86/kasan: Add helpers to align shadow addresses up and downSean Christopherson2022-12-151-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | Add helpers to dedup code for aligning shadow address up/down to page boundaries when translating an address to its shadow. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-5-seanjc@google.com
| * x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten namesSean Christopherson2022-12-151-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Rename the CPU entry area variables in kasan_init() to shorten their names, a future fix will reference the beginning of the per-CPU portion of the CPU entry area, and shadow_cpu_entry_per_cpu_begin is a bit much. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-4-seanjc@google.com
| * x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry areaSean Christopherson2022-12-151-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Populate a KASAN shadow for the entire possible per-CPU range of the CPU entry area instead of requiring that each individual chunk map a shadow. Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping was left behind, which can lead to not-present page faults during KASAN validation if the kernel performs a software lookup into the GDT. The DS buffer is also likely affected. The motivation for mapping the per-CPU areas on-demand was to avoid mapping the entire 512GiB range that's reserved for the CPU entry area, shaving a few bytes by not creating shadows for potentially unused memory was not a goal. The bug is most easily reproduced by doing a sigreturn with a garbage CS in the sigcontext, e.g. int main(void) { struct sigcontext regs; syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); memset(&regs, 0, sizeof(regs)); regs.cs = 0x1d0; syscall(__NR_rt_sigreturn); return 0; } to coerce the kernel into doing a GDT lookup to compute CS.base when reading the instruction bytes on the subsequent #GP to determine whether or not the #GP is something the kernel should handle, e.g. to fixup UMIP violations or to emulate CLI/STI for IOPL=3 applications. BUG: unable to handle page fault for address: fffffbc8379ace00 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 Call Trace: <TASK> get_desc+0xb0/0x1d0 insn_get_seg_base+0x104/0x270 insn_fetch_from_user+0x66/0x80 fixup_umip_exception+0xb1/0x530 exc_general_protection+0x181/0x210 asm_exc_general_protection+0x22/0x30 RIP: 0003:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0003:0000000000000000 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+ffb4f000dc2872c93f62@syzkaller.appspotmail.com Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-3-seanjc@google.com
| * x86/mm: Recompute physical address for every page of per-CPU CEA mappingSean Christopherson2022-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Recompute the physical address for each per-CPU page in the CPU entry area, a recent commit inadvertantly modified cea_map_percpu_pages() such that every PTE is mapped to the physical address of the first page. Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-2-seanjc@google.com
| * x86/mm: Rename __change_page_attr_set_clr(.checkalias)Peter Zijlstra2022-12-151-4/+4
| | | | | | | | | | | | | | | | | | Now that the checkalias functionality is taken by CPA_NO_CHECK_ALIAS rename the argument to better match is remaining purpose: primary, matching __change_page_attr(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.661001508%40infradead.org
| * x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()Peter Zijlstra2022-12-151-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a cludge in change_page_attr_set_clr() that inhibits propagating NX changes to the aliases (directmap and highmap) -- this is a cludge twofold: - it also inhibits the primary checks in __change_page_attr(); - it hard depends on single bit changes. The introduction of set_memory_rox() triggered this last issue for clearing both _PAGE_RW and _PAGE_NX. Explicitly ignore _PAGE_NX in cpa_process_alias() instead. Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()") Reported-by: kernel test robot <oliver.sang@intel.com> Debugged-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org
| * x86/mm: Untangle __change_page_attr_set_clr(.checkalias)Peter Zijlstra2022-12-151-19/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .checkalias argument to __change_page_attr_set_clr() is overloaded and serves two different purposes: - it inhibits the call to cpa_process_alias() -- as suggested by the name; however, - it also serves as 'primary' indicator for __change_page_attr() ( which in turn also serves as a recursion terminator for cpa_process_alias() ). Untangle these by extending the use of CPA_NO_CHECK_ALIAS to all callsites that currently use .checkalias=0 for this purpose. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.527267183%40infradead.org
| * x86/mm: Add a few commentsPeter Zijlstra2022-12-151-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | It's a shame to hide useful comments in Changelogs, add some to the code. Shamelessly stolen from commit: c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.460677011%40infradead.org
| * x86/mm: Fix CR3_ADDR_MASKKirill A. Shutemov2022-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The mask must not include bits above physical address mask. These bits are reserved and can be used for other things. Bits 61 and 62 are used for Linear Address Masking. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20221109165140.9137-2-kirill.shutemov%40linux.intel.com
| * x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macrosPasha Tatashin2022-12-157-26/+20
| | | | | | | | | | | | | | | | | | | | | | | | Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE. Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only used in x86/*. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com
| * mm: Convert __HAVE_ARCH_P..P_GET to the new stylePeter Zijlstra2022-12-152-3/+3
| | | | | | | | | | | | | | | | | | Since __HAVE_ARCH_* style guards have been depricated in favour of defining the function name onto itself, convert pxxp_get(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y2EUEBlQXNgaJgoI@hirez.programming.kicks-ass.net
| * mm: Remove pointless barrier() after pmdp_get_lockless()Peter Zijlstra2022-12-152-4/+0
| | | | | | | | | | | | | | pmdp_get_lockless() should itself imply any ordering required. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.298833095%40infradead.org
| * x86/mm/pae: Get rid of set_64bit()Peter Zijlstra2022-12-152-39/+12
| | | | | | | | | | | | | | | | Recognise that set_64bit() is a special case of our previously introduced pxx_xchg64(), so use that and get rid of set_64bit(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.233481884%40infradead.org
| * x86_64: Remove pointless set_64bit() usagePeter Zijlstra2022-12-153-21/+5
| | | | | | | | | | | | | | | | | | The use of set_64bit() in X86_64 only code is pretty pointless, seeing how it's a direct assignment. Remove all this nonsense. [nathanchance: unbreak irte] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
| * x86/mm/pae: Be consistent with pXXp_get_and_clear()Peter Zijlstra2022-12-151-50/+17
| | | | | | | | | | | | | | | | | | | | | | Given that ptep_get_and_clear() uses cmpxchg8b, and that should be by far the most common case, there's no point in having an optimized variant for pmd/pud. Introduce the pxx_xchg64() helper to implement the common logic once. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.103392961%40infradead.org
| * x86/mm/pae: Use WRITE_ONCE()Peter Zijlstra2022-12-151-6/+6
| | | | | | | | | | | | | | Disallow write-tearing, that would be really unfortunate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.038102604%40infradead.org
| * x86/mm/pae: Don't (ab)use atomic64Peter Zijlstra2022-12-151-5/+4
| | | | | | | | | | | | | | PAE implies CX8, write readable code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.971450128%40infradead.org
| * mm/gup: Fix the lockless PMD accessPeter Zijlstra2022-12-152-2/+2
| | | | | | | | | | | | | | | | | | On architectures where the PTE/PMD is larger than the native word size (i386-PAE for example), READ_ONCE() can do the wrong thing. Use pmdp_get_lockless() just like we use ptep_get_lockless(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.906110403%40infradead.org
| * mm: Rename pmd_read_atomic()Peter Zijlstra2022-12-157-14/+9
| | | | | | | | | | | | | | | | There's no point in having the identical routines for PTE/PMD have different names. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.841277397%40infradead.org
| * mm: Rename GUP_GET_PTE_LOW_HIGHPeter Zijlstra2022-12-155-6/+6
| | | | | | | | | | | | | | | | Since it no longer applies to only PTEs, rename it to PXX. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.776404066%40infradead.org
| * mm: Fix pmd_read_atomic()Peter Zijlstra2022-12-152-66/+37
| | | | | | | | | | | | | | | | AFAICT there's no reason to do anything different than what we do for PTEs. Make it so (also affects SH). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.711181252%40infradead.org
| * sh/mm: Make pmd_t similar to pte_tPeter Zijlstra2022-12-151-2/+8
| | | | | | | | | | | | | | Just like 64bit pte_t, have a low/high split in pmd_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.645657294%40infradead.org
| * x86/mm/pae: Make pmd_t similar to pte_tPeter Zijlstra2022-12-154-31/+23
| | | | | | | | | | | | | | | | Instead of mucking about with at least 2 different ways of fudging it, do the same thing we do for pte_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.580310787%40infradead.org
| * mm: Update ptep_get_lockless()'s commentPeter Zijlstra2022-12-151-9/+6
| | | | | | | | | | | | | | | | Improve the comment. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.515572025%40infradead.org
| * x86/mm: Implement native set_memory_rox()Peter Zijlstra2022-12-153-0/+15
| | | | | | | | | | | | | | | | Provide a native implementation of set_memory_rox(), avoiding the double set_memory_ro();set_memory_x(); calls. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
| * mm: Introduce set_memory_rox()Peter Zijlstra2022-12-1512-42/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Because endlessly repeating: set_memory_ro() set_memory_x() is getting tedious. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
| * x86/mm: Do verify W^X at boot upPeter Zijlstra2022-12-151-4/+0
| | | | | | | | | | | | | | | | | | | | | | Straight up revert of commit: a970174d7a10 ("x86/mm: Do not verify W^X at boot up") now that the root cause has been fixed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201058.011279208@infradead.org
| * x86/ftrace: Remove SYSTEM_BOOTING exceptionsPeter Zijlstra2022-12-152-12/+1
| | | | | | | | | | | | | | | | | | | | Now that text_poke is available before ftrace, remove the SYSTEM_BOOTING exceptions. Specifically, this cures a W+X case during boot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
| * x86/mm: Initialize text poking earlierPeter Zijlstra2022-12-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | Move poking_init() up a bunch; specifically move it right after mm_init() which is right before ftrace_init(). This will allow simplifying ftrace text poking which currently has a bunch of exceptions for early boot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.881703081@infradead.org
| * x86/mm: Use mm_alloc() in poking_init()Peter Zijlstra2022-12-153-7/+1
| | | | | | | | | | | | | | | | | | | | Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
| * mm: Move mm_cachep initialization to mm_init()Peter Zijlstra2022-12-153-14/+20
| | | | | | | | | | | | | | | | In order to allow using mm_alloc() much earlier, move initializing mm_cachep into mm_init(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
| * x86/mm: Randomize per-cpu entry areaPeter Zijlstra2022-12-154-10/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seth found that the CPU-entry-area; the piece of per-cpu data that is mapped into the userspace page-tables for kPTI is not subject to any randomization -- irrespective of kASLR settings. On x86_64 a whole P4D (512 GB) of virtual address space is reserved for this structure, which is plenty large enough to randomize things a little. As such, use a straight forward randomization scheme that avoids duplicates to spread the existing CPUs over the available space. [ bp: Fix le build. ] Reported-by: Seth Jenkins <sethjenkins@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * x86/kasan: Map shadow for percpu pages on demandAndrey Ryabinin2022-12-153-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN maps shadow for the entire CPU-entry-area: [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE] This will explode once the per-cpu entry areas are randomized since it will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to allocate shadow for such big area. Fix this by allocating KASAN shadow only for really used cpu entry area addresses mapped by cea_map_percpu_pages() Thanks to the 0day folks for finding and reporting this to be an issue. [ dhansen: tweak changelog since this will get committed before peterz's actual cpu-entry-area randomization ] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Yujie Liu <yujie.liu@intel.com> Cc: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
* | Merge tag 'msi-fixes-6.2-1' of ↵Linus Torvalds2022-12-176-3/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms Pull MSI fixes from Marc Zyngier: "Thomas tasked me with sending out a few urgent fixes after the giant MSI rework that landed in 6.2, as both s390 and powerpc ended-up suffering from it (they do not use the full core code infrastructure, leading to these previously undetected issues): - Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when the architecture does not make use of irq domains instead of returning 0, which is pretty limiting. - Check for the presence of an irq domain when validating the MSI iterator, as s390/powerpc won't have one. - Fix powerpc's MSI backends which fail to clear the descriptor's IRQ field on teardown, leading to a splat and leaked descriptors" * tag 'msi-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms: powerpc/msi: Fix deassociation of MSI descriptors genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
| * | powerpc/msi: Fix deassociation of MSI descriptorsMarc Zyngier2022-12-175-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 2f2940d16823 ("genirq/msi: Remove filter from msi_free_descs_free_range()"), the core MSI code relies on the msi_desc->irq field to have been cleared before the descriptor can be freed, as it indicates that there is no association with a device anymore. The irq domain code provides this guarantee, and so does s390, which is one of the two architectures not using irq domains for MSIs. Powerpc, however, is missing this particular requirements, leading in a splat and leaked MSI descriptors. Adding the now required irq reset to the handful of powerpc backends that implement MSIs fixes that particular problem. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/70dab88e-6119-0c12-7c6a-61bcbe239f66@roeck-us.net
| * | genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no ↵Thomas Gleixner2022-12-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | domain is present On architectures such as s390 that do not use irq domains for MSI, returning 0 as the maximum MSI index is a bit counter-productive, as it indicates that no MSI can be allocated. Bad idea. Instead, return the maximum we're willing to support in the MSI backing store (MSI_XA_DOMAIN_SIZE), and let the arch code do its usual thing. Thanks to Matthew Rosato for fixing the fix. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [maz: commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87fsdgzpqs.ffs@tglx
| * | genirq/msi: Check for the presence of an irq domain when validating msi_ctrlMarc Zyngier2022-12-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For architectures such as s390 and powerpc that do not use irq domains for MSIs, dev->msi.domain is always NULL, so the per-device, per-bus MSI domain is also guaranteed to be NULL. So checking one without checking the other is bound to result in a splat, followed by a memory leak as we don't free the MSI descriptors. Add the missing check. Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/e570e70d-19bc-101b-0481-ff9a3cab3504@linux.ibm.com
* | | Merge tag 'hsi-for-6.2' of ↵Linus Torvalds2022-12-174-34/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI updates from Sebastian Reichel: - misc small fixes * tag 'hsi-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: omap_ssi_core: Fix error handling in ssi_init() headers: Remove some left-over license text in include/uapi/linux/hsi/ HSI: omap_ssi_core: fix possible memory leak in ssi_probe() HSI: omap_ssi_core: fix unbalanced pm_runtime_disable() HSI: ssi_protocol: Fix return type of ssip_pn_xmit()
| * | | HSI: omap_ssi_core: Fix error handling in ssi_init()Yuan Can2022-11-251-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ssi_init() returns the platform_driver_register() directly without checking its return value, if platform_driver_register() failed, the ssi_pdriver is not unregistered. Fix by unregister ssi_pdriver when the last platform_driver_register() failed. Fixes: 0fae198988b8 ("HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module") Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | headers: Remove some left-over license text in include/uapi/linux/hsi/Christophe JAILLET2022-11-172-28/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove some left-over from commit e2be04c7f995 ("License cleanup: add SPDX license identifier to uapi header files with a license") When the SPDX-License-Identifier tag has been added, the corresponding license text has not been removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Kai Vehmanen <kvcontact@nosignal.fi> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | HSI: omap_ssi_core: fix possible memory leak in ssi_probe()Yang Yingliang2022-11-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ssi_add_controller() returns error, it should call hsi_put_controller() to give up the reference that was set in hsi_alloc_controller(), so that it can call hsi_controller_release() to free controller and ports that allocated in hsi_alloc_controller(). Fixes: b209e047bc74 ("HSI: Introduce OMAP SSI driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | HSI: omap_ssi_core: fix unbalanced pm_runtime_disable()Yang Yingliang2022-11-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In error label 'out1' path in ssi_probe(), the pm_runtime_enable() has not been called yet, so pm_runtime_disable() is not needed. Fixes: b209e047bc74 ("HSI: Introduce OMAP SSI driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | HSI: ssi_protocol: Fix return type of ssip_pn_xmit()Nathan Chancellor2022-11-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. A proposed warning in clang aims to catch these at compile time, which reveals: drivers/hsi/clients/ssi_protocol.c:1053:20: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict] .ndo_start_xmit = ssip_pn_xmit, ^~~~~~~~~~~~ 1 error generated. ->ndo_start_xmit() in 'struct net_device_ops' expects a return type of 'netdev_tx_t', not 'int'. Adjust the return type of ssip_pn_xmit() to match the prototype's to resolve the warning and CFI failure. Additionally, use the enum 'NETDEV_TX_OK' instead of a raw '0' for the return value of ssip_pn_xmit(). Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
* | | | Merge tag 'for-v6.2' of ↵Linus Torvalds2022-12-1743-285/+372
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: - bq25890: add charge voltage/current support - bd99954: convert to linear range - convert all i2c drivers to use probe_new - misc fixes and cleanups * tag 'for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (51 commits) power: supply: fix null pointer dereferencing in power_supply_get_battery_info power: supply: bq25890: Fix usb-notifier probe and remove races power: supply: bq25890: Ensure pump_express_work is cancelled on remove power: supply: Fix refcount leak in rk817_charger_probe power: supply: bq25890: Only use pdata->regulator_init_data for vbus power: supply: ab8500: Fix error handling in ab8500_charger_init() power: supply: cw2015: Fix potential null-ptr-deref in cw_bat_probe() power: supply: z2_battery: Fix possible memleak in z2_batt_probe() power: supply: z2_battery: Convert to i2c's .probe_new() power: supply: ucs1002: Convert to i2c's .probe_new() power: supply: smb347: Convert to i2c's .probe_new() power: supply: sbs-manager: Convert to i2c's .probe_new() power: supply: sbs: Convert to i2c's .probe_new() power: supply: rt9455: Convert to i2c's .probe_new() power: supply: rt5033_battery: Convert to i2c's .probe_new() power: supply: max17042_battery: Convert to i2c's .probe_new() power: supply: max17040: Convert to i2c's .probe_new() power: supply: max14656: Convert to i2c's .probe_new() power: supply: ltc4162-l: Convert to i2c's .probe_new() power: supply: ltc2941: Convert to i2c's .probe_new() ...
| * | | | power: supply: fix null pointer dereferencing in power_supply_get_battery_inforuanjinjie2022-12-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when kmalloc() fail to allocate memory in kasprintf(), propname will be NULL, strcmp() called by of_get_property() will cause null pointer dereference. So return ENOMEM if kasprintf() return NULL pointer. Fixes: 3afb50d7125b ("power: supply: core: Add some helpers to use the battery OCV capacity table") Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | | power: supply: bq25890: Fix usb-notifier probe and remove racesHans de Goede2022-12-031-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 races surrounding the usb-notifier: 1. The notifier, and thus usb_work, may run before the bq->charger power_supply class device is registered. But usb_work may call power_supply_changed() which relies on the psy device being registered. 2. usb_work may be pending/running at remove() time, so it needs to be cancelled on remove after unregistering the usb-notifier. Fix 1. by moving usb-notifier registration to after the power_supply registration. Fix 2. by adding a cancel_work_sync() call directly after the usb-notifier unregistration. Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
| * | | | power: supply: bq25890: Ensure pump_express_work is cancelled on removeHans de Goede2022-12-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pump_express_work which gets queued from an external_power_changed callback might be pending / running on remove() (or on probe failure). Add a devm action cancelling the work, to ensure that it is cancelled. Note the devm action is added before devm_power_supply_register(), making it run after devm unregisters the power_supply, so that the work cannot be queued anymore (this is also why a devm action is used for this). Fixes: 48f45b094dbb ("power: supply: bq25890: Support higher charging voltages through Pump Express+ protocol") Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>