summaryrefslogtreecommitdiffstats
path: root/kernel/numa.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2024-04-08timekeeping: Consolidate timekeeping helpersAdrian Hunter1-6/+8
Consolidate timekeeping helpers, making use of timekeeping_cycles_to_ns() in preference to directly using timekeeping_delta_to_ns(). No functional change. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-15-adrian.hunter@intel.com
2024-04-08timekeeping: Refactor timekeeping helpersAdrian Hunter1-14/+6
Simplify the usage of timekeeping sanity checking, in preparation for consolidating timekeeping helpers. This works towards eliminating timekeeping_delta_to_ns() in favour of timekeeping_cycles_to_ns(). No functional change. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-14-adrian.hunter@intel.com
2024-04-08timekeeping: Reuse timekeeping_cycles_to_ns()Adrian Hunter1-4/+1
Simplify __timekeeping_get_ns() by reusing timekeeping_cycles_to_ns(). No functional change. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-13-adrian.hunter@intel.com
2024-04-08timekeeping: Tidy timekeeping_cycles_to_ns() slightlyAdrian Hunter1-3/+2
Put together declaration and initialization of the local variable 'delta'. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-12-adrian.hunter@intel.com
2024-04-08timekeeping: Rename fast_tk_get_delta_ns() to __timekeeping_get_ns()Adrian Hunter1-3/+3
Rename fast_tk_get_delta_ns() to __timekeeping_get_ns() to prepare for its reuse as a general timekeeping helper function. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-11-adrian.hunter@intel.com
2024-04-08timekeeping: Move timekeeping helper functionsAdrian Hunter1-13/+13
Move timekeeping helper functions to prepare for their reuse. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-10-adrian.hunter@intel.com
2024-04-08x86/vdso: Make delta calculation overflow safeAdrian Hunter2-9/+23
Kernel timekeeping is designed to keep the change in cycles (since the last timer interrupt) below max_cycles, which prevents multiplication overflow when converting cycles to nanoseconds. However, if timer interrupts stop, the calculation will eventually overflow. Add protection against that. Select GENERIC_VDSO_OVERFLOW_PROTECT so that max_cycles is made available in the VDSO data page. Check against max_cycles, falling back to a slower higher precision calculation. Take advantage of the opportunity to move masking and negative motion check into the slow path. The result is a calculation that has similar performance as before. Newer machines showed performance benefit, whereas older Skylake-based hardware such as Intel Kaby Lake was seen <1% worse. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-9-adrian.hunter@intel.com
2024-04-08vdso: Make delta calculation overflow safeAdrian Hunter1-1/+16
Kernel timekeeping is designed to keep the change in cycles (since the last timer interrupt) below max_cycles, which prevents multiplication overflow when converting cycles to nanoseconds. However, if timer interrupts stop, the calculation will eventually overflow. Add protection against that, enabled by config option CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT. Check against max_cycles, falling back to a slower higher precision calculation. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-8-adrian.hunter@intel.com
2024-04-08vdso: Add vdso_data:: Max_cyclesAdrian Hunter2-0/+10
Add vdso_data::max_cycles in preparation to use it to detect potential multiplication overflow. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-7-adrian.hunter@intel.com
2024-04-08vdso, math64: Provide mul_u64_u32_add_u64_shr()Adrian Hunter2-1/+39
Provide mul_u64_u32_add_u64_shr() which is a calculation that will be used by timekeeping and VDSO. Place #include <vdso/math64.h> after #include <asm/div64.h> to allow architecture-specific overrides, at least for the kernel. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-6-adrian.hunter@intel.com
2024-04-08math64: Tidy up mul_u64_u32_shr()Adrian Hunter1-5/+1
Put together declaration and initialization of local variables. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-5-adrian.hunter@intel.com
2024-04-08vdso: Add CONFIG_GENERIC_VDSO_OVERFLOW_PROTECTAdrian Hunter1-0/+7
Add CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT in preparation to add multiplication overflow protection to the VDSO time getter functions. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-4-adrian.hunter@intel.com
2024-04-08vdso: Consolidate nanoseconds calculationAdrian Hunter2-33/+27
Consolidate nanoseconds calculation to simplify and reduce code duplication. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-3-adrian.hunter@intel.com
2024-04-08vdso: Consolidate vdso_calc_delta()Adrian Hunter3-21/+21
Consolidate vdso_calc_delta(), in preparation for further simplification. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-2-adrian.hunter@intel.com
2024-04-08hrtimer: Remove unused functionJiapeng Chong1-19/+1
The function is defined, but not called anywhere: kernel/time/hrtimer.c:1880:20: warning: unused function '__hrtimer_peek_ahead_timers'. Remove it. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322070441.29646-1-jiapeng.chong@linux.alibaba.com Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8611
2024-04-07Linux 6.9-rc3v6.9-rc3Linus Torvalds1-1/+1
2024-04-06x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunkBorislav Petkov (AMD)1-0/+1
srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat: vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0 Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
2024-04-06firewire: ohci: mask bus reset interrupts between ISR and bottom halfAdam Goldman1-1/+5
In the FireWire OHCI interrupt handler, if a bus reset interrupt has occurred, mask bus reset interrupts until bus_reset_work has serviced and cleared the interrupt. Normally, we always leave bus reset interrupts masked. We infer the bus reset from the self-ID interrupt that happens shortly thereafter. A scenario where we unmask bus reset interrupts was introduced in 2008 in a007bb857e0b26f5d8b73c2ff90782d9c0972620: If OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we will unmask bus reset interrupts so we can log them. irq_handler logs the bus reset interrupt. However, we can't clear the bus reset event flag in irq_handler, because we won't service the event until later. irq_handler exits with the event flag still set. If the corresponding interrupt is still unmasked, the first bus reset will usually freeze the system due to irq_handler being called again each time it exits. This freeze can be reproduced by loading firewire_ohci with "modprobe firewire_ohci debug=-1" (to enable all debugging output). Apparently there are also some cases where bus_reset_work will get called soon enough to clear the event, and operation will continue normally. This freeze was first reported a few months after a007bb85 was committed, but until now it was never fixed. The debug level could safely be set to -1 through sysfs after the module was loaded, but this would be ineffectual in logging bus reset interrupts since they were only unmasked during initialization. irq_handler will now leave the event flag set but mask bus reset interrupts, so irq_handler won't be called again and there will be no freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will unmask the interrupt after servicing the event, so future interrupts will be caught as desired. As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be enabled through sysfs in addition to during initial module loading. However, when enabled through sysfs, logging of bus reset interrupts will be effective only starting with the second bus reset, after bus_reset_work has executed. Signed-off-by: Adam Goldman <adamg@pobox.com> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-04-05stackdepot: rename pool_index to pool_index_plus_1Peter Collingbourne2-6/+5
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") changed the meaning of the pool_index field to mean "the pool index plus 1". This made the code accessing this field less self-documenting, as well as causing debuggers such as drgn to not be able to easily remain compatible with both old and new kernels, because they typically do that by testing for presence of the new field. Because stackdepot is a debugging tool, we should make sure that it is debugger friendly. Therefore, give the field a different name to improve readability as well as enabling debugger backwards compatibility. This is needed in 6.9, which would otherwise become an odd release with the new semantics and old name so debuggers wouldn't recognize the new semantics there. Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88 Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05x86/mm/pat: fix VM_PAT handling in COW mappingsDavid Hildenbrand2-14/+39
PAT handling won't do the right thing in COW mappings: the first PTE (or, in fact, all PTEs) can be replaced during write faults to point at anon folios. Reliably recovering the correct PFN and cachemode using follow_phys() from PTEs will not work in COW mappings. Using follow_phys(), we might just get the address+protection of the anon folio (which is very wrong), or fail on swap/nonswap entries, failing follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and track_pfn_copy(), not properly calling free_pfn_range(). In free_pfn_range(), we either wouldn't call memtype_free() or would call it with the wrong range, possibly leaking memory. To fix that, let's update follow_phys() to refuse returning anon folios, and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings if we run into that. We will now properly handle untrack_pfn() with COW mappings, where we don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if the first page was replaced by an anon folio, though: we'd have to store the cachemode in the VMA to make this work, likely growing the VMA size. For now, lets keep it simple and let track_pfn_copy() just fail in that case: it would have failed in the past with swap/nonswap entries already, and it would have done the wrong thing with anon folios. Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn(): <--- C reproducer ---> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <liburing.h> int main(void) { struct io_uring_params p = {}; int ring_fd; size_t size; char *map; ring_fd = io_uring_setup(1, &p); if (ring_fd < 0) { perror("io_uring_setup"); return 1; } size = p.sq_off.array + p.sq_entries * sizeof(unsigned); /* Map the submission queue ring MAP_PRIVATE */ map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, ring_fd, IORING_OFF_SQ_RING); if (map == MAP_FAILED) { perror("mmap"); return 1; } /* We have at least one page. Let's COW it. */ *map = 0; pause(); return 0; } <--- C reproducer ---> On a system with 16 GiB RAM and swap configured: # ./iouring & # memhog 16G # killall iouring [ 301.552930] ------------[ cut here ]------------ [ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100 [ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g [ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1 [ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4 [ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100 [ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000 [ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282 [ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047 [ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200 [ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000 [ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000 [ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000 [ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000 [ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0 [ 301.565725] PKRU: 55555554 [ 301.565944] Call Trace: [ 301.566148] <TASK> [ 301.566325] ? untrack_pfn+0xf4/0x100 [ 301.566618] ? __warn+0x81/0x130 [ 301.566876] ? untrack_pfn+0xf4/0x100 [ 301.567163] ? report_bug+0x171/0x1a0 [ 301.567466] ? handle_bug+0x3c/0x80 [ 301.567743] ? exc_invalid_op+0x17/0x70 [ 301.568038] ? asm_exc_invalid_op+0x1a/0x20 [ 301.568363] ? untrack_pfn+0xf4/0x100 [ 301.568660] ? untrack_pfn+0x65/0x100 [ 301.568947] unmap_single_vma+0xa6/0xe0 [ 301.569247] unmap_vmas+0xb5/0x190 [ 301.569532] exit_mmap+0xec/0x340 [ 301.569801] __mmput+0x3e/0x130 [ 301.570051] do_exit+0x305/0xaf0 ... Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Wupeng Ma <mawupeng1@huawei.com> Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines") Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3") Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05MAINTAINERS: change vmware.com addresses to broadcom.comAlexey Makhalov2-23/+28
Update all remaining vmware.com email addresses to actual broadcom.com. Add corresponding .mailmap entries for maintainers who contributed in the past as the vmware.com address will start bouncing soon. Maintainership update. Jeff Sipek has left VMware, Nick Shi will be maintaining VMware PTP. Link: https://lkml.kernel.org/r/20240402232334.33167-1-alexey.makhalov@broadcom.com Signed-off-by: Alexey Makhalov <alexey.makhalov@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Ajay Kaher <ajay.kaher@broadcom.com> Acked-by: Ronak Doshi <ronak.doshi@broadcom.com> Acked-by: Nick Shi <nick.shi@broadcom.com> Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com> Acked-by: Vishnu Dasa <vishnu.dasa@broadcom.com> Acked-by: Vishal Bhakta <vishal.bhakta@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05selftests/mm: include strings.h for ffslEdward Liaw1-1/+1
Got a compilation error on Android for ffsl after 91b80cc5b39f ("selftests: mm: fix map_hugetlb failure on 64K page size systems") included vm_util.h. Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com Fixes: af605d26a8f2 ("selftests/mm: merge util.h into vm_util.h") Signed-off-by: Edward Liaw <edliaw@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05mm: vmalloc: fix lockdep warningUladzislau Rezki (Sony)1-30/+43
A lockdep reports a possible deadlock in the find_vmap_area_exceed_addr_lock() function: ============================================ WARNING: possible recursive locking detected 6.9.0-rc1-00060-ged3ccc57b108-dirty #6140 Not tainted -------------------------------------------- drgn/455 is trying to acquire lock: ffff0000c00131d0 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124 but task is already holding lock: ffff0000c0011878 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&vn->busy.lock/1); lock(&vn->busy.lock/1); *** DEADLOCK *** indeed it can happen if the find_vmap_area_exceed_addr_lock() gets called concurrently because it tries to acquire two nodes locks. It was done to prevent removing a lowest VA found on a previous step. To address this a lowest VA is found first without holding a node lock where it resides. As a last step we check if a VA still there because it can go away, if removed, proceed with next lowest. [akpm@linux-foundation.org: fix comment typos, per Baoquan] Link: https://lkml.kernel.org/r/20240328140330.4747-1-urezki@gmail.com Fixes: 53becf32aec1 ("mm: vmalloc: support multiple nodes in vread_iter") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Omar Sandoval <osandov@fb.com> Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05mm: vmalloc: bail out early in find_vmap_area() if vmap is not initUladzislau Rezki (Sony)1-0/+3
During the boot the s390 system triggers "spinlock bad magic" messages if the spinlock debugging is enabled: [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0 [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1 [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux) [ 0.466270] Call Trace: [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8 [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108 [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108 [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40 [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150 [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118 [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68 [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708 [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8 [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40 it happens because such system tries to access some vmap areas whereas the vmalloc initialization is not even yet done: [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1 [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux) [ 0.466270] Call Trace: [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117) [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115) [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364) [ 0.466572] find_vm_area (mm/vmalloc.c:3150) [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393) [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660) [ 0.466651] paging_init (arch/s390/mm/init.c:97) [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972) [ 0.466702] start_kernel (init/main.c:899) [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35) [ 0.466811] INFO: lockdep is turned off. ... [ 0.718250] vmalloc init - busy lock init 0000000002871860 [ 0.718328] vmalloc init - busy lock init 00000000028731b8 Some background. It worked before because the lock that is in question was statically defined and initialized. As of now, the locks and data structures are initialized in the vmalloc_init() function. To address that issue add the check whether the "vmap_initialized" variable is set, if not find_vmap_area() bails out on entry returning NULL. Link: https://lkml.kernel.org/r/20240323141544.4150-1-urezki@gmail.com Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05init: open output files from cpio unpacking with O_LARGEFILEJohn Sperbeck1-1/+1
If a member of a cpio archive for an initrd or initrams is larger than 2Gb, we'll eventually fail to write to that file when we get to that limit, unless O_LARGEFILE is set. The problem can be seen with this recipe, assuming that BLK_DEV_RAM is not configured: cd /tmp dd if=/dev/zero of=BIGFILE bs=1048576 count=2200 echo BIGFILE | cpio -o -H newc -R root:root > initrd.img kexec -l /boot/vmlinuz-$(uname -r) --initrd=initrd.img --reuse-cmdline kexec -e The console will show 'Initramfs unpacking failed: write error'. With the patch, the error is gone. Link: https://lkml.kernel.org/r/20240323152934.3307391-1-jsperbeck@google.com Signed-off-by: John Sperbeck <jsperbeck@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05mm/secretmem: fix GUP-fast succeeding on secretmem foliosDavid Hildenbrand1-2/+2
folio_is_secretmem() currently relies on secretmem folios being LRU folios, to save some cycles. However, folios might reside in a folio batch without the LRU flag set, or temporarily have their LRU flag cleared. Consequently, the LRU flag is unreliable for this purpose. In particular, this is the case when secretmem_fault() allocates a fresh page and calls filemap_add_folio()->folio_add_lru(). The folio might be added to the per-cpu folio batch and won't get the LRU flag set until the batch was drained using e.g., lru_add_drain(). Consequently, folio_is_secretmem() might not detect secretmem folios and GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel when we would later try reading/writing to the folio, because the folio has been unmapped from the directmap. Fix it by removing that unreliable check. Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/ Debugged-by: Miklos Szeredi <miklos@szeredi.hu> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-05nfsd: hold a lighter-weight client reference over CB_RECALL_ANYJeff Layton1-5/+2
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-05aio: Fix null ptr deref in aio_complete() wakeupKent Overstreet1-1/+1
list_del_init_careful() needs to be the last access to the wait queue entry - it effectively unlocks access. Previously, finish_wait() would see the empty list head and skip taking the lock, and then we'd return - but the completion path would still attempt to do the wakeup after the task_struct pointer had been overwritten. Fixes: 71eb6b6b0ba9 ("fs/aio: obey min_nr when doing wakeups") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev Link: https://lore.kernel.org/r/20240331215212.522544-1-kent.overstreet@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-05timers/migration: Return early on deactivationAnna-Maria Behnsen1-0/+27
Commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation") removed the logic to return early in tmigr_update_events() on deactivation. With this the problem with a not properly updated first global event in a hierarchy containing only a single group was fixed. But when having a look at this code path with a hierarchy with more than a single level, now unnecessary work is done (example is partially copied from the message of the commit mentioned above): [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = 0 migrator = NONE active = 0 active = NONE nextevt = T0i, T1 nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 active idle idle idle 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are upper levels' events. CPU 1 is idle and has the timer T1 enqueued. CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2 [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T1 nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is pushed as its next expiry and its own event kept as "ignore". Without this early return the following steps happen in tmigr_update_events() when child = null and group = GRP0:0 : lock(GRP0:0->lock); timerqueue_del(GRP0:0, T0i); unlock(GRP0:0->lock); [GRP1:0] migrator = NONE active = NONE nextevt = T0:0, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T1 nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 2) The change now propagates up to the top. Then tmigr_update_events() updates the group event of GRP0:0 and executes the following steps (child = GRP0:0 and group = GRP0:0): lock(GRP0:0->lock); lock(GRP1:0->lock); evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events in GRP0:0 ... update GRP1:0 group event and timerqueue ... unlock(GRP1:0->lock); unlock(GRP0:0->lock); So the dance in 1) with locking the GRP0:0->lock and removing the T0i from the timerqueue is redundand as this is done nevertheless in 2) when tmigr_next_groupevt(GRP0:0) is executed. Revert commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation") and add a condition into return path to skip the return only, when hierarchy contains a single group. Adapt comments accordingly. Fixes: 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation") Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87cyr49on2.fsf@somnus
2024-04-05timers/migration: Fix ignored event due to missing CPU updateFrederic Weisbecker1-1/+4
When a group event is updated with its expiry unchanged but a different CPU, that target change may go unnoticed and the event may be propagated up with a stale CPU value. The following depicts a scenario that has been actually observed: [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = TGRP1:0 (T0) / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T0 / \ 0 (T0) 1 (T1) idle idle 0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle, including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have the same expiry value. [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = KTIME_MAX / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T0 / \ 0 (T0) 1 (T1) idle idle 1) The migrator in GRP1:1 handles remotely T0. The event is dequeued from the top and T0 executed. [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = KTIME_MAX / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T1 / \ 0 1 (T1) idle idle 2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds none. But it updates the events from its groups, starting with GRP0:0 which now has T1 as its next event. So far so good. [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = KTIME_MAX / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T1 / \ 0 1 (T1) idle idle 3) The migrator in GRP1:1 proceeds upward and updates the events in GRP1:0. The child event TGRP0:0 is found queued with the same expiry as before. And therefore it is left unchanged. However the target CPU is not the same but that fact is ignored so TGRP0:0 still points to CPU 0 when it should point to CPU 1. [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = TGRP1:0 (T0) / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T1 / \ 0 1 (T1) idle idle 4) The propagation has reached the top level and TGRP1:0, having TGRP0:0 as its first event, also wrongly points to CPU 0. TGRP1:0 is added to the top level group. [GRP2:0] migrator = GRP1:1 active = GRP1:1 nextevt = KTIME_MAX / \ [GRP1:0] [GRP1:1] migrator = NONE [...] active = NONE nextevt = TGRP0:0 (T0) / \ [GRP0:0] [...] migrator = NONE active = NONE nextevt = T1 / \ 0 1 (T1) idle idle 5) The migrator in GRP1:1 dequeues the next event in top level pointing to CPU 0. But since it actually doesn't see any real event in CPU 0, it early returns. 6) T1 is left unhandled until either CPU 0 or CPU 1 wake up. Some other bad scenario may involve trees with just two levels. Fix this with unconditionally updating the CPU of the child event before considering to early return while updating a queued event with an unchanged expiry value. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/Zg2Ct6M2RJAYHgCB@localhost.localdomain
2024-04-05x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined wordSean Christopherson2-0/+4
Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate compile-time assert in KVM to prevent direct lookups on the features in CPUID_LNX_5. KVM uses X86_FEATURE_* flags to manage guest CPUID, and so must translate features that are scattered by Linux from the Linux-defined bit to the hardware-defined bit, i.e. should never try to directly access scattered features in guest CPUID. Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a compile-time assert in KVM's CPUID infrastructure to ensure that future additions update cpuid_leafs along with NCAPINTS. No functional change intended. Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features") Cc: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-04riscv: process: Fix kernel gp leakageStefan O'Rear1-3/+0
childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means. [From the email thread] The /* Kernel thread */ comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. childregs is the *user* context during syscall execution and it is observable from userspace in at least five ways: 1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment. This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch. 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries. 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns. 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code. 5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Cc: stable@vger.kernel.org Signed-off-by: Stefan O'Rear <sorear@fastmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: Disable preemption when using patch_map()Alexandre Ghiti1-0/+8
patch_map() uses fixmap mappings to circumvent the non-writability of the kernel text mapping. The __set_fixmap() function only flushes the current cpu tlb, it does not emit an IPI so we must make sure that while we use a fixmap mapping, the current task is not migrated on another cpu which could miss the newly introduced fixmap mapping. So in order to avoid any task migration, disable the preemption. Reported-by: Andrea Parri <andrea@rivosinc.com> Closes: https://lore.kernel.org/all/ZcS+GAaM25LXsBOl@andrea/ Reported-by: Andy Chiu <andy.chiu@sifive.com> Closes: https://lore.kernel.org/linux-riscv/CABgGipUMz3Sffu-CkmeUB1dKVwVQ73+7=sgC45-m0AE9RCjOZg@mail.gmail.com/ Fixes: cad539baa48f ("riscv: implement a memset like function for text") Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") Co-developed-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240326203017.310422-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: Fix warning by declaring arch_cpu_idle() as noinstrAlexandre Ghiti1-1/+1
The following warning appears when using ftrace: [89855.443413] RCU not on for: arch_cpu_idle+0x0/0x1c [89855.445640] WARNING: CPU: 5 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x208/0x228 [89855.445824] Modules linked in: xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) cfg80211(E) nls_iso8859_1(E) ofpart(E) redboot(E) cmdlinepart(E) cfi_cmdset_0001(E) virtio_net(E) cfi_probe(E) cfi_util(E) 9pnet_virtio(E) gen_probe(E) net_failover(E) virtio_rng(E) failover(E) 9pnet(E) physmap(E) map_funcs(E) chipreg(E) mtd(E) uio_pdrv_genirq(E) uio(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) drm(E) efi_pstore(E) backlight(E) ip_tables(E) x_tables(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) raid1(E) raid0(E) virtio_blk(E) [89855.451563] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G E 6.8.0-rc6ubuntu-defconfig #2 [89855.451726] Hardware name: riscv-virtio,qemu (DT) [89855.451899] epc : arch_ftrace_ops_list_func+0x208/0x228 [89855.452016] ra : arch_ftrace_ops_list_func+0x208/0x228 [89855.452119] epc : ffffffff8016b216 ra : ffffffff8016b216 sp : ffffaf808090fdb0 [89855.452171] gp : ffffffff827c7680 tp : ffffaf808089ad40 t0 : ffffffff800c0dd8 [89855.452216] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffffaf808090fe30 [89855.452306] s1 : 0000000000000000 a0 : 0000000000000026 a1 : ffffffff82cd6ac8 [89855.452423] a2 : ffffffff800458c8 a3 : ffffaf80b1870640 a4 : 0000000000000000 [89855.452646] a5 : 0000000000000000 a6 : 00000000ffffffff a7 : ffffffffffffffff [89855.452698] s2 : ffffffff82766872 s3 : ffffffff80004caa s4 : ffffffff80ebea90 [89855.452743] s5 : ffffaf808089bd40 s6 : 8000000a00006e00 s7 : 0000000000000008 [89855.452787] s8 : 0000000000002000 s9 : 0000000080043700 s10: 0000000000000000 [89855.452831] s11: 0000000000000000 t3 : 0000000000100000 t4 : 0000000000000064 [89855.452874] t5 : 000000000000000c t6 : ffffaf80b182dbfc [89855.452929] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [89855.453053] [<ffffffff8016b216>] arch_ftrace_ops_list_func+0x208/0x228 [89855.453191] [<ffffffff8000e082>] ftrace_call+0x8/0x22 [89855.453265] [<ffffffff800a149c>] do_idle+0x24c/0x2ca [89855.453357] [<ffffffff8000da54>] return_to_handler+0x0/0x26 [89855.453429] [<ffffffff8000b716>] smp_callin+0x92/0xb6 [89855.453785] ---[ end trace 0000000000000000 ]--- To fix this, mark arch_cpu_idle() as noinstr, like it is done in commit a9cbc1b471d2 ("s390/idle: mark arch_cpu_idle() noinstr"). Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Closes: https://lore.kernel.org/linux-riscv/51f21b87-ebed-4411-afbc-c00d3dea2bab@yadro.com/ Fixes: cfbc4f81c9d0 ("riscv: Select ARCH_WANTS_NO_INSTR") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240326203017.310422-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04riscv: use KERN_INFO in do_trapAndreas Schwab1-1/+1
Print the instruction dump with info instead of emergency level. The unhandled signal message is only for informational purpose. Fixes: b8a03a634129 ("riscv: add userland instruction dump to RISC-V splats") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/mvmy1aegrhm.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-04ASoC: SOF: Core: Add remove_late() to sof_init_environment failure pathChaitanya Kumar Borah1-6/+8
In cases where the sof driver is unable to find the firmware and/or topology file [1], it exits without releasing the i915 runtime pm wakeref [2]. This results in dmesg warnings[3] during suspend/resume or driver unbind. Add remove_late() to the failure path of sof_init_environment so that i915 wakeref is released appropriately [1] [ 8.990366] sof-audio-pci-intel-mtl 0000:00:1f.3: SOF firmware and/or topology file not found. [ 8.990396] sof-audio-pci-intel-mtl 0000:00:1f.3: Supported default profiles [ 8.990398] sof-audio-pci-intel-mtl 0000:00:1f.3: - ipc type 1 (Requested): [ 8.990399] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware file: intel/sof-ipc4/mtl/sof-mtl.ri [ 8.990401] sof-audio-pci-intel-mtl 0000:00:1f.3: Topology file: intel/sof-ace-tplg/sof-mtl-rt711-2ch.tplg [ 8.990402] sof-audio-pci-intel-mtl 0000:00:1f.3: Check if you have 'sof-firmware' package installed. [ 8.990403] sof-audio-pci-intel-mtl 0000:00:1f.3: Optionally it can be manually downloaded from: [ 8.990404] sof-audio-pci-intel-mtl 0000:00:1f.3: https://github.com/thesofproject/sof-bin/ [ 8.999088] sof-audio-pci-intel-mtl 0000:00:1f.3: error: sof_probe_work failed err: -2 [2] ref_tracker: 0000:00:02.0@ffff9b8511b6a378 has 1/5 users at track_intel_runtime_pm_wakeref.part.0+0x36/0x70 [i915] __intel_runtime_pm_get+0x51/0xb0 [i915] intel_runtime_pm_get+0x17/0x20 [i915] intel_display_power_get+0x2f/0x70 [i915] i915_audio_component_get_power+0x23/0x120 [i915] snd_hdac_display_power+0x89/0x130 [snd_hda_core] hda_codec_i915_init+0x3f/0x50 [snd_sof_intel_hda] hda_dsp_probe_early+0x170/0x250 [snd_sof_intel_hda_common] snd_sof_device_probe+0x224/0x320 [snd_sof] sof_pci_probe+0x15b/0x220 [snd_sof_pci] hda_pci_intel_probe+0x30/0x70 [snd_sof_intel_hda_common] local_pci_probe+0x4c/0xb0 pci_device_probe+0xcc/0x250 really_probe+0x18e/0x420 __driver_probe_device+0x7e/0x170 driver_probe_device+0x23/0xa0 [3] [ 484.105070] ------------[ cut here ]------------ [ 484.108238] thunderbolt 0000:00:0d.2: PM: pci_pm_suspend_late+0x0/0x50 returned 0 after 0 usecs [ 484.117106] i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup [ 484.792005] WARNING: CPU: 2 PID: 2405 at drivers/gpu/drm/i915/intel_runtime_pm.c:444 intel_runtime_pm_driver_release+0x6c/0x80 Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://github.com/thesofproject/linux/pull/4878 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://msgid.link/r/20240404184813.134566-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-04PM: EM: fix wrong utilization estimation in em_cpu_energy()Vincent Guittot1-1/+0
Commit 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") has added back map_util_perf() in em_cpu_energy() computation which has been removed with the rework of scheduler/cpufreq interface. This is wrong because sugov_effective_cpu_perf() already takes care of mapping the utilization to a performance level. Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-04gpio: cdev: fix missed label sanitizing in debounce_setup()Kent Gibson1-21/+28
When adding sanitization of the label, the path through edge_detector_setup() that leads to debounce_setup() was overlooked. A request taking this path does not allocate a new label and the request label is freed twice when the request is released, resulting in memory corruption. Add label sanitization to debounce_setup(). Cc: stable@vger.kernel.org Fixes: b34490879baa ("gpio: cdev: sanitize the label before requesting the interrupt") Signed-off-by: Kent Gibson <warthog618@gmail.com> [Bartosz: rebased on top of the fix for empty GPIO labels] Co-developed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-04-04netfilter: validate user input for expected lengthEric Dumazet4-0/+18
I got multiple syzbot reports showing old bugs exposed by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt") setsockopt() @optlen argument should be taken into account before copying data. BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238 CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101 do_sock_setsockopt+0x3af/0x720 net/socket.c:2311 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a RIP: 0033:0x7fd22067dde9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9 RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8 </TASK> Allocated by task 7238: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:4069 [inline] __kmalloc_noprof+0x200/0x410 mm/slub.c:4082 kmalloc_noprof include/linux/slab.h:664 [inline] __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869 do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a The buggy address belongs to the object at ffff88802cd73da0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes inside of allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73 flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff) page_type: 0xffffefff(slab) raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122 raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490 prep_new_page mm/page_alloc.c:1498 [inline] get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712 __alloc_pages_node_noprof include/linux/gfp.h:244 [inline] alloc_pages_node_noprof include/linux/gfp.h:271 [inline] alloc_slab_page+0x5f/0x120 mm/slub.c:2249 allocate_slab+0x5a/0x2e0 mm/slub.c:2412 new_slab mm/slub.c:2465 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615 __slab_alloc+0x58/0xa0 mm/slub.c:3705 __slab_alloc_node mm/slub.c:3758 [inline] slab_alloc_node mm/slub.c:3936 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089 kstrdup+0x3a/0x80 mm/util.c:62 device_rename+0xb5/0x1b0 drivers/base/core.c:4558 dev_change_name+0x275/0x860 net/core/dev.c:1232 do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864 __rtnl_newlink net/core/rtnetlink.c:3680 [inline] rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727 rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 page last free pid 5146 tgid 5146 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1110 [inline] free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617 discard_slab mm/slub.c:2511 [inline] __put_partials+0xeb/0x130 mm/slub.c:2980 put_cpu_partial+0x17c/0x250 mm/slub.c:3055 __slab_free+0x2ea/0x3d0 mm/slub.c:4254 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3888 [inline] slab_alloc_node mm/slub.c:3948 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076 kmalloc_node_noprof include/linux/slab.h:681 [inline] kvmalloc_node_noprof+0x72/0x190 mm/util.c:634 bucket_table_alloc lib/rhashtable.c:186 [inline] rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367 rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 Memory state around the buggy address: ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc ^ ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net/sched: act_skbmod: prevent kernel-infoleakEric Dumazet1-5/+5
syzbot found that tcf_skbmod_dump() was copying four bytes from kernel stack to user space [1]. The issue here is that 'struct tc_skbmod' has a four bytes hole. We need to clear the structure before filling fields. [1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 copy_to_iter include/linux/uio.h:196 [inline] simple_copy_to_iter net/core/datagram.c:532 [inline] __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline] netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x2c4/0x340 net/socket.c:1068 __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242 __do_sys_recvfrom net/socket.c:2260 [inline] __se_sys_recvfrom net/socket.c:2256 [inline] __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was stored to memory at: pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253 netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317 netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351 nlmsg_unicast include/net/netlink.h:1144 [inline] nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610 rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741 rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline] tcf_add_notify net/sched/act_api.c:2048 [inline] tcf_action_add net/sched/act_api.c:2071 [inline] tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119 rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was stored to memory at: __nla_put lib/nlattr.c:1041 [inline] nla_put+0x1c6/0x230 lib/nlattr.c:1099 tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256 tcf_action_dump_old net/sched/act_api.c:1191 [inline] tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227 tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251 tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628 tcf_add_notify_msg net/sched/act_api.c:2023 [inline] tcf_add_notify net/sched/act_api.c:2042 [inline] tcf_action_add net/sched/act_api.c:2071 [inline] tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119 rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Local variable opt created at: tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244 tcf_action_dump_old net/sched/act_api.c:1191 [inline] tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227 Bytes 188-191 of 248 are uninitialized Memory access of size 248 starts at ffff888117697680 Data copied to user address 00007ffe56d855f0 Fixes: 86da71b57383 ("net_sched: Introduce skbmod action") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: usb: ax88179_178a: avoid the interface always configured as random addressJose Ignacio Tornos Martinez1-0/+2
After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets"), reset is not executed from bind operation and mac address is not read from the device registers or the devicetree at that moment. Since the check to configure if the assigned mac address is random or not for the interface, happens after the bind operation from usbnet_probe, the interface keeps configured as random address, although the address is correctly read and set during open operation (the only reset now). In order to keep only one reset for the device and to avoid the interface always configured as random address, after reset, configure correctly the suitable field from the driver, if the mac address is read successfully from the device registers or the devicetree. Take into account if a locally administered address (random) was previously stored. cc: stable@vger.kernel.org # 6.6+ Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets") Reported-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04nvme-fc: rename free_ctrl callback to match name patternDaniel Wagner1-2/+2
Rename nvme_fc_nvme_ctrl_freed to nvme_fc_free_ctrl to match the name pattern for the callback. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-04-04nvmet-fc: move RCU read lock to nvmet_fc_assoc_existsDaniel Wagner1-7/+10
The RCU lock is only needed for the lookup loop and not for list_ad_tail_rcu call. Thus move it down the call chain into nvmet_fc_assoc_exists. While at it also fix the name typo of the function. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-04-04nvmet: implement unique discovery NQNHannes Reinecke2-0/+54
Unique discovery NQNs allow to differentiate between discovery services from (typically physically separate) NVMe-oF subsystems. This is required for establishing secured connections as otherwise the credentials won't be unique and the integrity of the connection cannot be guaranteed. This patch adds a configfs attribute 'discovery_nqn' in the 'nvmet' configfs directory to specify the unique discovery NQN. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-04-04nvme: don't create a multipath node for zero capacity devicesChristoph Hellwig1-1/+1
Apparently there are nvme controllers around that report namespaces in the namespace list which have zero capacity. Return -ENXIO instead of -ENODEV from nvme_update_ns_info_block so we don't create a hidden multipath node for these namespaces but entirely ignore them. Fixes: 46e7422cda84 ("nvme: move common logic into nvme_update_ns_info") Reported-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-04-04x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()Borislav Petkov (AMD)1-1/+3
Modifying a MCA bank's MCA_CTL bits which control which error types to be reported is done over /sys/devices/system/machinecheck/ ├── machinecheck0 │   ├── bank0 │   ├── bank1 │   ├── bank10 │   ├── bank11 ... sysfs nodes by writing the new bit mask of events to enable. When the write is accepted, the kernel deletes all current timers and reinits all banks. Doing that in parallel can lead to initializing a timer which is already armed and in the timer wheel, i.e., in use already: ODEBUG: init active (active state 0) object: ffff888063a28000 object type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642 WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514 debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514 Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code does. Reported by: Yue Sun <samsun1006219@gmail.com> Reported by: xingwei lee <xrivendell7@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com
2024-04-04gpio: cdev: check for NULL labels when sanitizing them for irqsBartosz Golaszewski1-5/+14
We need to take into account that a line's consumer label may be NULL and not try to kstrdup() it in that case but rather pass the NULL pointer up the stack to the interrupt request function. To that end: let make_irq_label() return NULL as a valid return value and use ERR_PTR() instead to signal an allocation failure to callers. Cc: stable@vger.kernel.org Fixes: b34490879baa ("gpio: cdev: sanitize the label before requesting the interrupt") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/lkml/20240402093534.212283-1-naresh.kamboju@linaro.org/ Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org>
2024-04-04SUNRPC: Fix a slow server-side memory leak with RPC-over-TCPChuck Lever1-9/+1
Jan Schunk reports that his small NFS servers suffer from memory exhaustion after just a few days. A bisect shows that commit e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") is the first bad commit. That commit assumed that sock_sendmsg() releases all the pages in the underlying bio_vec array, but the reality is that it doesn't. svc_xprt_release() releases the rqst's response pages, but the record marker page fragment isn't one of those, so it is never released. This is a narrow fix that can be applied to stable kernels. A more extensive fix is in the works. Reported-by: Jan Schunk <scpcom@gmx.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671 Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-04drm/xe: Use ordered wq for preempt fence waitingMatthew Brost3-2/+14
Preempt fences can sleep waiting for an exec queue suspend operation to complete. If the system_unbound_wq is used for waiting and the number of waiters exceeds max_active this will result in other users of the system_unbound_wq getting starved. Use a device private work queue for preempt fences to avoid starvation of the system_unbound_wq. Even though suspend operations can complete out-of-order, all suspend operations within a VM need to complete before the preempt rebind worker can start. With that, use a device private ordered wq for preempt fence waiting. v2: - Add comment about cleanup on failure (Matt R) - Update commit message (Lucas) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401221913.139672-2-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 37c15c4aae1fe3f67efd2641db8d8c25c2d524ab) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-04drm/xe: Move vma rebinding to the drm_exec locking loopThomas Hellström5-75/+83
Rebinding might allocate page-table bos, causing evictions. To support blocking locking during these evictions, perform the rebinding in the drm_exec locking loop. Also Reserve fence slots where actually needed rather than trying to predict how many fence slots will be needed over a complete wound-wait transaction. v2: - Remove a leftover call to xe_vm_rebind() (Matt Brost) - Add a helper function xe_vm_validate_rebind() (Matt Brost) v3: - Add comments and squash with previous patch (Matt Brost) Fixes: 24f947d58fe5 ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects") Fixes: 29f424eb8702 ("drm/xe/exec: move fence reservation") Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-5-thomas.hellstrom@linux.intel.com (cherry picked from commit 7ee7dd6f301341d5b1204fc19fa620d7f7f7e90d) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>