summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_contextDavid Howells2022-06-0940-261/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'fs_for_v5.19-rc2' of ↵Linus Torvalds2022-06-094-11/+40
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, writeback, and quota fixes and cleanups from Jan Kara: "A fix for race in writeback code and two cleanups in quota and ext2" * tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Prevent memory allocation recursion while holding dq_lock writeback: Fix inode->i_io_list not be protected by inode->i_lock error fs: Fix syntax errors in comments
| * quota: Prevent memory allocation recursion while holding dq_lockMatthew Wilcox (Oracle)2022-06-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | As described in commit 02117b8ae9c0 ("f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read"), we must not enter filesystem reclaim while holding the dq_lock. Prevent this more generally by using memalloc_nofs_save() while holding the lock. Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
| * writeback: Fix inode->i_io_list not be protected by inode->i_lock errorJchao Sun2022-06-062-10/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b35250c0816c ("writeback: Protect inode->i_io_list with inode->i_lock") made inode->i_io_list not only protected by wb->list_lock but also inode->i_lock, but inode_io_list_move_locked() was missed. Add lock there and also update comment describing things protected by inode->i_lock. This also fixes a race where __mark_inode_dirty() could move inode under flush worker's hands and thus sync(2) could miss writing some inodes. Fixes: b35250c0816c ("writeback: Protect inode->i_io_list with inode->i_lock") Link: https://lore.kernel.org/r/20220524150540.12552-1-sunjunchao2870@gmail.com CC: stable@vger.kernel.org Signed-off-by: Jchao Sun <sunjunchao2870@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * fs: Fix syntax errors in commentsXiang wangx2022-06-061-1/+1
| | | | | | | | | | | | | | | | Delete the redundant word 'not'. Link: https://lore.kernel.org/r/20220605125509.14837-1-wangxiang@cdjrlc.com Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | Merge tag 'powerpc-5.19-2' of ↵Linus Torvalds2022-06-0911-21/+38
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of <generated/compile.h> which is forbidden. Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, and Wanming Hu. * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix overread/overwrite of thread_struct via ptrace powerpc/book3e: get rid of #include <generated/compile.h> powerpc/kasan: Force thread size increase with KASAN powerpc/papr_scm: don't requests stats with '0' sized stats buffer powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Silence KASAN warnings in __get_wchan() powerpc/kasan: Mark more real-mode code as not to be instrumented
| * | powerpc/32: Fix overread/overwrite of thread_struct via ptraceMichael Ellerman2022-06-092-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas <ariel.miculas@belden.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au
| * | powerpc/book3e: get rid of #include <generated/compile.h>Masahiro Yamada2022-06-061-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | You cannot include <generated/compile.h> here because it is generated in init/Makefile but there is no guarantee that it happens before arch/powerpc/mm/nohash/kaslr_booke.c is compiled for parallel builds. The places where you can reliably include <generated/compile.h> are: - init/ (because init/Makefile can specify the dependency) - arch/*/boot/ (because it is compiled after vmlinux) Commit f231e4333312 ("hexagon: get rid of #include <generated/compile.h>") fixed the last breakage at that time, but powerpc re-added this. <generated/compile.h> was unneeded because 'build_str' is almost the same as 'linux_banner' defined in init/version.c Let's copy the solution from MIPS. (get_random_boot() in arch/mips/kernel/relocate.c) Fixes: 6a38ea1d7b94 ("powerpc/fsl_booke/32: randomize the kernel image offset") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220604085050.4078927-1-masahiroy@kernel.org
| * | powerpc/kasan: Force thread size increase with KASANMichael Ellerman2022-06-022-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN causes increased stack usage, which can lead to stack overflows. The logic in Kconfig to suggest a larger default doesn't work if a user has CONFIG_EXPERT enabled and has an existing .config with a smaller value. Follow the lead of x86 and arm64, and force the thread size to be increased when KASAN is enabled. That also has the effect of enlarging the stack for 64-bit KASAN builds, which is also desirable. Fixes: edbadaf06710 ("powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Use MIN_THREAD_SHIFT as suggested by Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220601143114.133524-1-mpe@ellerman.id.au
| * | powerpc/papr_scm: don't requests stats with '0' sized stats bufferVaibhav Jain2022-05-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sachin reported [1] that on a POWER-10 lpar he is seeing a kernel panic being reported with vPMEM when papr_scm probe is being called. The panic is of the form below and is observed only with following option disabled(profile) for the said LPAR 'Enable Performance Information Collection' in the HMC: Kernel attempted to write user page (1c) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on write at 0x0000001c Faulting instruction address: 0xc008000001b90844 Oops: Kernel access of bad area, sig: 11 [#1] <snip> NIP [c008000001b90844] drc_pmem_query_stats+0x5c/0x270 [papr_scm] LR [c008000001b92794] papr_scm_probe+0x2ac/0x6ec [papr_scm] Call Trace: 0xc00000000941bca0 (unreliable) papr_scm_probe+0x2ac/0x6ec [papr_scm] platform_probe+0x98/0x150 really_probe+0xfc/0x510 __driver_probe_device+0x17c/0x230 <snip> ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception On investigation looks like this panic was caused due to a 'stat_buffer' of size==0 being provided to drc_pmem_query_stats() to fetch all performance stats-ids of an NVDIMM. However drc_pmem_query_stats() shouldn't have been called since the vPMEM NVDIMM doesn't support and performance stat-id's. This was caused due to missing check for 'p->stat_buffer_len' at the beginning of papr_scm_pmu_check_events() which indicates that the NVDIMM doesn't support performance-stats. Fix this by introducing the check for 'p->stat_buffer_len' at the beginning of papr_scm_pmu_check_events(). [1] https://lore.kernel.org/all/6B3A522A-6A5F-4CC9-B268-0C63AA6E07D3@linux.ibm.com Fixes: 0e0946e22f3665d2732 ("powerpc/papr_scm: Fix leaking nvdimm_events_map elements") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220524112353.1718454-1-vaibhav@linux.ibm.com
| * | powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACKMichael Ellerman2022-05-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The HAVE_IRQ_EXIT_ON_IRQ_STACK option tells generic code that irq_exit() is called while still running on the hard irq stack (hardirq_ctx[] in the powerpc code). Selecting the option means the generic code will *not* switch to the softirq stack before running softirqs, because the code is already running on the (mostly empty) hard irq stack. But since commit 1b1b6a6f4cc0 ("powerpc: handle irq_enter/irq_exit in interrupt handler wrappers"), irq_exit() is now called on the regular task stack, not the hard irq stack. That's because previously irq_exit() was called in __do_irq() which is run on the hard irq stack, but now it is called in interrupt_async_exit_prepare() which is called from do_irq() constructed by the wrapper macro, which is after the switch back to the task stack. So drop HAVE_IRQ_EXIT_ON_IRQ_STACK from the Kconfig. This will mean an extra stack switch when processing some interrupts, but should significantly reduce the likelihood of stack overflow. It also means the softirq stack will be used for running softirqs from other interrupts that don't use the hard irq stack, eg. timer interrupts. Fixes: 1b1b6a6f4cc0 ("powerpc: handle irq_enter/irq_exit in interrupt handler wrappers") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525032639.1947280-1-mpe@ellerman.id.au
| * | powerpc/kasan: Silence KASAN warnings in __get_wchan()He Ying2022-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following KASAN warning was reported in our kernel. BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 Read of size 4 at addr d216f958 by task ps/14437 CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 Call Trace: [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 [daa63948] [c00496e8] get_wchan+0x188/0x250 [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 [daa63d68] [c037fc94] vfs_read+0x164/0x510 [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 --- interrupt: c00 at 0x8fa8f4 LR = 0x8fa8cc The buggy address belongs to the page: page:98ebcdd2 refcount:0 mapcount:0 mapping:00000000 index:0x2 pfn:0x1216f flags: 0x0() raw: 00000000 00000000 01010122 00000000 00000002 00000000 ffffffff 00000000 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 ^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 After looking into this issue, I find the buggy address belongs to the task stack region. It seems KASAN has something wrong. I look into the code of __get_wchan in x86 architecture and find the same issue has been resolved by the commit f7d27c35ddff ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to powerpc architecture too. As Andrey Ryabinin said, get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Wanming Hu <huwanming@huaweil.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121014418.155675-1-heying24@huawei.com
| * | powerpc/kasan: Mark more real-mode code as not to be instrumentedPaul Mackerras2022-05-294-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This marks more files and functions that can possibly be called in real mode as not to be instrumented by KASAN. Most were found by inspection, except for get_pseries_errorlog() which was reported as causing a crash in testing. Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoX1kZPnmUX4RZEK@cleo
* | | Merge tag 'net-5.19-rc2' of ↵Linus Torvalds2022-06-0944-189/+367
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
| * | | net: amd-xgbe: fix clang -Wformat warningJustin Stitt2022-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | see warning: | drivers/net/ethernet/amd/xgbe/xgbe-drv.c:2787:43: warning: format specifies | type 'unsigned short' but the argument has type 'int' [-Wformat] | netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth->h_proto)); | ~~~~~~ ^~~~~~~~~~~~~~~~~~~ Variadic functions (printf-like) undergo default argument promotion. Documentation/core-api/printk-formats.rst specifically recommends using the promoted-to-type's format flag. Also, as per C11 6.3.1.1: (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) `If an int can represent all values of the original type ..., the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.` Since the argument is a u16 it will get promoted to an int and thus it is most accurate to use the %x format specifier here. It should be noted that the `#06` formatting sugar does not alter the promotion rules. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <jstitt007@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220607191119.20686-1-jstitt007@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | tcp: use alloc_large_system_hash() to allocate table_perturbMuchun Song2022-06-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In our server, there may be no high order (>= 6) memory since we reserve lots of HugeTLB pages when booting. Then the system panic. So use alloc_large_system_hash() to allocate table_perturb. Fixes: e9261476184b ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHYAlvin Šipraga2022-06-091-29/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit a18e6521a7d9 ("net: phylink: handle NA interface mode in phylink_fwnode_phy_connect()"), phylib defaults to GMII when no phy-mode or phy-connection-type property is specified in a DSA port node of the device tree. The same commit caused a regression in rtl8365mb whereby phylink would fail to connect, because the driver did not advertise support for GMII for ports with internal PHY. It should be noted that the aforementioned regression is not because the blamed commit was incorrect: on the contrary, the blamed commit is correcting the previous behaviour whereby unspecified phy-mode would cause the internal interface mode to be PHY_INTERFACE_MODE_NA. The rtl8365mb driver only worked by accident before because it _did_ advertise support for PHY_INTERFACE_MODE_NA, despite NA being reserved for internal use by phylink. With one mistake fixed, the other was exposed. Commit a5dba0f207e5 ("net: dsa: rtl8365mb: add GMII as user port mode") then introduced implicit support for GMII mode on ports with internal PHY to allow a PHY connection for device trees where the phy-mode is not explicitly set to "internal". At this point everything was working OK again. Subsequently, commit 6ff6064605e9 ("net: dsa: realtek: convert to phylink_generic_validate()") broke this behaviour again by discarding the usage of rtl8365mb_phy_mode_supported() - where this GMII support was indicated - while switching to the new .phylink_get_caps API. With the new API, rtl8365mb_phy_mode_supported() is no longer needed. Remove it altogether and add back the GMII capability - this time to rtl8365mb_phylink_get_caps() - so that the above default behaviour works for ports with internal PHY again. Fixes: 6ff6064605e9 ("net: dsa: realtek: convert to phylink_generic_validate()") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220607184624.417641-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | Merge branch '10GbE' of ↵Jakub Kicinski2022-06-091-4/+4
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-06-07 This series contains updates to ixgbe driver only. Olivier Matz resolves an issue so that broadcast packets can still be received when VF removes promiscuous settings and removes setting of VLAN promiscuous, in promiscuous mode, to prevent a loop when VFs are bridged. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: fix unexpected VLAN Rx in promisc mode on VF ixgbe: fix bcast packets Rx on VF after promisc removal ==================== Link: https://lore.kernel.org/r/20220607181538.748786-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | ixgbe: fix unexpected VLAN Rx in promisc mode on VFOlivier Matz2022-06-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the promiscuous mode is enabled on a VF, the IXGBE_VMOLR_VPE bit (VLAN Promiscuous Enable) is set. This means that the VF will receive packets whose VLAN is not the same than the VLAN of the VF. For instance, in this situation: ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ VF0├────┤VF1 VF2├────┤VF3 │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ VM1 VM2 VM3 vf 0: vlan 1000 vf 1: vlan 1000 vf 2: vlan 1001 vf 3: vlan 1001 If we tcpdump on VF3, we see all the packets, even those transmitted on vlan 1000. This behavior prevents to bridge VF1 and VF2 in VM2, because it will create a loop: packets transmitted on VF1 will be received by VF2 and vice-versa, and bridged again through the software bridge. This patch remove the activation of VLAN Promiscuous when a VF enables the promiscuous mode. However, the IXGBE_VMOLR_UPE bit (Unicast Promiscuous) is kept, so that a VF receives all packets that has the same VLAN, whatever the destination MAC address. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * | | ixgbe: fix bcast packets Rx on VF after promisc removalOlivier Matz2022-06-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a VF requested to remove the promiscuous flag on an interface, the broadcast packets are not received anymore. This breaks some protocols like ARP. In ixgbe_update_vf_xcast_mode(), we should keep the IXGBE_VMOLR_BAM bit (Broadcast Accept) on promiscuous removal. This flag is already set by default in ixgbe_set_vmolr() on VF reset. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | | | Merge branch 'mv88e6xxx-fixes-for-reading-serdes-state'Jakub Kicinski2022-06-091-16/+19
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell King says: ==================== mv88e6xxx: fixes for reading serdes state These are some low-priority fixes to the mv88e6xxx serdes code. Patch 1 fixes the reporting of an_complete, which is used in the emulation of a conventional C22 PHY. Patch from Marek. Patch 2 makes one of the error messages in patch 2 to be consistent with the other error messages in this function. Patch 3 ensures that we do not miss a link-failure event. ==================== Link: https://lore.kernel.org/r/Yp82TyoLon9jz6k3@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: dsa: mv88e6xxx: correctly report serdes link failureRussell King (Oracle)2022-06-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Phylink wants to know if the link has dropped since the last time state was retrieved, and the BMSR gives us that. Read the BMSR and use it when deciding the link state. Fill in the an_complete member as well for the emulated PHY state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: dsa: mv88e6xxx: fix BMSR error to be consistent with othersRussell King (Oracle)2022-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other errors accessing the registers in mv88e6352_serdes_pcs_get_state() print "PHY " before the register name, except for the BMSR. Make this consistent with the other error messages. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_completeMarek Behún2022-06-091-16/+11
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ede359d8843a ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed") added the ability to link if AN was bypassed, and added filling of state->an_complete field, but set it to true if AN was enabled in BMCR, not when AN was reported complete in BMSR. This was done because for some reason, when I wanted to use BMSR value to infer an_complete, I was looking at BMSR_ANEGCAPABLE bit (which was always 1), instead of BMSR_ANEGCOMPLETE bit. Use BMSR_ANEGCOMPLETE for filling state->an_complete. Fixes: ede359d8843a ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | net: altera: Fix refcount leak in altera_tse_mdio_createMiaoqian Lin2022-06-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every iteration of for_each_child_of_node() decrements the reference count of the previous node. When break from a for_each_child_of_node() loop, we need to explicitly call of_node_put() on the child node when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220607041144.7553-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets2022-06-092-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface ↵Chen Lin2022-06-091-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netdev[napi]_alloc_frag When rx_flag == MTK_RX_FLAGS_HWLRO, rx_data_len = MTK_MAX_LRO_RX_LENGTH(4096 * 3) > PAGE_SIZE. netdev_alloc_frag is for alloction of page fragment only. Reference to other drivers and Documentation/vm/page_frags.rst Branch to use __get_free_pages when ring->frag_size > PAGE_SIZE. Signed-off-by: Chen Lin <chen45464546@163.com> Link: https://lore.kernel.org/r/1654692413-2598-1-git-send-email-chen45464546@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | ip_gre: test csum_start instead of transport headerWillem de Bruijn2022-06-091-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2022-06-098-15/+49
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2022-06-09 We've added 6 non-merge commits during the last 2 day(s) which contain a total of 8 files changed, 49 insertions(+), 15 deletions(-). The main changes are: 1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64 BPF JIT compiler, from Eric Dumazet. 2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using the correct program context type, from Toke Høiland-Jørgensen. 3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski. 4) Fix potential integer overflows in multi-kprobe link code by using safer kvmalloc_array() allocation helpers, from Dan Carpenter. 5) Add Quentin as bpftool maintainer, from Quentin Monnet. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: MAINTAINERS: Add a maintainer for bpftool xsk: Fix handling of invalid descriptors in XSK TX batching API selftests/bpf: Add selftest for calling global functions from freplace bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs bpf: Use safer kvmalloc_array() where possible bpf, arm64: Clear prog->jited_len along prog->jited ==================== Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | MAINTAINERS: Add a maintainer for bpftoolQuentin Monnet2022-06-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've been contributing and reviewing patches for bpftool for some time, and I'm taking care of its external mirror. On Alexei, KP, and Daniel's suggestion, I would like to step forwards and become a maintainer for the tool. This patch adds a dedicated entry to MAINTAINERS. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220608121428.69708-1-quentin@isovalent.com
| | * | | | xsk: Fix handling of invalid descriptors in XSK TX batching APIMaciej Fijalkowski2022-06-082-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK Tx batching API. There is a test that checks how invalid Tx descriptors are handled by AF_XDP. Each valid descriptor is followed by invalid one on Tx side whereas the Rx side expects only to receive a set of valid descriptors. In current xsk_tx_peek_release_desc_batch() function, the amount of available descriptors is hidden inside xskq_cons_peek_desc_batch(). This can be problematic in cases where invalid descriptors are present due to the fact that xskq_cons_peek_desc_batch() returns only a count of valid descriptors. This means that it is impossible to properly update XSK ring state when calling xskq_cons_release_n(). To address this issue, pull out the contents of xskq_cons_peek_desc_batch() so that callers (currently only xsk_tx_peek_release_desc_batch()) will always be able to update the state of ring properly, as total count of entries is now available and use this value as an argument in xskq_cons_release_n(). By doing so, xskq_cons_peek_desc_batch() can be dropped altogether. Fixes: 9349eb3a9d2a ("xsk: Introduce batched Tx descriptor interfaces") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com
| | * | | | selftests/bpf: Add selftest for calling global functions from freplaceToke Høiland-Jørgensen2022-06-072-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a selftest that calls a global function with a context object parameter from an freplace function to check that the program context type is correctly converted to the freplace target when fetching the context type from the kernel BTF. v2: - Trim includes - Get rid of global function - Use __noinline Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-2-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programsToke Høiland-Jørgensen2022-06-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The verifier allows programs to call global functions as long as their argument types match, using BTF to check the function arguments. One of the allowed argument types to such global functions is PTR_TO_CTX; however the check for this fails on BPF_PROG_TYPE_EXT functions because the verifier uses the wrong type to fetch the vmlinux BTF ID for the program context type. This failure is seen when an XDP program is loaded using libxdp (which loads it as BPF_PROG_TYPE_EXT and attaches it to a global XDP type program). Fix the issue by passing in the target program type instead of the BPF_PROG_TYPE_EXT type to bpf_prog_get_ctx() when checking function argument compatibility. The first Fixes tag refers to the latest commit that touched the code in question, while the second one points to the code that first introduced the global function call verification. v2: - Use resolve_prog_type() Fixes: 3363bd0cfbb8 ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support") Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Reported-by: Simon Sundberg <simon.sundberg@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-1-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | bpf: Use safer kvmalloc_array() where possibleDan Carpenter2022-06-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvmalloc_array() function is safer because it has a check for integer overflows. These sizes come from the user and I was not able to see any bounds checking so an integer overflow seems like a realistic concern. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | bpf, arm64: Clear prog->jited_len along prog->jitedEric Dumazet2022-06-071-0/+1
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported an illegal copy_to_user() attempt from bpf_prog_get_info_by_fd() [1] There was no repro yet on this bug, but I think that commit 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") is exposing a prior bug in bpf arm64. bpf_prog_get_info_by_fd() looks at prog->jited_len to determine if the JIT image can be copied out to user space. My theory is that syzbot managed to get a prog where prog->jited_len has been set to 43, while prog->bpf_func has ben cleared. It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering this particular warning. I thought find_vma_area(NULL) would not find a vm_struct. As we do not hold vmap_area_lock spinlock, it might be possible that the found vm_struct was garbage. [1] usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)! kernel BUG at mm/usercopy.c:101! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0 Hardware name: linux,dummy-virt (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101 lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89 sp : ffff80000b773a20 x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48 x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000 x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001 x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420 x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031 x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865 x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830 x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000 x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064 Call trace: usercopy_abort+0x90/0x94 mm/usercopy.c:89 check_heap_object mm/usercopy.c:186 [inline] __check_object_size mm/usercopy.c:252 [inline] __check_object_size+0x198/0x36c mm/usercopy.c:214 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_to_user include/linux/uaccess.h:159 [inline] bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993 bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253 __sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956 __do_sys_bpf kernel/bpf/syscall.c:5021 [inline] __se_sys_bpf kernel/bpf/syscall.c:5019 [inline] __arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142 do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206 el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624 el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581 Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000) Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220531215113.1100754-1-eric.dumazet@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | au1000_eth: stop using virt_to_bus()Arnd Bergmann2022-06-082-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion to the dma-mapping API in linux-2.6.11 was incomplete and left a virt_to_bus() call around. There have been a number of fixes for DMA mapping API abuse in this driver, but this one always slipped through. Change it to just use the existing dma_addr_t pointer, and make it use the correct types throughout the driver to make it easier to understand the virtual vs dma address spaces. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Manuel Lauss <manuel.lauss@gmail.com> Link: https://lore.kernel.org/r/20220607090206.19830-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen2022-06-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | ipv6: Fix signed integer overflow in __ip6_append_dataWang Yufen2022-06-082-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t. UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170 Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferredXiaohui Zhang2022-06-081-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the handling of play_deferred in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in play_deferred"), we thought a patch might be needed here as well. Currently usb_submit_urb is called directly to submit deferred tx urbs after unanchor them. So the usb_giveback_urb_bh would failed to unref it in usb_unanchor_urb and cause memory leak. Put those urbs in tx_anchor to avoid the leak, and also fix the error handling. Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220607083230.6182-1-xiaohuizhang@ruc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge branch 'split-nfc-st21nfca-refactor-evt_transaction-into-3'Jakub Kicinski2022-06-081-23/+30
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin Faltesek says: ==================== Split "nfc: st21nfca: Refactor EVT_TRANSACTION" into 3 v2: https://lore.kernel.org/netdev/20220401180939.2025819-1-mfaltesek@google.com/ v1: https://lore.kernel.org/netdev/20220329175431.3175472-1-mfaltesek@google.com/ ==================== Link: https://lore.kernel.org/r/20220607025729.1673212-1-mfaltesek@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTIONMartin Faltesek2022-06-081-30/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The transaction buffer is allocated by using the size of the packet buf, and subtracting two which seem intended to remove the two tags which are not present in the target structure. This calculation leads to under counting memory because of differences between the packet contents and the target structure. The aid_len field is a u8 in the packet, but a u32 in the structure, resulting in at least 3 bytes always being under counted. Further, the aid data is a variable length field in the packet, but fixed in the structure, so if this field is less than the max, the difference is added to the under counting. The last validation check for transaction->params_len is also incorrect since it employs the same accounting error. To fix, perform validation checks progressively to safely reach the next field, to determine the size of both buffers and verify both tags. Once all validation checks pass, allocate the buffer and copy the data. This eliminates freeing memory on the error path, as those checks are moved ahead of memory allocation. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Fixes: 4fbcc1a4cb20 ("nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handlingMartin Faltesek2022-06-081-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error paths do not free previously allocated memory. Add devm_kfree() to those failure paths. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Fixes: 4fbcc1a4cb20 ("nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTIONMartin Faltesek2022-06-081-1/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first validation check for EVT_TRANSACTION has two different checks tied together with logical AND. One is a check for minimum packet length, and the other is for a valid aid_tag. If either condition is true (fails), then an error should be triggered. The fix is to change && to ||. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge branch 'net-unexport-some-symbols-that-are-annotated-__init'Jakub Kicinski2022-06-083-3/+0
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Masahiro Yamada says: ==================== net: unexport some symbols that are annotated __init This patch set fixes odd combinations of EXPORT_SYMBOL and __init. Checking this in modpost is a good thing and I really wanted to do it, but Linus Torvalds imposes a very strict rule, "No new warning". I'd like the maintainer to kindly pick this up and send a fixes pull request. Unless I eliminate all the sites of warnings beforehand, Linus refuses to re-enable the modpost check. [1] [1]: https://lore.kernel.org/linux-kbuild/CAK7LNATmd0bigp7HQ4fTzHw5ugZMkSb3UXG7L4fxpGbqkRKESA@mail.gmail.com/T/#m5e50cc2da17491ba210c72b5efdbc0ce76e0217f ==================== Link: https://lore.kernel.org/r/20220606045355.4160711-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: ipv6: unexport __init-annotated seg6_hmac_init()Masahiro Yamada2022-06-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the caller (net/ipv6/seg6.c) and the callee (net/ipv6/seg6_hmac.c) belong to the same module. It seems an internal function call in ipv6.ko. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: xfrm: unexport __init-annotated xfrm4_protocol_init()Masahiro Yamada2022-06-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, net/ipv4/xfrm4_policy.c is never compiled as modular. (CONFIG_XFRM is boolean) Fixes: 2f32b51b609f ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | net: mdio: unexport __init-annotated mdio_bus_init()Masahiro Yamada2022-06-081-1/+0
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, drivers/net/phy/phy_device.c is never compiled as modular. (CONFIG_PHYLIB is boolean) Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | net/mlx4_en: Fix wrong return value on ioctl EEPROM query failureGal Pressman2022-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ioctl EEPROM query wrongly returns success on read failures, fix that by returning the appropriate error code. Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220606115718.14233-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | net: dsa: lantiq_gswip: Fix refcount leak in gswip_gphy_fw_listMiaoqian Lin2022-06-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every iteration of for_each_available_child_of_node() decrements the reference count of the previous node. when breaking early from a for_each_available_child_of_node() loop, we need to explicitly call of_node_put() on the gphy_fw_np. Add missing of_node_put() to avoid refcount leak. Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220605072335.11257-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2022-06-088-35/+98
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix NAT support for NFPROTO_INET without layer 3 address, from Florian Westphal. 2) Use kfree_rcu(ptr, rcu) variant in nf_tables clean_net path. 3) Use list to collect flowtable hooks to be deleted. 4) Initialize list of hook field in flowtable transaction. 5) Release hooks on error for flowtable updates. 6) Memleak in hardware offload rule commit and abort paths. 7) Early bail out in case device does not support for hardware offload. This adds a new interface to net/core/flow_offload.c to check if the flow indirect block list is empty. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: bail out early if hardware offload is not supported netfilter: nf_tables: memleak flow rule from commit path netfilter: nf_tables: release new hooks on unsupported flowtable flags netfilter: nf_tables: always initialize flowtable hook list in transaction netfilter: nf_tables: delete flowtable hooks via transaction list netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path netfilter: nat: really support inet nat without l3 address ==================== Link: https://lore.kernel.org/r/20220606212055.98300-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>