summaryrefslogtreecommitdiffstats
path: root/arch/x86/um (unfollow)
Commit message (Collapse)AuthorFilesLines
2022-05-22afs: Fix afs_getattr() to refetch file status if callback break occurredDavid Howells1-1/+13
If a callback break occurs (change notification), afs_getattr() needs to issue an FS.FetchStatus RPC operation to update the status of the file being examined by the stat-family of system calls. Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set. This can be tested by appending to a file on one AFS client and then using "stat -L" to examine its length on a machine running kafs. This can also be watched through tracing on the kafs machine. The callback break is seen: kworker/1:1-46 [001] ..... 978.910812: afs_cb_call: c=0000005f YFSCB.CallBack kworker/1:1-46 [001] ...1. 978.910829: afs_cb_break: 100058:23b4c:242d2c2 b=2 s=1 break-cb kworker/1:1-46 [001] ..... 978.911062: afs_call_done: c=0000005f ret=0 ab=0 [0000000082994ead] And then the stat command generated no traffic if unpatched, but with this change a call to fetch the status can be observed: stat-4471 [000] ..... 986.744122: afs_make_fs_call: c=000000ab 100058:023b4c:242d2c2 YFS.FetchStatus stat-4471 [000] ..... 986.745578: afs_call_done: c=000000ab ret=0 ab=0 [0000000087fc8c84] Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Tested-by: Markus Suvanto <markus.suvanto@gmail.com> Tested-by: kafs-testing+fedora34_64checkkafs-build-496@auristor.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216010 Link: https://lore.kernel.org/r/165308359800.162686.14122417881564420962.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-21perf session: Fix Intel LBR callstack entries and nr print messageChengdong Li1-5/+21
When generating callstack information from branch_stack(Intel LBR), the actual number of callstack entry should be bigger than the number of branch_stack, for example: branch_stack records: B() -> C() A() -> B() converted callstack records should be: C() B() A() though, the number of callstack equals to the number of branch stack plus 1. This patch fixes above issue in branch_stack__printf(). For example, # echo 'scale=2000; 4*a(1)' > cmd # perf record --call-graph lbr bc -l < cmd Before applying this patch, `perf script -D` output: 1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0 ... LBR call chain: nr:8 ..... 0: fffffffffffffe00 ..... 1: 000000000040a410 ..... 2: 000000000040573c ..... 3: 0000000000408650 ..... 4: 00000000004022f2 ..... 5: 00000000004015f5 ..... 6: 00007f5ed6dcb553 ..... 7: 0000000000401698 ... FP chain: nr:2 ..... 0: fffffffffffffe00 ..... 1: 000000000040a6d8 ... branch callstack: nr:6 # which is not consistent with LBR records. ..... 0: 000000000040a410 ..... 1: 0000000000408650 # ditto ..... 2: 00000000004022f2 ..... 3: 00000000004015f5 ..... 4: 00007f5ed6dcb553 ..... 5: 0000000000401698 ... thread: bc:17990 ...... dso: /usr/bin/bc bc 17990 1220022.677386: 894172 cycles: 40a410 [unknown] (/usr/bin/bc) 40573c [unknown] (/usr/bin/bc) 408650 [unknown] (/usr/bin/bc) 4022f2 [unknown] (/usr/bin/bc) 4015f5 [unknown] (/usr/bin/bc) 7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so) 401698 [unknown] (/usr/bin/bc) After applied: 1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0 ... LBR call chain: nr:8 ..... 0: fffffffffffffe00 ..... 1: 000000000040a410 ..... 2: 000000000040573c ..... 3: 0000000000408650 ..... 4: 00000000004022f2 ..... 5: 00000000004015f5 ..... 6: 00007f5ed6dcb553 ..... 7: 0000000000401698 ... FP chain: nr:2 ..... 0: fffffffffffffe00 ..... 1: 000000000040a6d8 ... branch callstack: nr:7 ..... 0: 000000000040a410 ..... 1: 000000000040573c ..... 2: 0000000000408650 ..... 3: 00000000004022f2 ..... 4: 00000000004015f5 ..... 5: 00007f5ed6dcb553 ..... 6: 0000000000401698 ... thread: bc:17990 ...... dso: /usr/bin/bc bc 17990 1220022.677386: 894172 cycles: 40a410 [unknown] (/usr/bin/bc) 40573c [unknown] (/usr/bin/bc) 408650 [unknown] (/usr/bin/bc) 4022f2 [unknown] (/usr/bin/bc) 4015f5 [unknown] (/usr/bin/bc) 7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so) 401698 [unknown] (/usr/bin/bc) Change from v1: - refined code style according to Jiri's review comments. Signed-off-by: Chengdong Li <chengdongli@tencent.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: likexu@tencent.com Link: https://lore.kernel.org/r/20220517015726.96131-1-chengdongli@tencent.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf test bpf: Skip test if clang is not presentAthira Rajeev1-4/+6
Perf BPF filter test fails in environment where "clang" is not installed. Test failure logs: <<>> 42: BPF filter : 42.1: Basic BPF filtering : Skip 42.2: BPF pinning : FAILED! 42.3: BPF prologue generation : FAILED! <<>> Enabling verbose option provided debug logs which says clang/llvm needs to be installed. Snippet of verbose logs: <<>> 42.2: BPF pinning : --- start --- test child forked, pid 61423 ERROR: unable to find clang. Hint: Try to install latest clang/llvm to support BPF. Check your $PATH <<logs_here>> Failed to compile test case: 'Basic BPF llvm compile' Unable to get BPF object, fix kbuild first test child finished with -1 ---- end ---- BPF filter subtest 2: FAILED! <<>> Here subtests, "BPF pinning" and "BPF prologue generation" failed and logs shows clang/llvm is needed. After installing clang, testcase passes. Reason on why subtest failure happens though logs has proper debug information: Main function __test__bpf calls test_llvm__fetch_bpf_obj by passing 4th argument as true ( 4th arguments maps to parameter "force" in test_llvm__fetch_bpf_obj ). But this will cause test_llvm__fetch_bpf_obj to skip the check for clang/llvm. Snippet of code part which checks for clang based on parameter "force" in test_llvm__fetch_bpf_obj: <<>> if (!force && (!llvm_param.user_set_param && <<>> Since force is set to "false", test won't get skipped and fails to compile test case. The BPF code compilation needs clang, So pass the fourth argument as "false" and also skip the test if reason for return is "TEST_SKIP" After the patch: <<>> 42: BPF filter : 42.1: Basic BPF filtering : Skip 42.2: BPF pinning : Skip 42.3: BPF prologue generation : Skip <<>> Fixes: ba1fae431e74bb42 ("perf test: Add 'perf test BPF'") Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lore.kernel.org/r/20220511115438.84032-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf test session topology: Fix test to skip the test in guest environmentAthira Rajeev1-0/+11
The session topology test fails in powerpc pSeries platform. Test logs: <<>> Session topology : FAILED! <<>> This testcases tests cpu topology by checking the core_id and socket_id stored in perf_env from perf session. The data from perf session is compared with the cpu topology information from "/sys/devices/system/cpu/cpuX/topology" like core_id, physical_package_id. In case of virtual environment, detail like physical_package_id is restricted to be exposed. Hence physical_package_id is set to -1. The testcase fails on such platforms since socket_id can't be fetched from topology info. Skip the testcase in powerpc if physical_package_id returns -1. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>--- Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220511114959.84002-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf bench numa: Address compiler error on s390Thomas Richter1-1/+1
The compilation on s390 results in this error: # make DEBUG=y bench/numa.o ... bench/numa.c: In function ‘__bench_numa’: bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 10 and 20 [-Werror=format-truncation=] 1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t); ^~ ... bench/numa.c:1749:64: note: directive argument in the range [-2147483647, 2147483646] ... # The maximum length of the %d replacement is 11 characters because of the negative sign. Therefore extend the array by two more characters. Output after: # make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o # Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()") Suggested-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf test: Avoid shell test description infinite loopIan Rogers1-2/+6
for_each_shell_test() is already strict in expecting tests to be files and executable. It is sometimes possible when it iterates over all files that it finds one that is executable and lacks a newline character. When this happens the loop never terminates as it doesn't check for EOF. Add the EOF check to make this loop at least bounded by the file size. If the description is returned as NULL then also skip the test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220517204144.645913-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf regs x86: Fix arch__intr_reg_mask() for the hybrid platformKan Liang1-0/+12
The X86 specific arch__intr_reg_mask() is to check whether the kernel and hardware can collect XMM registers. But it doesn't work on some hybrid platform. Without the patch on ADL-N: $ perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 The config of the test event doesn't contain the PMU information. The kernel may fail to initialize it on the correct hybrid PMU and return the wrong non-supported information. Add the PMU information into the config for the hybrid platform. The same register set is supported among different hybrid PMUs. Checking the first available one is good enough. With the patch on ADL-N: $ perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 Fixes: 6466ec14aaf44ff1 ("perf regs x86: Add X86 specific arch__intr_reg_mask()") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220518145125.1494156-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpcAthira Rajeev1-0/+10
"perf all PMU test" picks the input events from "perf list --raw-dump pmu" list and runs "perf stat -e" for each of the event in the list. In case of powerpc, the PowerVM environment supports events from hv_24x7 and hv_gpci PMU which is of example format like below: - hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/ - hv_gpci/event,partition_id=?/ The value for "?" needs to be filled in depending on system and respective event. CPM_ADJUNCT_INST needs have core value and domain value. hv_gpci event needs partition_id. Similarly, there are other events for hv_24x7 and hv_gpci having "?" in event format. Hence skip these events on powerpc platform since values like partition_id, domain is specific to system and event. Fixes: 3d5ac9effcc640d5 ("perf test: Workload test of all PMUs") Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20220520101236.17249-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllersPiyush Malgujar1-0/+1
Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly found for TWSI controllers. Signed-off-by: Szymon Balcerak <sbalcerak@marvell.com> Signed-off-by: Piyush Malgujar <pmalgujar@marvell.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-05-21i2c: ismt: Provide a DMA buffer for Interrupt Cause LoggingMika Westerberg1-0/+14
Before sending a MSI the hardware writes information pertinent to the interrupt cause to a memory location pointed by SMTICL register. This memory holds three double words where the least significant bit tells whether the interrupt cause of master/target/error is valid. The driver does not use this but we need to set it up because otherwise it will perform DMA write to the default address (0) and this will cause an IOMMU fault such as below: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set To prevent this from happening, provide a proper DMA buffer for this that then gets mapped by the IOMMU accordingly. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-05-21i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()Yang Yingliang1-2/+8
Fix the missing clk_disable_unprepare() before return from mtk_i2c_probe() in the error handling case. Fixes: d04913ec5f89 ("i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Stefan Roese <sr@denx.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-05-20perf: Fix sys_perf_event_open() race against selfPeter Zijlstra1-0/+14
Norbert reported that it's possible to race sys_perf_event_open() such that the looser ends up in another context from the group leader, triggering many WARNs. The move_group case checks for races against itself, but the !move_group case doesn't, seemingly relying on the previous group_leader->ctx == ctx check. However, that check is racy due to not holding any locks at that time. Therefore, re-check the result after acquiring locks and bailing if they no longer match. Additionally, clarify the not_move_group case from the move_group-vs-move_group race. Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Reported-by: Norbert Slusarek <nslusarek@gmx.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-20KVM: x86/mmu: fix NULL pointer dereference on guest INVPCIDPaolo Bonzini1-2/+4
With shadow paging enabled, the INVPCID instruction results in a call to kvm_mmu_invpcid_gva. If INVPCID is executed with CR0.PG=0, the invlpg callback is not set and the result is a NULL pointer dereference. Fix it trivially by checking for mmu->invlpg before every call. There are other possibilities: - check for CR0.PG, because KVM (like all Intel processors after P5) flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a nop with paging disabled - check for EFER.LMA, because KVM syncs and flushes when switching MMU contexts outside of 64-bit mode All of these are tricky, go for the simple solution. This is CVE-2022-1789. Reported-by: Yongkang Jia <kangel@zju.edu.cn> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20KVM: x86: hyper-v: fix type of valid_bank_maskYury Norov1-2/+2
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long, but is used as u64, which is wrong for i386, and has been spotted by LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with hweight64()" https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/ But it's wrong even without that patch because now bitmap_weight() dereferences a word after valid_bank_mask on i386. >> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type +[-Wshift-count-overflow] 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~ include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8' 10 | ((!!((w) & (1ULL << 0))) + \ | ^ include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16' 20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32' 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64' 29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) | ^~~~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64' 1983 | if (hc->var_cnt != hweight64(valid_bank_mask)) | ^~~~~~~~~ CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Jim Mattson <jmattson@google.com> CC: Joerg Roedel <joro@8bytes.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Sean Christopherson <seanjc@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Wanpeng Li <wanpengli@tencent.com> CC: kvm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: x86@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20KVM: Free new dirty bitmap if creating a new memslot failsSean Christopherson1-1/+1
Fix a goof in kvm_prepare_memory_region() where KVM fails to free the new memslot's dirty bitmap during a CREATE action if kvm_arch_prepare_memory_region() fails. The logic is supposed to detect if the bitmap was allocated and thus needs to be freed, versus if the bitmap was inherited from the old memslot and thus needs to be kept. If there is no old memslot, then obviously the bitmap can't have been inherited The bug was exposed by commit 86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR"), which made it trivally easy for syzkaller to trigger failure during kvm_arch_prepare_memory_region(), but the bug can be hit other ways too, e.g. due to -ENOMEM when allocating x86's memslot metadata. The backtrace from kmemleak: __vmalloc_node_range+0xb40/0xbd0 mm/vmalloc.c:3195 __vmalloc_node mm/vmalloc.c:3232 [inline] __vmalloc+0x49/0x50 mm/vmalloc.c:3246 __vmalloc_array mm/util.c:671 [inline] __vcalloc+0x49/0x70 mm/util.c:694 kvm_alloc_dirty_bitmap virt/kvm/kvm_main.c:1319 kvm_prepare_memory_region virt/kvm/kvm_main.c:1551 kvm_set_memslot+0x1bd/0x690 virt/kvm/kvm_main.c:1782 __kvm_set_memory_region+0x689/0x750 virt/kvm/kvm_main.c:1949 kvm_set_memory_region virt/kvm/kvm_main.c:1962 kvm_vm_ioctl_set_memory_region virt/kvm/kvm_main.c:1974 kvm_vm_ioctl+0x377/0x13a0 virt/kvm/kvm_main.c:4528 vfs_ioctl fs/ioctl.c:51 __do_sys_ioctl fs/ioctl.c:870 __se_sys_ioctl fs/ioctl.c:856 __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae And the relevant sequence of KVM events: ioctl(3, KVM_CREATE_VM, 0) = 4 ioctl(4, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=KVM_MEM_LOG_DIRTY_PAGES, guest_phys_addr=0x10000000000000, memory_size=4096, userspace_addr=0x20fe8000} ) = -1 EINVAL (Invalid argument) Fixes: 244893fa2859 ("KVM: Dynamically allocate "new" memslots from the get-go") Cc: stable@vger.kernel.org Reported-by: syzbot+8606b8a9cc97a63f1c87@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220518003842.1341782-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20gpio: mvebu/pwm: Refuse requests with inverted polarityUwe Kleine-König1-0/+3
The driver doesn't take struct pwm_state::polarity into account when configuring the hardware, so refuse requests for inverted polarity. Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-20gpio: gpio-vf610: do not touch other bits when set the target bitHaibo Chen1-2/+6
For gpio controller contain register PDDR, when set one target bit, current logic will clear all other bits, this is wrong. Use operator '|=' to fix it. Fixes: 659d8a62311f ("gpio: vf610: add imx7ulp support") Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-20perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT eventsIan Rogers1-3/+14
Stat events can come from disk and so need a degree of validation. They contain a CPU which needs looking up via CPU map to access a counter. Add the CPU to index translation, alongside validity checking. Discussion thread: https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/ Fixes: 7ac0089d138f80dc ("perf evsel: Pass cpu not cpu map index to synthesize") Reported-by: Michael Petlan <mpetlan@redhat.com> Suggested-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lv Ruyi <lv.ruyi@zte.com.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20KVM: eventfd: Fix false positive RCU usage warningWanpeng Li1-1/+2
The splat below can be seen when running kvm-unit-test: ============================= WARNING: suspicious RCU usage 5.18.0-rc7 #5 Tainted: G IOE ----------------------------- /home/kernel/linux/arch/x86/kvm/../../../virt/kvm/eventfd.c:80 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by qemu-system-x86/35124: #0: ffff9725391d80b8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x710 [kvm] #1: ffffbd25cfb2a0b8 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0xdeb/0x1900 [kvm] #2: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: kvm_hv_notify_acked_sint+0x79/0x1e0 [kvm] #3: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: irqfd_resampler_ack+0x5/0x110 [kvm] stack backtrace: CPU: 2 PID: 35124 Comm: qemu-system-x86 Tainted: G IOE 5.18.0-rc7 #5 Call Trace: <TASK> dump_stack_lvl+0x6c/0x9b irqfd_resampler_ack+0xfd/0x110 [kvm] kvm_notify_acked_gsi+0x32/0x90 [kvm] kvm_hv_notify_acked_sint+0xc5/0x1e0 [kvm] kvm_hv_set_msr_common+0xec1/0x1160 [kvm] kvm_set_msr_common+0x7c3/0xf60 [kvm] vmx_set_msr+0x394/0x1240 [kvm_intel] kvm_set_msr_ignored_check+0x86/0x200 [kvm] kvm_emulate_wrmsr+0x4f/0x1f0 [kvm] vmx_handle_exit+0x6fb/0x7e0 [kvm_intel] vcpu_enter_guest+0xe5a/0x1900 [kvm] kvm_arch_vcpu_ioctl_run+0x16e/0xac0 [kvm] kvm_vcpu_ioctl+0x279/0x710 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae resampler-list is protected by irq_srcu (see kvm_irqfd_assign), so fix the false positive by using list_for_each_entry_srcu(). Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1652950153-12489-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20perf build: Fix check for btf__load_from_kernel_by_id() in libbpfArnaldo Carvalho de Melo5-1/+22
Avi Kivity reported a problem where the __weak btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in turn called back to btf__load_from_kernel_by_id(), resulting in an endless loop. Fix this by adding a feature test to check if btf__load_from_kernel_by_id() is available when building perf with LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id() since at that time it didn't exist at all. Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC where we don't have btf__load_from_kernel_by_id() and thus its feature test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID: $ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’: test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration] 6 | return btf__load_from_kernel_by_id(20151128, NULL); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors $ $ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id 00000000005ba180 T btf__load_from_kernel_by_id $ $ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf /tmp/build/perf-urgent/perf: file format elf64-x86-64 <SNIP> 00000000005ba180 <btf__load_from_kernel_by_id>: #include "record.h" #include "util/synthetic-events.h" #ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID struct btf *btf__load_from_kernel_by_id(__u32 id) { 5ba180: 55 push %rbp 5ba181: 48 89 e5 mov %rsp,%rbp 5ba184: 48 83 ec 10 sub $0x10,%rsp 5ba188: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 5ba18f: 00 00 5ba191: 48 89 45 f8 mov %rax,-0x8(%rbp) 5ba195: 31 c0 xor %eax,%eax struct btf *btf; #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdeprecated-declarations" int err = btf__get_from_id(id, &btf); 5ba197: 48 8d 75 f0 lea -0x10(%rbp),%rsi 5ba19b: e8 a0 57 e5 ff call 40f940 <btf__get_from_id@plt> 5ba1a0: 89 c2 mov %eax,%edx #pragma GCC diagnostic pop return err ? ERR_PTR(err) : btf; 5ba1a2: 48 98 cltq 5ba1a4: 85 d2 test %edx,%edx 5ba1a6: 48 0f 44 45 f0 cmove -0x10(%rbp),%rax } <SNIP> Fixes: 218e7b775d368f38 ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions") Reported-by: Avi Kivity <avi@scylladb.com> Link: https://lore.kernel.org/linux-perf-users/f0add43b-3de5-20c5-22c4-70aff4af959f@scylladb.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/YobjjFOblY4Xvwo7@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20selftests: kvm/x86: Verify the pmu event filter matches the correct eventAaron Lewis1-0/+19
Add a test to demonstrate that when the guest programs an event select it is matched correctly in the pmu event filter and not inadvertently filtered. This could happen on AMD if the high nybble[1] in the event select gets truncated away only leaving the bottom byte[2] left for matching. This is a contrived example used for the convenience of demonstrating this issue, however, this can be applied to event selects 0x28A (OC Mode Switch) and 0x08A (L1 BTB Correction), where 0x08A could end up being denied when the event select was only set up to deny 0x28A. [1] bits 35:32 in the event select register and bits 11:8 in the event select. [2] bits 7:0 in the event select register and bits 7:0 in the event select. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20selftests: kvm/x86: Add the helper function create_pmu_event_filterAaron Lewis1-4/+14
Add a helper function that creates a pmu event filter given an event list. Currently, a pmu event filter can only be created with the same hard coded event list. Add a way to create one given a different event list. Also, rename make_pmu_event_filter to alloc_pmu_event_filter to clarify it's purpose given the introduction of create_pmu_event_filter. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-2-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20kvm: x86/pmu: Fix the compare function used by the pmu event filterAaron Lewis1-2/+5
When returning from the compare function the u64 is truncated to an int. This results in a loss of the high nybble[1] in the event select and its sign if that nybble is in use. Switch from using a result that can end up being truncated to a result that can only be: 1, 0, -1. [1] bits 35:32 in the event select register and bits 11:8 in the event select. Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220517051238.2566934-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20scsi: ufs: core: Fix referencing invalid rsp fieldDaejun Park1-12/+7
Fix referencing sense data when it is invalid. When the length of the data segment is 0, there is no valid information in the rsp field, so ufshpb_rsp_upiu() is returned without additional operation. Link: https://lore.kernel.org/r/252651381.41652940482659.JavaMail.epsvc@epcpadp4 Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-20riscv: dts: microchip: fix gpio1 reg property typoConor Paxton1-1/+1
Fix reg address typo in the gpio1 stanza. Signed-off-by: Conor Paxton <conor.paxton@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree") Link: https://lore.kernel.org/r/20220517104058.2004734-1-conor.paxton@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-19riscv: dts: sifive: fu540-c000: align dma node name with dtschemaKrzysztof Kozlowski1-1/+1
Fixes dtbs_check warnings like: dma@3000000: $nodename:0: 'dma@3000000' does not match '^dma-controller(@.*)?$' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20220407193856.18223-1-krzysztof.kozlowski@linaro.org Fixes: c5ab54e9945b ("riscv: dts: add support for PDMA device of HiFive Unleashed Rev A00") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-19mmc: core: Fix busy polling for MMC_SEND_OP_COND againUlf Hansson1-1/+1
It turned out that polling period for MMC_SEND_OP_COND, that currently is set to 1ms, still isn't sufficient. In particular a Micron eMMC on a Beaglebone platform, is reported to sometimes fail to initialize. Additional test, shows that extending the period to 4ms is working fine, so let's make that change. Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de> Tested-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de> Fixes: 1760fdb6fe9f (mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND") Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20220517101046.27512-1-ulf.hansson@linaro.org
2022-05-19drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomapMaarten Lankhorst1-4/+2
When removing short term pins, I've changed the the batch buffer pinning for relocation to use __i915_vma_pin, because i915_gem_object_ggtt_pin_ww was destroying the old vma. This caused regressions, because the functions are not identical. Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again on ggtt-only platforms, but only if the batch can be pinned without being moved. Fixes: b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com (cherry picked from commit 451374eef622fca6f00eeeda89aaccb45a30a149) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-05-19riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOLSunil V L3-6/+32
Add support for getting the boot hart ID from the Linux EFI stub using RISCV_EFI_BOOT_PROTOCOL. This method is preferred over the existing DT based approach since it works irrespective of DT or ACPI. The specification of the protocol is hosted at: https://github.com/riscv-non-isa/riscv-uefi Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Link: https://lore.kernel.org/r/20220519051512.136724-2-sunilvl@ventanamicro.com [ardb: minor tweaks for coding style and whitespace] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-19net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.Andrew Lunn1-0/+7
It is possible to stack bridges on top of each other. Consider the following which makes use of an Ethernet switch: br1 / \ / \ / \ br0.11 wlan0 | br0 / | \ p1 p2 p3 br0 is offloaded to the switch. Above br0 is a vlan interface, for vlan 11. This vlan interface is then a slave of br1. br1 also has a wireless interface as a slave. This setup trunks wireless lan traffic over the copper network inside a VLAN. A frame received on p1 which is passed up to the bridge has the skb->offload_fwd_mark flag set to true, indicating that the switch has dealt with forwarding the frame out ports p2 and p3 as needed. This flag instructs the software bridge it does not need to pass the frame back down again. However, the flag is not getting reset when the frame is passed upwards. As a result br1 sees the flag, wrongly interprets it, and fails to forward the frame to wlan0. When passing a frame upwards, clear the flag. This is the Rx equivalent of br_switchdev_frame_unmark() in br_dev_xmit(). Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-19ptp: ocp: change sysfs attr group handlingJonathan Lemon1-15/+42
In the detach path, the driver calls sysfs_remove_group() for the groups it believes has been registered. However, if the group was never previously registered, then this causes a splat. Instead, compute the groups that should be registered in advance, and then call sysfs_create_groups(), which registers them all at once. Update the error handling appropriately. Fixes: c205d53c4923 ("ptp: ocp: Add firmware capability bits for feature gating") Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220517214600.10606-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19selftests: forwarding: fix missing backslashJoachim Wiberg1-1/+1
Fix missing backslash, introduced in f62c5acc800ee. Causes all tests to not be installed. Fixes: f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile") Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20220518151630.2747773-1-troglobit@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-18Input: ili210x - use one common reset implementationMarek Vasut1-12/+8
Rename ili251x_hardware_reset() to ili210x_hardware_reset(), change its parameter from struct device * to struct gpio_desc *, and use it as one single consistent reset implementation all over the driver. Also increase the minimum reset duration to 12ms, to make sure the reset is really within the spec. Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20220518210423.106555-1-marex@denx.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18Input: ili210x - fix reset timingMarek Vasut1-2/+2
According to Ilitek "231x & ILI251x Programming Guide" Version: 2.30 "2.1. Power Sequence", "T4 Chip Reset and discharge time" is minimum 10ms and "T2 Chip initial time" is maximum 150ms. Adjust the reset timings such that T4 is 12ms and T2 is 160ms to fit those figures. This prevents sporadic touch controller start up failures when some systems with at least ILI251x controller boot, without this patch the systems sometimes fail to communicate with the touch controller. Fixes: 201f3c803544c ("Input: ili210x - add reset GPIO support") Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20220518204901.93534-1-marex@denx.de Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18drm/amd: Don't reset dGPUs if the system is going to s2idleMario Limonciello3-1/+17
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-18libceph: fix misleading ceph_osdc_cancel_request() commentIlya Dryomov1-2/+7
cancel_request() never guaranteed that after its return the OSD client would be completely done with the OSD request. The callback (if specified) can still be invoked and a ref can still be held. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
2022-05-18libceph: fix potential use-after-free on linger ping and resendsIlya Dryomov2-183/+122
request_reinit() is not only ugly as the comment rightfully suggests, but also unsafe. Even though it is called with osdc->lock held for write in all cases, resetting the OSD request refcount can still race with handle_reply() and result in use-after-free. Taking linger ping as an example: handle_timeout thread handle_reply thread down_read(&osdc->lock) req = lookup_request(...) ... finish_request(req) # unregisters up_read(&osdc->lock) __complete_request(req) linger_ping_cb(req) # req->r_kref == 2 because handle_reply still holds its ref down_write(&osdc->lock) send_linger_ping(lreq) req = lreq->ping_req # same req # cancel_linger_request is NOT # called - handle_reply already # unregistered request_reinit(req) WARN_ON(req->r_kref != 1) # fires request_init(req) kref_init(req->r_kref) # req->r_kref == 1 after kref_init ceph_osdc_put_request(req) kref_put(req->r_kref) # req->r_kref == 0 after kref_put, req is freed <further req initialization/use> !!! This happens because send_linger_ping() always (re)uses the same OSD request for watch ping requests, relying on cancel_linger_request() to unregister it from the OSD client and rip its messages out from the messenger. send_linger() does the same for watch/notify registration and watch reconnect requests. Unfortunately cancel_request() doesn't guarantee that after it returns the OSD client would be completely done with the OSD request -- a ref could still be held and the callback (if specified) could still be invoked too. The original motivation for request_reinit() was inability to deal with allocation failures in send_linger() and send_linger_ping(). Switching to using osdc->req_mempool (currently only used by CephFS) respects that and allows us to get rid of request_reinit(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2022-05-18Fix double fget() in vhost_net_set_backend()Al Viro1-8/+7
Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-18vdpa/mlx5: Use consistent RQT sizeEli Cohen1-40/+21
The current code evaluates RQT size based on the configured number of virtqueues. This can raise an issue in the following scenario: Assume MQ was negotiated. 1. mlx5_vdpa_set_map() gets called. 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower than the configured max VQs. 3. A second set_map gets called, but now a smaller number of VQs is used to evaluate the size of the RQT. 4. handle_ctrl_mq() is called with a value larger than what the RQT can hold. This will emit errors and the driver state is compromised. To fix this, we use a new field in struct mlx5_vdpa_net to hold the required number of entries in the RQT. This value is evaluated in mlx5_vdpa_set_driver_features() where we have the negotiated features all set up. In addition to that, we take into consideration the max capability of RQT entries early when the device is added so we don't need to take consider it when creating the RQT. Last, we remove the use of mlx5_vdpa_max_qps() which just returns the max_vas / 2 and make the code clearer. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-18netfilter: nf_tables: disable expression reduction infraPablo Neira Ayuso1-10/+1
Either userspace or kernelspace need to pre-fetch keys inconditionally before comparisons for this to work. Otherwise, register tracking data is misleading and it might result in reducing expressions which are not yet registers. First expression is also guaranteed to be evaluated always, however, certain expressions break before writing data to registers, before comparing the data, leaving the register in undetermined state. This patch disables this infrastructure by now. Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18netfilter: flowtable: move dst_check to packet pathRitaro Takenaka2-22/+20
Fixes sporadic IPv6 packet loss when flow offloading is enabled. IPv6 route GC and flowtable GC are not synchronized. When dst_cache becomes stale and a packet passes through the flow before the flowtable GC teardowns it, the packet can be dropped. So, it is necessary to check dst every time in packet path. Fixes: 227e1e4d0d6c ("netfilter: nf_flowtable: skip device lookup from interface index") Signed-off-by: Ritaro Takenaka <ritarot634@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18netfilter: flowtable: fix TCP flow teardownPablo Neira Ayuso2-27/+9
This patch addresses three possible problems: 1. ct gc may race to undo the timeout adjustment of the packet path, leaving the conntrack entry in place with the internal offload timeout (one day). 2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE timeout is reached before the flow offload del. 3. tcp ct is always set to ESTABLISHED with a very long timeout in flow offload teardown/delete even though the state might be already CLOSED. Also as a remark we cannot assume that the FIN or RST packet is hitting flow table teardown as the packet might get bumped to the slow path in nftables. This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN state transition. Moreover, teturn the connection's ownership to conntrack upon teardown by clearing the offload flag and fixing the established timeout value. The flow table GC thread will asynchonrnously free the flow table and hardware offload entries. Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on which is also misleading since the flow is back to classic conntrack path. If nf_ct_delete() removes the entry from the conntrack table, then it calls nf_ct_put() which decrements the refcnt. This is not a problem because the flowtable holds a reference to the conntrack object from flow_offload_alloc() path which is released via flow_offload_free(). This patch also updates nft_flow_offload to skip packets in SYN_RECV state. Since we might miss or bump packets to slow path, we do not know what will happen there while we are still in SYN_RECV, this patch postpones offload up to the next packet which also aligns to the existing behaviour in tc-ct. flow_offload_teardown() does not reset the existing tcp state from flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow path might have already update the state to CLOSE/FIN. Joint work with Oz and Sven. Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18net: ftgmac100: Disable hardware checksum on AST2600Joel Stanley1-0/+5
The AST2600 when using the i210 NIC over NC-SI has been observed to produce incorrect checksum results with specific MTU values. This was first observed when sending data across a long distance set of networks. On a local network, the following test was performed using a 1MB file of random data. On the receiver run this script: #!/bin/bash while [ 1 ]; do # Zero the stats nstat -r > /dev/null nc -l 9899 > test-file # Check for checksum errors TcpInCsumErrors=$(nstat | grep TcpInCsumErrors) if [ -z "$TcpInCsumErrors" ]; then echo No TcpInCsumErrors else echo TcpInCsumErrors = $TcpInCsumErrors fi done On an AST2600 system: # nc <IP of receiver host> 9899 < test-file The test was repeated with various MTU values: # ip link set mtu 1410 dev eth0 The observed results: 1500 - good 1434 - bad 1400 - good 1410 - bad 1420 - good The test was repeated after disabling tx checksumming: # ethtool -K eth0 tx-checksumming off And all MTU values tested resulted in transfers without error. An issue with the driver cannot be ruled out, however there has been no bug discovered so far. David has done the work to take the original bug report of slow data transfer between long distance connections and triaged it down to this test case. The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers to the patch that introduced AST2600 support. Reported-by: David Wilder <wilder@us.ibm.com> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18igb: skip phy status check where unavailableKevin Mitchell1-1/+2
igb_read_phy_reg() will silently return, leaving phy_data untouched, if hw->ops.read_reg isn't set. Depending on the uninitialized value of phy_data, this led to the phy status check either succeeding immediately or looping continuously for 2 seconds before emitting a noisy err-level timeout. This message went out to the console even though there was no actual problem. Instead, first check if there is read_reg function pointer. If not, proceed without trying to check the phy status register. Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18nfc: pn533: Fix buggy cleanup orderLin Ma1-2/+3
When removing the pn533 device (i2c or USB), there is a logic error. The original code first cancels the worker (flush_delayed_work) and then destroys the workqueue (destroy_workqueue), leaving the timer the last one to be deleted (del_timer). This result in a possible race condition in a multi-core preempt-able kernel. That is, if the cleanup (pn53x_common_clean) is concurrently run with the timer handler (pn533_listen_mode_timer), the timer can queue the poll_work to the already destroyed workqueue, causing use-after-free. This patch reorder the cleanup: it uses the del_timer_sync to make sure the handler is finished before the routine will destroy the workqueue. Note that the timer cannot be activated by the worker again. static void pn533_wq_poll(struct work_struct *work) ... rc = pn533_send_poll_frame(dev); if (rc) return; if (cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, ...); That is, the mod_timer can be called only when pn533_send_poll_frame() returns no error, which is impossible because the device is detaching and the lower driver should return ENODEV code. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18mptcp: Do TCP fallback on early DSS checksum failureMat Martineau2-4/+20
RFC 8684 section 3.7 describes several opportunities for a MPTCP connection to "fall back" to regular TCP early in the connection process, before it has been confirmed that MPTCP options can be successfully propagated on all SYN, SYN/ACK, and data packets. If a peer acknowledges the first received data packet with a regular TCP header (no MPTCP options), fallback is allowed. If the recipient of that first data packet finds a MPTCP DSS checksum error, this provides an opportunity to fail gracefully with a TCP fallback rather than resetting the connection (as might happen if a checksum failure were detected later). This commit modifies the checksum failure code to attempt fallback on the initial subflow of a MPTCP connection, only if it's a failure in the first data mapping. In cases where the peer initiates the connection, requests checksums, is the first to send data, and the peer is sending incorrect checksums (see https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows the connection to proceed as TCP rather than reset. Fixes: dd8bcd1768ff ("mptcp: validate the data checksum") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18mptcp: fix checksum byte orderPaolo Abeni3-14/+26
The MPTCP code typecasts the checksum value to u16 and then converts it to big endian while storing the value into the MPTCP option. As a result, the wire encoding for little endian host is wrong, and that causes interoperabilty interoperability issues with other implementation or host with different endianness. Address the issue writing in the packet the unmodified __sum16 value. MPTCP checksum is disabled by default, interoperating with systems with bad mptcp-level csum encoding should cause fallback to TCP. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275 Fixes: c5b39e26d003 ("mptcp: send out checksum for DSS") Fixes: 390b95a5fb84 ("mptcp: receive checksum for DSS") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2Ard Biesheuvel1-1/+1
In Thumb2, 'b . + 4' produces a branch instruction that uses a narrow encoding, and so it does not jump to the following instruction as expected. So use W(b) instead. Fixes: 6c7cb60bff7a ("ARM: fix Thumb2 regression with Spectre BHB") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-18ARM: 9196/1: spectre-bhb: enable for Cortex-A15Ard Biesheuvel1-0/+1
The Spectre-BHB mitigations were inadvertently left disabled for Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in that case. So fix that. Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-18net: af_key: check encryption module availability consistencyThomas Bartschies1-3/+3
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel produces invalid pfkey acquire messages, when these encryption modules are disabled. This happens because the availability of the algos wasn't checked in all necessary functions. This patch adds these checks. Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>