| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-13
We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).
The main changes are:
1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.
2) Add BPF range computation improvements to the verifier in particular
around XOR and OR operators, refactoring of checks for range computation
and relaxing MUL range computation so that src_reg can also be an unknown
scalar, from Cupertino Miranda.
3) Add support to attach kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. Session mode is a common use-case for tetragon and bpftrace,
from Jiri Olsa.
4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.
5) Improvements to BPF selftests in context of BPF gcc backend,
from Jose E. Marchesi & David Faust.
6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
-style in order to retire the old test, run it in BPF CI and additionally
expand test coverage, from Jordan Rife.
7) Big batch for BPF selftest refactoring in order to remove duplicate code
around common network helpers, from Geliang Tang.
8) Another batch of improvements to BPF selftests to retire obsolete
bpf_tcp_helpers.h as everything is available vmlinux.h,
from Martin KaFai Lau.
9) Fix BPF map tear-down to not walk the map twice on free when both timer
and wq is used, from Benjamin Tissoires.
10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
from Alexei Starovoitov.
11) Change BTF build scripts to using --btf_features for pahole v1.26+,
from Alan Maguire.
12) Small improvements to BPF reusing struct_size() and krealloc_array(),
from Andy Shevchenko.
13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
from Ilya Leoshkevich.
14) Extend TCP ->cong_control() callback in order to feed in ack and
flag parameters and allow write-access to tp->snd_cwnd_stamp
from BPF program, from Miao Xu.
15) Add support for internal-only per-CPU instructions to inline
bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
from Puranjay Mohan.
16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
from Tushar Vyavahare.
17) Extend libbpf to support "module:<function>" syntax for tracing
programs, from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
bpf: make list_for_each_entry portable
bpf: ignore expected GCC warning in test_global_func10.c
bpf: disable strict aliasing in test_global_func9.c
selftests/bpf: Free strdup memory in xdp_hw_metadata
selftests/bpf: Fix a few tests for GCC related warnings.
bpf: avoid gcc overflow warning in test_xdp_vlan.c
tools: remove redundant ethtool.h from tooling infra
selftests/bpf: Expand ATTACH_REJECT tests
selftests/bpf: Expand getsockname and getpeername tests
sefltests/bpf: Expand sockaddr hook deny tests
selftests/bpf: Expand sockaddr program return value tests
selftests/bpf: Retire test_sock_addr.(c|sh)
selftests/bpf: Remove redundant sendmsg test cases
selftests/bpf: Migrate ATTACH_REJECT test cases
selftests/bpf: Migrate expected_attach_type tests
selftests/bpf: Migrate wildcard destination rewrite test
selftests/bpf: Migrate sendmsg6 v4 mapped address tests
selftests/bpf: Migrate sendmsg deny test cases
selftests/bpf: Migrate WILDCARD_IP test
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
...
====================
Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The BPF atomic operations with the BPF_FETCH modifier along with
BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements
all atomic operations except BPF_CMPXCHG with relaxed ordering.
Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic
Instructions" says:
| To provide more efficient support for release consistency [5], each
| atomic instruction has two bits, aq and rl, used to specify additional
| memory ordering constraints as viewed by other RISC-V harts.
and
| If only the aq bit is set, the atomic memory operation is treated as
| an acquire access.
| If only the rl bit is set, the atomic memory operation is treated as a
| release access.
|
| If both the aq and rl bits are set, the atomic memory operation is
| sequentially consistent.
Fix this by setting both aq and rl bits as 1 for operations with
BPF_FETCH and BPF_XCHG.
[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can use either "instruction" or "insn" in the comment.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.
s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:
stw [%r0], 1
afadd [%r1], %r2
ldxw %r3, [%r4]
may be broken. Fix it by emitting "bcr 14,0".
Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.
Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.
Here is how the ARM64 JITed assembly changes after this commit:
ARM64 JIT
===========
BEFORE AFTER
-------- -------
int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0
movk x10, #0x802b, lsl #16 ldr w7, [x10, #24]
movk x10, #0x8000, lsl #32
blr x10
add x7, x0, #0x0
Performance improvement using benchmark[1]
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+-------------------+-------------------+--------------+
| Name | Before | After | % change |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% |
| arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% |
| hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% |
+---------------+-------------------+-------------------+--------------+
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.
To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:
mov dst, src // Move src to dst, if src != dst
mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp // Add the per cpu offset to the dst.
To measure the performance improvement provided by this change, the
benchmark in [1] was used:
Before:
glob-arr-inc : 23.597 ± 0.012M/s
arr-inc : 23.173 ± 0.019M/s
hash-inc : 12.186 ± 0.028M/s
After:
glob-arr-inc : 23.819 ± 0.034M/s
arr-inc : 23.285 ± 0.017M/s
hash-inc : 12.419 ± 0.011M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).
RISCV64 JIT output for `call bpf_get_smp_processor_id`
======================================================
Before After
-------- -------
auipc t1,0x848c ld a5,32(tp)
jalr 604(t1)
mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+
| Name | Before | After | % change |
|---------------+------------------+------------------+--------------|
| glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% |
| arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% |
| hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% |
+---------------+------------------+------------------+--------------+
NOTE: This benchmark includes changes from this patch and the previous
patch that implemented the per-cpu insn.
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).
Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.
To measure the improvement from this change, the benchmark in [1] was
used on Qemu:
Before:
glob-arr-inc : 1.127 ± 0.013M/s
arr-inc : 1.121 ± 0.004M/s
hash-inc : 0.681 ± 0.052M/s
After:
glob-arr-inc : 1.138 ± 0.011M/s
arr-inc : 1.366 ± 0.006M/s
hash-inc : 0.676 ± 0.001M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:
test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS
Deployment and structure
------------------------
The related codes are added to "arch/arc/net":
- bpf_jit.h -- The interface that a back-end translator must provide
- bpf_jit_core.c -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic
The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".
1. Normal pass # The necessary pass
1a. Dry run # Get the whole JIT length, epilogue offset, etc.
1b. Emit phase # Allocate memory and start emitting instructions
2. Extra pass # Only needed if there are relocations to be fixed
2a. Patch relocations
Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:
- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)
The result of "test_bpf" test suite on an HSDK board is:
hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:
.-----------.------------.-------------.
| test type | opcodes | # of cases |
|-----------+------------+-------------|
| atomic | 0xC3, 0xDB | 149 |
| div64 | 0x37, 0x3F | 22 |
| mod64 | 0x97, 0x9F | 15 |
`-----------^------------+-------------|
| (total) 186 |
`-------------'
Setup: build config
-------------------
The following configs must be set to have a working JIT test:
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_TEST_BPF=m
The following options are not necessary for the tests module,
but are good to have:
CONFIG_DEBUG_INFO=y # prerequisite for below
CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h
CONFIG_FTRACE=y #
CONFIG_BPF_SYSCALL=y # all these options lead to
CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y
CONFIG_PERF_EVENTS=y #
Some BPF programs provide data through /sys/kernel/debug:
CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug
Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.
[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7
Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
Or else, the build will fail:
$ make V=1
...
BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats \
-j --lang_exclude=rust \
--skip_encoding_btf_inconsistent_proto \
--btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
...
BTFIDS vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.
Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
...
test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When LSE atomics are available, BPF atomic instructions are implemented
as single ARM64 atomic instructions, therefore it is easy to enable
these in bpf_arena using the currently available exception handling
setup.
LL_SC atomics use loops and therefore would need more work to enable in
bpf_arena.
Enable LSE atomics based instructions in bpf_arena and use the
bpf_jit_supports_insn() callback to reject atomics in bpf_arena if LSE
atomics are not available.
All atomics and arena_atomics selftests are passing:
[root@ip-172-31-2-216 bpf]# ./test_progs -a atomics,arena_atomics
#3/1 arena_atomics/add:OK
#3/2 arena_atomics/sub:OK
#3/3 arena_atomics/and:OK
#3/4 arena_atomics/or:OK
#3/5 arena_atomics/xor:OK
#3/6 arena_atomics/cmpxchg:OK
#3/7 arena_atomics/xchg:OK
#3 arena_atomics:OK
#10/1 atomics/add:OK
#10/2 atomics/sub:OK
#10/3 atomics/and:OK
#10/4 atomics/or:OK
#10/5 atomics/xor:OK
#10/6 atomics/cmpxchg:OK
#10/7 atomics/xchg:OK
#10 atomics:OK
Summary: 2/14 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240426161116.441-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and IPsec.
The bridge patch is actually a follow-up to a recent fix in the same
area. We have a pending v6.8 AF_UNIX regression; it should be solved
soon, but not in time for this PR.
Current release - regressions:
- eth: ks8851: Queue RX packets in IRQ handler instead of disabling
BHs
- net: bridge: fix corrupted ethernet header on multicast-to-unicast
Current release - new code bugs:
- xfrm: fix possible bad pointer derferencing in error path
Previous releases - regressionis:
- core: fix out-of-bounds access in ops_init
- ipv6:
- fix potential uninit-value access in __ip6_make_skb()
- fib6_rules: avoid possible NULL dereference in fib6_rule_action()
- tcp: use refcount_inc_not_zero() in tcp_twsk_unique().
- rtnetlink: correct nested IFLA_VF_VLAN_LIST attribute validation
- rxrpc: fix congestion control algorithm
- bluetooth:
- l2cap: fix slab-use-after-free in l2cap_connect()
- msft: fix slab-use-after-free in msft_do_close()
- eth: hns3: fix kernel crash when devlink reload during
initialization
- eth: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21
family
Previous releases - always broken:
- xfrm: preserve vlan tags for transport mode software GRO
- tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets
- eth: hns3: keep using user config after hardware reset"
* tag 'net-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: dsa: mv88e6xxx: read cmode on mv88e6320/21 serdes only ports
net: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family
net: hns3: fix kernel crash when devlink reload during initialization
net: hns3: fix port vlan filter not disabled issue
net: hns3: use appropriate barrier function after setting a bit value
net: hns3: release PTP resources if pf initialization failed
net: hns3: change type of numa_node_mask as nodemask_t
net: hns3: direct return when receive a unknown mailbox message
net: hns3: using user configure after hardware reset
net/smc: fix neighbour and rtable leak in smc_ib_find_route()
ipv6: prevent NULL dereference in ip6_output()
hsr: Simplify code for announcing HSR nodes timer setup
ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
dt-bindings: net: mediatek: remove wrongly added clocks and SerDes
rxrpc: Only transmit one ACK per jumbo packet received
rxrpc: Fix congestion control algorithm
selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC
ipv6: Fix potential uninit-value access in __ip6_make_skb()
net: phy: marvell-88q2xxx: add support for Rev B1 and B2
appletalk: Improve handling of broadcast packets
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Bluetooth is not a random device connected to the MMC/SD controller. It
is function 2 of the SDIO device.
Fix the address of the bluetooth node. Also fix the node name and drop
the label.
Fixes: 055ef10ccdd4 ("arm64: dts: mt8183: Add jacuzzi pico/pico6 board")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull ARM fix from Russell King:
- clear stale KASan stack poison when a CPU resumes
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9381/1: kasan: clear stale stack poison
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We found below OOB crash:
[ 33.452494] ==================================================================
[ 33.453513] BUG: KASAN: stack-out-of-bounds in refresh_cpu_vm_stats.constprop.0+0xcc/0x2ec
[ 33.454660] Write of size 164 at addr c1d03d30 by task swapper/0/0
[ 33.455515]
[ 33.455767] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 6.1.25-mainline #1
[ 33.456880] Hardware name: Generic DT based system
[ 33.457555] unwind_backtrace from show_stack+0x18/0x1c
[ 33.458326] show_stack from dump_stack_lvl+0x40/0x4c
[ 33.459072] dump_stack_lvl from print_report+0x158/0x4a4
[ 33.459863] print_report from kasan_report+0x9c/0x148
[ 33.460616] kasan_report from kasan_check_range+0x94/0x1a0
[ 33.461424] kasan_check_range from memset+0x20/0x3c
[ 33.462157] memset from refresh_cpu_vm_stats.constprop.0+0xcc/0x2ec
[ 33.463064] refresh_cpu_vm_stats.constprop.0 from tick_nohz_idle_stop_tick+0x180/0x53c
[ 33.464181] tick_nohz_idle_stop_tick from do_idle+0x264/0x354
[ 33.465029] do_idle from cpu_startup_entry+0x20/0x24
[ 33.465769] cpu_startup_entry from rest_init+0xf0/0xf4
[ 33.466528] rest_init from arch_post_acpi_subsys_init+0x0/0x18
[ 33.467397]
[ 33.467644] The buggy address belongs to stack of task swapper/0/0
[ 33.468493] and is located at offset 112 in frame:
[ 33.469172] refresh_cpu_vm_stats.constprop.0+0x0/0x2ec
[ 33.469917]
[ 33.470165] This frame has 2 objects:
[ 33.470696] [32, 76) 'global_zone_diff'
[ 33.470729] [112, 276) 'global_node_diff'
[ 33.471294]
[ 33.472095] The buggy address belongs to the physical page:
[ 33.472862] page:3cd72da8 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x41d03
[ 33.473944] flags: 0x1000(reserved|zone=0)
[ 33.474565] raw: 00001000 ed741470 ed741470 00000000 00000000 00000000 ffffffff 00000001
[ 33.475656] raw: 00000000
[ 33.476050] page dumped because: kasan: bad access detected
[ 33.476816]
[ 33.477061] Memory state around the buggy address:
[ 33.477732] c1d03c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.478630] c1d03c80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[ 33.479526] >c1d03d00: 00 04 f2 f2 f2 f2 00 00 00 00 00 00 f1 f1 f1 f1
[ 33.480415] ^
[ 33.481195] c1d03d80: 00 00 00 00 00 00 00 00 00 00 04 f3 f3 f3 f3 f3
[ 33.482088] c1d03e00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 33.482978] ==================================================================
We find the root cause of this OOB is that arm does not clear stale stack
poison in the case of cpuidle.
This patch refer to arch/arm64/kernel/sleep.S to resolve this issue.
From cited commit [1] that explain the problem
Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.
In the case of cpuidle, CPUs exit the kernel a number of levels deep in
C code. Any instrumented functions on this critical path will leave
portions of the stack shadow poisoned.
If CPUs lose context and return to the kernel via a cold path, we
restore a prior context saved in __cpu_suspend_enter are forgotten, and
we never remove the poison they placed in the stack shadow area by
functions calls between this and the actual exit of the kernel.
Thus, (depending on stackframe layout) subsequent calls to instrumented
functions may hit this stale poison, resulting in (spurious) KASAN
splats to the console.
To avoid this, clear any stale poison from the idle thread for a CPU
prior to bringing a CPU online.
From cited commit [2]
Extend to check for CONFIG_KASAN_STACK
[1] commit 0d97e6d8024c ("arm64: kasan: clear stale stack poison")
[2] commit d56a9ef84bd0 ("kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK")
Signed-off-by: Boy Wu <boy.wu@mediatek.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Fixes: 5615f69bc209 ("ARM: 9016/2: Initialize the mapping of KASan shadow memory")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are a couple of last minute fixes that came in over the previous
week, addressing:
- A pin configuration bug on a qualcomm board that caused issues with
ethernet and mmc
- Two minor code fixes for misleading console output in the microchip
firmware driver
- A build warning in the sifive cache driver"
* tag 'soc-fixes-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: microchip: clarify that sizes and addresses are in hex
firmware: microchip: don't unconditionally print validation success
arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration
cache: sifive_ccache: Silence unused variable warning
|
| | |\ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
One more Qualcomm Arm64 DeviceTree fix for v6.9
On ths SA8155P automotive platform, the wrong gpio controller is defined
for the SD-card detect pin, which depending on probe ordering of things
cause ethernet to be broken. The card detect pin reference is corrected
to solve this problem.
* tag 'qcom-arm64-fixes-for-6.9-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration
Link: https://lore.kernel.org/r/20240427153817.1430382-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
There are two issues with SDHC2 configuration for SA8155P-ADP,
which prevent use of SDHC2 and causes issues with ethernet:
- Card Detect pin for SHDC2 on SA8155P-ADP is connected to gpio4 of
PMM8155AU_1, not to SoC itself. SoC's gpio4 is used for DWMAC
TX. If sdhc driver probes after dwmac driver, it reconfigures
gpio4 and this breaks Ethernet MAC.
- pinctrl configuration mentions gpio96 as CD pin. It seems it was
copied from some SM8150 example, because as mentioned above,
correct CD pin is gpio4 on PMM8155AU_1.
This patch fixes both mentioned issues by providing correct pin handle
and pinctrl configuration.
Fixes: 0deb2624e2d0 ("arm64: dts: qcom: sa8155p-adp: Add support for uSD card")
Cc: stable@vger.kernel.org
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20240412190310.1647893-1-volodymyr_babchuk@epam.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix incorrect delay handling in the plpks (keystore) code
- Fix a panic when an LPAR boots with a frozen PE
Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna
Jain.
* tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
powerpc/pseries: make max polling consistent for longer H_CALLs
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
At the time of LPAR boot up, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.
There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR boot up. One of the scenario is
where the firmware has frozen the PE due to some error condition. This
PE is frozen for 24 hours or unless the whole system is reinitialized.
Within this time frame, if the LPAR is booted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.
Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.
BUG: Kernel NULL pointer dereference on read at 0x000000c8
Faulting instruction address: 0xc0000000001024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000
CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
...
NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c
Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently, plpks_confirm_object_flushed() function polls for 5msec in
total instead of 5sec.
Keep max polling time consistent for all the H_CALLs, which take longer
than expected, to be 5sec. Also, make use of fsleep() everywhere to
insert delay.
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240418031230.170954-1-nayna@linux.ibm.com
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Remove the broken vsyscall emulation code from
the page fault code
- Fix kexec crash triggered by certain SEV RMP
table layouts
- Fix unchecked MSR access error when disabling
the x2APIC via iommu=off
* tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove broken vsyscall emulation code from the page fault code
x86/apic: Don't access the APIC when disabling x2APIC
x86/sev: Add callback to apply RMP table fixups for kexec
x86/e820: Add a new e820 table update helper
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
With 'iommu=off' on the kernel command line and x2APIC enabled by the BIOS
the code which disables the x2APIC triggers an unchecked MSR access error:
RDMSR from 0x802 at rIP: 0xffffffff94079992 (native_apic_msr_read+0x12/0x50)
This is happens because default_acpi_madt_oem_check() selects an x2APIC
driver before the x2APIC is disabled.
When the x2APIC is disabled because interrupt remapping cannot be enabled
due to 'iommu=off' on the command line, x2apic_disable() invokes
apic_set_fixmap() which in turn tries to read the APIC ID. This triggers
the MSR warning because x2APIC is disabled, but the APIC driver is still
x2APIC based.
Prevent that by adding an argument to apic_set_fixmap() which makes the
APIC ID read out conditional and set it to false from the x2APIC disable
path. That's correct as the APIC ID has already been read out during early
discovery.
Fixes: d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/875xw5t6r7.ffs@tglx
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Handle cases where the RMP table placement in the BIOS is not 2M aligned
and the kexec-ed kernel could try to allocate from within that chunk
which then causes a fatal RMP fault.
The kexec failure is illustrated below:
SEV-SNP: RMP table physical range [0x0000007ffe800000 - 0x000000807f0fffff]
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable
BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS
...
BIOS-e820: [mem 0x0000004080000000-0x0000007ffe7fffff] usable
BIOS-e820: [mem 0x0000007ffe800000-0x000000807f0fffff] reserved
BIOS-e820: [mem 0x000000807f100000-0x000000807f1fefff] usable
As seen here in the e820 memory map, the end range of the RMP table is not
aligned to 2MB and not reserved but it is usable as RAM.
Subsequently, kexec -s (KEXEC_FILE_LOAD syscall) loads it's purgatory
code and boot_param, command line and other setup data into this RAM
region as seen in the kexec logs below, which leads to fatal RMP fault
during kexec boot.
Loaded purgatory at 0x807f1fa000
Loaded boot_param, command line and misc at 0x807f1f8000 bufsz=0x1350 memsz=0x2000
Loaded 64bit kernel at 0x7ffae00000 bufsz=0xd06200 memsz=0x3894000
Loaded initrd at 0x7ff6c89000 bufsz=0x4176014 memsz=0x4176014
E820 memmap:
0000000000000000-000000000008efff (1)
000000000008f000-000000000008ffff (4)
0000000000090000-000000000009ffff (1)
...
0000004080000000-0000007ffe7fffff (1)
0000007ffe800000-000000807f0fffff (2)
000000807f100000-000000807f1fefff (1)
000000807f1ff000-000000807fffffff (2)
nr_segments = 4
segment[0]: buf=0x00000000e626d1a2 bufsz=0x4000 mem=0x807f1fa000 memsz=0x5000
segment[1]: buf=0x0000000029c67bd6 bufsz=0x1350 mem=0x807f1f8000 memsz=0x2000
segment[2]: buf=0x0000000045c60183 bufsz=0xd06200 mem=0x7ffae00000 memsz=0x3894000
segment[3]: buf=0x000000006e54f08d bufsz=0x4176014 mem=0x7ff6c89000 memsz=0x4177000
kexec_file_load: type:0, start:0x807f1fa150 head:0x1184d0002 flags:0x0
Check if RMP table start and end physical range in the e820 tables are
not aligned to 2MB and in that case map this range to reserved in all
the three e820 tables.
[ bp: Massage. ]
Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/df6e995ff88565262c2c7c69964883ff8aa6fc30.1714090302.git.ashish.kalra@amd.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Add a new API helper e820__range_update_table() with which to update an
arbitrary e820 table. Move all current users of
e820__range_update_kexec() to this new helper.
[ bp: Massage. ]
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/b726af213ad55053f8a7a1e793b01bb3f1ca9dd5.1714090302.git.ashish.kalra@amd.com
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two fixes when running as Xen PV guests for issues introduced in the
6.9 merge window, both related to apic id handling"
* tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: return a sane initial apic id when running as PV guest
x86/xen/smp_pv: Register the boot CPU APIC properly
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
With recent sanity checks for topology information added, there are now
warnings issued for APs when running as a Xen PV guest:
[Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
This is due to the initial APIC ID obtained via CPUID for PV guests is
always 0.
Avoid the warnings by synthesizing the CPUID data to contain the same
initial APIC ID as xen_pv_smp_config() is using for registering the
APIC IDs of all CPUs.
Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete")
Signed-off-by: Juergen Gross <jgross@suse.com>
|
| | |/ / / / / / /
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.
The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.
Reported-by: Juergen Gross <jgross@suse.com>
Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx
Signed-off-by: Juergen Gross <jgross@suse.com>
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. Fix the callers
that provide the last byte within the range instead.
- 3270 Channel Command Word (CCW) may contain zero data address in case
there is no data in the request. Add data availability check to avoid
erroneous non-zero value as result of virt_to_dma32(NULL) application
in cases there is no data
- Add missing CFI directives for an unwinder to restore the return
address in the vDSO assembler code
- NUL-terminate kernel buffer when duplicating user space memory region
on Channel IO (CIO) debugfs write inject
- Fix wrong format string in zcrypt debug output
- Return -EBUSY code when a CCA card is temporarily unavailabile
- Restore a loop that retries derivation of a protected key from a
secure key in cases the low level reports temporarily unavailability
with -EBUSY code
* tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/paes: Reestablish retry loop in paes
s390/zcrypt: Use EBUSY to indicate temp unavailability
s390/zcrypt: Handle ep11 cprb return code
s390/zcrypt: Fix wrong format string in debug feature printout
s390/cio: Ensure the copied buf is NUL terminated
s390/vdso: Add CFI for RA register to asm macro vdso_func
s390/3270: Fix buffer assignment
s390/mm: Fix clearing storage keys for huge pages
s390/mm: Fix storage key clearing for guest huge pages
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
With commit ed6776c96c60 ("s390/crypto: remove retry
loop with sleep from PAES pkey invocation") the retry
loop to retry derivation of a protected key from a
secure key has been removed. This was based on the
assumption that theses retries are not needed any
more as proper retries are done in the zcrypt layer.
However, tests have revealed that there exist some
cases with master key change in the HSM and immediately
(< 1 second) attempt to derive a protected key from a
secure key with exact this HSM may eventually fail.
The low level functions in zcrypt_ccamisc.c and
zcrypt_ep11misc.c detect and report this temporary
failure and report it to the caller as -EBUSY. The
re-established retry loop in the paes implementation
catches exactly this -EBUSY and eventually may run
some retries.
Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The return-address (RA) register r14 is specified as volatile in the
s390x ELF ABI [1]. Nevertheless proper CFI directives must be provided
for an unwinder to restore the return address, if the RA register
value is changed from its value at function entry, as it is the case.
[1]: s390x ELF ABI, https://github.com/IBM/s390x-abi/releases
Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO")
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.
The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().
Fixes: 3afdfca69870 ("s390/mm: Clear skeys for newly mapped huge guest pmds")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-3-imbrenda@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.
The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().
Fixes: 964c2c05c9f3 ("s390/mm: Clear huge page storage keys on enable_skey")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-2-imbrenda@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
| |\ \ \ \ \ \ \ \ \
| | |_|_|_|_|_|_|/ /
| |/| | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Pull xtensa fixes from Max Filippov:
- fix unused variable warning caused by empty flush_dcache_page()
definition
- fix stack unwinding on windowed noMMU XIP configurations
- fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform
code
* tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros
tty: xtensa/iss: Use min() to fix Coccinelle warning
xtensa: fix MAKE_PC_FROM_RA second argument
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros
xtensa's flush_dcache_page() can be a no-op sometimes. There is a
generic implementation for this case in include/asm-generic/
cacheflush.h.
#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
static inline void flush_dcache_page(struct page *page)
{
}
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#endif
So remove the superfluous flush_dcache_page() definition, which also
helps silence potential build warnings complaining the page variable
passed to flush_dcache_page() is not used.
In file included from crypto/scompress.c:12:
include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone':
include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable]
76 | struct page *page;
| ^~~~
crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
>> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable]
174 | struct page *dst_page = sg_page(req->dst);
|
The issue was originally reported on LoongArch by kernel test
robot (Huacai fixed it on LoongArch), then reported by Guenter
and me on xtensa.
This patch also removes lots of redundant macros which have
been defined by asm-generic/cacheflush.h.
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/
Reported-by: Barry Song <v-songbaohua@oppo.com>
Closes: https://lore.kernel.org/all/CAGsJ_4yDk1+axbte7FKQEwD7X2oxUCFrEc9M5YOS1BobfDFXPA@mail.gmail.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/aaa8b7d7-5abe-47bf-93f6-407942436472@roeck-us.net/
Fixes: 77292bb8ca69 ("crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Message-Id: <20240319010920.125192-1-21cnbao@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Inline strlen() and use min() to fix the following Coccinelle/coccicheck
warning reported by minmax.cocci:
WARNING opportunity for min()
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Message-Id: <20240404075811.6936-3-thorsten.blum@toblux.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Xtensa has two-argument MAKE_PC_FROM_RA macro to convert a0 to an actual
return address because when windowed ABI is used call{,x}{4,8,12}
opcodes stuff encoded window size into the top 2 bits of the register
that becomes a return address in the called function. Second argument of
that macro is supposed to be an address having these 2 topmost bits set
correctly, but the comment suggested that that could be the stack
address. However the stack doesn't have to be in the same 1GByte region
as the code, especially in noMMU XIP configurations.
Fix the comment and use either _text or regs->pc as the second argument
for the MAKE_PC_FROM_RA macro.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|\| | | | | | | | |
| |_|_|_|_|_|_|_|/
|/| | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a0c ("bpf: verifier: prevent userspace memory access")
d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
|
| | |\ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-04-26
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 14 files changed, 168 insertions(+), 72 deletions(-).
The main changes are:
1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page,
from Puranjay Mohan.
2) Fix a crash in XDP with devmap broadcast redirect when the latter map
is in process of being torn down, from Toke Høiland-Jørgensen.
3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF
program runtime stats, from Xu Kuohai.
4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue,
from Jason Xing.
5) Fix BPF verifier error message in resolve_pseudo_ldimm64,
from Anton Protopopov.
6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item,
from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64
bpf, x86: Fix PROBE_MEM runtime load check
bpf: verifier: prevent userspace memory access
xdp: use flags field to disambiguate broadcast redirect
arm32, bpf: Reimplement sign-extension mov instruction
riscv, bpf: Fix incorrect runtime stats
bpf, arm64: Fix incorrect runtime stats
bpf: Fix a verifier verbose message
bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers
MAINTAINERS: Update email address for Puranjay Mohan
bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
====================
Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
address being loaded from is not necessarily valid. The BPF jit sets up
exception handlers for each such load which catch page faults and 0 out
the destination register.
If the address for the load is outside kernel address space, the load
will escape the exception handling and crash the kernel. To prevent this
from happening, the emits some instruction to verify that addr is > end
of userspace addresses.
x86 has a legacy vsyscall ABI where a page at address 0xffffffffff600000
is mapped with user accessible permissions. The addresses in this page
are considered userspace addresses by the fault handler. Therefore, a
BPF program accessing this page will crash the kernel.
This patch fixes the runtime checks to also check that the PROBE_MEM
address is below VSYSCALL_ADDR.
Example BPF program:
SEC("fentry/tcp_v4_connect")
int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
{
*(volatile unsigned long *)&sk->sk_tsq_flags;
return 0;
}
BPF Assembly:
0: (79) r1 = *(u64 *)(r1 +0)
1: (79) r1 = *(u64 *)(r1 +344)
2: (b7) r0 = 0
3: (95) exit
x86-64 JIT
==========
BEFORE AFTER
------ -----
0: nopl 0x0(%rax,%rax,1) 0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax 5: xchg %ax,%ax
7: push %rbp 7: push %rbp
8: mov %rsp,%rbp 8: mov %rsp,%rbp
b: mov 0x0(%rdi),%rdi b: mov 0x0(%rdi),%rdi
-------------------------------------------------------------------------------
f: movabs $0x100000000000000,%r11 f: movabs $0xffffffffff600000,%r10
19: add $0x2a0,%rdi 19: mov %rdi,%r11
20: cmp %r11,%rdi 1c: add $0x2a0,%r11
23: jae 0x0000000000000029 23: sub %r10,%r11
25: xor %edi,%edi 26: movabs $0x100000000a00000,%r10
27: jmp 0x000000000000002d 30: cmp %r10,%r11
29: mov 0x0(%rdi),%rdi 33: ja 0x0000000000000039
--------------------------------\ 35: xor %edi,%edi
2d: xor %eax,%eax \ 37: jmp 0x0000000000000040
2f: leave \ 39: mov 0x2a0(%rdi),%rdi
30: ret \--------------------------------------------
40: xor %eax,%eax
42: leave
43: ret
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240424100210.11982-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
thwart invalid memory accesses, the JITs add an exception table entry
for all such accesses. But in case the src_reg + offset is a userspace
address, the BPF program might read that memory if the user has
mapped it.
Make the verifier add guard instructions around such memory accesses and
skip the load if the address falls into the userspace region.
The JITs need to implement bpf_arch_uaddress_limit() to define where
the userspace addresses end for that architecture or TASK_SIZE is taken
as default.
The implementation is as follows:
REG_AX = SRC_REG
if(offset)
REG_AX += offset;
REG_AX >>= 32;
if (REG_AX <= (uaddress_limit >> 32))
DST_REG = 0;
else
DST_REG = *(size *)(SRC_REG + offset);
Comparing just the upper 32 bits of the load address with the upper
32 bits of uaddress_limit implies that the values are being aligned down
to a 4GB boundary before comparison.
The above means that all loads with address <= uaddress_limit + 4GB are
skipped. This is acceptable because there is a large hole (much larger
than 4GB) between userspace and kernel space memory, therefore a
correctly functioning BPF program should not access this 4GB memory
above the userspace.
Let's analyze what this patch does to the following fentry program
dereferencing an untrusted pointer:
SEC("fentry/tcp_v4_connect")
int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
{
*(volatile long *)sk;
return 0;
}
BPF Program before | BPF Program after
------------------ | -----------------
0: (79) r1 = *(u64 *)(r1 +0) 0: (79) r1 = *(u64 *)(r1 +0)
-----------------------------------------------------------------------
1: (79) r1 = *(u64 *)(r1 +0) --\ 1: (bf) r11 = r1
----------------------------\ \ 2: (77) r11 >>= 32
2: (b7) r0 = 0 \ \ 3: (b5) if r11 <= 0x8000 goto pc+2
3: (95) exit \ \-> 4: (79) r1 = *(u64 *)(r1 +0)
\ 5: (05) goto pc+1
\ 6: (b7) r1 = 0
\--------------------------------------
7: (b7) r0 = 0
8: (95) exit
As you can see from above, in the best case (off=0), 5 extra instructions
are emitted.
Now, we analyze the same program after it has gone through the JITs of
ARM64 and RISC-V architectures. We follow the single load instruction
that has the untrusted pointer and see what instrumentation has been
added around it.
x86-64 JIT
==========
JIT's Instrumentation
(upstream)
---------------------
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: mov 0x0(%rdi),%rdi
---------------------------------
f: movabs $0x800000000000,%r11
19: cmp %r11,%rdi
1c: jb 0x000000000000002a
1e: mov %rdi,%r11
21: add $0x0,%r11
28: jae 0x000000000000002e
2a: xor %edi,%edi
2c: jmp 0x0000000000000032
2e: mov 0x0(%rdi),%rdi
---------------------------------
32: xor %eax,%eax
34: leave
35: ret
The x86-64 JIT already emits some instructions to protect against user
memory access. This patch doesn't make any changes for the x86-64 JIT.
ARM64 JIT
=========
No Intrumentation Verifier's Instrumentation
(upstream) (This patch)
----------------- --------------------------
0: add x9, x30, #0x0 0: add x9, x30, #0x0
4: nop 4: nop
8: paciasp 8: paciasp
c: stp x29, x30, [sp, #-16]! c: stp x29, x30, [sp, #-16]!
10: mov x29, sp 10: mov x29, sp
14: stp x19, x20, [sp, #-16]! 14: stp x19, x20, [sp, #-16]!
18: stp x21, x22, [sp, #-16]! 18: stp x21, x22, [sp, #-16]!
1c: stp x25, x26, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]!
20: stp x27, x28, [sp, #-16]! 20: stp x27, x28, [sp, #-16]!
24: mov x25, sp 24: mov x25, sp
28: mov x26, #0x0 28: mov x26, #0x0
2c: sub x27, x25, #0x0 2c: sub x27, x25, #0x0
30: sub sp, sp, #0x0 30: sub sp, sp, #0x0
34: ldr x0, [x0] 34: ldr x0, [x0]
--------------------------------------------------------------------------------
38: ldr x0, [x0] ----------\ 38: add x9, x0, #0x0
-----------------------------------\\ 3c: lsr x9, x9, #32
3c: mov x7, #0x0 \\ 40: cmp x9, #0x10, lsl #12
40: mov sp, sp \\ 44: b.ls 0x0000000000000050
44: ldp x27, x28, [sp], #16 \\--> 48: ldr x0, [x0]
48: ldp x25, x26, [sp], #16 \ 4c: b 0x0000000000000054
4c: ldp x21, x22, [sp], #16 \ 50: mov x0, #0x0
50: ldp x19, x20, [sp], #16 \---------------------------------------
54: ldp x29, x30, [sp], #16 54: mov x7, #0x0
58: add x0, x7, #0x0 58: mov sp, sp
5c: autiasp 5c: ldp x27, x28, [sp], #16
60: ret 60: ldp x25, x26, [sp], #16
64: nop 64: ldp x21, x22, [sp], #16
68: ldr x10, 0x0000000000000070 68: ldp x19, x20, [sp], #16
6c: br x10 6c: ldp x29, x30, [sp], #16
70: add x0, x7, #0x0
74: autiasp
78: ret
7c: nop
80: ldr x10, 0x0000000000000088
84: br x10
There are 6 extra instructions added in ARM64 in the best case. This will
become 7 in the worst case (off != 0).
RISC-V JIT (RISCV_ISA_C Disabled)
==========
No Intrumentation Verifier's Instrumentation
(upstream) (This patch)
----------------- --------------------------
0: nop 0: nop
4: nop 4: nop
8: li a6, 33 8: li a6, 33
c: addi sp, sp, -16 c: addi sp, sp, -16
10: sd s0, 8(sp) 10: sd s0, 8(sp)
14: addi s0, sp, 16 14: addi s0, sp, 16
18: ld a0, 0(a0) 18: ld a0, 0(a0)
---------------------------------------------------------------
1c: ld a0, 0(a0) --\ 1c: mv t0, a0
--------------------------\ \ 20: srli t0, t0, 32
20: li a5, 0 \ \ 24: lui t1, 4096
24: ld s0, 8(sp) \ \ 28: sext.w t1, t1
28: addi sp, sp, 16 \ \ 2c: bgeu t1, t0, 12
2c: sext.w a0, a5 \ \--> 30: ld a0, 0(a0)
30: ret \ 34: j 8
\ 38: li a0, 0
\------------------------------
3c: li a5, 0
40: ld s0, 8(sp)
44: addi sp, sp, 16
48: sext.w a0, a5
4c: ret
There are 7 extra instructions added in RISC-V.
Fixes: 800834285361 ("bpf, arm64: Add BPF exception tables")
Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240424100210.11982-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
The current implementation of the mov instruction with sign extension has the
following problems:
1. It clobbers the source register if it is not stacked because it
sign extends the source and then moves it to the destination.
2. If the dst_reg is stacked, the current code doesn't write the value
back in case of 64-bit mov.
3. There is room for improvement by emitting fewer instructions.
The steps for fixing this and the instructions emitted by the JIT are explained
below with examples in all combinations:
Case A: offset == 32:
=====================
Case A.1: src and dst are stacked registers:
--------------------------------------------
1. Load src_lo into tmp_lo
2. Store tmp_lo into dst_lo
3. Sign extend tmp_lo into tmp_hi
4. Store tmp_hi to dst_hi
Example: r3 = (s32)r3
r3 is a stacked register
ldr r6, [r11, #-16] // Load r3_lo into tmp_lo
// str to dst_lo is not emitted because src_lo == dst_lo
asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi
str r7, [r11, #-12] // Store tmp_hi into r3_hi
Case A.2: src is stacked but dst is not:
----------------------------------------
1. Load src_lo into dst_lo
2. Sign extend dst_lo into dst_hi
Example: r6 = (s32)r3
r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
ldr r4, [r11, #-16] // Load r3_lo into r6_lo
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case A.3: src is not stacked but dst is stacked:
------------------------------------------------
1. Store src_lo into dst_lo
2. Sign extend src_lo into tmp_hi
3. Store tmp_hi to dst_hi
Example: r3 = (s32)r6
r3 is stacked and r6 maps to {ARM_R5, ARM_R4}
str r4, [r11, #-16] // Store r6_lo to r3_lo
asr r7, r4, #31 // Sign extend r6_lo into tmp_hi
str r7, [r11, #-12] // Store tmp_hi to dest_hi
Case A.4: Both src and dst are not stacked:
-------------------------------------------
1. Mov src_lo into dst_lo
2. Sign extend src_lo into dst_hi
Example: (bf) r6 = (s32)r6
r6 maps to {ARM_R5, ARM_R4}
// Mov not emitted because dst == src
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case B: offset != 32:
=====================
Case B.1: src and dst are stacked registers:
--------------------------------------------
1. Load src_lo into tmp_lo
2. Sign extend tmp_lo according to offset.
3. Store tmp_lo into dst_lo
4. Sign extend tmp_lo into tmp_hi
5. Store tmp_hi to dst_hi
Example: r9 = (s8)r3
r9 and r3 are both stacked registers
ldr r6, [r11, #-16] // Load r3_lo into tmp_lo
lsl r6, r6, #24 // Sign extend tmp_lo
asr r6, r6, #24 // ..
str r6, [r11, #-56] // Store tmp_lo to r9_lo
asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi
str r7, [r11, #-52] // Store tmp_hi to r9_hi
Case B.2: src is stacked but dst is not:
----------------------------------------
1. Load src_lo into dst_lo
2. Sign extend dst_lo according to offset.
3. Sign extend tmp_lo into dst_hi
Example: r6 = (s8)r3
r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
ldr r4, [r11, #-16] // Load r3_lo to r6_lo
lsl r4, r4, #24 // Sign extend r6_lo
asr r4, r4, #24 // ..
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case B.3: src is not stacked but dst is stacked:
------------------------------------------------
1. Sign extend src_lo into tmp_lo according to offset.
2. Store tmp_lo into dst_lo.
3. Sign extend src_lo into tmp_hi.
4. Store tmp_hi to dst_hi.
Example: r3 = (s8)r1
r3 is stacked and r1 maps to {ARM_R3, ARM_R2}
lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo
asr r6, r6, #24 // ..
str r6, [r11, #-16] // Store tmp_lo to r3_lo
asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi
str r7, [r11, #-12] // Store tmp_hi to r3_hi
Case B.4: Both src and dst are not stacked:
-------------------------------------------
1. Sign extend src_lo into dst_lo according to offset.
2. Sign extend dst_lo into dst_hi.
Example: r6 = (s8)r1
r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2}
lsl r4, r2, #24 // Sign extend r1_lo to r6_lo
asr r4, r4, #24 // ..
asr r5, r4, #31 // Sign extend r6_lo to r6_hi
Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com
Link: https://lore.kernel.org/bpf/20240419182832.27707-1-puranjay@kernel.org
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
When __bpf_prog_enter() returns zero, the s1 register is not set to zero,
resulting in incorrect runtime stats. Fix it by setting s1 immediately upon
the return of __bpf_prog_enter().
Fixes: 49b5e77ae3e2 ("riscv, bpf: Add bpf trampoline support for RV64")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240416064208.2919073-3-xukuohai@huaweicloud.com
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
When __bpf_prog_enter() returns zero, the arm64 register x20 that stores
prog start time is not assigned to zero, causing incorrect runtime stats.
To fix it, assign the return value of bpf_prog_enter() to x20 register
immediately upon its return.
Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240416064208.2919073-2-xukuohai@huaweicloud.com
|
| |\ \ \ \ \ \ \ \ \ \
| | |_|_|_|_|/ / / / /
| |/| | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.9, part #2
- Fix + test for a NULL dereference resulting from unsanitised user
input in the vgic-v2 device attribute accessors
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
vgic_v2_parse_attr() is responsible for finding the vCPU that matches
the user-provided CPUID, which (of course) may not be valid. If the ID
is invalid, kvm_get_vcpu_by_id() returns NULL, which isn't handled
gracefully.
Similar to the GICv3 uaccess flow, check that kvm_get_vcpu_by_id()
actually returns something and fail the ioctl if not.
Cc: stable@vger.kernel.org
Fixes: 7d450e282171 ("KVM: arm/arm64: vgic-new: Add userland access to VGIC dist registers")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240424173959.3776798-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
| |\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Make the CPU_MITIGATIONS=n interaction with conflicting
mitigation-enabling boot parameters a bit saner.
- Re-enable CPU mitigations by default on non-x86
- Fix TDX shared bit propagation on mprotect()
- Fix potential show_regs() system hang when PKE initialization
is not fully finished yet.
- Add the 0x10-0x1f model IDs to the Zen5 range
- Harden #VC instruction emulation some more
* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
cpu: Re-enable CPU mitigations by default for !X86 architectures
x86/tdx: Preserve shared bit on mprotect()
x86/cpu: Fix check for RDPKRU in __show_regs()
x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Explicitly disallow enabling mitigations at runtime for kernels that were
built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code
entirely if mitigations are disabled at compile time.
E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS,
and trying to provide sane behavior for retroactively enabling mitigations
is extremely difficult, bordering on impossible. E.g. page table isolation
and call depth tracking require build-time support, BHI mitigations will
still be off without additional kernel parameters, etc.
[ bp: Touchups. ]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com
|