| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move the bpf verifier trace check into the new switch statement in
HEAD.
Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking fixes from David Miller:
1) Fix sk_psock reference count leak on receive, from Xiyu Yang.
2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.
3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
from Maciej Żenczykowski.
4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
Abeni.
5) Fix netprio cgroup v2 leak, from Zefan Li.
6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.
7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.
8) Various bpf probe handling fixes, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
selftests: mptcp: pm: rm the right tmp file
dpaa2-eth: properly handle buffer size restrictions
bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
MAINTAINERS: Mark networking drivers as Maintained.
ipmr: Add lockdep expression to ipmr_for_each_table macro
ipmr: Fix RCU list debugging warning
drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
tcp: fix error recovery in tcp_zerocopy_receive()
MAINTAINERS: Add Jakub to networking drivers.
MAINTAINERS: another add of Karsten Graul for S390 networking
drivers: ipa: fix typos for ipa_smp2p structure doc
pppoe: only process PADT targeted at local interfaces
selftests/bpf: Enforce returning 0 for fentry/fexit programs
bpf: Enforce returning 0 for fentry/fexit progs
net: stmmac: fix num_por initialization
security: Fix the default value of secid_to_secctx hook
libbpf: Fix register naming in PT_REGS s390 macros
...
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Alexei Starovoitov says:
====================
pull-request: bpf 2020-05-15
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 137 insertions(+), 43 deletions(-).
The main changes are:
1) Fix secid_to_secctx LSM hook default value, from Anders.
2) Fix bug in mmap of bpf array, from Andrii.
3) Restrict bpf_probe_read to archs where they work, from Daniel.
4) Enforce returning 0 for fentry/fexit progs, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There are a few fentry/fexit programs returning non-0.
The tests with these programs will break with the previous
patch which enfoced return-0 rules. Fix them properly.
Fixes: ac065870d928 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200514053207.1298479-1-yhs@fb.com
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix register naming in PT_REGS s390 macros
Fixes: b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513154414.29972-1-sumanthk@linux.ibm.com
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
"$err" is a variable pointing to a temp file. "$out" is not: only used
as a local variable in "check()" and representing the output of a
command line.
Fixes: eedbc685321b (selftests: add PM netlink functional tests)
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- lkdtm runner fixes to prevent dmesg clearing and shellcheck errors
- ftrace test handling when test module doesn't exist
- nsfs test fix to replace zero-length array with flexible-array
- dmabuf-heaps test fix to return clear error value
* tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/lkdtm: Use grep -E instead of egrep
selftests/lkdtm: Don't clear dmesg when running tests
selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist
tools/testing: Replace zero-length array with flexible-array
kselftests: dmabuf-heaps: Fix confused return value on expected error testing
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
shellcheck complains that egrep is deprecated, and the grep man page
agrees. Use grep -E instead.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It is Very Rude to clear dmesg in test scripts. That's because the
script may be part of a larger test run, and clearing dmesg
potentially destroys the output of other tests.
We can avoid using dmesg -c by saving the content of dmesg before the
test, and then using diff to compare that to the dmesg afterward,
producing a log with just the added lines.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
module does not exist
The UNRESOLVED state is much more apporiate than the UNSUPPORTED state
for the absence of the test module, as it matches "test was set up
incorrectly" situation in the README file.
A possible scenario is that the function was enabled (supported by the
kernel) but the module was not installed properly, in this case we
cannot call this as UNSUPPORTED.
This change also make it consistent with other module-related tests
in ftrace.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When I added the expected error testing, I forgot I need to set
the return to zero when we successfully see an error.
Without this change we only end up testing a single heap
before the test quits.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Brian Starkey <brian.starkey@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Andrew F. Davis" <afd@ti.com>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Fixes to previous fixes.
Unfortunately, the last set of fixes introduced some minor bugs:
- The bootconfig apply_xbc() leak fix caused the application to
return a positive number on success, when it should have returned
zero.
- The preempt_irq_delay_thread fix to make the creation code wait for
the kthread to finish to prevent it from executing after module
unload, can now cause the kthread to exit before it even executes
(preventing it to run its tests).
- The fix to the bootconfig that fixed the initrd to remove the
bootconfig from causing the kernel to panic, now prints a warning
that the bootconfig is not found, even when bootconfig is not on
the command line"
* tag 'trace-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: Fix to prevent warning message if no bootconfig option
tracing: Wait for preempt irq delay thread to execute
tools/bootconfig: Fix apply_xbc() to return zero on success
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The return of apply_xbc() returns the result of the last write() call, which
is not what is expected. It should only return zero on success.
Link: https://lore.kernel.org/r/20200508093059.GF9365@kadam
Fixes: 8842604446d1 ("tools/bootconfig: Fix resource leak in apply_xbc()")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing
page attributes. The optimization for small ranges failed to do so
when the virtual address was in the vmalloc or module space.
- Unbreak the trace event registration for syscalls without arguments
caused by the refactoring of the SYSCALL_DEFINE0() macro.
- Move the printk in the TSC deadline timer code to a place where it
is guaranteed to only be called once during boot and cannot be
rearmed by clearing warn_once after boot. If it's invoked post boot
then lockdep rightfully complains about a potential deadlock as the
calling context is different.
- A series of fixes for objtool and the ORC unwinder addressing
variety of small issues:
- Stack offset tracking for indirect CFAs in objtool ignored
subsequent pushs and pops
- Repair the unwind hints in the register clearing entry ASM code
- Make the unwinding in the low level exit to usermode code stop
after switching to the trampoline stack. The unwind hint is no
longer valid and the ORC unwinder emits a warning as it can't
find the registers anymore.
- Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
which caused objtool to generate bogus ORC data.
- Prevent unwinder warnings when dumping the stack of a
non-current task as there is no way to be sure about the
validity because the dumped stack can be a moving target.
- Make the ORC unwinder behave the same way as the frame pointer
unwinder when dumping an inactive tasks stack and do not skip
the first frame.
- Prevent ORC unwinding before ORC data has been initialized
- Immediately terminate unwinding when a unknown ORC entry type
is found.
- Prevent premature stop of the unwinder caused by IRET frames.
- Fix another infinite loop in objtool caused by a negative
offset which was not catched.
- Address a few build warnings in the ORC unwinder and add
missing static/ro_after_init annotations"
* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
x86/apic: Move TSC deadline timer debug printk
ftrace/x86: Fix trace event registration for syscalls without arguments
x86/mm/cpa: Flush direct map alias during cpa
objtool: Fix infinite loop in for_offset_range()
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
x86/unwind/orc: Fix error path for bad ORC entry type
x86/unwind/orc: Prevent unwinding before ORC initialization
x86/unwind/orc: Don't skip the first frame for inactive tasks
x86/unwind: Prevent false warnings for non-current tasks
x86/unwind/orc: Convert global variables to static
x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
x86/entry/64: Fix unwind hints in __switch_to_asm()
x86/entry/64: Fix unwind hints in kernel exit path
x86/entry/64: Fix unwind hints in register clearing code
objtool: Fix stack offset tracking for indirect CFAs
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Randy reported that objtool got stuck in an infinite loop when
processing drivers/i2c/busses/i2c-parport.o. It was caused by the
following code:
00000000000001fd <line_set>:
1fd: 48 b8 00 00 00 00 00 movabs $0x0,%rax
204: 00 00 00
1ff: R_X86_64_64 .rodata-0x8
207: 41 55 push %r13
209: 41 89 f5 mov %esi,%r13d
20c: 41 54 push %r12
20e: 49 89 fc mov %rdi,%r12
211: 55 push %rbp
212: 48 89 d5 mov %rdx,%rbp
215: 53 push %rbx
216: 0f b6 5a 01 movzbl 0x1(%rdx),%ebx
21a: 48 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%rsi
221: 00
21e: R_X86_64_32S .rodata
222: 48 89 f1 mov %rsi,%rcx
225: 48 29 c1 sub %rax,%rcx
find_jump_table() saw the .rodata reference and tried to find a jump
table associated with it (though there wasn't one). The -0x8 rela
addend is unusual. It caused find_jump_table() to send a negative
table_offset (unsigned 0xfffffffffffffff8) to find_rela_by_dest().
The negative offset should have been harmless, but it actually threw
for_offset_range() for a loop... literally. When the mask value got
incremented past the end value, it also wrapped to zero, causing the
loop exit condition to remain true forever.
Prevent this scenario from happening by ensuring the incremented value
is always >= the starting value.
Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/02b719674b031800b61e33c30b2e823183627c19.1587842122.git.jpoimboe@redhat.com
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops. This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.
This fixes the following unwinder warning:
WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0
Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
"A single fix for objtool to prevent an infinite loop in the
jump table search which can be triggered when building the
kernel with '-ffunction-sections'"
* tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix infinite loop in find_jump_table()
|
| | | |_|/ /
| | |/| | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Kristen found a hang in objtool when building with -ffunction-sections.
It was caused by evergreen_pcie_gen2_enable.cold() being laid out
immediately before evergreen_pcie_gen2_enable(). Since their "pfunc" is
always the same, find_jump_table() got into an infinite loop because it
didn't recognize the boundary between the two functions.
Fix that with a new prev_insn_same_sym() helper, which doesn't cross
subfunction boundaries.
Reported-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/378b51c9d9c894dc3294bc460b4b0869e950b7c5.1588110291.git.jpoimboe@redhat.com
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull block fixes from Jens Axboe:
- a small series fixing a use-after-free of bdi name (Christoph,Yufen)
- NVMe fix for a regression with the smaller CQ update (Alexey)
- NVMe fix for a hang at namespace scanning error recovery (Sagi)
- fix race with blk-iocost iocg->abs_vdebt updates (Tejun)
* tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block:
nvme: fix possible hang when ns scanning fails during error recovery
nvme-pci: fix "slimmer CQ head update"
bdi: add a ->dev_name field to struct backing_dev_info
bdi: use bdi_dev_name() to get device name
bdi: move bdi_dev_name out of line
vboxsf: don't use the source name in the bdi name
iocost: protect iocg->abs_vdebt with iocg->waitq.lock
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.
However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.
1. The iocg is in the process of being deactivated by the periodic timer.
2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
without anything because it still sees that the iocg is already active.
3. The iocg is deactivated.
4. The bio from #2 is over budget but needs to be forced. It increases
abs_vdebt and goes over the threshold and enables use_delay.
5. IO control is enabled for the iocg's subtree and now IOs are attributed
to the descendant cgroups and the iocg itself no longer issues IOs.
This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.
The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.
This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f246 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Merge misc fixes from Andrew Morton:
"14 fixes and one selftest to verify the ipc fixes herein"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: limit boost_watermark on small zones
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
epoll: atomically remove wait entry on wake up
kselftests: introduce new epoll60 testcase for catching lost wakeups
percpu: make pcpu_alloc() aware of current gfp context
mm/slub: fix incorrect interpretation of s->offset
scripts/gdb: repair rb_first() and rb_last()
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
scripts/decodecode: fix trapping instruction formatting
kernel/kcov.c: fix typos in kcov_remote_start documentation
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
mm, memcg: fix error return value of mem_cgroup_css_alloc()
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The documentation for UBSAN_ALIGNMENT already mentions that it should
not be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).
Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |_|_|_|/
| | |/| | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This test case catches lost wake up introduced by commit 339ddb53d373
("fs/epoll: remove unnecessary wakeups of nested epoll")
The test is simple: we have 10 threads and 10 event fds. Each thread
can harvest only 1 event. 1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.
In case of lost wakeup epoll_wait() will timeout and 0 will be returned.
Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:
5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")
the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups. Problem is fixed in the
following patch.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |\ \ \ \ \ \
| | | |_|_|_|/
| | |/| | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM
enabled
- Fix allocation leaks in bootconfig tool
- Fix a double initialization of a variable
- Fix API bootconfig usage from kprobe boot time events
- Reject NULL location for kprobes
- Fix crash caused by preempt delay module not cleaning up kthread
correctly
- Add vmalloc_sync_mappings() to prevent x86_64 page faults from
recursively faulting from tracing page faults
- Fix comment in gpu/trace kerneldoc header
- Fix documentation of how to create a trace event class
- Make the local tracing_snapshot_instance_cond() function static
* tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tools/bootconfig: Fix resource leak in apply_xbc()
tracing: Make tracing_snapshot_instance_cond() static
tracing: Fix doc mistakes in trace sample
gpu/trace: Minor comment updates for gpu_mem_total tracepoint
tracing: Add a vmalloc_sync_mappings() for safe measure
tracing: Wait for preempt irq delay thread to finish
tracing/kprobes: Reject new event if loc is NULL
tracing/boottime: Fix kprobe event API usage
tracing/kprobes: Fix a double initialization typo
bootconfig: Fix to remove bootconfig data from initrd while boot
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Fix the @data and @fd allocations that are leaked in the error path of
apply_xbc().
Link: http://lkml.kernel.org/r/583a49c9-c27a-931d-e6c2-6f63a4b18bea@huawei.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"ftrace test fixes and a fix to kvm Makefile for relocatable
native/cross builds and installs"
* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix kvm relocatable native/cross builds and installs
selftests/ftrace: Make XFAIL green color
ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
ftrace/selftests: workaround cgroup RT scheduling issues
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
kvm test Makefile doesn't fully support cross-builds and installs.
UNAME_M = $(shell uname -m) variable is used to define the target
programs and libraries to be built from arch specific sources in
sub-directories.
For cross-builds to work, UNAME_M has to map to ARCH and arch specific
directories and targets in this Makefile.
UNAME_M variable to used to run the compiles pointing to the right arch
directories and build the right targets for these supported architectures.
TEST_GEN_PROGS and LIBKVM are set using UNAME_M variable.
LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable.
x86_64 targets are named to include x86_64 as a suffix and directories
for includes are in x86_64 sub-directory. s390x and aarch64 follow the
same convention. "uname -m" doesn't result in the correct mapping for
s390x and aarch64. Fix it to set UNAME_M correctly for s390x and aarch64
cross-builds.
In addition, Makefile doesn't create arch sub-directories in the case of
relocatable builds and test programs under s390x and x86_64 directories
fail to build. This is a problem for native and cross-builds. Fix it to
create all necessary directories keying off of TEST_GEN_PROGS.
The following use-cases work with this change:
Native x86_64:
make O=/tmp/kselftest -C tools/testing/selftests TARGETS=kvm install \
INSTALL_PATH=$HOME/x86_64
arm64 cross-build:
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- defconfig
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
CROSS_COMPILE=aarch64-linux-gnu- all
make kselftest-install TARGETS=kvm O=$HOME/arm64_build ARCH=arm64 \
HOSTCC=gcc CROSS_COMPILE=aarch64-linux-gnu-
s390x cross-build:
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
CROSS_COMPILE=s390x-linux-gnu- defconfig
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
CROSS_COMPILE=s390x-linux-gnu- all
make kselftest-install TARGETS=kvm O=$HOME/s390x_build/ ARCH=s390 \
HOSTCC=gcc CROSS_COMPILE=s390x-linux-gnu- all
No regressions in the following use-cases:
make -C tools/testing/selftests TARGETS=kvm
make kselftest-all TARGETS=kvm
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Since XFAIL (Expected Failure) is expected to fail the test, which
means that test case works as we expected. IOW, XFAIL is same as
PASS. So make it green.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently, ftracetest will return 1 (failure) if any unresolved cases
are encountered. The unresolved status results from modules and
programs not being available, and as such does not indicate any
issues with ftrace itself. As such, change the behaviour of
ftracetest in line with unsupported cases; if unsupported cases
happen, ftracetest still returns 0 unless --fail-unsupported. Here
--fail-unresolved is added and the default is to return 0 if
unresolved results occur.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
wakeup_rt.tc and wakeup.tc tests in tracers/ subdirectory
fail due to the chrt command returning:
chrt: failed to set pid 0's policy: Operation not permitted.
To work around this, temporarily disable grout RT scheduling
during ftracetest execution. Restore original value on
test run completion. With these changes in place, both
tests consistently pass.
Fixes: c575dea2c1a5 ("selftests/ftrace: Add wakeup_rt tracer testcase")
Fixes: c1edd060b413 ("selftests/ftrace: Add wakeup tracer testcase")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
| |\ \ \ \ \ \ \
| | |_|_|/ / / /
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.
Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
I got this error when building kvm selftests:
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here
I think it's because evmcs.h is included both in a test file and a lib file so
the structs have multiple declarations when linking. After all it's not a good
habit to declare structs in the header files.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200504220607.99627-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 1 day(s) which contain
a total of 67 files changed, 741 insertions(+), 252 deletions(-).
The main changes are:
1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper.
2) bpftool can probe CONFIG_HZ, from Daniel.
3) CAP_BPF is introduced to isolate user processes that use BPF infra and
to secure BPF networking services by dropping CAP_SYS_ADMIN requirement
in certain cases, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Make all test_verifier test exercise CAP_BPF and CAP_PERFMON
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-4-alexei.starovoitov@gmail.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
In Cilium we've recently switched to make use of bpf_jiffies64() for
parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
expensive and high-precision is not needed for our timeouts we have
anyway. Our agent has a probe manager which picks up the json of
bpftool's feature probe and we also use the macro output in our C
programs e.g. to have workarounds when helpers are not available on
older kernels.
Extend the kernel config info dump to also include the kernel's
CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
macro dump such that CONFIG_HZ can be propagated to BPF C code as a
simple define if available via config. Latter allows to have _compile-
time_ resolution of jiffies <-> sec conversion in our code since all
are propagated as known constants.
Given we cannot generally assume availability of kconfig everywhere,
we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
could have an integrated probe fallback as well, although to derive it,
we might need to place it under 'bpftool feature probe full' or similar
given it would slow down the probing process overall. Yet 'full' doesn't
fit either for us since we don't want to pollute the kernel log with
warning messages from bpf_probe_write_user() and bpf_trace_printk() on
agent startup; I've left it out for the time being.
[0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200513075849.20868-1-daniel@iogearbox.net
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945350567.97035.9632611946765811876.stgit@firesoul
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945350058.97035.17280775016196207372.stgit@firesoul
|
|/ / / / / / / /
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Implement two basic tests to verify terse dump functionality of flower
classifier:
- Test that verifies that terse dump works.
- Test that verifies that terse dump doesn't print filter key.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-14
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.
2) support for narrow loads in bpf_sock_addr progs and additional
helpers in cg-skb progs, from Andrey.
3) bpf benchmark runner, from Andrii.
4) arm and riscv JIT optimizations, from Luke.
5) bpf iterator infrastructure, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and
bpf_sk_ancestor_cgroup_id helpers from cgroup skb program.
The test creates a testing cgroup, starts a TCPv6 server inside the
cgroup and creates two client sockets: one inside testing cgroup and one
outside.
Then it attaches cgroup skb program to the cgroup that checks all TCP
segments coming to the server and allows only those coming from the
cgroup of the server. If a segment comes from a peer outside of the
cgroup, it'll be dropped.
Finally the test checks that client from inside testing cgroup can
successfully connect to the server, but client outside the cgroup fails
to connect by timeout.
The main goal of the test is to check newly introduced
bpf_sk_{,ancestor_}cgroup_id helpers.
It also checks a couple of socket lookup helpers (tcp & release), but
lookup helpers were introduced much earlier and covered by other tests.
Here it's mostly checked that they can be called from cgroup skb.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Add two new network helpers.
connect_fd_to_fd connects an already created client socket fd to address
of server fd. Sometimes it's useful to separate client socket creation
and connecting this socket to a server, e.g. if client socket has to be
created in a cgroup different from that of server cgroup.
Additionally connect_to_fd is now implemented using connect_fd_to_fd,
both helpers don't treat EINPROGRESS as an error and let caller decide
how to proceed with it.
connect_wait is a helper to work with non-blocking client sockets so
that if connect_to_fd or connect_fd_to_fd returned -1 with errno ==
EINPROGRESS, caller can wait for connect to finish or for connection
timeout. The helper returns -1 on error, 0 on timeout (1sec,
hard-coded), and positive number on success.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
With having ability to lookup sockets in cgroup skb programs it becomes
useful to access cgroup id of retrieved sockets so that policies can be
implemented based on origin cgroup of such socket.
For example, a container running in a cgroup can have cgroup skb ingress
program that can lookup peer socket that is sending packets to a process
inside the container and decide whether those packets should be allowed
or denied based on cgroup id of the peer.
More specifically such ingress program can implement intra-host policy
"allow incoming packets only from this same container and not from any
other container on same host" w/o relying on source IP addresses since
quite often it can be the case that containers share same IP address on
the host.
Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
bpf_sk_ancestor_cgroup_id().
These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
helpers with the only difference that sk is used to get cgroup id
instead of skb, and share code with them.
See documentation in UAPI for more details.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
There is a spelling mistake in an error message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr
programs.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
code in BPF programs, like:
volatile __u32 user_port = ctx->user_port;
__u16 port = bpf_ntohs(user_port);
Since otherwise clang may optimize the load to be 2-byte and it's
rejected by verifier.
Add support for 1- and 2-byte loads same way as it's supported for other
fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This is to be consistent with tracing and lsm programs
which have prefix "bpf_trace_" and "bpf_lsm_" respectively.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Commit 6879c042e105 ("tools/bpf: selftests: Add bpf_iter selftests")
added self tests for bpf_iter feature. But two subtests
ipv6_route and netlink needs llvm latest 10.x release branch
or trunk due to a bug in llvm BPF backend. This patch added
the file README.rst to document these two failures
so people using llvm 10.0.0 can be aware of them.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.
This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):
base : 9.200 ± 0.319M/s
tp : 6.690 ± 0.125M/s
rawtp : 8.571 ± 0.214M/s
kprobe : 6.431 ± 0.048M/s
fentry : 8.955 ± 0.241M/s
fmodret : 8.903 ± 0.135M/s
So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.
Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
|