| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| | |
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* pm-cpufreq:
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
* pm-cpuidle:
cpuidle: dt: Add missing 'of_node_put()'
* pm-devfreq:
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
|
| | | |\
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
"An error handling corner case fix"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Drop the device lock on error
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If a custom CPU target is specified and that one is not available _or_
can't be interrupted then the code returns to userland without dropping a
lock as notices by lockdep:
|echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
| ================================================
| [ BUG: lock held when returning to user space! ]
| ------------------------------------------------
| bash/503 is leaving the kernel with locks still held!
| 1 lock held by bash/503:
| #0: (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40
So release the lock then.
Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
|
| | | |\ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fixes from Ingo Molnar:
"Fix an SRCU bug affecting KVM IRQ injection"
* 'rcu-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
srcu: Allow use of Classic SRCU from both process and interrupt context
srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context
|
| | | | |\ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into rcu/urgent
Pull RCU fix from Paul E. McKenney:
" This series enables srcu_read_lock() and srcu_read_unlock() to be used from
interrupt handlers, which fixes a bug in KVM's use of SRCU in delivery
of interrupts to guest OSes. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device. This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq(). If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.
The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case. KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods. It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).
However, the docs are overly conservative. You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts. In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU. For those two implementations, only srcu_read_lock()
is unsafe.
When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller. Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.
Cc: stable@vger.kernel.org
Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
| | | | | |/
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device. This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq(). If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.
The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case. KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods. It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).
However, the docs are overly conservative. You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts. In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU. For those two implementations, only srcu_read_lock()
is unsafe.
When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller. Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.
Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
implementation already supports mixed-context use of srcu_read_lock()
and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
and srcu_read_unlock() in each handler are nested and paired properly.
In other words, it is still illegal to (say) invoke srcu_read_lock()
in an interrupt handler and to invoke the matching srcu_read_unlock()
in a softirq handler. Therefore, the only change required for Tiny SRCU
is to its comments.
Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | |\ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This is mostly tooling fixes, plus an instruction pointer filtering
fix.
It's more fixes than usual - Arnaldo got back from a longer vacation
and there was a backlog"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
perf symbols: Kill dso__build_id_is_kmod()
perf symbols: Keep DSO->symtab_type after decompress
perf tests: Decompress kernel module before objdump
perf tools: Consolidate error path in __open_dso()
perf tools: Decompress kernel module when reading DSO data
perf annotate: Use dso__decompress_kmodule_path()
perf tools: Introduce dso__decompress_kmodule_{fd,path}
perf tools: Fix a memory leak in __open_dso()
perf annotate: Fix symbolic link of build-id cache
perf/core: Drop kernel samples even though :u is specified
perf script python: Remove dups in documentation examples
perf script python: Updated trace_unhandled() signature
perf script python: Fix wrong code snippets in documentation
perf script: Fix documentation errors
perf script: Fix outdated comment for perf-trace-python
perf probe: Fix examples section of documentation
perf report: Ensure the perf DSO mapping matches what libdw sees
perf report: Include partial stacks unwound with libdw
perf annotate: Add missing powerpc triplet
perf test: Disable breakpoint signal tests for powerpc
...
|
| | | | |/ /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When doing sampling, for example:
perf record -e cycles:u ...
On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.
This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.
The patch drops the kernel samples if exclude_kernel is specified.
For example, test on Haswell desktop:
perf record -e cycles:u <mgen>
perf report --stdio
Before patch applied:
99.77% mgen mgen [.] buf_read
0.20% mgen mgen [.] rand_buf_init
0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen mgen [.] rand_array_init
0.00% mgen [kernel.vmlinux] [k] page_fault
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] __strcasestr
0.00% mgen ld-2.23.so [.] strcmp
0.00% mgen ld-2.23.so [.] _dl_start
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _start
We can see kernel symbols apic_timer_interrupt and page_fault.
After patch applied:
99.79% mgen mgen [.] buf_read
0.19% mgen mgen [.] rand_buf_init
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen mgen [.] rand_array_init
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] vfprintf
0.00% mgen libc-2.23.so [.] rand
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen libc-2.23.so [.] _IO_doallocbuf
0.00% mgen ld-2.23.so [.] do_lookup_x
0.00% mgen ld-2.23.so [.] open_verify.constprop.7
0.00% mgen ld-2.23.so [.] _dl_important_hwcaps
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _start
There are only userspace symbols.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Cc: mark.rutland@arm.com
Cc: will.deacon@arm.com
Cc: yao.jin@intel.com
Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |\ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert one problematic commit related to system sleep and fix
one recent intel_pstate regression.
Specifics:
- Revert a recent commit that attempted to avoid spurious wakeups
from suspend-to-idle via ACPI SCI, but introduced regressions on
some systems (Rafael Wysocki).
We will get back to the problem it tried to address in the next
cycle.
- Fix a possible division by 0 during intel_pstate initialization
due to a missing check (Rafael Wysocki)"
* tag 'pm-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle"
cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min()
|
| | | | | \ \ | |
| | | | | \ \ | |
| | | | |\ \ \ \
| | | | | |_|_|/
| | | | |/| | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
* intel_pstate:
cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min()
* pm-sleep:
Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle"
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Revert commit eed4d47efe95 (ACPI / sleep: Ignore spurious SCI wakeups
from suspend-to-idle) as it turned out to be premature and triggered
a number of different issues on various systems.
That includes, but is not limited to, premature suspend-to-RAM aborts
on Dell XPS 13 (9343) reported by Dominik.
The issue the commit in question attempted to address is real and
will need to be taken care of going forward, but evidently more work
is needed for this purpose.
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |\ \ \ \ \
| | | | |_|_|_|/
| | | |/| | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
"This reverts a fix added into 4.12-rc1. It caused the kernel log to be
printed on another console when two consoles of the same type were
defined, e.g. console=ttyS0 console=ttyS1.
This configuration was never supported by kernel itself, but it
started to make sense with systemd. In other words, the commit broke
userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
Revert "printk: fix double printing with earlycon"
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Revert commit 39b64aa1c007 (cpufreq: schedutil: Reduce frequencies
slower) that introduced unintentional changes in behavior leading
to adverse effects on some systems.
Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
__u8 h = __sk_buff->hash, or
__u16 p = __sk_buff->protocol
__u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.
This patch solves the issue by:
. Introduce a new parameter ctx_field_size to carry the
field size of narrower load from prog type
specific *__is_valid_access validator back to verifier.
. The non-zero ctx_field_size for a memory access indicates
(1). underlying prog type specific convert_ctx_accesses
supporting non-whole-field access
(2). the current insn is a narrower or whole field access.
. In verifier, for such loads where load memory size is
less than ctx_field_size, verifier transforms it
to a full field load followed by proper masking.
. Currently, __sk_buff and bpf_perf_event_data->sample_period
are supporting narrowing loads.
. Narrower stores are still not allowed as typical ctx stores
are just normal stores.
Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Right now, we don't reset the id of spilled registers in case of
clear_all_pkt_pointers(). Given pkt_pointers are highly likely to
contain an id, do so by reusing __mark_reg_unknown_value().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Whenever we set the register to the type CONST_IMM, we currently don't
reset the id to 0. id member is not used in CONST_IMM case, so don't
let it become stale, where pruning won't be able to match later on.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
spilled_regs[] state is only used for stack slots of type STACK_SPILL,
never for STACK_MISC. Right now, in states_equal(), even if we have
old and current stack state of type STACK_MISC, we compare spilled_regs[]
for that particular offset. Just skip these like we do everywhere else.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
perf_sample_data consumes 386 bytes on stack, reduce excessive stack
usage and move it to per cpu buffer. It's allowed due to preemption
being disabled for tracing, xdp and tc programs, thus at all times
only one program can run on a specific CPU and programs cannot run
from interrupt. We similarly also handle bpf_pt_regs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \ \
| | |_|/ / / / /
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \ \ \ \
| | |_|_|_|/ / /
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes. One to address RCU delay of cpuset removal affecting
userland visible behaviors. The other fixes a race condition between
controller disable and cgroup removal"
* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: consider dying css as offline
cgroup: Prevent kill_css() from being called more than once
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In most cases, a cgroup controller don't care about the liftimes of
cgroups. For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.
However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not. Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period. The effects of cgroup
removals are delayed when seen from userland.
This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending. This gets rid of the userland visible
delays.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |_|_|/ /
| | |/| | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The kill_css() function may be called more than once under the condition
that the css was killed but not physically removed yet followed by the
removal of the cgroup that is hosting the css. This patch prevents any
harmm from being done when that happens.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.5+
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
"Kconfig dependency fix for livepatching infrastructure from Miroslav
Benes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
If TRIM_UNUSED_KSYMS is enabled, all unneeded exported symbols are made
unexported. Two-pass build of the kernel is done to find out which
symbols are needed based on a configuration. This effectively
complicates things for out-of-tree modules.
Livepatch exports functions to (un)register and enable/disable a live
patch. The only in-tree module which uses these functions is a sample in
samples/livepatch/. If the sample is disabled, the functions are
trimmed and out-of-tree live patches cannot be built.
Note that live patches are intended to be built out-of-tree.
Suggested-by: Michal Marek <mmarek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixlet from Thomas Gleixner:
"Silence dmesg spam by making the posix cpu timer printks depend on
print_fatal_signals"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Make signal printks conditional
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
A recent commit added extra printks for CPU/RT limits. This can result in
excessive spam in dmesg.
Make the printks conditional on print_fatal_signals.
Fixes: e7ea7c9806a2 ("rlimits: Print more information when CPU/RT limits are exceeded")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arun Raghavan <arun@arunraghavan.net>
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A fix for a state leak which was introduced in the recent rework of
futex/rtmutex interaction"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
|
| | |/ / / / / / /
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
commit:
cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")
The following trace shows the problem:
ld-linux-x86-64-2161 [019] .... 410.760971: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_LOCK_PI
ld-linux-x86-64-2161 [019] ...1 410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
ld-linux-x86-64-2165 [011] .... 410.760978: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_UNLOCK_PI
ld-linux-x86-64-2165 [011] d..1 410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
ld-linux-x86-64-2165 [011] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
ld-linux-x86-64-2161 [019] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT
Task 2165 does an UNLOCK_PI, assigning the lock to the waiter task 2161
which then returns with -ETIMEDOUT. That wrecks the lock state, because now
the owner isn't aware it acquired the lock and removes the pending robust
list entry.
If 2161 is killed, the robust list will not clear out this futex and the
subsequent acquire on this futex will then (correctly) result in -ESRCH
which is unexpected by glibc, triggers an internal assertion and dies.
Task 2161 Task 2165
rt_mutex_wait_proxy_lock()
timeout();
/* T2161 is still queued in the waiter list */
return -ETIMEDOUT;
futex_unlock_pi()
spin_lock(hb->lock);
rtmutex_unlock()
remove_rtmutex_waiter(T2161);
mark_lock_available();
/* Make the next waiter owner of the user space side */
futex_uval = 2161;
spin_unlock(hb->lock);
spin_lock(hb->lock);
rt_mutex_cleanup_proxy_lock()
if (rtmutex_owner() !== current)
...
return FAIL;
....
return -ETIMEOUT;
This means that rt_mutex_cleanup_proxy_lock() needs to call
try_to_take_rt_mutex() so it can take over the rtmutex correctly which was
assigned by the waker. If the rtmutex is owned by some other task then this
call is harmless and just confirmes that the waiter is not able to acquire
it.
While there, fix what looks like a merge error which resulted in
rt_mutex_cleanup_proxy_lock() having two calls to
fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any.
Both should have one, since both potentially touch the waiter list.
Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Bug-Spotted-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Link: http://lkml.kernel.org/r/20170519154850.mlomgdsd26drq5j6@hirez.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kthread fix from Thomas Gleixner:
"A single fix which prevents a use after free when kthread fork fails"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kthread: Fix use-after-free if kthread fork fails
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but
fails in copy_process() between calling dup_task_struct() and setting
p->set_child_tid, then the value of p->set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().
kthread()
- worker_thread()
- process_one_work()
| - call_usermodehelper_exec_work()
| - kernel_thread()
| - _do_fork()
| - copy_process()
| - dup_task_struct()
| - arch_dup_task_struct()
| - tsk->set_child_tid = current->set_child_tid // implied
| - ...
| - goto bad_fork_*
| - ...
| - free_task(tsk)
| - free_kthread_struct(tsk)
| - kfree(tsk->set_child_tid)
- ...
- schedule()
- __schedule()
- wq_worker_sleeping()
- kthread_data(task)->flags // UAF
The problem started showing up with commit 1da5c46fa965 since it reused
->set_child_tid for the kthread worker data.
A better long-term solution might be to get rid of the ->set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.
Debugged-by: Jamie Iles <jamie.iles@oracle.com>
Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| |\ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fixes from Steven Rostedt:
"There's been a few memory issues found with ftrace.
One was simply a memory leak where not all was being freed that should
have been in releasing a file pointer on set_graph_function.
Then Thomas found that the ftrace trampolines were marked for
read/write as well as execute. To shrink the possible attack surface,
he added calls to set them to ro. Which also uncovered some other
issues with freeing module allocated memory that had its permissions
changed.
Kprobes had a similar issue which is fixed and a selftest was added to
trigger that issue again"
* tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/ftrace: Make sure that ftrace trampolines are not RWX
x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
selftests/ftrace: Add a testcase for many kprobe events
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
ftrace: Fix memory leak in ftrace_graph_release()
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
Fix kprobes to set(recover) RWX bits correctly on trampoline
buffer before releasing it. Releasing readonly page to
module_memfree() crash the kernel.
Without this fix, if kprobes user register a bunch of kprobes
in function body (since kprobes on function entry usually
use ftrace) and unregister it, kernel hits a BUG and crash.
Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: d0381c81c2f7 ("kprobes/x86: Set kprobes pages read-only")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
| | | |/ / / / / / /
| | |/| | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not. This results in a memory leak that is easily
captured by kmemleak:
unreferenced object 0xffff880038afe000 (size 8192):
comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff8113964d>] __kmalloc+0x12d/0x1a0
[<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
[<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
[<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
[<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
[<ffffffff81140df7>] vfs_open+0x47/0x60
[<ffffffff81150f95>] path_openat+0x2a5/0x1020
[<ffffffff81152d6a>] do_filp_open+0x8a/0xf0
[<ffffffff811411df>] do_sys_open+0x12f/0x200
[<ffffffff811412ce>] SyS_open+0x1e/0x20
[<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
[<ffffffffffffffff>] 0xffffffffffffffff
Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com
Cc: stable@vger.kernel.org
Fixes: b9b0c831bed2 ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Commit fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to
access sk_buff") enabled programs of BPF_PROG_TYPE_CGROUP_SKB
type to use ld_abs/ind instructions. However, at this point,
we cannot use them, since offsets relative to SKF_LL_OFF will
end up pointing skb_mac_header(skb) out of bounds since in the
egress path it is not yet set at that point in time, but only
after __dev_queue_xmit() did a general reset on the mac header.
bpf_internal_load_pointer_neg_helper() will then end up reading
data from a wrong offset.
BPF_PROG_TYPE_CGROUP_SKB programs can use bpf_skb_load_bytes()
already to access packet data, which is also more flexible than
the insns carried over from cBPF.
Fixes: fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to access sk_buff")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map. The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.
The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.
The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.
The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Add BPF_MAP_GET_FD_BY_ID command to allow user to get a fd
from a bpf_map's ID.
bpf_map_inc_not_zero() is added and is called with map_idr_lock
held.
__bpf_map_put() is also added which has the 'bool do_idr_lock'
param to decide if the map_idr_lock should be acquired when
freeing the map->id.
In the error path of bpf_map_inc_not_zero(), it may have to
call __bpf_map_put(map, false) which does not need
to take the map_idr_lock when freeing the map->id.
It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Add BPF_PROG_GET_FD_BY_ID command to allow user to get a fd
from a bpf_prog's ID.
bpf_prog_inc_not_zero() is added and is called with prog_idr_lock
held.
__bpf_prog_put() is also added which has the 'bool do_idr_lock'
param to decide if the prog_idr_lock should be acquired when
freeing the prog->id.
In the error path of bpf_prog_inc_not_zero(), it may have to
call __bpf_prog_put(map, false) which does not need
to take the prog_idr_lock when freeing the prog->id.
It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch adds BPF_PROG_GET_NEXT_ID and BPF_MAP_GET_NEXT_ID
to allow userspace to iterate all bpf_prog IDs and bpf_map IDs.
The API is trying to be consistent with the existing
BPF_MAP_GET_NEXT_KEY.
It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch generates an unique ID for each created bpf_map.
The approach is similar to the earlier patch for bpf_prog ID.
It is worth to note that the bpf_map's ID and bpf_prog's ID
are in two independent ID spaces and both have the same valid range:
[1, INT_MAX).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch generates an unique ID for each BPF_PROG_LOAD-ed prog.
It is worth to note that each BPF_PROG_LOAD-ed prog will have
a different ID even they have the same bpf instructions.
The ID is generated by the existing idr_alloc_cyclic().
The ID is ranged from [1, INT_MAX). It is allocated in cyclic manner,
so an ID will get reused every 2 billion BPF_PROG_LOAD.
The bpf_prog_alloc_id() is done after bpf_prog_select_runtime()
because the jit process may have allocated a new prog. Hence,
we need to ensure the value of pointer 'prog' will not be changed
any more before storing the prog to the prog_idr.
After bpf_prog_select_runtime(), the prog is read-only. Hence,
the id is stored in 'struct bpf_prog_aux'.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.
Also add support for reading all event counters using
bpf_perf_event_read() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Currently loading a cgroup skb eBPF program require a CAP_SYS_ADMIN
capability while attaching the program to a cgroup only requires the
user have CAP_NET_ADMIN privilege. We can escape the capability
check when load the program just like socket filter program to make
the capability requirement consistent.
Change since v1:
Change the code style in order to be compliant with checkpatch.pl
preference
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This allows cgroup eBPF program to classify packet based on their
protocol or other detail information. Currently program need
CAP_NET_ADMIN privilege to attach a cgroup eBPF program, and A
process with CAP_NET_ADMIN can already see all packets on the system,
for example, by creating an iptables rules that causes the packet to
be passed to userspace via NFLOG.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
16 __bpf_prog_run() interpreters for various stack sizes add .text
but not a lot comparing to run-time stack savings
text data bss dec hex filename
26350 10328 624 37302 91b6 kernel/bpf/core.o.before_split
25777 10328 624 36729 8f79 kernel/bpf/core.o.after_split
26970 10328 624 37922 9422 kernel/bpf/core.o.now
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The next set of patches will take advantage of stack_depth tracking,
so make sure that the program that does bpf_tail_call() has
stack depth large enough for the callee.
We could have tracked the stack depth of the prog_array owner program
and only allow insertion of the programs with stack depth less
than the owner, but it will break existing applications.
Some of them have trivial root bpf program that only does
multiple bpf_tail_calls and at init time the prog array is empty.
In the future we may add a flag to do such tracking optionally,
but for now play simple and safe.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
teach verifier to track bpf program stack depth
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
split __bpf_prog_run() interpreter into stack allocation and execution parts.
The code section shrinks which helps interpreter performance in some cases.
text data bss dec hex filename
26350 10328 624 37302 91b6 kernel/bpf/core.o.before
25777 10328 624 36729 8f79 kernel/bpf/core.o.after
Very short programs got slower (due to extra function call):
Before:
test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS
test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS
test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS
test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS
test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS
After:
test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS
test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS
test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS
test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS
test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS
Longer programs got faster:
Before:
test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS
test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS
test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS
test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS
test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS
test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS
test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS
test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS
After:
test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS
test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS
test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS
test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS
test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS
test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS
test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS
test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS
For real world production programs the difference is noise.
This patch is first step towards reducing interpreter stack consumption.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|