summaryrefslogtreecommitdiffstats
path: root/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* alarmtimer: Switch over to generic set/get/rearm routineThomas Gleixner2017-06-043-110/+29
| | | | | | | | | | | | All required callbacks are in place. Switch the alarm timer based posix interval timer callbacks to the common implementation and remove the incorrect private implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.825471962@linutronix.de
* alarmtimer: Implement arm callbackThomas Gleixner2017-06-041-0/+22
| | | | | | | | | | Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.747567162@linutronix.de
* alarmtimer: Implement try_to_cancel callbackThomas Gleixner2017-06-041-0/+10
| | | | | | | | | | Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.670026824@linutronix.de
* alarmtimer: Implement remaining callbackThomas Gleixner2017-06-041-9/+22
| | | | | | | | | | Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.592676753@linutronix.de
* alarmtimer: Implement forward callbackThomas Gleixner2017-06-041-0/+13
| | | | | | | | | | Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.513694229@linutronix.de
* alarmtimer: Implement timer_rearm() callbackThomas Gleixner2017-06-041-1/+14
| | | | | | | | | | Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.434598989@linutronix.de
* posix-timers: Make use of cancel/arm callbacksThomas Gleixner2017-06-041-81/+100
| | | | | | | | | | | | Replace the hrtimer calls by calls to the new try_to_cancel()/arm() kclock callbacks and move the hrtimer specific implementation into the corresponding callback functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.355396667@linutronix.de
* posix-timers: Add cancel/arm callbacksThomas Gleixner2017-06-041-0/+3
| | | | | | | | | | | | Add timer_try_to_cancel() and timer_arm() callbacks to kclock which allow to make common_timer_set() usable by both hrtimer and alarmtimer based clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.278022962@linutronix.de
* posix-timers: Zero settings value in common codeThomas Gleixner2017-06-042-6/+2
| | | | | | | | | | | Zero out the settings struct in the common code so the callbacks do not have to do it themself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.200870713@linutronix.de
* posix-timers: Make use of forward/remaining callbacksThomas Gleixner2017-06-041-15/+49
| | | | | | | | | | | | Replace the hrtimer calls by calls to the new forward/remaining kclock callbacks and move the hrtimer specific implementation into the corresponding callback functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.121437232@linutronix.de
* posix-timers: Add forward/remaining callbacksThomas Gleixner2017-06-041-0/+2
| | | | | | | | | | | Add two callbacks to kclock which allow using common_)timer_get() for both hrtimer and alarm timer based clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.044915536@linutronix.de
* posix-timers: Add active flag to k_itimerThomas Gleixner2017-06-041-1/+7
| | | | | | | | | | | | Keep track of the activation state of posix timers. This is a preparatory change for making common_timer_get() usable by both hrtimer and alarm timer implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.967783982@linutronix.de
* posix-timers: Use timer_rearm() callback in posixtimer_rearm()Thomas Gleixner2017-06-043-10/+11
| | | | | | | | | | | | | Use the new timer_rearm() callback to replace the conditional hardcoded calls into the hrtimer and cpu timer code. This allows later to bring the same logic to alarmtimers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.889661919@linutronix.de
* posix-timers: Rename do_schedule_next_timerThomas Gleixner2017-06-043-7/+7
| | | | | | | | | | | That function is a misnomer. Rename it with a proper prefix to posixtimer_rearm(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.811362578@linutronix.de
* posix-timers: Add timer_rearm() callbackThomas Gleixner2017-06-041-15/+18
| | | | | | | | | | | Add a timer_rearm() callback which is used to make the rescheduling of posix interval timers independent of the underlying clock implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.732632167@linutronix.de
* posix-timers: Store k_clock pointer in k_itimerThomas Gleixner2017-06-042-3/+6
| | | | | | | | | | | | Having the k_clock pointer in the k_itimer struct avoids the lookup in several code pathes and makes the next steps of unification of the hrtimer and alarmtimer based posix timers simpler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.641222072@linutronix.de
* posix-timers: Move interval out of the unionThomas Gleixner2017-06-042-17/+16
| | | | | | | | | | | | | | Preparatory patch to unify the alarm timer and hrtimer based posix interval timer handling. The interval is used as a criteria for rearming decisions so moving it out of the clock specific data structures allows later unification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.563922908@linutronix.de
* posix-timers: Unify overrun/requeue_pending handlingThomas Gleixner2017-06-042-18/+15
| | | | | | | | | | | | | hrtimer based posix-timers and posix-cpu-timers handle the update of the rearming and overflow related status fields differently. Move that update to the common rearming code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.484936964@linutronix.de
* posix-timers: Move posix-timer internals to coreThomas Gleixner2017-06-045-0/+36
| | | | | | | | | | | | None of these declarations is required outside of kernel/time. Move them to an internal header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170530211656.394803853@linutronix.de
* posix-timers: Avoid gazillions of forward declarationsThomas Gleixner2017-06-041-101/+89
| | | | | | | | | | | Move it below the actual implementations as there are new callbacks coming which would require even more forward declarations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.238209952@linutronix.de
* posix-clocks: Remove interval timer facility and mmap/fasync callbacksThomas Gleixner2017-06-041-113/+0
| | | | | | | | | | | | | | | | | | | The only user of this facility is ptp_clock, which does not implement any of those functions. Remove them to prevent accidental users. Especially the interval timer interfaces are now more or less impossible to implement because the necessary infrastructure has been confined to the core code. Aside of that it's really complex to make these callbacks implemented according to spec as the alarm timer implementation demonstrates. If at all then a nanosleep callback might be a reasonable extension. For now keep just what ptp_clock needs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.145036286@linutronix.de
* posix-timers: Remove unused export of posix_timer_event()Thomas Gleixner2017-06-041-1/+0
| | | | | | | | | | Since the removal of the mmtimer driver the export is not longer needed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.052744418@linutronix.de
* alarmtimer: Remove pointless config conditionalThomas Gleixner2017-06-041-2/+1
| | | | | | | | | | | Having a IF_ENABLED(CONFIG_POSIX_TIMERS) inside of a #ifdef CONFIG_POSIX_TIMERS section is pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211655.975218056@linutronix.de
* Merge branch 'timers/urgent' into WIP.timersThomas Gleixner2017-06-0414-70/+114
|\ | | | | | | Pick up urgent fixes to avoid conflicts.
| * alarmtimer: Rate limit periodic intervalsThomas Gleixner2017-06-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alarmtimer code has another source of potentially rearming itself too fast. Interval timers with a very samll interval have a similar CPU hog effect as the previously fixed overflow issue. The reason is that alarmtimers do not implement the normal protection against this kind of problem which the other posix timer use: timer expires -> queue signal -> deliver signal -> rearm timer This scheme brings the rearming under scheduler control and prevents permanently firing timers which hog the CPU. Bringing this scheme to the alarm timer code is a major overhaul because it lacks all the necessary mechanisms completely. So for a quick fix limit the interval to one jiffie. This is not problematic in practice as alarmtimers are usually backed by an RTC for suspend which have 1 second resolution. It could be therefor argued that the resolution of this clock should be set to 1 second in general, but that's outside the scope of this fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de
| * alarmtimer: Prevent overflow of relative timersThomas Gleixner2017-06-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey reported a alartimer related RCU stall while fuzzing the kernel with syzkaller. The reason for this is an overflow in ktime_add() which brings the resulting time into negative space and causes immediate expiry of the timer. The following rearm with a small interval does not bring the timer back into positive space due to the same issue. This results in a permanent firing alarmtimer which hogs the CPU. Use ktime_add_safe() instead which detects the overflow and clamps the result to KTIME_SEC_MAX. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de
| * Merge branch 'for-linus' of ↵Linus Torvalds2017-06-021-0/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "Kconfig dependency fix for livepatching infrastructure from Miroslav Benes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
| | * livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMSMiroslav Benes2017-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If TRIM_UNUSED_KSYMS is enabled, all unneeded exported symbols are made unexported. Two-pass build of the kernel is done to find out which symbols are needed based on a configuration. This effectively complicates things for out-of-tree modules. Livepatch exports functions to (un)register and enable/disable a live patch. The only in-tree module which uses these functions is a sample in samples/livepatch/. If the sample is disabled, the functions are trimmed and out-of-tree live patches cannot be built. Note that live patches are intended to be built out-of-tree. Suggested-by: Michal Marek <mmarek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2017-05-271-8/+16
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixlet from Thomas Gleixner: "Silence dmesg spam by making the posix cpu timer printks depend on print_fatal_signals" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Make signal printks conditional
| | * | posix-timers: Make signal printks conditionalThomas Gleixner2017-05-231-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent commit added extra printks for CPU/RT limits. This can result in excessive spam in dmesg. Make the printks conditional on print_fatal_signals. Fixes: e7ea7c9806a2 ("rlimits: Print more information when CPU/RT limits are exceeded") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arun Raghavan <arun@arunraghavan.net>
| * | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2017-05-271-6/+18
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A fix for a state leak which was introduced in the recent rework of futex/rtmutex interaction" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
| | * | | futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()Peter Zijlstra2017-05-221-6/+18
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Markus reported that the glibc/nptl/tst-robustpi8 test was failing after commit: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()") The following trace shows the problem: ld-linux-x86-64-2161 [019] .... 410.760971: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_LOCK_PI ld-linux-x86-64-2161 [019] ...1 410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0 ld-linux-x86-64-2165 [011] .... 410.760978: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_UNLOCK_PI ld-linux-x86-64-2165 [011] d..1 410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0 ld-linux-x86-64-2165 [011] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000 ld-linux-x86-64-2161 [019] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT Task 2165 does an UNLOCK_PI, assigning the lock to the waiter task 2161 which then returns with -ETIMEDOUT. That wrecks the lock state, because now the owner isn't aware it acquired the lock and removes the pending robust list entry. If 2161 is killed, the robust list will not clear out this futex and the subsequent acquire on this futex will then (correctly) result in -ESRCH which is unexpected by glibc, triggers an internal assertion and dies. Task 2161 Task 2165 rt_mutex_wait_proxy_lock() timeout(); /* T2161 is still queued in the waiter list */ return -ETIMEDOUT; futex_unlock_pi() spin_lock(hb->lock); rtmutex_unlock() remove_rtmutex_waiter(T2161); mark_lock_available(); /* Make the next waiter owner of the user space side */ futex_uval = 2161; spin_unlock(hb->lock); spin_lock(hb->lock); rt_mutex_cleanup_proxy_lock() if (rtmutex_owner() !== current) ... return FAIL; .... return -ETIMEOUT; This means that rt_mutex_cleanup_proxy_lock() needs to call try_to_take_rt_mutex() so it can take over the rtmutex correctly which was assigned by the waker. If the rtmutex is owned by some other task then this call is harmless and just confirmes that the waiter is not able to acquire it. While there, fix what looks like a merge error which resulted in rt_mutex_cleanup_proxy_lock() having two calls to fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any. Both should have one, since both potentially touch the waiter list. Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Bug-Spotted-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Link: http://lkml.kernel.org/r/20170519154850.mlomgdsd26drq5j6@hirez.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2017-05-271-5/+12
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kthread fix from Thomas Gleixner: "A single fix which prevents a use after free when kthread fork fails" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kthread: Fix use-after-free if kthread fork fails
| | * | | kthread: Fix use-after-free if kthread fork failsVegard Nossum2017-05-221-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but fails in copy_process() between calling dup_task_struct() and setting p->set_child_tid, then the value of p->set_child_tid will be inherited from the parent and get prematurely freed by free_kthread_struct(). kthread() - worker_thread() - process_one_work() | - call_usermodehelper_exec_work() | - kernel_thread() | - _do_fork() | - copy_process() | - dup_task_struct() | - arch_dup_task_struct() | - tsk->set_child_tid = current->set_child_tid // implied | - ... | - goto bad_fork_* | - ... | - free_task(tsk) | - free_kthread_struct(tsk) | - kfree(tsk->set_child_tid) - ... - schedule() - __schedule() - wq_worker_sleeping() - kthread_data(task)->flags // UAF The problem started showing up with commit 1da5c46fa965 since it reused ->set_child_tid for the kthread worker data. A better long-term solution might be to get rid of the ->set_child_tid abuse. The comment in set_kthread_struct() also looks slightly wrong. Debugged-by: Jamie Iles <jamie.iles@oracle.com> Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jamie Iles <jamie.iles@oracle.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge tag 'trace-v4.12-rc2' of ↵Linus Torvalds2017-05-272-2/+2
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "There's been a few memory issues found with ftrace. One was simply a memory leak where not all was being freed that should have been in releasing a file pointer on set_graph_function. Then Thomas found that the ftrace trampolines were marked for read/write as well as execute. To shrink the possible attack surface, he added calls to set them to ro. Which also uncovered some other issues with freeing module allocated memory that had its permissions changed. Kprobes had a similar issue which is fixed and a selftest was added to trigger that issue again" * tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/ftrace: Make sure that ftrace trampolines are not RWX x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range() selftests/ftrace: Add a testcase for many kprobe events kprobes/x86: Fix to set RWX bits correctly before releasing trampoline ftrace: Fix memory leak in ftrace_graph_release()
| | * | | | kprobes/x86: Fix to set RWX bits correctly before releasing trampolineMasami Hiramatsu2017-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kprobes to set(recover) RWX bits correctly on trampoline buffer before releasing it. Releasing readonly page to module_memfree() crash the kernel. Without this fix, if kprobes user register a bunch of kprobes in function body (since kprobes on function entry usually use ftrace) and unregister it, kernel hits a BUG and crash. Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: d0381c81c2f7 ("kprobes/x86: Set kprobes pages read-only") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * | | | ftrace: Fix memory leak in ftrace_graph_release()Luis Henriques2017-05-271-1/+1
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace_hash is being kfree'ed in ftrace_graph_release(), however the ->buckets field is not. This results in a memory leak that is easily captured by kmemleak: unreferenced object 0xffff880038afe000 (size 8192): comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8113964d>] __kmalloc+0x12d/0x1a0 [<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80 [<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100 [<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0 [<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0 [<ffffffff81140df7>] vfs_open+0x47/0x60 [<ffffffff81150f95>] path_openat+0x2a5/0x1020 [<ffffffff81152d6a>] do_filp_open+0x8a/0xf0 [<ffffffff811411df>] do_sys_open+0x12f/0x200 [<ffffffff811412ce>] SyS_open+0x1e/0x20 [<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94 [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com Cc: stable@vger.kernel.org Fixes: b9b0c831bed2 ("ftrace: Convert graph filter to use hash tables") Signed-off-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-05-264-30/+29
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel Borkmann. 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide Caratti. 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5 driver, from Jesper Brouer. 4) Defer slave->link updates until bonding is ready to do a full commit to the new settings, from Nithin Sujir. 5) Properly reference count ipv4 FIB metrics to avoid use after free situations, from Eric Dumazet and several others including Cong Wang and Julian Anastasov. 6) Fix races in llc_ui_bind(), from Lin Zhang. 7) Fix regression of ESP UDP encapsulation for TCP packets, from Steffen Klassert. 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap. 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter Dawson. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) ipv4: add reference counting to metrics net: ethernet: ax88796: don't call free_irq without request_irq first ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets sctp: fix ICMP processing if skb is non-linear net: llc: add lock_sock in llc_ui_bind to avoid a race condition bonding: Don't update slave->link until ready to commit test_bpf: Add a couple of tests for BPF_JSGE. bpf: add various verifier test cases bpf: fix wrong exposure of map_flags into fdinfo for lpm bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data bpf: properly reset caller saved regs after helper call and ld_abs/ind bpf: fix incorrect pruning decision when alignment must be tracked arp: fixed -Wuninitialized compiler warning tcp: avoid fastopen API to be used on AF_UNSPEC net: move somaxconn init from sysctl code net: fix potential null pointer dereference geneve: fix fill_info when using collect_metadata virtio-net: enable TSO/checksum offloads for Q-in-Q vlans be2net: Fix offload features for Q-in-Q packets vlan: Fix tcp checksum offloads in Q-in-Q vlans ...
| | * | | | bpf: fix wrong exposure of map_flags into fdinfo for lpmDaniel Borkmann2017-05-253-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via attr->map_flags, since it does not support preallocation yet. We check the flag, but we never copy the flag into trie->map.map_flags, which is later on exposed into fdinfo and used by loaders such as iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test whether a pinned map has the same spec as the one from the BPF obj file and if not, bails out, which is currently the case for lpm since it exposes always 0 as flags. Also copy over flags in array_map_alloc() and stack_map_alloc(). They always have to be 0 right now, but we should make sure to not miss to copy them over at a later point in time when we add actual flags for them to use. Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | bpf: properly reset caller saved regs after helper call and ld_abs/indDaniel Borkmann2017-05-251-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, after performing helper calls, we clear all caller saved registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto specification. The way we reset these regs can affect pruning decisions in later paths, since we only reset register's imm to 0 and type to NOT_INIT. However, we leave out clearing of other variables such as id, min_value, max_value, etc, which can later on lead to pruning mismatches due to stale data. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | bpf: fix incorrect pruning decision when alignment must be trackedDaniel Borkmann2017-05-251-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when we enforce alignment tracking on direct packet access, the verifier lets the following program pass despite doing a packet write with unaligned access: 0: (61) r2 = *(u32 *)(r1 +76) 1: (61) r3 = *(u32 *)(r1 +80) 2: (61) r7 = *(u32 *)(r1 +8) 3: (bf) r0 = r2 4: (07) r0 += 14 5: (25) if r7 > 0x1 goto pc+4 R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp 6: (2d) if r0 > r3 goto pc+1 R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14) R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp 7: (63) *(u32 *)(r0 -4) = r0 8: (b7) r0 = 0 9: (95) exit from 6 to 8: R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp 8: (b7) r0 = 0 9: (95) exit from 5 to 10: R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=2 R10=fp 10: (07) r0 += 1 11: (05) goto pc-6 6: safe <----- here, wrongly found safe processed 15 insns However, if we enforce a pruning mismatch by adding state into r8 which is then being mismatched in states_equal(), we find that for the otherwise same program, the verifier detects a misaligned packet access when actually walking that path: 0: (61) r2 = *(u32 *)(r1 +76) 1: (61) r3 = *(u32 *)(r1 +80) 2: (61) r7 = *(u32 *)(r1 +8) 3: (b7) r8 = 1 4: (bf) r0 = r2 5: (07) r0 += 14 6: (25) if r7 > 0x1 goto pc+4 R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=0,max_value=1 R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp 7: (2d) if r0 > r3 goto pc+1 R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14) R3=pkt_end R7=inv,min_value=0,max_value=1 R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp 8: (63) *(u32 *)(r0 -4) = r0 9: (b7) r0 = 0 10: (95) exit from 7 to 9: R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=0,max_value=1 R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp 9: (b7) r0 = 0 10: (95) exit from 6 to 11: R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R7=inv,min_value=2 R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp 11: (07) r0 += 1 12: (b7) r8 = 0 13: (05) goto pc-7 <----- mismatch due to r8 7: (2d) if r0 > r3 goto pc+1 R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15) R3=pkt_end R7=inv,min_value=2 R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp 8: (63) *(u32 *)(r0 -4) = r0 misaligned packet access off 2+15+-4 size 4 The reason why we fail to see it in states_equal() is that the third test in compare_ptrs_to_packet() ... if (old->off <= cur->off && old->off >= old->range && cur->off >= cur->range) return true; ... will let the above pass. The situation we run into is that old->off <= cur->off (14 <= 15), meaning that prior walked paths went with smaller offset, which was later used in the packet access after successful packet range check and found to be safe already. For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14 as in above program to it, results in R0=pkt(id=0,off=14,r=0) before the packet range test. Now, testing this against R3=pkt_end with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14) for the case when we're within bounds. A write into the packet at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and aligned (2 is for NET_IP_ALIGN). After processing this with all fall-through paths, we later on check paths from branches. When the above skb->mark test is true, then we jump near the end of the program, perform r0 += 1, and jump back to the 'if r0 > r3 goto out' test we've visited earlier already. This time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune that part because this time we'll have a larger safe packet range, and we already found that with off=14 all further insn were already safe, so it's safe as well with a larger off. However, the problem is that the subsequent write into the packet with 2 + 15 -4 is then unaligned, and not caught by the alignment tracking. Note that min_align, aux_off, and aux_off_align were all 0 in this example. Since we cannot tell at this time what kind of packet access was performed in the prior walk and what minimal requirements it has (we might do so in the future, but that requires more complexity), fix it to disable this pruning case for strict alignment for now, and let the verifier do check such paths instead. With that applied, the test cases pass and reject the program due to misalignment. Fixes: d1174416747d ("bpf: Track alignment of register values in the verifier.") Reference: http://patchwork.ozlabs.org/patch/761909/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge branch 'for-linus' of ↵Linus Torvalds2017-05-241-7/+13
| |\ \ \ \ \ | | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace fix from Eric Biederman: "This fixes a brown paper bag bug. When I fixed the ptrace interaction with user namespaces I added a new field ptracer_cred in struct_task and I failed to properly initialize it on fork. This dangling pointer wound up breaking runing setuid applications run from the enlightenment window manager. As this is the worst sort of bug. A regression breaking user space for no good reason let's get this fixed" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: Properly initialize ptracer_cred on fork
| | * | | | ptrace: Properly initialize ptracer_cred on forkEric W. Biederman2017-05-231-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I introduced ptracer_cred I failed to consider the weirdness of fork where the task_struct copies the old value by default. This winds up leaving ptracer_cred set even when a process forks and the child process does not wind up being ptraced. Because ptracer_cred is not set on non-ptraced processes whose parents were ptraced this has broken the ability of the enlightenment window manager to start setuid children. Fix this by properly initializing ptracer_cred in ptrace_init_task This must be done with a little bit of care to preserve the current value of ptracer_cred when ptrace carries through fork. Re-reading the ptracer_cred from the ptracing process at this point is inconsistent with how PT_PTRACE_CAP has been maintained all of these years. Tested-by: Takashi Iwai <tiwai@suse.de> Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * | | | | Merge tag 'pm-4.12-rc3' of ↵Linus Torvalds2017-05-232-5/+4
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix RTC wakeup from suspend-to-idle broken recently, fix CPU idleness detection condition in the schedutil cpufreq governor, fix a cpufreq driver build failure, fix an error code path in the power capping framework, clean up the hibernate core and update the intel_pstate documentation. Specifics: - Fix RTC wakeup from suspend-to-idle broken by the recent rework of ACPI wakeup handling (Rafael Wysocki). - Update intel_pstate driver documentation to reflect the current code and explain how it works in more detail (Rafael Wysocki). - Fix an issue related to CPU idleness detection on systems with shared cpufreq policies in the schedutil governor (Juri Lelli). - Fix a possible build issue in the dbx500 cpufreq driver (Arnd Bergmann). - Fix a function in the power capping framework core to return an error code instead of 0 when there's an error (Dan Carpenter). - Clean up variable definition in the hibernation core (Pushkar Jambhlekar)" * tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: dbx500: add a Kconfig symbol PM / hibernate: Declare variables as static PowerCap: Fix an error code in powercap_register_zone() RTC: rtc-cmos: Fix wakeup from suspend-to-idle PM / wakeup: Fix up wakeup_source_report_event() cpufreq: intel_pstate: Document the current behavior and user interface cpufreq: schedutil: use now as reference when aggregating shared policy requests
| | | \ \ \ \
| | | \ \ \ \
| | *-. \ \ \ \ Merge branches 'pm-sleep' and 'powercap'Rafael J. Wysocki2017-05-221-1/+1
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-sleep: PM / hibernate: Declare variables as static RTC: rtc-cmos: Fix wakeup from suspend-to-idle PM / wakeup: Fix up wakeup_source_report_event() * powercap: PowerCap: Fix an error code in powercap_register_zone()
| | | * | | | | | PM / hibernate: Declare variables as staticPushkar Jambhlekar2017-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixing sparse warnings: 'symbol not declared. Should it be static?' Signed-off-by: Pushkar Jambhlekar <pushkar.iit@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | | | | |
| | | \ \ \ \ \ \
| | | \ \ \ \ \ \
| | | \ \ \ \ \ \
| | *---. \ \ \ \ \ \ Merge branches 'intel_pstate', 'pm-cpufreq' and 'pm-cpufreq-sched'Rafael J. Wysocki2017-05-221-4/+3
| | |\ \ \ \ \ \ \ \ \ | | | |_|_|_|_|_|/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * intel_pstate: cpufreq: intel_pstate: Document the current behavior and user interface * pm-cpufreq: cpufreq: dbx500: add a Kconfig symbol * pm-cpufreq-sched: cpufreq: schedutil: use now as reference when aggregating shared policy requests
| | | | | * | | | | | cpufreq: schedutil: use now as reference when aggregating shared policy requestsJuri Lelli2017-05-051-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, sugov_next_freq_shared() uses last_freq_update_time as a reference to decide when to start considering CPU contributions as stale. However, since last_freq_update_time is set by the last CPU that issued a frequency transition, this might cause problems in certain cases. In practice, the detection of stale utilization values fails whenever the CPU with such values was the last to update the policy. For example (and please note again that the SCHED_CPUFREQ_RT flag is not the problem here, but only the detection of after how much time that flag has to be considered stale), suppose a policy with 2 CPUs: CPU0 | CPU1 | | RT task scheduled | SCHED_CPUFREQ_RT is set | CPU1->last_update = now | freq transition to max | last_freq_update_time = now | more than TICK_NSEC nsecs | a small CFS wakes up | CPU0->last_update = now1 | delta_ns(CPU0) < TICK_NSEC* | CPU0's util is considered | delta_ns(CPU1) = | last_freq_update_time - | CPU1->last_update = 0 | < TICK_NSEC | CPU1 is still considered | CPU1->SCHED_CPUFREQ_RT is set | we stay at max (until CPU1 | exits from idle) | * delta_ns is actually negative as now1 > last_freq_update_time While last_freq_update_time is a sensible reference for rate limiting, it doesn't seem to be useful for working around stale CPU states. Fix the problem by always considering now (time) as the reference for deciding when CPUs have stale contributions. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-05-221-4/+8
| |\ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Mostly netfilter bug fixes in here, but we have some bits elsewhere as well. 1) Don't do SNAT replies for non-NATed connections in IPVS, from Julian Anastasov. 2) Don't delete conntrack helpers while they are still in use, from Liping Zhang. 3) Fix zero padding in xtables's xt_data_to_user(), from Willem de Bruijn. 4) Add proper RCU protection to nf_tables_dump_set() because we cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From Liping Zhang. 5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang. 6) smsc95xx devices can't handle IPV6 checksums fully, so don't advertise support for offloading them. From Nisar Sayed. 7) Fix out-of-bounds access in __ip6_append_data(), from Eric Dumazet. 8) Make atl2_probe() propagate the error code properly on failures, from Alexey Khoroshilov. 9) arp_target[] in bond_check_params() is used uninitialized. This got changes from a global static to a local variable, which is how this mistake happened. Fix from Jarod Wilson. 10) Fix fallout from unnecessary NULL check removal in cls_matchall, from Jiri Pirko. This is definitely brown paper bag territory..." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) net: sched: cls_matchall: fix null pointer dereference vsock: use new wait API for vsock_stream_sendmsg() bonding: fix randomly populated arp target array net: Make IP alignment calulations clearer. bonding: fix accounting of active ports in 3ad net: atheros: atl2: don't return zero on failure path in atl2_probe() ipv6: fix out of bound writes in __ip6_append_data() bridge: start hello_timer when enabling KERNEL_STP in br_stp_start smsc95xx: Support only IPv4 TCP/UDP csum offload arp: always override existing neigh entries with gratuitous ARP arp: postpone addr_type calculation to as late as possible arp: decompose is_garp logic into a separate function arp: fixed error in a comment tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0 netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT ebtables: arpreply: Add the standard target sanity check netfilter: nf_tables: revisit chain/object refcounting from elements netfilter: nf_tables: missing sanitization in data from userspace netfilter: nf_tables: can't assume lock is acquired when dumping set elems netfilter: synproxy: fix conntrackd interaction ...
| | * | | | | | | | | net: Make IP alignment calulations clearer.David S. Miller2017-05-221-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignmnet: ip_align = strict ? 2 : NET_IP_ALIGN; in compare_pkt_ptr_alignment() trips up Coverity because we can only get to this code when strict is true, therefore ip_align will always be 2 regardless of NET_IP_ALIGN's value. So just assign directly to '2' and explain the situation in the comment above. Reported-by: "Gustavo A. R. Silva" <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>