summaryrefslogtreecommitdiffstats
path: root/kernel/stop_machine.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-mingo' of ↵Ingo Molnar2019-10-311-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Linux-kernel memory consistency model updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * stop_machine: Provide RCU quiescent state in multi_cpu_stop()Paul E. McKenney2019-10-051-0/+1
| | | | | | | | | | | | | | | | | | | | When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU CPU stall warning. This can be misleading because what is instead needed is information on whatever task is blocking multi_cpu_stop(). This commit therefore inserts an RCU quiescent state into the multi_cpu_stop() function's waitloop. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | stop_machine: Avoid potential race behaviourMark Rutland2019-10-171-4/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both multi_cpu_stop() and set_state() access multi_stop_data::state racily using plain accesses. These are subject to compiler transformations which could break the intended behaviour of the code, and this situation is detected by KCSAN on both arm64 and x86 (splats below). Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the compiler cannot elide, replay, or tear loads and stores. In multi_cpu_stop() the two loads of multi_stop_data::state are expected to be a consistent value, so snapshot the value into a temporary variable to ensure this. The state transitions are serialized by atomic manipulation of multi_stop_data::num_threads, and other fields in multi_stop_data are not modified while subject to concurrent reads. KCSAN splat on arm64: | BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0 | | write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3: | set_state+0x80/0xb0 | multi_cpu_stop+0x16c/0x198 | cpu_stopper_thread+0x170/0x298 | smpboot_thread_fn+0x40c/0x560 | kthread+0x1a8/0x1b0 | ret_from_fork+0x10/0x18 | | read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1: | multi_cpu_stop+0xa8/0x198 | cpu_stopper_thread+0x170/0x298 | smpboot_thread_fn+0x40c/0x560 | kthread+0x1a8/0x1b0 | ret_from_fork+0x10/0x18 | | Reported by Kernel Concurrency Sanitizer on: | CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3 | Hardware name: linux,dummy-virt (DT) KCSAN splat on x86: | write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2: | set_state kernel/stop_machine.c:170 [inline] | ack_state kernel/stop_machine.c:177 [inline] | multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227 | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516 | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165 | kthread+0x1b5/0x200 kernel/kthread.c:255 | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352 | | read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7: | multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213 | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516 | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165 | kthread+0x1b5/0x200 kernel/kthread.c:255 | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352 | | Reported by Kernel Concurrency Sanitizer on: | CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20191007104536.27276-1-mark.rutland@arm.com
* stop_machine: Fix stop_cpus_in_progress orderingPeter Zijlstra2019-08-081-0/+2
| | | | | | | | | | | | | Make sure the entire for loop has stop_cpus_in_progress set. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lwe@gmail.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: mingo@kernel.org Cc: Phil Auld <pauld@redhat.com> Cc: Julien Desfossez <jdesfossez@digitalocean.com> Cc: Nishanth Aravamudan <naravamudan@digitalocean.com> Link: https://lkml.kernel.org/r/0fd8fd4b99b9b9aa88d8b2dff897f7fd0d88f72c.1559129225.git.vpillai@digitalocean.com
* processor: get rid of cpu_relax_yieldHeiko Carstens2019-06-151-1/+6
| | | | | | | | | | | stop_machine is the only user left of cpu_relax_yield. Given that it now has special semantics which are tied to stop_machine introduce a weak stop_machine_yield function which architectures can override, and get rid of the generic cpu_relax_yield implementation. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* s390: improve wait logic of stop_machineMartin Schwidefsky2019-06-151-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stop_machine loop to advance the state machine and to wait for all affected CPUs to check-in calls cpu_relax_yield in a tight loop until the last missing CPUs acknowledged the state transition. On a virtual system where not all logical CPUs are backed by real CPUs all the time it can take a while for all CPUs to check-in. With the current definition of cpu_relax_yield a diagnose 0x44 is done which tells the hypervisor to schedule *some* other CPU. That can be any CPU and not necessarily one of the CPUs that need to run in order to advance the state machine. This can lead to a pretty bad diagnose 0x44 storm until the last missing CPU finally checked-in. Replace the undirected cpu_relax_yield based on diagnose 0x44 with a directed yield. Each CPU in the wait loop will pick up the next CPU in the cpumask of stop_machine. The diagnose 0x9c is used to tell the hypervisor to run this next CPU instead of the current one. If there is only a limited number of real CPUs backing the virtual CPUs we end up with the real CPUs passed around in a round-robin fashion. [heiko.carstens@de.ibm.com]: Use cpumask_next_wrap as suggested by Peter Zijlstra. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 38Thomas Gleixner2019-05-241-2/+1
| | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this file is released under the gplv2 and any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.732920462@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: Switch printk users from %pf and %pF to %ps and %pS, respectivelySakari Ailus2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2018-08-131-18/+23
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
| * stop_machine: Reflow cpu_stop_queue_two_works()Peter Zijlstra2018-08-021-18/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: isaacm@codeaurora.org Cc: matt@codeblueprint.co.uk Cc: psodagud@codeaurora.org Cc: gregkh@linuxfoundation.org Cc: pkondeti@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
* | stop_machine: Atomically queue and wake stopper threadsPrasad Sodagudi2018-08-051-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cpu_stop_queue_work() releases the lock for the stopper thread that was queued into its wake queue, preemption is enabled, which leads to the following deadlock: CPU0 CPU1 sched_setaffinity(0, ...) __set_cpus_allowed_ptr() stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...) cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...) -grabs lock for migration/0- -spins with preemption disabled, waiting for migration/0's lock to be released- -adds work items for migration/0 and queues migration/0 to its wake_q- -releases lock for migration/0 and preemption is enabled- -current thread is preempted, and __set_cpus_allowed_ptr has changed the thread's cpu allowed mask to CPU1 only- -acquires migration/0 and migration/1's locks- -adds work for migration/0 but does not add migration/0 to wake_q, since it is already in a wake_q- -adds work for migration/1 and adds migration/1 to its wake_q- -releases migration/0 and migration/1's locks, wakes migration/1, and enables preemption- -since migration/1 is requested to run, migration/1 begins to run and waits on migration/0, but migration/0 will never be able to run, since the thread that can wake it is affine to CPU1- Disable preemption in cpu_stop_queue_work() before queueing works for stopper threads, and queueing the stopper thread in the wake queue, to ensure that the operation of queueing the works and waking the stopper threads is atomic. Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: matt@codeblueprint.co.uk Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
* stop_machine: Disable preemption after queueing stopper threadsIsaac J. Manjarres2018-07-251-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") does not fully address the race condition that can occur as follows: On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads. Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2. If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run. Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic. Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: matt@codeblueprint.co.uk Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Disable preemption when waking two stopper threadsIsaac J. Manjarres2018-07-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cpu_stop_queue_two_works() begins to wake the stopper threads, it does so without preemption disabled, which leads to the following race condition: The source CPU calls cpu_stop_queue_two_works(), with cpu1 as the source CPU, and cpu2 as the destination CPU. When adding the stopper threads to the wake queue used in this function, the source CPU stopper thread is added first, and the destination CPU stopper thread is added last. When wake_up_q() is invoked to wake the stopper threads, the threads are woken up in the order that they are queued in, so the source CPU's stopper thread is woken up first, and it preempts the thread running on the source CPU. The stopper thread will then execute on the source CPU, disable preemption, and begin executing multi_cpu_stop(), and wait for an ack from the destination CPU's stopper thread, with preemption still disabled. Since the worker thread that woke up the stopper thread on the source CPU is affine to the source CPU, and preemption is disabled on the source CPU, that thread will never run to dequeue the destination CPU's stopper thread from the wake queue, and thus, the destination CPU's stopper thread will never run, causing the source CPU's stopper thread to wait forever, and stall. Disable preemption when waking the stopper threads in cpu_stop_queue_two_works(). Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock") Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: matt@codeblueprint.co.uk Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1530655334-4601-1-git-send-email-isaacm@codeaurora.org
* Merge tag 'v4.17-rc5' into locking/core, to pick up fixesIngo Molnar2018-05-151-5/+14
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlockPeter Zijlstra2018-05-031-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matt reported the following deadlock: CPU0 CPU1 schedule(.prev=migrate/0) <fault> pick_next_task() ... idle_balance() migrate_swap() active_balance() stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix this deadlock by taking the wakeups out from under stopper->lock. This allows the active_balance() to queue the stop work and finish the context switch, which in turn allows the wakeup from migrate_swap() to observe the context and complete the wakeup. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Use raw spinlocksThomas Gleixner2018-04-271-12/+12
|/ | | | | | | | | | | | Use raw-locks in stop_machine() to allow locking in irq-off and preempt-disabled regions on -RT. This also documents the possible locking context in general. [bigeasy: update patch description.] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20180423191635.6014-1-bigeasy@linutronix.de
* stop_machine: Provide stop_machine_cpuslocked()Sebastian Andrzej Siewior2017-05-261-4/+7
| | | | | | | | | | | | | | | | | | | | | Some call sites of stop_machine() are within a get_online_cpus() protected region. stop_machine() calls get_online_cpus() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.400700852@linutronix.de
* locking/core, stop_machine: Yield the CPU during stop machine()Christian Borntraeger2016-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time ago the following commit: 57f2ffe14fd125c2 ("s390: remove diag 44 calls from cpu_relax()") ... stopped cpu_relax() on s390 yielding to the hypervisor. As it turns out this made stop_machine() run really slow on virtualized overcommited systems. For example the kprobes test during bootup took several seconds instead of just running unnoticed with large guests. Therefore, yielding was reintroduced with commit: 4d92f50249eb ("s390: reintroduce diag 44 calls for cpu_relax()") ... but in fact the stop machine code seems to be the only place where this yielding was really necessary. This place is probably the most important one as it makes all but one guest CPUs wait for one guest CPU. As we now have cpu_relax_yield(), we can use this in multi_cpu_stop(). For now lets only add it here. We can add it later in other places when necessary. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2016-10-031-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes are: - irqtime accounting cleanups and enhancements. (Frederic Weisbecker) - schedstat debugging enhancements, make it more broadly runtime available. (Josh Poimboeuf) - More work on asymmetric topology/capacity scheduling. (Morten Rasmussen) - sched/wait fixes and cleanups. (Oleg Nesterov) - PELT (per entity load tracking) improvements. (Peter Zijlstra) - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra) - sched/numa enhancements/fixes (Rik van Riel) - sched/cputime scalability improvements (Stanislaw Gruszka) - Load calculation arithmetics fixes. (Dietmar Eggemann) - sched/deadline enhancements (Tommaso Cucinotta) - Fix utilization accounting when switching to the SCHED_NORMAL policy. (Vincent Guittot) - ... plus misc cleanups and enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/irqtime: Consolidate irqtime flushing code sched/irqtime: Consolidate accounting synchronization with u64_stats API u64_stats: Introduce IRQs disabled helpers sched/irqtime: Remove needless IRQs disablement on kcpustat update sched/irqtime: No need for preempt-safe accessors sched/fair: Fix min_vruntime tracking sched/debug: Add SCHED_WARN_ON() sched/core: Fix set_user_nice() sched/fair: Introduce set_curr_task() helper sched/core, ia64: Rename set_curr_task() sched/core: Fix incorrect utilization accounting when switching to fair class sched/core: Optimize SCHED_SMT sched/core: Rewrite and improve select_idle_siblings() sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared sched/core: Introduce 'struct sched_domain_shared' sched/core: Restructure destroy_sched_domain() sched/core: Remove unused @cpu argument from destroy_sched_domain*() sched/wait: Introduce init_wait_entry() sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock() sched/wait: Avoid abort_exclusive_wait() in ___wait_event() ...
| * stop_machine: Avoid a sleep and wakeup in stop_one_cpu()Cheng Chao2016-09-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup cycle by doing a preemption. Callers such as sched_exec() can benefit from this change. Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: chris@chris-wilson.co.uk Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()Oleg Nesterov2016-09-221-16/+26
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_two_cpus() and stop_cpus() use stop_cpus_lock to avoid the deadlock, we need to ensure that the stopper functions can't be queued "backwards" from one another. This doesn't look nice; if we use lglock then we do not really need stopper->lock, cpu_stop_queue_work() could use lg_local_lock() under local_irq_save(). OTOH it would be even better to avoid lglock in stop_machine.c and remove lg_double_lock(). This patch adds "bool stop_cpus_in_progress" set/cleared by queue_stop_cpus_work(), and changes cpu_stop_queue_two_works() to busy wait until it is cleared. queue_stop_cpus_work() sets stop_cpus_in_progress = T lockless, but after it queues a work on CPU1 it must be visible to stop_two_cpus(CPU1, CPU2) which checks it under the same lock. And since stop_two_cpus() holds the 2nd lock too, queue_stop_cpus_work() can not clear stop_cpus_in_progress if it is also going to queue a work on CPU2, it needs to take that 2nd lock to do this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151121181148.GA433@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPAREOleg Nesterov2016-07-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI hard-lockup can be triggered on another CPU which does nothing wrong and the trace from nmi_panic() won't help to investigate the problem. And this change "fixes" the problem we (seem to) hit in practice. - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0. - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft lockup and tries to report the problem. - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop() which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts. - CPU_1 spends more than 10 seconds trying to flush the log buffer to the slow serial console. - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic(). Reported-by: Wang Shu <shuwang@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Anderson <anderson@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* kernel/stop_machine.c: remove CONFIG_SMP dependenciesAndrew Morton2016-01-161-4/+0
| | | | | | | | | | | | stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always evaluates to true. [akpm@linux-foundation.org: remove now-unneeded ifdef] Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging ↵Ingo Molnar2016-01-061-2/+2
|\ | | | | | | | | | | new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * kernel: remove stop_machine() Kconfig dependencyChris Wilson2015-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the full stop_machine() routine is only enabled on SMP if module unloading is enabled, or if the CPUs are hotpluggable. This leads to configurations where stop_machine() is broken as it will then only run the callback on the local CPU with irqs disabled, and not stop the other CPUs or run the callback on them. For example, this breaks MTRR setup on x86 in certain configs since ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions") as the MTRR is only established on the boot CPU. This patch removes the Kconfig option for STOP_MACHINE and uses the SMP and HOTPLUG_CPU config options to compile the correct stop_machine() for the architecture, removing the false dependency on MODULE_UNLOAD in the process. Link: https://lkml.org/lkml/2014/10/8/124 References: https://bugs.freedesktop.org/show_bug.cgi?id=84794 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | stop_machine: Clean up the usage of the preemption counter in ↵Oleg Nesterov2015-11-231-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_stopper_thread() 1. Change this code to use preempt_count_inc/preempt_count_dec; this way it works even if CONFIG_PREEMPT_COUNT=n, and we avoid the unnecessary __preempt_schedule() check (stop_sched_class is not preemptible). And this makes clear that we only want to make preempt_count() != 0 for __might_sleep() / schedule_debug(). 2. Change WARN_ONCE() to use %pf to print the function name and remove kallsyms_lookup/ksym_buf. 3. Move "int ret" into the "if (work)" block, this looks more consistent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193332.GA8281@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to ↵Oleg Nesterov2015-11-231-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | callers Change cpu_stop_queue_work() and cpu_stopper_thread() to check done != NULL before cpu_stop_signal_done(done). This makes the code more clean imo, note that cpu_stopper_thread() has to do this check anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193329.GA8274@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Kill cpu_stop_done->executedOleg Nesterov2015-11-231-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that cpu_stop_done->executed becomes write-only (ignoring WARN_ON() checks) we can remove it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193326.GA8269@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work()Oleg Nesterov2015-11-231-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change queue_stop_cpus_work() to return true if it queues at least one work, this means that the caller should wait. __stop_cpus() can check the value returned by queue_stop_cpus_work() and avoid done.executed, just like stop_one_cpu() does. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193323.GA8262@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Change stop_one_cpu() to rely on cpu_stop_queue_work()Oleg Nesterov2015-11-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change stop_one_cpu() to return -ENOENT if cpu_stop_queue_work() fails. Otherwise we know that ->executed must be true after wait_for_completion() so we can just return done.ret. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193320.GA8259@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Make cpu_stop_queue_work() and stop_one_cpu_nowait() return boolOleg Nesterov2015-11-231-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change cpu_stop_queue_work() to return true if the work was queued and change stop_one_cpu_nowait() to return the result of cpu_stop_queue_work(). This makes it more useful, for example now you can alloc cpu_stop_work for stop_one_cpu_nowait() and free it in the callback or if stop_one_cpu_nowait() fails, currently this is impossible because you can't know if @fn will be called or not. Also, this allows to kill cpu_stop_done->executed, see the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151117170523.GA13955@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Don't disable preemption in stop_two_cpus()Oleg Nesterov2015-11-231-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that stop_two_cpus() path does not check cpu_active() we can remove preempt_disable(), it was only needed to ensure that stop_machine() can not be called after we observe cpu_active() == T and before we queue the new work. Also, turn the pointless and confusing ->executed check into WARN_ON(). We know that both works must be executed, otherwise we have a bug. And in fact I think that done->executed should die, see the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193314.GA8249@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | stop_machine: Fix possible cpu_stopper_thread() crashOleg Nesterov2015-11-231-1/+1
|/ | | | | | | | | | | | | | | | | | | | stop_one_cpu_nowait(fn) will crash the kernel if the callback returns nonzero, work->done == NULL in this case. This needs more cleanups, cpu_stop_signal_done() is called right after we check done != NULL and it does the same check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Milos Vyletel <milos@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151115193311.GA8242@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()Peter Zijlstra2015-10-201-9/+0
| | | | | | | | | | | | | | The cpu_active() tests are not fundamentally part of stop_two_cpus(), move then into the scheduler where they belong. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()Oleg Nesterov2015-10-201-11/+1
| | | | | | | | | | | | | | | | | | | | | | Now that we always use stop_machine_unpark() to wake the stopper threas up, we can kill ->setup() and fold cpu_stop_unpark() into stop_machine_unpark(). And we do not need stopper->lock to set stopper->enabled = true. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160051.GA10169@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce ↵Oleg Nesterov2015-10-201-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_machine_unpark() 1. Change smpboot_unpark_thread() to check ->selfparking, just like smpboot_park_thread() does. 2. Introduce stop_machine_unpark() which sets ->enabled and calls kthread_unpark(). 3. Change smpboot_thread_call() and cpu_stop_init() to call stop_machine_unpark() by hand. This way: - IMO the ->selfparking logic becomes more consistent. - We can kill the smp_hotplug_thread->pre_unpark() method. - We can easily unpark the stopper thread earlier. Say, we can move stop_machine_unpark() from smpboot_thread_call() to sched_cpu_active() as Peter suggests. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabledOleg Nesterov2015-10-201-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | Change cpu_stop_queue_two_works() to ensure that both CPU's have stopper->enabled == T or fail otherwise. This way stop_two_cpus() no longer needs to check cpu_active() to avoid the deadlock. This patch doesn't remove these checks, we will do this later. Note: we need to take both stopper->lock's at the same time, but this will also help to remove lglock from stop_machine.c, so I hope this is fine. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008170141.GA25537@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()Oleg Nesterov2015-10-201-11/+26
| | | | | | | | | | | | | | | | | | | | Preparation to simplify the review of the next change. Add two simple helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which simply take a bit of code from their callers. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008145134.GA18146@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Ensure that a queued callback will be called before ↵Oleg Nesterov2015-10-201-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_stop_park() cpu_stop_queue_work() checks stopper->enabled before it queues the work, but ->enabled == T can only guarantee cpu_stop_signal_done() if we race with cpu_down(). This is not enough for stop_two_cpus() or stop_machine(), they will deadlock if multi_cpu_stop() won't be called by one of the target CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex. But stop_two_cpus() has to check cpu_active() to avoid the same race with hotplug, and this check is very unobvious and probably not even correct if we race with cpu_up(). Change cpu_down() pass to clear ->enabled before cpu_stopper_thread() flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set. Note also that smpboot_thread_call() calls cpu_stop_unpark() which sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until cpu_stopper_thread() is called at least once. This all means that if cpu_stop_queue_work() succeeds, we know that work->fn() will be called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Remove cpu_stop_work's from list in cpu_stop_park()Oleg Nesterov2015-08-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | cpu_stop_park() does cpu_stop_signal_done() but leaves the work on stopper->works. The owner of this work can free/reuse this memory right after that and corrupt the list, so if this CPU becomes online again cpu_stopper_thread() will crash. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012958.GA23944@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Use 'cpu_stop_fn_t' where possibleOleg Nesterov2015-08-031-4/+4
| | | | | | | | | | | | | | | | | | | | Cosmetic, but 'cpu_stop_fn_t' actually makes the code more readable and it doesn't break cscope. And most of the declarations already use it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012955.GA23937@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Unexport __stop_machine()Oleg Nesterov2015-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | The only caller outside of stop_machine.c is _cpu_down(), it can use stop_machine(). get_online_cpus() is fine under cpu_hotplug_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012951.GA23934@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Don't do for_each_cpu() twice in queue_stop_cpus_work()Oleg Nesterov2015-08-031-10/+7
| | | | | | | | | | | | | | | | | | | queue_stop_cpus_work() can do everything in one for_each_cpu() loop. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012948.GA23927@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct ↵Oleg Nesterov2015-08-031-8/+9
| | | | | | | | | | | | | | | | | | | | | | cpu_stopper' Multpiple DEFINE_PER_CPU's do not make sense, move all the per-cpu variables into 'struct cpu_stopper'. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012944.GA23924@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/stop_machine: Fix deadlock between multiple stop_two_cpus()Peter Zijlstra2015-06-191-37/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported a machine stuck in multi_cpu_stop() with migrate_swap_stop() as function and with the following src,dst cpu pairs: {11, 4} {13, 11} { 4, 13} 4 11 13 cpuM: queue(4 ,13) *Ma cpuN: queue(13,11) *N Na *M Mb cpuO: queue(11, 4) *O Oa *Nb *Ob Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes the first/second queued work. You'll observe the top of the workqueue for each cpu: 4,11,13 to be work from cpus: M, O, N resp. IOW. deadlock. Do away with the queueing trickery and introduce lg_double_lock() to lock both CPUs and fully serialize the stop_two_cpus() callers instead of the partial (and buggy) serialization we have now. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* kernel/stop_machine.c: kernel-doc warning fixFabian Frederick2014-06-051-0/+1
| | | | | | | Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()Peter Zijlstra2014-03-111-1/+1
| | | | | | | | | | | | | | | | | | We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Fix race between stop_two_cpus() and stop_cpus()Rik van Riel2013-11-111-2/+13
| | | | | | | | | | | | | | | | | | | There is a race between stop_two_cpus, and the global stop_cpus. It is possible for two CPUs to get their stopper functions queued "backwards" from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized. This patch adds synchronization between stop_cpus (a rare operation), and stop_two_cpus. Reported-and-Tested-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20131101104146.03d1e043@annuminas.surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Remove get_online_cpus() usagePeter Zijlstra2013-10-161-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* stop_machine: Introduce stop_two_cpus()Peter Zijlstra2013-10-091-98/+174
| | | | | | | | | | | | | | | | | | | | | | | | Introduce stop_two_cpus() in order to allow controlled swapping of two tasks. It repurposes the stop_machine() state machine but only stops the two cpus which we can do with on-stack structures and avoid machine wide synchronization issues. The ordering of CPUs is important to avoid deadlocks. If unordered then two cpus calling stop_two_cpus on each other simultaneously would attempt to queue in the opposite order on each CPU causing an AB-BA style deadlock. By always having the lowest number CPU doing the queueing of works, we can guarantee that works are always queued in the same order, and deadlocks are avoided. Signed-off-by: Peter Zijlstra <peterz@infradead.org> [ Implemented deadlock avoidance. ] Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1381141781-10992-38-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>