summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* srcu: Improve rcu_seq grace-period-counter abstractionPaul E. McKenney2017-04-182-9/+29
| | | | | | | | | | | | The expedited grace-period code contains several open-coded shifts know the format of an rcu_seq grace-period counter, which is not particularly good style. This commit therefore creates a new rcu_seq_ctr() function that extracts the counter portion of the counter, and an rcu_seq_state() function that extracts the low-order state bit. This commit prepares for SRCU callback parallelization, which will require two state bits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Fix bogus try_check_zero() commentPaul E. McKenney2017-04-181-4/+3
| | | | Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Make num_rcu_lvl[] array be externalPaul E. McKenney2017-04-182-1/+2
| | | | | | | This commit makes the num_rcu_lvl[] array external so that SRCU can make use of it for initializing its upcoming srcu_node tree. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Move rcu_node traversal macros to rcu.hPaul E. McKenney2017-04-182-35/+35
| | | | | | | | | | This commit moves rcu_for_each_node_breadth_first(), rcu_for_each_nonleaf_node_breadth_first(), and rcu_for_each_leaf_node() from kernel/rcu/tree.h to kernel/rcu/rcu.h so that SRCU can access them. This commit is code-movement only. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Remove redundant levelcnt[] array from rcu_init_one()Paul E. McKenney2017-04-181-6/+3
| | | | | | | The levelcnt[] array is identical to num_rcu_lvl[], so this commit removes levelcnt[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Move rcu_init_levelspread() to rcu_tree_node.hPaul E. McKenney2017-04-185-28/+38
| | | | | | | | | This commit moves the rcu_init_levelspread() function from kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is another step towards enabling SRCU to create its own combining tree. This commit is code-movement only, give or take knock-on adjustments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Move combining-tree definitions for SRCU's benefitPaul E. McKenney2017-04-182-70/+103
| | | | | | | | | | | | | | This commit moves the C preprocessor code that defines the default shape of the rcu_node combining tree to a new include/linux/rcu_node_tree.h file as a first step towards enabling SRCU to create its own combining tree, which in turn enables SRCU to implement per-CPU callback handling, thus avoiding contention on the lock currently guarding the single list of callbacks. Note that users of SRCU still need to know the size of the srcu_struct structure, hence include/linux rather than kernel/rcu. This commit is code-movement only. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Use rcu_segcblist to track SRCU callbacksPaul E. McKenney2017-04-185-149/+52
| | | | | | | | | This commit switches SRCU from custom-built callback queues to the new rcu_segcblist structure. This change associates grace-period sequence numbers with groups of callbacks, which will be needed for efficient processing of per-CPU callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Add grace-period sequence numbersPaul E. McKenney2017-04-182-4/+24
| | | | | | | This commit adds grace-period sequence numbers, which will be used to handle mid-boot grace periods and per-CPU callback lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Move to state-based grace-period sequencingPaul E. McKenney2017-04-182-52/+69
| | | | | | | | | | | | | | | | | The current SRCU grace-period processing might never reach the last portion of srcu_advance_batches(). This is OK given the current implementation, as the first portion, up to the try_check_zero() following the srcu_flip() is sufficient to drive grace periods forward. However, it has the unfortunate side-effect of making it impossible to determine when a given grace period has ended, and it will be necessary to efficiently trace ends of grace periods in order to efficiently handle per-CPU SRCU callback lists. This commit therefore adds states to the SRCU grace-period processing, so that the end of a given SRCU grace period is marked by the transition to the SRCU_STATE_DONE state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Push srcu_advance_batches() fastpath into common casePaul E. McKenney2017-04-181-20/+7
| | | | | | | | | | | | This commit simplifies the SRCU state machine by pushing the srcu_advance_batches() idle-SRCU fastpath into the common case. This is done by giving srcu_reschedule() a delay parameter, which is zero in the call from srcu_advance_batches(). This commit is a step towards numbering callbacks in order to efficiently handle per-CPU callback lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Fix warning in rcu_seq_end()Dmitry Vyukov2017-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_seq_end() function increments seq signifying completion of a grace period, after that checks that the seq is even and wakes _synchronize_rcu_expedited(). The _synchronize_rcu_expedited() function uses wait_event() to wait for even seq. The problem is that wait_event() can return as soon as seq becomes even without waiting for the wakeup. In such case the warning in rcu_seq_end() can falsely fire if the next expedited grace period starts before the check. Check that seq has good value before incrementing it. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: syzkaller@googlegroups.com Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: josh@joshtriplett.org Cc: jiangshanlai@gmail.com Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- syzkaller-triggered warning: WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events wait_rcu_exp_gp Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 ---
* rcu: Expedited wakeups need to be fully orderedPaul E. McKenney2017-04-181-0/+2
| | | | | | | | | | | Expedited grace periods use workqueue handlers that wake up the requesters, but there is no lock mediating this wakeup. Therefore, memory barriers are required to ensure that the handler's memory references are seen by all to occur before synchronize_*_expedited() returns to its caller. Possibly detected by syzkaller. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Move rcu_seq_start() and friends to rcu.hPaul E. McKenney2017-04-182-35/+40
| | | | | | | | | | This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h. This will allow SRCU to use these functions, which in turn will allow SRCU to move from a single global callback queue to a per-CPU callback queue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Add single-element dequeue functions to rcu_segcblistPaul E. McKenney2017-04-181-0/+45
| | | | | | | | | This commit adds single-element dequeue functions to rcu_segcblist. These are less efficient than using the extract and insert functions, but allow more precise debugging code. These functions are thus expected to be used only in debug builds, for example, CONFIG_PROVE_RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Allow early boot use of synchronize_srcu()Paul E. McKenney2017-04-181-0/+2
| | | | | | | This commit checks for pre-scheduler state, and if that early in the boot process, synchronize_srcu() and friends are no-ops. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Allow SRCU to access rcu_scheduler_activePaul E. McKenney2017-04-185-38/+43
| | | | | | | This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Abstract multi-tail callback list handlingPaul E. McKenney2017-04-185-309/+780
| | | | | | | | | | | | | | | | | | RCU has only one multi-tail callback list, which is implemented via the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the rcu_data structure, and whose operations are open-code throughout the Tree RCU implementation. This has been more or less OK in the past, but upcoming callback-list optimizations in SRCU could really use a multi-tail callback list there as well. This commit therefore abstracts the multi-tail callback list handling into a new kernel/rcu/rcu_segcblist.h file, and uses this new API. The simple head-and-tail pointer callback list is also abstracted and applied everywhere except for the NOCB callback-offload lists. (Yes, the plan is to apply them there as well, but this commit is already bigger than would be good.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changedPaul E. McKenney2017-04-181-5/+1
| | | | | | | | | | | | | If the RCU_EXPERT Kconfig option is not set (the default), then the RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems and 64 on 64-bit systems. This can result in excessive lock contention. This commit therefore changes the computation of the leaf-level rcu_node tree fanout so that the result will be 16 unless an explicit Kconfig or kernel-boot setting says otherwise. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actionsPaul E. McKenney2017-04-185-18/+44
| | | | | | | | | | | | | The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
* rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()Paul E. McKenney2017-04-183-54/+15
| | | | | | | | | | The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking that one of them still needs a quiescent state before doing an expensive atomic operation on the ->dynticks counter. However, this check reduces overhead only after a rare race condition, and increases complexity. This commit therefore removes the scan and the mechanism enabling the scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Pull rcu_qs_ctr into rcu_dynticks structurePaul E. McKenney2017-04-184-15/+19
| | | | | | | | The rcu_qs_ctr variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Pull rcu_sched_qs_mask into rcu_dynticks structurePaul E. McKenney2017-04-183-8/+14
| | | | | | | | The rcu_sched_qs_mask variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Semicolon inside RCU_TRACE() for tree.cPaul E. McKenney2017-04-181-4/+4
| | | | | | | | | | The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Semicolon inside RCU_TRACE() for Tiny RCUPaul E. McKenney2017-04-182-12/+12
| | | | | | | | | | The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Semicolon inside RCU_TRACE() for rcu.hPaul E. McKenney2017-04-181-2/+2
| | | | | | | | | | The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* srcu: Check for tardy grace-period activity in cleanup_srcu_struct()Paul E. McKenney2017-04-181-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users of SRCU are obliged to complete all grace-period activity before invoking cleanup_srcu_struct(). This means that all calls to either synchronize_srcu() or synchronize_srcu_expedited() must have returned, and all calls to call_srcu() must have returned, and the last call to call_srcu() must have been followed by a call to srcu_barrier(). Furthermore, the caller must have done something to prevent any further calls to synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu(). Therefore, if there has ever been an invocation of call_srcu() on the srcu_struct in question, the sequence of events must be as follows: 1. Prevent any further calls to call_srcu(). 2. Wait for any pre-existing call_srcu() invocations to return. 3. Invoke srcu_barrier(). 4. It is now safe to invoke cleanup_srcu_struct(). On the other hand, if there has ever been a call to synchronize_srcu() or synchronize_srcu_expedited(), the sequence of events must be as follows: 1. Prevent any further calls to synchronize_srcu() or synchronize_srcu_expedited(). 2. Wait for any pre-existing synchronize_srcu() or synchronize_srcu_expedited() invocations to return. 3. It is now safe to invoke cleanup_srcu_struct(). If there have been calls to all both types of functions (call_srcu() and either of synchronize_srcu() and synchronize_srcu_expedited()), then the caller must do the first three steps of the call_srcu() procedure above and the first two steps of the synchronize_s*() procedure above, and only then invoke cleanup_srcu_struct(). Note that cleanup_srcu_struct() does some probabilistic checks for the caller failing to follow these procedures, in which case cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU structures associated with the specified srcu_struct structure. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* srcu: Consolidate batch checking into rcu_all_batches_empty()Paul E. McKenney2017-04-181-8/+13
| | | | | | | | | | | The srcu_reschedule() function invokes rcu_batch_empty() on each of the four rcu_batch structures in the srcu_struct in question twice. Given that this check will also be needed in cleanup_srcu_struct(), this commit consolidates these four checks into a new rcu_all_batches_empty() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Make arch select smp_mb__after_unlock_lock() strengthPaul E. McKenney2017-04-183-3/+7
| | | | | | | | | | | | | | | | | | | | | The definition of smp_mb__after_unlock_lock() is currently smp_mb() for CONFIG_PPC and a no-op otherwise. It would be better to instead provide an architecture-selectable Kconfig option, and select the strength of smp_mb__after_unlock_lock() based on that option. This commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it, and bases the definition of smp_mb__after_unlock_lock() on this new ARCH_WEAK_RELEASE_ACQUIRE Kconfig option. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <linuxppc-dev@lists.ozlabs.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Maintain special bits at bottom of ->dynticks counterPaul E. McKenney2017-04-183-16/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, IPIs are used to force other CPUs to invalidate their TLBs in response to a kernel virtual-memory mapping change. This works, but degrades both battery lifetime (for idle CPUs) and real-time response (for nohz_full CPUs), and in addition results in unnecessary IPIs due to the fact that CPUs executing in usermode are unaffected by stale kernel mappings. It would be better to cause a CPU executing in usermode to wait until it is entering kernel mode to do the flush, first to avoid interrupting usemode tasks and second to handle multiple flush requests with a single flush in the case of a long-running user task. This commit therefore reserves a bit at the bottom of the ->dynticks counter, which is checked upon exit from extended quiescent states. If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is invoked, which, if not supplied, is an empty single-pass do-while loop. If this bottom bit is set on -entry- to an extended quiescent state, then a WARN_ON_ONCE() triggers. This bottom bit may be set using a new rcu_eqs_special_set() function, which returns true if the bit was set, or false if the CPU turned out to not be in an extended quiescent state. Please note that this function refuses to set the bit for a non-nohz_full CPU when that CPU is executing in usermode because usermode execution is tracked by RCU as a dyntick-idle extended quiescent state only for nohz_full CPUs. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* Linux 4.11-rc2v4.11-rc2Linus Torvalds2017-03-121-1/+1
|
* Merge branch 'for-linus' of ↵Linus Torvalds2017-03-1211-29/+51
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - four patches to get the new cputime code in shape for s390 - add the new statx system call - a few bug fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: wire up statx system call KVM: s390: Fix guest migration for huge guests resulting in panic s390/ipl: always use load normal for CCW-type re-IPL s390/timex: micro optimization for tod_to_ns s390/cputime: provide archicture specific cputime_to_nsecs s390/cputime: reset all accounting fields on fork s390/cputime: remove last traces of cputime_t s390: fix in-kernel program checks s390/crypt: fix missing unlock in ctr_paes_crypt on error path
| * s390: wire up statx system callHeiko Carstens2017-03-103-1/+6
| | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * KVM: s390: Fix guest migration for huge guests resulting in panicJanosch Frank2017-03-021-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we can technically not run huge page guests right now, we can setup a guest with huge pages. Trying to migrate it will trigger a VM_BUG_ON and, if the kernel is not configured to panic on a BUG, it will happily try to work on non-existing page table entries. With this patch, we always return "dirty" if we encounter a large page when migrating. This at least fixes the immediate problem until we have proper handling for both kind of pages. Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.") Cc: <stable@vger.kernel.org> # 3.16+ Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/ipl: always use load normal for CCW-type re-IPLHeiko Carstens2017-03-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | commit 14890678687c ("s390/ipl: use load normal for LPAR re-ipl") missed to convert one code path to use load normal semantics for re-IPL. Convert the missing code path as well. Fixes: 14890678687c ("s390/ipl: use load normal for LPAR re-ipl") Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/timex: micro optimization for tod_to_nsMartin Schwidefsky2017-03-011-8/+4
| | | | | | | | | | | | | | | | The conversion of a TOD value to nano-seconds currently uses a 32/32 bit split with the calculation for "nsecs = (TOD * 125) >> 9". Using a 55/9 bit split saves an instruction. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/cputime: provide archicture specific cputime_to_nsecsMartin Schwidefsky2017-03-011-1/+7
| | | | | | | | | | | | | | | | | | The generic cputime_to_nsecs function first converts the cputime to micro-seconds and then multiplies the result with 1000. This looses some bits of accuracy, provide our own version of cputime_to_nsecs that does not loose precision. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/cputime: reset all accounting fields on forkMartin Schwidefsky2017-03-011-0/+3
| | | | | | | | | | | | | | copy_thread has to reset all cputime related field in the task struct, not only user_timer and system_timer. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/cputime: remove last traces of cputime_tMartin Schwidefsky2017-03-012-13/+3
| | | | | | | | | | | | | | The cputime_t type is a thing of the past, replace the last occurences of the type in the s390 code with a simple u64. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390: fix in-kernel program checksMartin Schwidefsky2017-03-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A program check inside the kernel takes a slightly different path in entry.S compare to a normal user fault. A recent change moved the store of the breaking event address into the path taken for in-kernel program checks as well, but %r14 has not been setup to point to the correct location. A wild store is the consequence. Move the store of the breaking event address to the code path for user space faults. Fixes: 34525e1f7e8d ("s390: store breaking event address only for program checks") Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/crypt: fix missing unlock in ctr_paes_crypt on error pathMartin Schwidefsky2017-03-011-1/+4
| | | | | | | | | | | | | | | | | | The ctr mode of protected key aes uses the ctrblk page if the ctrblk_lock could be acquired. If the protected key has to be reestablished and this operation fails the unlock for the ctrblk_lock is missing. Add it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-03-1212-48/+80
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - a fix for the kexec/purgatory regression which was introduced in the merge window via an innocent sparse fix. We could have reverted that commit, but on deeper inspection it turned out that the whole machinery is neither documented nor robust. So a proper cleanup was done instead - the fix for the TLB flush issue which was discovered recently - a simple typo fix for a reboot quirk * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix tlb flushing when lguest clears PGE kexec, x86/purgatory: Unbreak it and clean it up x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
| * | x86/tlb: Fix tlb flushing when lguest clears PGEDaniel Borkmann2017-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | kexec, x86/purgatory: Unbreak it and clean it upThomas Gleixner2017-03-1010-46/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: 72042a8c7b01 ("x86/purgatory: Make functions and variables static") Reported-by: Mike Galbraith <<efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirkMatjaz Hegedic2017-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: 90b28ded88dd ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2017-03-125-4/+35
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - a workaround for a GIC erratum - a missing stub function for CONFIG_IRQDOMAIN=n - fixes for a couple of type inconsistencies * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/crossbar: Fix incorrect type of register size irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065 irqdomain: Add empty irq_domain_check_msi_remap irqchip/crossbar: Fix incorrect type of local variables
| * \ \ Merge tag 'irq-fixes-4.11-rc2' of ↵Thomas Gleixner2017-03-092159-24786/+57596
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip/irqdomain updates for 4.11-rc2 from Marc Zyngier - irqchip/crossbar: Some type tidying up - irqchip/gicv3-its: Workaround for a Qualcomm erratum - irqdomain: Compile for for systems that don't use CONFIG_IRQ_DOMAIN Fixed up minor conflict in the crossbar driver.
| | * | | irqchip/crossbar: Fix incorrect type of register sizeFranck Demathieu2017-03-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'size' variable is unsigned according to the dt-bindings. As this variable is used as integer in other places, create a new variable that allows to fix the following sparse issue (-Wtypesign): drivers/irqchip/irq-crossbar.c:279:52: warning: incorrect type in argument 3 (different signedness) drivers/irqchip/irq-crossbar.c:279:52: expected unsigned int [usertype] *out_value drivers/irqchip/irq-crossbar.c:279:52: got int *<noident> Signed-off-by: Franck Demathieu <fdemathieu@gmail.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065Shanker Donthineni2017-03-073-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware implementation uses 16Bytes for Interrupt Translation Entry (ITE), but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size. It might cause kernel memory corruption depending on the number of MSI(x) that are configured and the amount of memory that has been allocated for ITEs in its_create_device(). This patch fixes the potential memory corruption by setting the correct ITE size to 16Bytes. Cc: stable@vger.kernel.org Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | irqdomain: Add empty irq_domain_check_msi_remapMian Yousaf Kaukab2017-03-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix following build error for s390: drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_type1_attach_group': drivers/vfio/vfio_iommu_type1.c:1290:25: error: implicit declaration of function 'irq_domain_check_msi_remap' Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>