summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* locking/pvqspinlock, x86: Enable PV qspinlock for KVMWaiman Long2015-05-082-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the necessary KVM specific code to allow KVM to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-11-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/pvqspinlock, x86: Implement the paravirt qspinlock call patchingPeter Zijlstra (Intel)2015-05-088-12/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use the regular paravirt call patching to switch between: native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath() native_queued_spin_unlock() __pv_queued_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a "movb $0,%arg1" if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/pvqspinlock: Implement simple paravirt support for the qspinlockWaiman Long2015-05-082-1/+392
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a separate (second) version of the spin_lock_slowpath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429901803-29771-9-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Revert to test-and-set on hypervisorsPeter Zijlstra (Intel)2015-05-083-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Use a simple write to grab the lockWaiman Long2015-05-081-16/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the tail code. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that both the lock and pending bits are not set. The current pending bit waiting code will ensure that the bit will not be set as soon as the tail code in the lock is set. With that change, the are some slight improvement in the performance of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket Westere-EX machine as shown in the tables below. [Standalone/Embedded - same node] # of tasks Before patch After patch %Change ---------- ----------- ---------- ------- 3 2324/2321 2248/2265 -3%/-2% 4 2890/2896 2819/2831 -2%/-2% 5 3611/3595 3522/3512 -2%/-2% 6 4281/4276 4173/4160 -3%/-3% 7 5018/5001 4875/4861 -3%/-3% 8 5759/5750 5563/5568 -3%/-3% [Standalone/Embedded - different nodes] # of tasks Before patch After patch %Change ---------- ----------- ---------- ------- 3 12242/12237 12087/12093 -1%/-1% 4 10688/10696 10507/10521 -2%/-2% It was also found that this change produced a much bigger performance improvement in the newer IvyBridge-EX chip and was essentially to close the performance gap between the ticket spinlock and queued spinlock. The disk workload of the AIM7 benchmark was run on a 4-socket Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users on a 3.14 based kernel. The results of the test runs were: AIM7 XFS Disk Test kernel JPM Real Time Sys Time Usr Time ----- --- --------- -------- -------- ticketlock 5678233 3.17 96.61 5.81 qspinlock 5750799 3.13 94.83 5.97 AIM7 EXT4 Disk Test kernel JPM Real Time Sys Time Usr Time ----- --- --------- -------- -------- ticketlock 1114551 16.15 509.72 7.11 qspinlock 2184466 8.24 232.99 6.01 The ext4 filesystem run had a much higher spinlock contention than the xfs filesystem run. The "ebizzy -m" test was also run with the following results: kernel records/s Real Time Sys Time Usr Time ----- --------- --------- -------- -------- ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Optimize for smaller NR_CPUSPeter Zijlstra (Intel)2015-05-082-1/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obviating one atomic operation entirely. This optimization is needed to make the qspinlock achieve performance parity with ticket spinlock at light load. All this is horribly broken on Alpha pre EV56 (and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Extract out code snippets for the next patchWaiman Long2015-05-082-31/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparatory patch that extracts out the following 2 code snippets to prepare for the next performance optimization patch. 1) the logic for the exchange of new and previous tail code words into a new xchg_tail() function. 2) the logic for clearing the pending bit and setting the locked bit into a new clear_pending_set_locked() function. This patch also simplifies the trylock operation before queuing by calling queued_spin_trylock() directly. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Add pending bitPeter Zijlstra (Intel)2015-05-082-24/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the qspinlock needs to touch a second cacheline (the per-cpu mcs_nodes[]); add a pending bit and allow a single in-word spinner before we punt to the second cacheline. It is possible so observe the pending bit without the locked bit when the last owner has just released but the pending owner has not yet taken ownership. In this case we would normally queue -- because the pending bit is already taken. However, in this case the pending bit is guaranteed to be released 'soon', therefore wait for it and avoid queueing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock, x86: Enable x86-64 to use queued spinlocksWaiman Long2015-05-084-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the necessary changes at the x86 architecture specific layer to enable the use of queued spinlocks for x86-64. As x86-32 machines are typically not multi-socket. The benefit of queue spinlock may not be apparent. So queued spinlocks are not enabled. Currently, there is some incompatibilities between the para-virtualized spinlock code (which hard-codes the use of ticket spinlock) and the queued spinlocks. Therefore, the use of queued spinlocks is disabled when the para-virtualized spinlock is enabled. The arch/x86/include/asm/qspinlock.h header file includes some x86 specific optimization which will make the queueds spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/qspinlock: Introduce a simple generic 4-byte queued spinlockWaiman Long2015-05-086-0/+408
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new generic queued spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queued spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster in high contention situations especially when the spinlock is embedded within the data structure to be protected. Only in light to moderate contention where the average queue depth is around 1-3 will this queued spinlock be potentially a bit slower due to the higher slowpath overhead. This queued spinlock is especially suit to NUMA machines with a large number of cores as the chance of spinlock contention is much higher in those machines. The cost of contention is also higher because of slower inter-node memory traffic. Due to the fact that spinlocks are acquired with preemption disabled, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time. Counting soft IRQ, hard IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting activities. By allocating a set of per-cpu queue nodes and used them to form a waiting queue, we can encode the queue node address into a much smaller 24-bit size (including CPU number and queue node index) leaving one byte for the lock. Please note that the queue node is only needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* kernel: Replace reference to ASSIGN_ONCE() with WRITE_ONCE() in commentPreeti U Murthy2015-05-081-1/+1
| | | | | | | | | | | | | | | | | | | Looks like commit : 43239cbe79fc ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE(). Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: borntraeger@de.ibm.com Cc: dave@stgolabs.net Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()Waiman Long2015-05-082-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In up_write()/up_read(), rwsem_wake() will be called whenever it detects that some writers/readers are waiting. The rwsem_wake() function will take the wait_lock and call __rwsem_do_wake() to do the real wakeup. For a heavily contended rwsem, doing a spin_lock() on wait_lock will cause further contention on the heavily contended rwsem cacheline resulting in delay in the completion of the up_read/up_write operations. This patch makes the wait_lock taking and the call to __rwsem_do_wake() optional if at least one spinning writer is present. The spinning writer will be able to take the rwsem and call rwsem_wake() later when it calls up_write(). With the presence of a spinning writer, rwsem_wake() will now try to acquire the lock using trylock. If that fails, it will just quit. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'pm+acpi-4.1-rc3' of ↵Linus Torvalds2015-05-086-7/+51
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These include three regression fixes (PCI resources management, ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI documentation fixes related to GPIO. Specifics: - Fix for a PCI resources management regression introduced during the 4.0 cycle and related to the handling of ACPI resources' Producer/Consumer flags that turn out to be useless (Jiang Liu) - Fix for a MacBook regression related to the Smart Battery Subsystem (SBS) driver causing various problems (stalls on boot, failure to detect or report battery) to happen and introduced during the 3.18 cycle (Chris Bainbridge) - Fix for an ACPI/PNP device enumeration regression introduced during the 3.16 cycle caused by failing to include two PNP device IDs into the list of IDs that PNP device objects need to be created for (Witold Szczeponik) - Fixes for two minor mistakes in the ACPI GPIO properties documentation (Antonio Ospite, Rafael J Wysocki)" * tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PNP: add two IDs to list for PNPACPI device enumeration ACPI / documentation: Fix ambiguity in the GPIO properties document ACPI / documentation: fix a sentence about GPIO resources ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
| *---. Merge branches 'acpi-resources', 'acpi-battery', 'acpi-doc' and 'acpi-pnp'Rafael J. Wysocki2015-05-07230-1649/+2700
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-resources: x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus * acpi-battery: ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook * acpi-doc: ACPI / documentation: Fix ambiguity in the GPIO properties document ACPI / documentation: fix a sentence about GPIO resources * acpi-pnp: ACPI / PNP: add two IDs to list for PNPACPI device enumeration
| | | | * ACPI / PNP: add two IDs to list for PNPACPI device enumerationWitold Szczeponik2015-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit eec15edbb0e1 (ACPI / PNP: use device ID list for PNPACPI device enumeration) changed the way how ACPI devices are enumerated and when they are added to the PNP bus. However, it broke the sound card support on (at least) a vintage IBM ThinkPad 600E: with said commit applied, two of the necessary "CSC01xx" devices are not added to the PNP bus and hence can not be found during the initialization of the "snd-cs4236" module. As a consequence, loading "snd-cs4236" causes null pointer exceptions. The attached patch fixes the problem end re-enables sound on the IBM ThinkPad 600E. Fixes: eec15edbb0e1 (ACPI / PNP: use device ID list for PNPACPI device enumeration) Signed-off-by: Witold Szczeponik <Witold.Szczeponik@gmx.net> Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | ACPI / documentation: Fix ambiguity in the GPIO properties documentRafael J. Wysocki2015-05-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first paragraph in Documentation/acpi/gpio-properties.txt is ambiguous, so make it more clear. Reported-by: Antonio Ospite <ao2@ao2.it> Acked-by: Antonio Ospite <ao2@ao2.it> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | ACPI / documentation: fix a sentence about GPIO resourcesAntonio Ospite2015-04-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sentence "These resources are used be used to pass ..." contains a suspicious repetition, likely the author meant "These resources can be used to pass ...". Simplify the wording. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBookChris Bainbridge2015-04-301-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7bc5a2bad0b8 'ACPI: Support _OSI("Darwin") correctly' caused the MacBook firmware to expose the SBS, resulting in intermittent hangs of several minutes on boot, and failure to detect or report the battery. Fix this by adding a 5 us delay to the start of each SMBUS transaction. This timing is the result of experimentation - hangs were observed with 3 us but never with 5 us. Fixes: 7bc5a2bad0b8 'ACPI: Support _OSI("Darwin") correctly' Link: https://bugzilla.kernel.org/show_bug.cgi?id=94651 Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Cc: 3.18+ <stable@vger.kernel.org> # 3.18+ [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI busJiang Liu2015-04-302-3/+23
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An IO port or MMIO resource assigned to a PCI host bridge may be consumed by the host bridge itself or available to its child bus/devices. The ACPI specification defines a bit (Producer/Consumer) to tell whether the resource is consumed by the host bridge itself, but firmware hasn't used that bit consistently, so we can't rely on it. Before commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored all IO port resources defined by acpi_resource_io and acpi_resource_fixed_io to filter out IO ports consumed by the host bridge itself. Commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation") started accepting all IO port and MMIO resources, which caused a regression that IO port resources consumed by the host bridge itself became available to its child devices. Then commit 63f1789ec716 ("x86/PCI/ACPI: Ignore resources consumed by host bridge itself") ignored resources consumed by the host bridge itself by checking the IORESOURCE_WINDOW flag, which accidently removed MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32 and acpi_resource_fixed_memory32. On x86 and IA64 platforms, all IO port and MMIO resources are assumed to be available to child bus/devices except one special case: IO port [0xCF8-0xCFF] is consumed by the host bridge itself to access PCI configuration space. So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution will also ease the way to consolidate ACPI PCI host bridge common code from x86, ia64 and ARM64. Related ACPI table are archived at: https://bugzilla.kernel.org/show_bug.cgi?id=94221 Related discussions at: http://patchwork.ozlabs.org/patch/461633/ https://lkml.org/lkml/2015/3/29/304 Fixes: 63f1789ec716 (Ignore resources consumed by host bridge itself) Reported-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: 4.0+ <stable@vger.kernel.org> # 4.0+ Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | Merge tag 'for-f2fs-4.1-rc3' of ↵Linus Torvalds2015-05-074-5/+12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Fix a performance regression and a bug" * tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix wrong error hanlder in f2fs_follow_link Revert "f2fs: enhance multi-threads performance"
| * | | | f2fs: fix wrong error hanlder in f2fs_follow_linkJaegeuk Kim2015-05-041-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The page_follow_link_light returns NULL and its error pointer was remained in nd->path. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | | Revert "f2fs: enhance multi-threads performance"Jaegeuk Kim2015-05-043-0/+9
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reports performance regression by Yuanhan Liu. The basic idea was to reduce one-point mutex, but it turns out this causes another contention like context swithes. https://lkml.org/lkml/2015/4/21/11 Until finishing the analysis on this issue, I'd like to revert this for a while. This reverts commit 78373b7319abdf15050af5b1632c4c8b8b398f33.
* | | | Merge tag 'pinctrl-v4.1-3' of ↵Linus Torvalds2015-05-077-13/+16
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a smallish set of pin control fixes for the v4.1 cycle, collected the last two weeks: - fix a real nasty legacy bug that has screwed up the protection of adding pinctrl maps dynamically. Normally this didn't happen so much but Dough Anderson ran into it and fixed it, kudos! - minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers" * tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: Don't just pretend to protect pinctrl_maps, do it for real pinctrl: mediatek: mtk-common: initialize unmask pinctrl: qcom-spmi-mpp: Fix input value report pinctrl: qcom-spmi: Fix pin direction configuration pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)
| * | | | pinctrl: Don't just pretend to protect pinctrl_maps, do it for realDoug Anderson2015-05-063-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Way back, when the world was a simpler place and there was no war, no evil, and no kernel bugs, there was just a single pinctrl lock. That was how the world was when (57291ce pinctrl: core device tree mapping table parsing support) was written. In that case, there were instances where the pinctrl mutex was already held when pinctrl_register_map() was called, hence a "locked" parameter was passed to the function to indicate that the mutex was already locked (so we shouldn't lock it again). A few years ago in (42fed7b pinctrl: move subsystem mutex to pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex. ...but (oops) we forgot to re-think about the whole "locked" parameter for pinctrl_register_map(). Basically the "locked" parameter appears to still refer to whether the bigger pinctrl_dev mutex is locked, but we're using it to skip locks of our (now separate) pinctrl_maps_mutex. That's kind of a bad thing(TM). Probably nobody noticed because most of the calls to pinctrl_register_map happen at boot time and we've got synchronous device probing. ...and even cases where we're asynchronous don't end up actually hitting the race too often. ...but after banging my head against the wall for a bug that reproduced 1 out of 1000 reboots and lots of looking through kgdb, I finally noticed this. Anyway, we can now safely remove the "locked" parameter and go back to a war-free, evil-free, and kernel-bug-free world. Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct") Signed-off-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | pinctrl: mediatek: mtk-common: initialize unmaskColin Ian King2015-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cppcheck detected an uninitialized variable: [drivers/pinctrl/mediatek/pinctrl-mtk-common.c:897]: (error) Uninitialized variable: unmask unmask should be initialized to zero to ensure unmasking only occurs if a previous mask occurred. The current situation is that the unmask variable could contain any random garbage causing random unexpected unmasking. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | pinctrl: qcom-spmi-mpp: Fix input value reportIvan T. Ivanov2015-04-281-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix interpretation of the pmic_mpp_read() return code, negative value means an error. Signed-off-by: Ivan T. Ivanov <ivan.ivanov@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | pinctrl: qcom-spmi: Fix pin direction configurationIvan T. Ivanov2015-04-272-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pin direction configuration was incorrectly overwritten by output and function values in set_mux(). Fix this. Signed-off-by: Ivan T. Ivanov <ivan.ivanov@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)Andrew Andrianov2015-04-271-1/+1
| | |/ / | |/| | | | | | | | | | | | | | Signed-off-by: Andrew Andrianov <andrew@ncrmnt.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* | | | Merge tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfioLinus Torvalds2015-05-072-4/+25
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull vfio fixes from Alex Williamson: "Fix some undesirable behavior with the vfio device request interface: - increase verbosity of device request channel (Alex Williamson) - fix runaway interruptible timeout (Alex Williamson)" * tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio: vfio: Fix runaway interruptible timeout vfio-pci: Log device requests more verbosely
| * | | | vfio: Fix runaway interruptible timeoutAlex Williamson2015-05-021-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 13060b64b819 ("vfio: Add and use device request op for vfio bus drivers") incorrectly makes use of an interruptible timeout. When interrupted, the signal remains pending resulting in subsequent timeouts occurring instantly. This makes the loop spin at a much higher rate than intended. Instead of making this completely non-interruptible, we can change this into a sort of interruptible-once behavior and use the "once" to log debug information. The driver API doesn't allow us to abort and return an error code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: 13060b64b819 Cc: stable@vger.kernel.org # v4.0
| * | | | vfio-pci: Log device requests more verboselyAlex Williamson2015-05-011-1/+7
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Log some clues indicating whether the user is receiving device request interfaces or not listening. This can help indicate why a driver unbind is blocked or explain why QEMU automatically unplugged a device from the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | | Merge tag 'for-linus' of git://github.com/dledford/linuxLinus Torvalds2015-05-0731-219/+568
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull infiniband updates from Doug Ledford: "Minor updates for 4.1-rc Most of the changes are fairly small and well confined. The iWARP address reporting changes are the only ones that are a medium size. I had these queued up prior to rc1, but due to the shuffle in maintainers, they did not get submitted when I expected. My apologies for that. I feel comfortable with them however due to the testing they've received, so I left them in this submission" * tag 'for-linus' of git://github.com/dledford/linux: MAINTAINERS: Update InfiniBand subsystem maintainer MAINTAINERS: add include/rdma/ to InfiniBand subsystem IPoIB/CM: Fix indentation level iw_cxgb4: Remove negative advice dmesg warnings IB/core: Fix unaligned accesses IB/core: change rdma_gid2ip into void function as it always return zero IB/qib: use arch_phys_wc_add() IB/qib: add acounting for MTRR IB/core: dma unmap optimizations IB/core: dma map/unmap locking optimizations RDMA/cxgb4: Report the actual address of the remote connecting peer RDMA/nes: Report the actual address of the remote connecting peer RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients iw_cxgb4: enforce qp/cq id requirements iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs iw_cxgb4: 32b platform fixes iw_cxgb4: Cleanup register defines/MACROS RDMA/CMA: Canonize IPv4 on IPV6 sockets properly
| * | | | MAINTAINERS: Update InfiniBand subsystem maintainerDoug Ledford2015-05-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Roland stepped down, the community asked me to take his place, and the nomination was followed by sufficient votes and no dissensions that we can move forward with the change. Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | MAINTAINERS: add include/rdma/ to InfiniBand subsystemYann Droneaud2015-05-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most headers for InfiniBand/RDMA are located under include/rdma/ and include/uapi/rdma. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IPoIB/CM: Fix indentation levelBart Van Assche2015-05-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See also patch "IPoIB/cm: Add connected mode support for devices without SRQs" (commit ID 68e995a29572). Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | iw_cxgb4: Remove negative advice dmesg warningsHariprasad S2015-05-053-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove these log messages in favor of per-endpoint counters as well as device-global counters that can be inspected via debugfs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/core: Fix unaligned accessesDavid Ahern2015-05-053-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Addresses the following kernel logs seen during boot of sparc systems: Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/core: change rdma_gid2ip into void function as it always return zeroHonggang LI2015-05-052-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Honggang Li <honli@redhat.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/qib: use arch_phys_wc_add()Luis R. Rodriguez2015-05-057-79/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver already makes use of ioremap_wc() on PIO buffers, so convert it to use arch_phys_wc_add(). The qib driver uses a mmap() special case for when PAT is not used, this behaviour used to be determined with a module parameter but since we have been asked to just remove that module parameter this checks for the WC cookie, if not set we can assume PAT was used. If its set we do what we used to do for the mmap for when MTRR was enabled. The removal of the module parameter is OK given that Andy notes that even if users of module parameter are still around it will not prevent loading of the module on recent kernels. Cc: Doug Ledford <dledford@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: infinipath@intel.com Cc: linux-rdma@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xensource.com Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/qib: add acounting for MTRRLuis R. Rodriguez2015-05-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no good reason not to, we eventually delete it as well. Cc: Toshi Kani <toshi.kani@hp.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Airlie <airlied@redhat.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/core: dma unmap optimizationsGuy Shapiro2015-05-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While unmapping an ODP writable page, the dirty bit of the page is set. In order to do so, the head of the compound page is found. Currently, the compound head is found even on non-writable pages, where it is never used, leading to unnecessary cpu barrier that impacts performance. This patch moves the search for the compound head to be done only when needed. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/core: dma map/unmap locking optimizationsGuy Shapiro2015-05-051-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, while mapping or unmapping pages for ODP, the umem mutex is locked and unlocked once for each page. Such lock/unlock operation take few tens to hundreds of nsecs. This makes a significant impact when mapping or unmapping few MBs of memory. To avoid this, the mutex should be locked only once per operation, and not per page. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/cxgb4: Report the actual address of the remote connecting peerSteve Wise2015-05-052-4/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get the actual (non-mapped) ip/tcp address of the connecting peer from the port mapper Also setup the passive side endpoint to correctly display the actual and mapped addresses for the new connection. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/nes: Report the actual address of the remote connecting peerTatyana Nikolova2015-05-052-17/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get the actual (non-mapped) ip/tcp address of the connecting peer from the port mapper and report the address info to the user space application at the time of connection establishment Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the ↵Tatyana Nikolova2015-05-055-34/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | connecting peer to its clients Add functionality to enable the port mapper on the passive side to provide to its clients the actual (non-mapped) ip/tcp address information of the connecting peer 1) Adding remote_info_cb() to process the address info of the connecting peer The address info is provided by the user space port mapper service when the connection is initiated by the peer 2) Adding a hash list to store the remote address info 3) Adding functionality to add/remove the remote address info After the info has been provided to the port mapper client, it is removed from the hash list Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | iw_cxgb4: enforce qp/cq id requirementsHariprasad S2015-05-051-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the iw_cxgb4 implementation requires the qp and cq qid densities to match as well as the qp and cq id ranges. So fail a device open if the device configuration doesn't meet the requirements. The reason for these restictions has to do with the fact that IQ qid X has a UGTS register in the same bar2 page as EQ qid X. Thus both qids need to be allocated to the same user process for security reasons. The logic that does this (the qpid allocator in iw_cxgb4/resource.c) handles this but requires the above restrictions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQsHariprasad S2015-05-052-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For T5, we must not use the kdb/kgts registers, in order avoid db drops under extreme loads. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | iw_cxgb4: 32b platform fixesHariprasad S2015-05-055-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - get_dma_mr() was using ~0UL which is should be ~0ULL. This causes the DMA MR to get setup incorrectly in hardware. - wr_log_show() needed a 64b divide function div64_u64() instead of doing division directly. - fixed warnings about recasting a pointer to a u64 Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | iw_cxgb4: Cleanup register defines/MACROSHariprasad S2015-05-052-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup macros and register defines for consistency Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/CMA: Canonize IPv4 on IPV6 sockets properlyJason Gunthorpe2015-05-051-10/+17
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to canonize the address family to IPv4, but does not properly process the listening sockaddr to get the listening port, and does not properly set the address family of the canonized sockaddr. Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager") Cc: <stable@vger.kernel.org> Reported-By: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>