summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2016-12-12116-1523/+674
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...
| * x86/paravirt: Fix bool return type for PVOP_CALL()Peter Zijlstra2016-12-111-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") introduced a paravirt op with bool return type [*] It turns out that the PVOP_CALL*() macros miscompile when rettype is bool. Code that looked like: 83 ef 01 sub $0x1,%edi ff 15 32 a0 d8 00 callq *0xd8a032(%rip) # ffffffff81e28120 <pv_lock_ops+0x20> 84 c0 test %al,%al ended up looking like so after PVOP_CALL1() was applied: 83 ef 01 sub $0x1,%edi 48 63 ff movslq %edi,%rdi ff 14 25 20 81 e2 81 callq *0xffffffff81e28120 48 85 c0 test %rax,%rax Note how it tests the whole of %rax, even though a typical bool return function only sets %al, like: 0f 95 c0 setne %al c3 retq This is because ____PVOP_CALL() does: __ret = (rettype)__eax; and while regular integer type casts truncate the result, a cast to bool tests for any !0 value. Fix this by explicitly truncating to sizeof(rettype) before casting. [*] The actual bug should've been exposed in commit: 446f3dc8cc0a ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests") but that didn't properly implement the paravirt call. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/paravirt: Fix native_patch()Peter Zijlstra2016-12-112-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While chasing a regression I noticed we potentially patch the wrong code in native_patch(). If we do not select the native code sequence, we must use the default patcher, not fall-through the switch case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <xiaolong.ye@intel.com> Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2016-12-11446-1540/+3435
| |\ | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | locking/ww_mutex: Use relaxed atomicsPeter Zijlstra2016-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stamp is a sequence number, we don't care about memory ordering. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | locking/rtmutex: Explain locking rules for ↵Thomas Gleixner2016-12-021-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rt_mutex_proxy_unlock()/init_proxy_locked() While debugging the unlock vs. dequeue race which resulted in state corruption of futexes the lockless nature of rt_mutex_proxy_unlock() caused some confusion. Add commentry to explain why it is safe to do this lockless. Add matching comments to rt_mutex_init_proxy_locked() for completeness sake. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20161130210030.591941927@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALLThomas Gleixner2016-12-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a left over from the original rtmutex implementation which used both bit0 and bit1 in the owner pointer. Commit: 8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task get lock") ... removed the usage of bit1, but kept the extra mask around. This is confusing at best. Remove it and just use RT_MUTEX_HAS_WAITERS for the masking. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20161130210030.509567906@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | Merge branch 'locking/urgent' into locking/core, to pick up dependent fixesIngo Molnar2016-12-0238-150/+367
| |\ \ | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()Peter Zijlstra2016-11-229-27/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted() when a paravirt spinlock enabled kernel is ran on native hardware. Do this by patching out the CALL instruction with "XOR %RAX,%RAX" which has the same effect (0 return value). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/mutex: Break out of expensive busy-loop on ↵Pan Xinhui2016-11-222-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted An over-committed guest with more vCPUs than pCPUs has a heavy overload in the two spin_on_owner. This blames on the lock holder preemption issue. Break out of the loop if the vCPU is preempted: if vcpu_is_preempted(cpu) is true. test-case: perf record -a perf bench sched messaging -g 400 -p && perf report before patch: 20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner 8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock 4.12% sched-messaging [kernel.vmlinux] [k] system_call 3.01% sched-messaging [kernel.vmlinux] [k] system_call_common 2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7 2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 2.00% sched-messaging [kernel.vmlinux] [k] osq_lock after patch: 9.99% sched-messaging [kernel.vmlinux] [k] mutex_unlock 5.28% sched-messaging [unknown] [H] 0xc0000000000768e0 4.27% sched-messaging [kernel.vmlinux] [k] __copy_tofrom_user_power7 3.77% sched-messaging [kernel.vmlinux] [k] copypage_power7 3.24% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq 3.02% sched-messaging [kernel.vmlinux] [k] system_call 2.69% sched-messaging [kernel.vmlinux] [k] wait_consider_task Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-4-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU ↵Pan Xinhui2016-11-221-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in osq_lock() An over-committed guest with more vCPUs than pCPUs has a heavy overload in osq_lock(). This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for vCPU-A to run and unlock the osq lock. Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is currently running or not, and break out of the spin-loop if so. test case: $ perf record -a perf bench sched messaging -g 400 -p && perf report before patch: 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is 2.49% sched-messaging [kernel.vmlinux] [k] system_call after patch: 20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner 8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock 4.12% sched-messaging [kernel.vmlinux] [k] system_call 3.01% sched-messaging [kernel.vmlinux] [k] system_call_common 2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7 2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 2.00% sched-messaging [kernel.vmlinux] [k] osq_lock Suggested-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Translated to English. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Documentation/virtual/kvm: Support the vCPU preemption checkPan Xinhui2016-11-221-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ("x86/kvm: support vCPU preemption check") added a new struct kvm_steal_time::preempted field. This field tells us if a vCPU is running or not. It is zero if some old KVM does not support this field or if the vCPU is not preempted. Other values means the vCPU has been preempted. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-12-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Various typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/xen: Support the vCPU preemption checkJuergen Gross2016-11-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the vcpu_is_preempted() functionality under Xen. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. A quick test (4 vCPUs on 1 physical CPU doing a parallel build job with "make -j 8") reduced system time by about 5% with this patch. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-11-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/kvm: Support the vCPU preemption checkPan Xinhui2016-11-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the vcpu_is_preempted() functionality under KVM. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. struct kvm_steal_time::preempted indicates that if one vCPU is running or not after commit "x86, kvm/x86.c: support vCPU preempted check". unix benchmark result: host: kernel 4.8.1, i5-4570, 4 cpus guest: kernel 4.8.1, 8 vcpus test-case after-patch before-patch Execl Throughput | 18307.9 lps | 11701.6 lps File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps Pipe Throughput | 11872208.7 lps | 11855628.9 lps Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps Process Creation | 29881.2 lps | 28572.8 lps Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm System Call Overhead | 10385653.0 lps | 10419979.0 lps Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-10-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/kvm: Support the vCPU preemption checkPan Xinhui2016-11-222-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the vcpu_is_preempted() functionality under KVM. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. Use struct kvm_steal_time::preempted to indicate that if a vCPU is running or not. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-9-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | kvm: Introduce kvm_write_guest_offset_cached()Pan Xinhui2016-11-222-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen ↵Pan Xinhui2016-11-223-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guests Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu) function on KVM and Xen platforms. Extend the pv_lock_ops interface accordingly and implement the callbacks on KVM and Xen. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Translated to English. ] Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/spinlocks, s390: Implement vcpu_is_preempted(cpu)Christian Borntraeger2016-11-223-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the s390 version for vcpu_is_preempted(cpu), by reworking the existing smp_vcpu_scheduled() function into arch_vcpu_is_preempted(). We can then also get rid of the local cpu_is_preempted() function by moving the CIF_ENABLED_WAIT test into arch_vcpu_is_preempted(). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-6-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/core, powerpc: Implement vcpu_is_preempted(cpu)Pan Xinhui2016-11-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu) function on pSeries. We do not support PowerNV. All this can be achieved by using lppaca->yield_count, which is zero on PowerNV. Suggested-by: Boqun Feng <boqun.feng@gmail.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-5-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | sched/core: Introduce the vcpu_is_preempted(cpu) interfacePan Xinhui2016-11-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the first step to add support to improve lock holder preemption beaviour. vcpu_is_preempted(cpu) does the obvious thing: it tells us whether a vCPU is preempted or not. Defaults to false on architectures that don't support it. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Translated the changelog to English. ] Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-2-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2016-11-22575-3292/+6532
| |\ \ \ | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_QWaiman Long2016-11-217-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the wake_q data structure is defined by the WAKE_Q() macro. This macro, however, looks like a function doing something as "wake" is a verb. Even checkpatch.pl was confused as it reported warnings like WARNING: Missing a blank line after declarations #548: FILE: kernel/futex.c:3665: + int ret; + WAKE_Q(wake_q); This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies what the macro is doing and eliminates the checkpatch.pl warnings. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com [ Resolved conflict and added missing rename. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core: Provide common cpu_relax_yield() definitionChristian Borntraeger2016-11-1733-38/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to duplicate the same define everywhere. Since the only user is stop-machine and the only provider is s390, we can use a default implementation of cpu_relax_yield() in sched.h. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-s390 <linux-s390@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarilyIngo Molnar2016-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until the DRM drivers are fixed to not use mutex_trylock_recursive(), allyes/modconfig builds will emit an API deprecation warning: drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘i915_gem_shrinker_lock’: drivers/gpu/drm/i915/i915_gem_shrinker.c:230:2: warning: ‘mutex_trylock_recursive’ is deprecated [-Wdeprecated-declarations] switch (mutex_trylock_recursive(&dev->struct_mutex)) { ^ Don't pollute the kernel log until the DRM code is fixed. Hopefully the checkpatch warning is enough to keep people from using this new API, and we'll be NAK-ing new users as well. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Jason Low <jason.low2@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Clark <robdclark@gmail.com> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <Will.Deacon@arm.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core, arch: Remove cpu_relax_lowlatency()Christian Borntraeger2016-11-1632-33/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As there are no users left, we can remove cpu_relax_lowlatency() implementations from every architecture. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Cc: <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/1477386195-32736-6-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core: Remove cpu_relax_lowlatency() usersChristian Borntraeger2016-11-168-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them with cpu_relax(). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core, s390: Make cpu_relax() a barrier againChristian Borntraeger2016-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_machine() seemed to be the only important place for yielding during cpu_relax(). This was fixed by using cpu_relax_yield(). Therefore, we can now redefine cpu_relax() to be a barrier instead on s390, making s390 identical to all other architectures. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-4-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core, stop_machine: Yield the CPU during stop machine()Christian Borntraeger2016-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time ago the following commit: 57f2ffe14fd125c2 ("s390: remove diag 44 calls from cpu_relax()") ... stopped cpu_relax() on s390 yielding to the hypervisor. As it turns out this made stop_machine() run really slow on virtualized overcommited systems. For example the kprobes test during bootup took several seconds instead of just running unnoticed with large guests. Therefore, yielding was reintroduced with commit: 4d92f50249eb ("s390: reintroduce diag 44 calls for cpu_relax()") ... but in fact the stop machine code seems to be the only place where this yielding was really necessary. This place is probably the most important one as it makes all but one guest CPUs wait for one guest CPU. As we now have cpu_relax_yield(), we can use this in multi_cpu_stop(). For now lets only add it here. We can add it later in other places when necessary. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core: Introduce cpu_relax_yield()Christian Borntraeger2016-11-1633-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For spinning loops people do often use barrier() or cpu_relax(). For most architectures cpu_relax and barrier are the same, but on some architectures cpu_relax can add some latency. For example on power,sparc64 and arc, cpu_relax can shift the CPU towards other hardware threads in an SMT environment. On s390 cpu_relax does even more, it uses an hypercall to the hypervisor to give up the timeslice. In contrast to the SMT yielding this can result in larger latencies. In some places this latency is unwanted, so another variant "cpu_relax_lowlatency" was introduced. Before this is used in more and more places, lets revert the logic and provide a cpu_relax_yield that can be called in places where yielding is more important than latency. By default this is the same as cpu_relax on all architectures. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-2-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/mutex, drm: Introduce mutex_trylock_recursive()Peter Zijlstra2016-11-154-7/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By popular DRM demand, introduce mutex_trylock_recursive() to fix up the two GEM users. Without this it is very easy for these drivers to get stuck in low-memory situations and trigger OOM. Work is in progress to remove the need for this in at least i915. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Jason Low <jason.low2@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Clark <robdclark@gmail.com> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <Will.Deacon@arm.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/lockdep: Remove unused parameter from the add_lock_to_list() functionTahsin Erdogan2016-11-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'class' parameter is not used, remove it. n Signed-off-by: Tahsin Erdogan <tahsin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1478592127-4376-1-git-send-email-tahsin@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2016-11-11746-5823/+8678
| |\ \ \ \ | | | | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/drm: Fix i915_gem_shrinker_lock() lockingIngo Molnar2016-11-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Krinkin reported hangs in the DRM code and bisected it to: 3ab7c086d5ec72585ef0 ("locking/drm: Kill mutex trickery") Hugh Dickins observed: "i915_gem_shrinker_lock() is broken: but copy the pattern from msm_gem_shrinker_lock() and it's okay - patch below." Pick up the fix in isolation to make sure the bug is fixed, cleanup patch will follow up. Originally-From: Hugh Dickins <hughd@google.com> Reported-by: Hugh Dickins <hughd@google.com> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: chris@chris-wilson.co.uk Cc: jason.low2@hpe.com Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1610301645180.28429@eggly.anvils Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Enable optimistic spinning of woken waiterWaiman Long2016-10-251-23/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the waiter that sets the HANDOFF flag start spinning instead of sleeping until the handoff is complete or the owner sleeps. Otherwise, the handoff will cause the optimistic spinners to abort spinning as the handed-off owner may not be running. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <Will.Deacon@arm.com> Link: http://lkml.kernel.org/r/1472254509-27508-2-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Simplify some ww_mutex code in __mutex_lock_common()Waiman Long2016-10-251-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes some of the redundant ww_mutex code in __mutex_lock_common(). Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <Will.Deacon@arm.com> Link: http://lkml.kernel.org/r/1472254509-27508-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Restructure wait loopPeter Zijlstra2016-10-251-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doesn't really matter yet, but pull the HANDOFF and trylock out from under the wait_lock. The intention is to add an optimistic spin loop here, which requires we do not hold the wait_lock, so shuffle code around in preparation. Also clarify the purpose of taking the wait_lock in the wait loop, its tempting to want to avoid it altogether, but the cancellation cases need to to avoid losing wakeups. Suggested-by: Waiman Long <waiman.long@hpe.com> Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Add lock handoff to avoid starvationPeter Zijlstra2016-10-251-23/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement lock handoff to avoid lock starvation. Lock starvation is possible because mutex_lock() allows lock stealing, where a running (or optimistic spinning) task beats the woken waiter to the acquire. Lock stealing is an important performance optimization because waiting for a waiter to wake up and get runtime can take a significant time, during which everyboy would stall on the lock. The down-side is of course that it allows for starvation. This patch has the waiter requesting a handoff if it fails to acquire the lock upon waking. This re-introduces some of the wait time, because once we do a handoff we have to wait for the waiter to wake up again. A future patch will add a round of optimistic spinning to attempt to alleviate this penalty, but if that turns out to not be enough, we can add a counter and only request handoff after multiple failed wakeups. There are a few tricky implementation details: - accepting a handoff must only be done in the wait-loop. Since the handoff condition is owner == current, it can easily cause recursive locking trouble. - accepting the handoff must be careful to provide the ACQUIRE semantics. - having the HANDOFF bit set on unlock requires care, we must not clear the owner. - we must be careful to not leave HANDOFF set after we've acquired the lock. The tricky scenario is setting the HANDOFF bit on an unlocked mutex. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXESPeter Zijlstra2016-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that mutex::count and mutex::owner are the same field, we can allow SPIN_ON_OWNER while DEBUG_MUTEX. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Kill arch specific codePeter Zijlstra2016-10-2538-1026/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its all generic atomic_long_t stuff now. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/mutex: Rework mutex::ownerPeter Zijlstra2016-10-257-305/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mutex implementation has an atomic lock word and a non-atomic owner field. This disparity leads to a number of issues with the current mutex code as it means that we can have a locked mutex without an explicit owner (because the owner field has not been set, or already cleared). This leads to a number of weird corner cases, esp. between the optimistic spinning and debug code. Where the optimistic spinning code needs the owner field updated inside the lock region, the debug code is more relaxed because the whole lock is serialized by the wait_lock. Also, the spinning code itself has a few corner cases where we need to deal with a held lock without an owner field. Furthermore, it becomes even more of a problem when trying to fix starvation cases in the current code. We end up stacking special case on special case. To solve this rework the basic mutex implementation to be a single atomic word that contains the owner and uses the low bits for extra state. This matches how PI futexes and rt_mutex already work. By having the owner an integral part of the lock state a lot of the problems dissapear and we get a better option to deal with starvation cases, direct owner handoff. Changing the basic mutex does however invalidate all the arch specific mutex code; this patch leaves that unused in-place, a later patch will remove that. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | locking/drm: Kill mutex trickeryPeter Zijlstra2016-10-252-39/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Poking at lock internals is not cool. Since I'm going to change the implementation this will break, take it out. Tested-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Clark <robdclark@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | Merge branch 'efi-core-for-linus' of ↵Linus Torvalds2016-12-1224-62/+872
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this development cycle were: - Implement EFI dev path parser and other changes to fully support thunderbolt devices on Apple Macbooks (Lukas Wunner) - Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel) - Expose EFI framebuffer configuration to user-space, to improve tooling (Peter Jones) - Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan Carpenter, Roy Franz)" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit thunderbolt: Compile on x86 only thunderbolt, efi: Fix Kconfig dependencies harder thunderbolt, efi: Fix Kconfig dependencies thunderbolt: Use Device ROM retrieved from EFI x86/efi: Retrieve and assign Apple device properties efi: Allow bitness-agnostic protocol calls efi: Add device path parser efi/arm*/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table efi/libstub: Add random.c to ARM build efi: Add support for seeding the RNG from a UEFI config table MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi/libstub: Fix allocation size calculations efi/efivar_ssdt_load: Don't return success on allocation failure efifb: Show framebuffer layout as device attributes efi/efi_test: Use memdup_user() as a cleanup efi/efi_test: Fix uninitialized variable 'rv' efi/efi_test: Fix uninitialized variable 'datasize' efi/arm*: Fix efi_init() error handling efi: Remove unused include of <linux/version.h>
| * | | | | | efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bitArd Biesheuvel2016-11-251-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UEFI stub executes in the context of the firmware, which identity maps the available system RAM, which implies that only memory below 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE capable hardware. So ignore any reported memory above 4 GB in efi_random_alloc(). This also fixes a reported build problem on ARM under -Os, where the 64-bit logical shift relies on a software routine that the ARM decompressor does not provide. A second [minor] issue is also fixed, where the '+ 1' is moved out of the shift, where it belongs: the reason for its presence is that a memory region where start == end should count as a single slot, given that 'end' takes the desired size and alignment of the allocation into account. To clarify the code in this regard, rename start/end to 'first_slot' and 'last_slot', respectively, and introduce 'region_end' to describe the last usable address of the current region. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | thunderbolt: Compile on x86 onlyLukas Wunner2016-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far Thunderbolt is (unfortunately) an Intel proprietary technology that is only available on x86, so compiling on other arches is pointless except for testing purposes. Amend Kconfig accordingly. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/7dfda728d3ee8a33c80c49b224da7359c6015eea.1479456179.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | thunderbolt, efi: Fix Kconfig dependencies harderLukas Wunner2016-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit c9cc3aaa0281 ("thunderbolt: Use Device ROM retrieved from EFI"), the THUNDERBOLT config option selects APPLE_PROPERTIES. This broke the build for certain configs because APPLE_PROPERTIES is located in a menu which depends on EFI: If EFI is not enabled, the prerequisites needed for APPLE_PROPERTIES are not selected: Those are EFI_DEV_PATH_PARSER and UCS2_STRING. Additionally EFI_DEV_PATH_PARSER won't compile unless ACPI is enabled. Commit 79f9cd35b05e ("thunderbolt, efi: Fix Kconfig dependencies") sought to fix the breakage by making THUNDERBOLT select APPLE_PROPERTIES only if EFI_STUB is enabled. On x86, EFI_STUB depends on EFI and EFI depends on ACPI, so this fixed the build at least on this architecture. However on arm and arm64, EFI_STUB does not depend on EFI, so once again the prerequisites needed for APPLE_PROPERTIES are not selected. Additionally ACPI is not available on arm and optional on arm64, therefore EFI_DEV_PATH_PARSER won't compile. Fix by selecting APPLE_PROPERTIES only on x86. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/5c241cf92eb1dc2421218c1204c6a9d22c9f847b.1479456179.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | thunderbolt, efi: Fix Kconfig dependenciesLukas Wunner2016-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this EFI build failure on certain (rand)configs: drivers/firmware/efi/apple-properties.c:149:9: error: implicit declaration of function ???efi_get_device_by_path??? [-Werror=implicit-function-declaration] which is due to: warning: (THUNDERBOLT) selects APPLE_PROPERTIES which has unmet direct dependencies (EFI && EFI_STUB && X86) Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Pedro Vilaça <reverser@put.as> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3] Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161114151033.GA10141@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | thunderbolt: Use Device ROM retrieved from EFILukas Wunner2016-11-133-1/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Macs with Thunderbolt 1 do not have a unit-specific DROM: The DROM is empty with uid 0x1000000000000. (Apple started factory-burning a unit- specific DROM with Thunderbolt 2.) Instead, the NHI EFI driver supplies a DROM in a device property. Use it if available. It's only available when booting with the efistub. If it's not available, silently fall back to our hardcoded DROM. The size of the DROM is always 256 bytes. The number is hardcoded into the NHI EFI driver. This commit can deal with an arbitrary size however, just in case they ever change that. Background information: The EFI firmware volume contains ROM files for the NHI, GMUX and several other chips as well as key material. This strategy allows Apple to deploy ROM or key updates by simply publishing an EFI firmware update on their website. Drivers do not access those files directly but rather through a file server via EFI protocol AC5E4829-A8FD-440B-AF33-9FFE013B12D8. Files are identified by GUID, the NHI DROM has 339370BD-CFC6-4454-8EF7-704653120818. The NHI EFI driver amends that file with a unit-specific uid. The uid has 64 bit but its entropy is much lower: 24 bit represent the model, 24 bit are taken from a serial number, 16 bit are fixed. The NHI EFI driver obtains the serial number via the DataHub protocol, copies it into the DROM, calculates the CRC and submits the result as a device property. A modification is needed in the resume code where we currently read the uid of all switches in the hierarchy to detect plug events that occurred during sleep. On Thunderbolt 1 root switches this will now lead to a mismatch between the uid of the empty DROM and the EFI DROM. Exempt the root switch from this check: It's built in, so the uid should never change. However we continue to *read* the uid of the root switch, this seems like a good way to test its reachability after resume. Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1] Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3] Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Vilaça <reverser@put.as> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112213237.8804-10-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | x86/efi: Retrieve and assign Apple device propertiesLukas Wunner2016-11-137-0/+350
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apple's EFI drivers supply device properties which are needed to support Macs optimally. They contain vital information which cannot be obtained any other way (e.g. Thunderbolt Device ROM). They're also used to convey the current device state so that OS drivers can pick up where EFI drivers left (e.g. GPU mode setting). There's an EFI driver dubbed "AAPL,PathProperties" which implements a per-device key/value store. Other EFI drivers populate it using a custom protocol. The macOS bootloader /System/Library/CoreServices/boot.efi retrieves the properties with the same protocol. The kernel extension AppleACPIPlatform.kext subsequently merges them into the I/O Kit registry (see ioreg(8)) where they can be queried by other kernel extensions and user space. This commit extends the efistub to retrieve the device properties before ExitBootServices is called. It assigns them to devices in an fs_initcall so that they can be queried with the API in <linux/property.h>. Note that the device properties will only be available if the kernel is booted with the efistub. Distros should adjust their installers to always use the efistub on Macs. grub with the "linux" directive will not work unless the functionality of this commit is duplicated in grub. (The "linuxefi" directive should work but is not included upstream as of this writing.) The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and looks like this: typedef struct { unsigned long version; /* 0x10000 */ efi_status_t (*get) ( IN struct apple_properties_protocol *this, IN struct efi_dev_path *device, IN efi_char16_t *property_name, OUT void *buffer, IN OUT u32 *buffer_len); /* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */ efi_status_t (*set) ( IN struct apple_properties_protocol *this, IN struct efi_dev_path *device, IN efi_char16_t *property_name, IN void *property_value, IN u32 property_value_len); /* allocates copies of property name and value */ /* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */ efi_status_t (*del) ( IN struct apple_properties_protocol *this, IN struct efi_dev_path *device, IN efi_char16_t *property_name); /* EFI_SUCCESS, EFI_NOT_FOUND */ efi_status_t (*get_all) ( IN struct apple_properties_protocol *this, OUT void *buffer, IN OUT u32 *buffer_len); /* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */ } apple_properties_protocol; Thanks to Pedro Vilaça for this blog post which was helpful in reverse engineering Apple's EFI drivers and bootloader: https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/ If someone at Apple is reading this, please note there's a memory leak in your implementation of the del() function as the property struct is freed but the name and value allocations are not. Neither the macOS bootloader nor Apple's EFI drivers check the protocol version, but we do to avoid breakage if it's ever changed. It's been the same since at least OS X 10.6 (2009). The get_all() function conveniently fills a buffer with all properties in marshalled form which can be passed to the kernel as a setup_data payload. The number of device properties is dynamic and can change between a first invocation of get_all() (to determine the buffer size) and a second invocation (to retrieve the actual buffer), hence the peculiar loop which does not finish until the buffer size settles. The macOS bootloader does the same. The setup_data payload is later on unmarshalled in an fs_initcall. The idea is that most buses instantiate devices in "subsys" initcall level and drivers are usually bound to these devices in "device" initcall level, so we assign the properties in-between, i.e. in "fs" initcall level. This assumes that devices to which properties pertain are instantiated from a "subsys" initcall or earlier. That should always be the case since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only supports ACPI and PCI nodes and we've fully scanned those buses during "subsys" initcall level. The second assumption is that properties are only needed from a "device" initcall or later. Seems reasonable to me, but should this ever not work out, an alternative approach would be to store the property sets e.g. in a btree early during boot. Then whenever device_add() is called, an EFI Device Path would have to be constructed for the newly added device, and looked up in the btree. That way, the property set could be assigned to the device immediately on instantiation. And this would also work for devices instantiated in a deferred fashion. It seems like this approach would be more complicated and require more code. That doesn't seem justified without a specific use case. For comparison, the strategy on macOS is to assign properties to objects in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()). That approach is definitely wrong as it fails for devices not present in the namespace: The NHI EFI driver supplies properties for attached Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device level behind the host controller is described in the namespace. Consequently macOS cannot assign properties for chained devices. With Thunderbolt 2 they started to describe three device levels behind host controllers in the namespace but this grossly inflates the SSDT and still fails if the user daisy-chained more than three devices. We copy the property names and values from the setup_data payload to swappable virtual memory and afterwards make the payload available to the page allocator. This is just for the sake of good housekeeping, it wouldn't occupy a meaningful amount of physical memory (4444 bytes on my machine). Only the payload is freed, not the setup_data header since otherwise we'd break the list linkage and we cannot safely update the predecessor's ->next link because there's no locking for the list. The payload is currently not passed on to kexec'ed kernels, same for PCI ROMs retrieved by setup_efi_pci(). This can be added later if there is demand by amending setup_efi_state(). The payload can then no longer be made available to the page allocator of course. Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1] Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3] Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Vilaça <reverser@put.as> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: grub-devel@gnu.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | efi: Allow bitness-agnostic protocol callsLukas Wunner2016-11-133-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already have a macro to invoke boot services which on x86 adapts automatically to the bitness of the EFI firmware: efi_call_early(). The macro allows sharing of functions across arches and bitness variants as long as those functions only call boot services. However in practice functions in the EFI stub contain a mix of boot services calls and protocol calls. Add an efi_call_proto() macro for bitness-agnostic protocol calls to allow sharing more code across arches as well as deduplicating 32 bit and 64 bit code paths. On x86, implement it using a new efi_table_attr() macro for bitness- agnostic table lookups. Refactor efi_call_early() to make use of the same macro. (The resulting object code remains identical.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112213237.8804-8-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | efi: Add device path parserLukas Wunner2016-11-134-0/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to extended the efistub to retrieve device properties from EFI on Apple Macs. The properties use EFI Device Paths to indicate the device they belong to. This commit adds a parser which, given an EFI Device Path, locates the corresponding struct device and returns a reference to it. Initially only ACPI and PCI Device Path nodes are supported, these are the only types needed for Apple device properties (the corresponding macOS function AppleACPIPlatformExpert::matchEFIDevicePath() does not support any others). Further node types can be added with little to moderate effort. Apple device properties is currently the only use case of this parser, but Peter Jones intends to use it to match up devices with the ConInDev/ConOutDev/ErrOutDev variables and add sysfs attributes to these devices to say the hardware supports using them as console. Thus, make this parser a separate component which can be selected with config option EFI_DEV_PATH_PARSER. It can in principle be compiled as a module if acpi_get_first_physical_node() and acpi_bus_type are exported (and efi_get_device_by_path() itself is exported). The dependency on CONFIG_ACPI is needed for acpi_match_device_ids(). It can be removed if an empty inline stub is added for that function. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112213237.8804-7-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>