| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
Two commits which fix host crashes.
Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB
ioctl() to change HPT size", 2016-12-20) changed the behaviour of
the KVM_PPC_ALLOCATE_HTAB ioctl so that it now allocates a new HPT
and new revmap array if there was a previously-allocated HPT of a
different size from the size being requested. In this case, we need
to reset the rmap arrays of the memslots, because the rmap arrays
will contain references to HPTEs which are no longer valid. Worse,
these references are also references to slots in the new revmap
array (which parallels the HPT), and the new revmap array contains
random contents, since it doesn't get zeroed on allocation.
The effect of having these stale references to slots in the revmap
array that contain random contents is that subsequent calls to
functions such as kvmppc_add_revmap_chain will crash because they
will interpret the non-zero contents of the revmap array as HPTE
indexes and thus index outside of the revmap array. This leads to
host crashes such as the following.
[ 7072.862122] Unable to handle kernel paging request for data at address 0xd000000c250c00f8
[ 7072.862218] Faulting instruction address: 0xc0000000000e1c78
[ 7072.862233] Oops: Kernel access of bad area, sig: 11 [#1]
[ 7072.862286] SMP NR_CPUS=1024
[ 7072.862286] NUMA
[ 7072.862325] PowerNV
[ 7072.862378] Modules linked in: kvm_hv vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 mlx5_ib ib_core ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel i2c_opal nfsd auth_rpcgss oid_registry
[ 7072.863085] nfs_acl lockd grace sunrpc kvm_pr kvm xfs libcrc32c scsi_dh_alua dm_service_time radeon lpfc nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper ptp pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm dm_multipath i2c_core cxgb3 mlx5_core mdio [last unloaded: kvm_hv]
[ 7072.863381] CPU: 72 PID: 56929 Comm: qemu-system-ppc Not tainted 4.12.0-kvm+ #59
[ 7072.863457] task: c000000fe29e7600 task.stack: c000001e3ffec000
[ 7072.863520] NIP: c0000000000e1c78 LR: c0000000000e2e3c CTR: c0000000000e25f0
[ 7072.863596] REGS: c000001e3ffef560 TRAP: 0300 Not tainted (4.12.0-kvm+)
[ 7072.863658] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>
[ 7072.863667] CR: 44082882 XER: 20000000
[ 7072.863767] CFAR: c0000000000e2e38 DAR: d000000c250c00f8 DSISR: 42000000 SOFTE: 1
GPR00: c0000000000e2e3c c000001e3ffef7e0 c000000001407d00 d000000c250c00f0
GPR04: d00000006509fb70 d00000000b3d2048 0000000003ffdfb7 0000000000000000
GPR08: 00000001007fdfb7 00000000c000000f d0000000250c0000 000000000070f7bf
GPR12: 0000000000000008 c00000000fdad000 0000000010879478 00000000105a0d78
GPR16: 00007ffaf4080000 0000000000001190 0000000000000000 0000000000010000
GPR20: 4001ffffff000415 d00000006509fb70 0000000004091190 0000000ee1881190
GPR24: 0000000003ffdfb7 0000000003ffdfb7 00000000007fdfb7 c000000f5c958000
GPR28: d00000002d09fb70 0000000003ffdfb7 d00000006509fb70 d00000000b3d2048
[ 7072.864439] NIP [c0000000000e1c78] kvmppc_add_revmap_chain+0x88/0x130
[ 7072.864503] LR [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
[ 7072.864566] Call Trace:
[ 7072.864594] [c000001e3ffef7e0] [c000001e3ffef830] 0xc000001e3ffef830 (unreliable)
[ 7072.864671] [c000001e3ffef830] [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
[ 7072.864751] [c000001e3ffef920] [d00000000b38d878] kvmppc_map_vrma+0x168/0x200 [kvm_hv]
[ 7072.864831] [c000001e3ffef9e0] [d00000000b38a684] kvmppc_vcpu_run_hv+0x1284/0x1300 [kvm_hv]
[ 7072.864914] [c000001e3ffefb30] [d00000000f465664] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 7072.865008] [c000001e3ffefb60] [d00000000f461864] kvm_arch_vcpu_ioctl_run+0x114/0x290 [kvm]
[ 7072.865152] [c000001e3ffefbe0] [d00000000f453c98] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
[ 7072.865292] [c000001e3ffefd40] [c000000000389328] do_vfs_ioctl+0xd8/0x8c0
[ 7072.865410] [c000001e3ffefde0] [c000000000389be4] SyS_ioctl+0xd4/0x130
[ 7072.865526] [c000001e3ffefe30] [c00000000000b760] system_call+0x58/0x6c
[ 7072.865644] Instruction dump:
[ 7072.865715] e95b2110 793a0020 7b4926e4 7f8a4a14 409e0098 807c000c 786326e4 7c6a1a14
[ 7072.865857] 935e0008 7bbd0020 813c000c 913e000c <93a30008> 93bc000c 48000038 60000000
[ 7072.866001] ---[ end trace 627b6e4bf8080edc ]---
Note that to trigger this, it is necessary to use a recent upstream
QEMU (or other userspace that resizes the HPT at CAS time), specify
a maximum memory size substantially larger than the current memory
size, and boot a guest kernel that does not support HPT resizing.
This fixes the problem by resetting the rmap arrays when the old HPT
is freed.
Fixes: f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
Cc: stable@vger.kernel.org # v4.11+
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state
properly", 2017-06-15) added code to read transactional memory (TM)
registers but forgot to enable TM before doing so. The result is
that if userspace does have live values in the TM registers, a KVM_RUN
ioctl will cause a host kernel crash like this:
[ 181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980
[ 181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1]
[ 181.328613] SMP NR_CPUS=2048
[ 181.328613] NUMA
[ 181.328618] PowerNV
[ 181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs
+fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
+nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables
+ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic
+auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core
+powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod
+lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[ 181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1
[ 181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000
[ 181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0
[ 181.329465] REGS: c000003fe4d837e0 TRAP: 0f60 Not tainted (4.12.0+)
[ 181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 181.329527] CR: 24022448 XER: 00000000
[ 181.329608] CFAR: d00000001e773818 SOFTE: 1
[ 181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000
[ 181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800
[ 181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880
[ 181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090
[ 181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028
[ 181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000
[ 181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000
[ 181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000
[ 181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv]
[ 181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[ 181.330322] Call Trace:
[ 181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable)
[ 181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[ 181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
[ 181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
[ 181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0
[ 181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120
[ 181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c
[ 181.330833] Instruction dump:
[ 181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108
[ 181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6> f9490ff0 e92d0260 7d4122a6
[ 181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]---
The fix is just to turn on the TM bit in the MSR before accessing the
registers.
Cc: stable@vger.kernel.org # v3.14+
Fixes: 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly")
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Preempt can occur in the preemption timer expiration handler:
CPU0 CPU1
preemption timer vmexit
handle_preemption_timer(vCPU0)
kvm_lapic_expired_hv_timer
hv_timer_is_use == true
sched_out
sched_in
kvm_arch_vcpu_load
kvm_lapic_restart_hv_timer
restart_apic_timer
start_hv_timer
already-expired timer or sw timer triggerd in the window
start_sw_timer
cancel_hv_timer
/* back in kvm_lapic_expired_hv_timer */
cancel_hv_timer
WARN_ON(!apic->lapic_timer.hv_timer_in_use); ==> Oops
This can be reproduced if CONFIG_PREEMPT is enabled.
------------[ cut here ]------------
WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G OE 4.13.0-rc2+ #16
RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
Call Trace:
handle_preemption_timer+0xe/0x20 [kvm_intel]
vmx_handle_exit+0xb8/0xd70 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
? kvm_arch_vcpu_load+0x47/0x230 [kvm]
? kvm_arch_vcpu_load+0x62/0x230 [kvm]
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x81/0x220
entry_SYSCALL64_slow_path+0x25/0x25
------------[ cut here ]------------
WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc2+ #16
RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
Call Trace:
kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
handle_preemption_timer+0xe/0x20 [kvm_intel]
vmx_handle_exit+0xb8/0xd70 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
? kvm_arch_vcpu_load+0x47/0x230 [kvm]
? kvm_arch_vcpu_load+0x62/0x230 [kvm]
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x81/0x220
entry_SYSCALL64_slow_path+0x25/0x25
This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
and start_sw_timer be in preemption-disabled regions, which trivially
avoid any reentrancy issue with preempt notifier.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Add more WARNs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: fixup missing srcu lock
We need to hold the srcu lock when accessing memory slots
during migration
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The following warning was triggered by missing srcu locks around
the storage key handling functions.
=============================
WARNING: suspicious RCU usage
4.12.0+ #56 Not tainted
-----------------------------
./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by live_migration/4936:
#0: (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0
CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
Hardware name: IBM 2964 NC9 704 (LPAR)
Call Trace:
([<000000000011378a>] show_stack+0xea/0xf0)
[<000000000055cc4c>] dump_stack+0x94/0xd8
[<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
[<0000000000130b76>] gfn_to_hva+0x2e/0x48
[<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
[<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
[<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
[<000000000037e984>] SyS_ioctl+0xa4/0xb8
[<00000000008b20a4>] system_call+0xc4/0x27c
1 lock held by live_migration/4936:
#0: (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:
Before NMI IRET test
Sending NMI to self
NMI isr running stack 0x461000
Sending nested NMI to self
After nested NMI to self
Nested NMI isr running rip=40038e
After iret
After NMI to self
FAIL: NMI
Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately
for each VMCS) tracks NMI blocking state separately for vmcs01 and
vmcs02. However it is not enough:
- The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
on IRET, so the L2 can generate #PF which can be intercepted by L0.
- L0 walks L1's guest page table and sees the mapping is invalid, it
resumes the L1 guest and injects the #PF into L1. At this point the
vmcs02 has nmi_known_unmasked=true.
- L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
- L1 executes VMRESUME to resume L2, causing a vmexit to L0
- during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
interruptibility-state field of vmcs02, but nmi_known_unmasked is
still true.
- L2 immediately exits to L0 with another page fault, because L0 still has
not updated the NGVA->HPA page tables. However, nmi_known_unmasked is
true so vmx_recover_nmi_blocking does not do anything.
The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The PI vector for L0 and L1 must be different. If dest vcpu0
is in guest mode while vcpu1 is delivering a non-nested PI to
vcpu0, there wont't be any vmexit so that the non-nested interrupt
will be delayed.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We are using the same vector for nested/non-nested posted
interrupts delivery, this may cause interrupts latency in
L1 since we can't kick the L2 vcpu out of vmx-nonroot mode.
This patch introduces a new vector which is only for nested
posted interrupts to solve the problems above.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This reverts the change of commit f85c758dbee54cc3612a6e873ef7cecdb66ebee5,
as the behavior it modified was intended.
The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual
says:
Table 4-7. Use of CR3 with PAE Paging
Bit Position(s) Contents
4:0 Ignored
31:5 Physical address of the 32-Byte aligned
page-directory-pointer table used for linear-address
translation
63:32 Ignored (these bits exist only on processors supporting
the Intel-64 architecture)
To placate the static checker, write the mask explicitly as an
unsigned long constant instead of using a 32-bit unsigned constant.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: f85c758dbee54cc3612a6e873ef7cecdb66ebee5
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | |/
| |/|
| | |
| | | |
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'd been collecting these whilst we debugged a CPU hotplug failure,
but we ended up diagnosing that one to tglx, who has taken a fix via
the -tip tree separately.
We're seeing some NFS issues that we haven't gotten to the bottom of
yet, and we've uncovered some issues with our backtracing too so there
might be another fixes pull before we're done.
Summary:
- Ensure we have a guard page after the kernel image in vmalloc
- Fix incorrect prefetch stride in copy_page
- Ensure irqs are disabled in die()
- Fix for event group validation in QCOM L2 PMU driver
- Fix requesting of PMU IRQs on AMD Seattle
- Minor cleanups and fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mmu: Place guard page after mapping of kernel image
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
perf: qcom_l2: fix column exclusion check
arm64/lib: copy_page: use consistent prefetch stride
arm64/numa: Drop duplicate message
perf: Convert to using %pOF instead of full_name
arm64: Convert to using %pOF instead of full_name
arm64: traps: disable irq in die()
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
arm64: uaccess: Remove redundant __force from addr cast in __range_ok
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.
This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.
This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.
This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.
Fixes: 3cf7ee98b848 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
write_sysreg() may misparse the value argument because it is used
without parentheses to protect it.
This patch adds the ( ) in order to avoid any surprises.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[will: same change to write_sysreg_s]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The optional prefetch instructions in the copy_page() routine are
inconsistent: at the start of the function, two cachelines are
prefetched beyond the one being loaded in the first iteration, but
in the loop, the prefetch is one more line ahead. This appears to
be unintentional, so let's fix it.
While at it, fix the comment style and white space.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When booting linux on a system without CONFIG_NUMA enabled, the
following messages are printed during boot -
NUMA: Faking a node at [mem 0x0000000000000000-0x00000083ffffffff]
NUMA: Adding memblock [0x8000000000 - 0x8000e7ffff] on node 0
NUMA: Adding memblock [0x8000e80000 - 0x83f65cffff] on node 0
NUMA: Adding memblock [0x83f65d0000 - 0x83f665ffff] on node 0
NUMA: Adding memblock [0x83f6660000 - 0x83f676ffff] on node 0
NUMA: Adding memblock [0x83f6770000 - 0x83f678ffff] on node 0
NUMA: Adding memblock [0x83f6790000 - 0x83fb82ffff] on node 0
NUMA: Adding memblock [0x83fb830000 - 0x83fbc0ffff] on node 0
NUMA: Adding memblock [0x83fbc10000 - 0x83fbdfffff] on node 0
NUMA: Adding memblock [0x83fbe00000 - 0x83fbffffff] on node 0
NUMA: Adding memblock [0x83fc000000 - 0x83fffbffff] on node 0
NUMA: Adding memblock [0x83fffc0000 - 0x83fffdffff] on node 0
NUMA: Adding memblock [0x83fffe0000 - 0x83ffffffff] on node 0
NUMA: Initmem setup node 0 [mem 0x8000000000-0x83ffffffff]
NUMA: NODE_DATA [mem 0x83fffec500-0x83fffedfff]
The information is then duplicated by core kernel messages right after
the above output.
Early memory node ranges
node 0: [mem 0x0000008000000000-0x0000008000e7ffff]
node 0: [mem 0x0000008000e80000-0x00000083f65cffff]
node 0: [mem 0x00000083f65d0000-0x00000083f665ffff]
node 0: [mem 0x00000083f6660000-0x00000083f676ffff]
node 0: [mem 0x00000083f6770000-0x00000083f678ffff]
node 0: [mem 0x00000083f6790000-0x00000083fb82ffff]
node 0: [mem 0x00000083fb830000-0x00000083fbc0ffff]
node 0: [mem 0x00000083fbc10000-0x00000083fbdfffff]
node 0: [mem 0x00000083fbe00000-0x00000083fbffffff]
node 0: [mem 0x00000083fc000000-0x00000083fffbffff]
node 0: [mem 0x00000083fffc0000-0x00000083fffdffff]
node 0: [mem 0x00000083fffe0000-0x00000083ffffffff]
Initmem setup node 0 [mem 0x0000008000000000-0x00000083ffffffff]
Remove the duplication of memblock layout information printed during
boot by dropping the messages from arm64 numa initialisation.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In current die(), the irq is disabled for __die() handle, not
including the possible panic() handling. Since the log in __die()
can take several hundreds ms, new irq might come and interrupt
current die().
If the process calling die() holds some critical resource, and some
other process scheduled later also needs it, then it would deadlock.
The first panic will not be executed.
So here disable irq for the whole flow of die().
Signed-off-by: Qiao Zhou <qiaozhou@asrmicro.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The lse implementation of atomic64_dec_if_positive uses the '+&' constraint,
but the '&' is redundant and confusing in this case, since early clobber
on a read/write operand is a strange concept.
Replace the constraint with '+'.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Casting a pointer to an integral type doesn't require a __force
attribute, because you'll need to cast back to a pointer in order to
dereference the thing anyway.
This patch removes the redundant __force cast from __range_ok.
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The highlight is Ben's patch to work around a host killing bug when
running KVM guests with the Radix MMU on Power9. See the long change
log of that commit for more detail.
And then three fairly minor fixes:
- fix of_node_put() underflow during reconfig remove, using old DLPAR
tools.
- fix recently introduced ld version check with 64-bit LE-only
toolchain.
- free the subpage_prot_table correctly, avoiding a memory leak.
Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier"
* tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Free the subpage_prot_table correctly
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
powerpc/mm/radix: Workaround prefetch issue with KVM
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fixes: dad6f37c2602e ("powerpc: subpage_protect: Increase the array size to take care of 64TB")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In commit efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions
for modules"), we added an ld version check early in the powerpc
top-level Makefile.
Because the Makefile runs before the kernel config is setup, the
checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
we end up configuring ld for 32-bit big endian.
That would be OK, except that for historical (or perhaps no) reason,
we use 'override LD' to add the endian flags to the LD variable
itself, rather than the normal approach of adding them to LDFLAGS.
The end result is that when we check the ld version we run it as:
$(CROSS_COMPILE)ld -EB -m elf32ppc --version
This often works, unless you are using a 64-bit only and/or little
endian only, toolchain. In which case you see something like:
$ make defconfig
powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
/bin/sh: 1: [: -ge: unexpected operator
The proper fix is to stop using 'override LD', but that will require a
fair bit of testing. Instead we can fix it for now just by reordering
the Makefile to do the version check earlier.
Fixes: efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put() must be
removed from pSeries_reconfig_remove_node().
dlpar_detach_node() and pSeries_reconfig_remove_node() both call
of_detach_node(), and thus the node should not be released in both
cases.
Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There's a somewhat architectural issue with Radix MMU and KVM.
When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.
The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.
This can cause stale translations and subsequent crashes.
Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.
We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.
We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.
There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:
- On the way out of a guest, before we clear the current VCPU in the
PACA, we check the PID and if it's outside of the permitted range
we flush the TLB for that PID.
- When context switching, if the mm is "new" on that CPU (the
corresponding bit was set for the first time in the mm cpumask), we
check if any sibling thread is in KVM (has a non-NULL VCPU pointer
in the PACA). If that is the case, we also flush the PID for that
CPU (core).
This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.
A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13-rc1.
I have also included a couple of cleanup patches in this pull request
for OMAP2+, related to the omap_hsmmc driver. The reason is because of
the changes are also depending on OMAP SoC specific code, so this
simplifies how to deal with this.
Summary:
MMC host:
- sunxi: Correct time phase settings
- omap_hsmmc: Clean up some dead code
- dw_mmc: Fix message printed for deprecated num-slots DT binding
- dw_mmc: Fix DT documentation"
* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Documentation: dw-mshc: deprecate num-slots
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
mmc: host: omap_hsmmc: remove unused platform callbacks
ARM: OMAP2+: hsmmc.c: Remove dead code
mmc: sunxi: Keep default timing phase settings for new timing mode
|
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Most platforms using OMAP hsmmc driver have switched to device tree
for passing platform data to omap_hsmmc.c driver.
The hsmmc.c file in mach-omap2 exists only to support pandora board
which uses wl1251 driver in legacy platform data mode.
Hence, remove the dead code not used by the pandora board.
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- The majority of lines changed are due to regenerated defconfig files.
- The support for the Page Deallocation Table (PDT) which was merged in
the merge window for 4.13 contained a bug which crashes the kernel if
a bad page is reported by firmware. This is now fixed and the kernel
messages will show which memory slot holds the broken DIMM.
- Commit 3a166fc2d4ef ("kbuild: handle libs-y archives separately from
built-in.o archives") broke linking the parisc kernel due to
millicode symbols which can't be reached then any longer. This was
fixed by modifying the parisc vmlinux.lds linker script.
- If the stack checker panics on stack overflow, avoid recursive
panics.
- Some parisc machines can't physically power off and thus instead
start after some time to flood the console by presumably detected
soft lockups. Avoid this by disabling the lockup detectors before
entering the endless for-next loop.
- Dave Anglin provided fixes which prevents TLB speculation on flushed
pages on PA8800/PA9000 CPUs.
- Arvind Yadav sent a trivial patch to constify the attribute_group
structure in our firmware on-board-flash storage driver
(pdc_stable.c)
* 'parisc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Extend disabled preemption in copy_user_page
parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases
parisc: Suspend lockup detectors before system halt
parisc: Show DIMM slot number which holds broken memory module
parisc: Add function to return DIMM slot of physical address
parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function
parisc: regenerate defconfig files
parisc: pdc_stable: constify attribute_group structures.
parisc: Merge millicode routines via linker script
parisc: Disable further stack checks when panic occurs during stack check
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
It's always bothered me that we only disable preemption in
copy_user_page around the call to flush_dcache_page_asm.
This patch extends this to after the copy.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
equivalent aliases
Helge noticed that we flush the TLB page in flush_cache_page but not in
flush_cache_range or flush_cache_mm.
For a long time, we have had random segmentation faults building
packages on machines with PA8800/8900 processors. These machines only
support equivalent aliases. We don't see these faults on machines that
don't require strict coherency. So, it appears TLB speculation
sometimes leads to cache corruption on machines that require coherency.
This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
coherency is required. We only flush the TLB in flush_cache_page when
coherency is required.
The patch also optimizes flush_cache_range. It turns out we always have
the right context to use flush_user_dcache_range_asm and
flush_user_icache_range_asm.
The patch has been tested for some time on rp3440, rp3410 and A500-44.
It's been boot tested on c8000. No random segmentation faults were
observed during testing.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Some machines can't power off the machine, so disable the lockup detectors to
avoid this watchdog BUG to show up every few seconds:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The Page Deallocation Table (PDT) holds the physical addresses of all broken
memory addresses. With the physical address we now are able to show which DIMM
slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
able to replace it.
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
physical address. This is needed to show users which DIMM module needs
replacement in case a broken DIMM was encountered.
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support")
introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
crashed the system because it trashed the stack if the
pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
in size less than the required 32 64-bit values).
Fix it by using the pdc_result struct instead when calling firmware and copy
the return values back into the result struct when finished sucessfully.
While debugging this code I noticed that the pdc_type wasn't set correctly
either, so let's fix that too.
Fixes: c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support")
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When compiling the 4.13-rc kernel I got those linker errors:
libgcc2.c:(.text+0x110): relocation truncated to fit: R_PARISC_PCREL22F against symbol `$$divU'
defined in .text.div section in /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_divU.o)
hppa64-linux-gnu-ld: /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_moddi3.o)(.text+0x174): cannot reach $$divU
Avoid such errors by bundling the millicode routines in the linker script.
Signed-off-by: Helge Deller <deller@gmx.de>
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Before the irq handler detects a low stack and then panics the kernel, disable
further stack checks to avoid recursive panics.
Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull ARM fixes from Russell King:
"Two areas addressed by these fixes:
- Fixes from Dave Martin for the signal frames that were broken with
certain configurations. No one noticed until recently.
- More kexec fixes to ensure that the crashkernel region is correctly
allocated, and a fix for the location of the device tree when
several kexec kernels are loaded"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[]
ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors
ARM: kexec: fix failure to boot crash kernel
ARM: kexec: avoid allocating crashkernel region outside lowmem
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In kernels with CONFIG_IWMMXT=y running on non-iWMMXt hardware, the
signal frame can be left partially uninitialised in such a way
that userspace cannot parse uc_regspace[] safely. In particular,
this means that the VFP registers cannot be located reliably in the
signal frame when a multi_v7_defconfig kernel is run on the
majority of platforms.
The cause is that the uc_regspace[] is laid out statically based on
the kernel config, but the decision of whether to save/restore the
iWMMXt registers must be a runtime decision.
To minimise breakage of software that may assume a fixed layout,
this patch emits a dummy block of the same size as iwmmxt_sigframe,
for non-iWMMXt threads. However, the magic and size of this block
are now filled in to help parsers skip over it. A new DUMMY_MAGIC
is defined for this purpose.
It is probably legitimate (if non-portable) for userspace to
manufacture its own sigframe for sigreturn, and there is no obvious
reason why userspace should be required to insert a DUMMY_MAGIC
block when running on non-iWMMXt hardware, when omitting it has
worked just fine forever in other configurations. So in this case,
sigreturn does not require this block to be present.
Reported-by: Edmund Grimley-Evans <Edmund.Grimley-Evans@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
preserve_iwmmxt_context() and restore_iwmmxt_context() lack __user
accessors on their arguments pointing to the user signal frame.
There does not be appear to be a bug here, but this omission is
inconsistent with the crunch and vfp sigframe access functions.
This patch adds the annotations, for consistency.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When kexec was converted to DTB, the dtb address was passed between
machine_kexec_prepare() and machine_kexec() using a static variable.
This is bad news if you load a crash kernel followed by a normal
kernel or vice versa - the last loaded kernel overwrites the dtb
address.
This can result in kexec failures, as (eg) we try to boot the crash
kernel with the last loaded dtb. For example, with:
the crash kernel fails to find the dtb.
Avoid this by defining a kimage architecture structure, and store
the address to be passed in r2 there, which will either be the ATAGs
or the dtb blob.
Fixes: 4cabd1d9625c ("ARM: 7539/1: kexec: scan for dtb magic in segments")
Fixes: 42d720d1731a ("ARM: kexec: Make .text R/W in machine_kexec")
Reported-by: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Allocating the crashkernel region outside lowmem causes the kernel to
oops while trying to kexec into the new kernel:
Loading crashdump kernel...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = edd70000
[00000000] *pgd=de19e835
Internal error: Oops: 817 [#2] SMP ARM
Modules linked in: ...
CPU: 0 PID: 689 Comm: sh Not tainted 4.12.0-rc3-next-20170601-04015-gc3a5a20
Hardware name: Generic DRA74X (Flattened Device Tree)
task: edb32f00 task.stack: edf18000
PC is at memcpy+0x50/0x330
LR is at 0xe3c34001
pc : [<c04baf30>] lr : [<e3c34001>] psr: 800c0193
sp : edf19c2c ip : 0a000001 fp : c0553170
r10: c055316e r9 : 00000001 r8 : e3130001
r7 : e4903004 r6 : 0a000014 r5 : e3500000 r4 : e59f106c
r3 : e59f0074 r2 : ffffffe8 r1 : c010fb88 r0 : 00000000
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: add7006a DAC: 00000051
Process sh (pid: 689, stack limit = 0xedf18218)
Stack: (0xedf19c2c to 0xedf1a000)
...
[<c04baf30>] (memcpy) from [<c010fae0>] (machine_kexec+0xa8/0x12c)
[<c010fae0>] (machine_kexec) from [<c01e4104>] (__crash_kexec+0x5c/0x98)
[<c01e4104>] (__crash_kexec) from [<c01e419c>] (crash_kexec+0x5c/0x68)
[<c01e419c>] (crash_kexec) from [<c010c5c0>] (die+0x228/0x490)
[<c010c5c0>] (die) from [<c011e520>] (__do_kernel_fault.part.0+0x54/0x1e4)
[<c011e520>] (__do_kernel_fault.part.0) from [<c082412c>] (do_page_fault+0x1e8/0x400)
[<c082412c>] (do_page_fault) from [<c010135c>] (do_DataAbort+0x38/0xb8)
[<c010135c>] (do_DataAbort) from [<c0823584>] (__dabt_svc+0x64/0xa0)
This is caused by image->control_code_page being a highmem page, so
page_address(image->control_code_page) returns NULL. In any case, we
don't want the control page to be a highmem page.
We already limit the crash kernel region to the top of 32-bit physical
memory space. Also limit it to the top of lowmem in physical space.
Reported-by: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull dma mapping fixes from Christoph Hellwig:
"split the global dma coherent pool from the per-device pool.
This fixes a regression in the earlier 4.13 pull requests where the
global pool would override a per-device CMA pool (Vladimir Murzin)"
* tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping:
ARM: NOMMU: Wire-up default DMA interface
dma-coherent: introduce interface for default DMA pool
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The way how default DMA pool is exposed has changed and now we need to
use dedicated interface to work with it. This patch makes alloc/release
operations to use such interface. Since, default DMA pool is not
handled by generic code anymore we have to implement our own mmap
operation.
Tested-by: Andras Szemzo <sza@esh.hu>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| | |_|/ / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.
This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.
To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.
[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Andras Szemzo <sza@esh.hu>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|\ \ \ \ \ \
| |_|_|/ / /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: set change and reference bit on lazy key enablement
s390: chp: handle CRW_ERC_INIT for channel-path status change
s390/perf: fix problem state detection
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When we enable storage keys for a guest lazily, we reset the ACC and F
values. That is correct assuming that these are 0 on a clear reset and
the guest obviously has not used any key setting instruction.
We also zero out the change and reference bit. This is not correct as
the architecture prefers over-indication instead of under-indication
for the keyless->keyed transition.
This patch fixes the behaviour and always sets guest change and guest
reference for all guest storage keys on the keyless -> keyed switch.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|