diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-07-07 04:11:24 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-07-07 04:11:24 +0200 |
commit | 6e6c5b960644125b6f2fc2cd04e62bff0771923e (patch) | |
tree | 6d839565616684904fd17f1207c8294b30cb162c | |
parent | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (diff) | |
parent | x86/xen: allow userspace access during hypercalls (diff) | |
download | linux-6e6c5b960644125b6f2fc2cd04e62bff0771923e.tar.xz linux-6e6c5b960644125b6f2fc2cd04e62bff0771923e.zip |
Merge tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Other than fixes and cleanups it contains:
- support > 32 VCPUs at domain restore
- support for new sysfs nodes related to Xen
- some performance tuning for Linux running as Xen guest"
* tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: allow userspace access during hypercalls
x86: xen: remove unnecessary variable in xen_foreach_remap_area()
xen: allocate page for shared info page from low memory
xen: avoid deadlock in xenbus driver
xen: add sysfs node for hypervisor build id
xen: sync include/xen/interface/version.h
xen: add sysfs node for guest type
doc,xen: document hypervisor sysfs nodes for xen
xen/vcpu: Handle xen_vcpu_setup() failure at boot
xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
xen/pv: Fix OOPS on restore for a PV, !SMP domain
xen/pvh*: Support > 32 VCPUs at domain restore
xen/vcpu: Simplify xen_vcpu related code
xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
xen: avoid type warning in xchg_xen_ulong
xen: fix HYPERVISOR_dm_op() prototype
xen: don't print error message in case of missing Xenstore entry
arm/xen: Adjust one function call together with a variable assignment
arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
arm/xen: Improve a size determination in __set_phys_to_machine_multi()
25 files changed, 542 insertions, 167 deletions
diff --git a/Documentation/ABI/stable/sysfs-hypervisor-xen b/Documentation/ABI/stable/sysfs-hypervisor-xen new file mode 100644 index 000000000000..3cf5cdfcd9a8 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-hypervisor-xen @@ -0,0 +1,119 @@ +What: /sys/hypervisor/compilation/compile_date +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Contains the build time stamp of the Xen hypervisor + Might return "<denied>" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/compilation/compiled_by +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Contains information who built the Xen hypervisor + Might return "<denied>" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/compilation/compiler +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Compiler which was used to build the Xen hypervisor + Might return "<denied>" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/properties/capabilities +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Space separated list of supported guest system types. Each type + is in the format: <class>-<major>.<minor>-<arch> + With: + <class>: "xen" -- x86: paravirtualized, arm: standard + "hvm" -- x86 only: fully virtualized + <major>: major guest interface version + <minor>: minor guest interface version + <arch>: architecture, e.g.: + "x86_32": 32 bit x86 guest without PAE + "x86_32p": 32 bit x86 guest with PAE + "x86_64": 64 bit x86 guest + "armv7l": 32 bit arm guest + "aarch64": 64 bit arm guest + +What: /sys/hypervisor/properties/changeset +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Changeset of the hypervisor (git commit) + Might return "<denied>" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/properties/features +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Features the Xen hypervisor supports for the guest as defined + in include/xen/interface/features.h printed as a hex value. + +What: /sys/hypervisor/properties/pagesize +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Default page size of the hypervisor printed as a hex value. + Might return "0" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/properties/virtual_start +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Virtual address of the hypervisor as a hex value. + +What: /sys/hypervisor/type +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Type of hypervisor: + "xen": Xen hypervisor + +What: /sys/hypervisor/uuid +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + UUID of the guest as known to the Xen hypervisor. + +What: /sys/hypervisor/version/extra +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + The Xen version is in the format <major>.<minor><extra> + This is the <extra> part of it. + Might return "<denied>" in case of special security settings + in the hypervisor. + +What: /sys/hypervisor/version/major +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + The Xen version is in the format <major>.<minor><extra> + This is the <major> part of it. + +What: /sys/hypervisor/version/minor +Date: March 2009 +KernelVersion: 2.6.30 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + The Xen version is in the format <major>.<minor><extra> + This is the <minor> part of it. diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu b/Documentation/ABI/testing/sysfs-hypervisor-xen index 224faa105e18..53b7b2ea7515 100644 --- a/Documentation/ABI/testing/sysfs-hypervisor-pmu +++ b/Documentation/ABI/testing/sysfs-hypervisor-xen @@ -1,8 +1,19 @@ +What: /sys/hypervisor/guest_type +Date: June 2017 +KernelVersion: 4.13 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Type of guest: + "Xen": standard guest type on arm + "HVM": fully virtualized guest (x86) + "PV": paravirtualized guest (x86) + "PVH": fully virtualized guest without legacy emulation (x86) + What: /sys/hypervisor/pmu/pmu_mode Date: August 2015 KernelVersion: 4.3 Contact: Boris Ostrovsky <boris.ostrovsky@oracle.com> -Description: +Description: If running under Xen: Describes mode that Xen's performance-monitoring unit (PMU) uses. Accepted values are "off" -- PMU is disabled @@ -17,7 +28,16 @@ What: /sys/hypervisor/pmu/pmu_features Date: August 2015 KernelVersion: 4.3 Contact: Boris Ostrovsky <boris.ostrovsky@oracle.com> -Description: +Description: If running under Xen: Describes Xen PMU features (as an integer). A set bit indicates that the corresponding feature is enabled. See include/xen/interface/xenpmu.h for available features + +What: /sys/hypervisor/properties/buildid +Date: June 2017 +KernelVersion: 4.13 +Contact: xen-devel@lists.xenproject.org +Description: If running under Xen: + Build id of the hypervisor, needed for hypervisor live patching. + Might return "<denied>" in case of special security settings + in the hypervisor. diff --git a/MAINTAINERS b/MAINTAINERS index 1c1d106a3347..c8f7952b8bbe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14228,6 +14228,8 @@ F: drivers/xen/ F: arch/x86/include/asm/xen/ F: include/xen/ F: include/uapi/xen/ +F: Documentation/ABI/stable/sysfs-hypervisor-xen +F: Documentation/ABI/testing/sysfs-hypervisor-xen XEN HYPERVISOR ARM M: Stefano Stabellini <sstabellini@kernel.org> diff --git a/arch/arm/include/asm/xen/events.h b/arch/arm/include/asm/xen/events.h index 71e473d05fcc..620dc75362e5 100644 --- a/arch/arm/include/asm/xen/events.h +++ b/arch/arm/include/asm/xen/events.h @@ -16,7 +16,7 @@ static inline int xen_irqs_disabled(struct pt_regs *regs) return raw_irqs_disabled_flags(regs->ARM_cpsr); } -#define xchg_xen_ulong(ptr, val) atomic64_xchg(container_of((ptr), \ +#define xchg_xen_ulong(ptr, val) atomic64_xchg(container_of((long long*)(ptr),\ atomic64_t, \ counter), (val)) diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c index 0ed01f2d5ee4..e71eefa2e427 100644 --- a/arch/arm/xen/p2m.c +++ b/arch/arm/xen/p2m.c @@ -144,17 +144,17 @@ bool __set_phys_to_machine_multi(unsigned long pfn, return true; } - p2m_entry = kzalloc(sizeof(struct xen_p2m_entry), GFP_NOWAIT); - if (!p2m_entry) { - pr_warn("cannot allocate xen_p2m_entry\n"); + p2m_entry = kzalloc(sizeof(*p2m_entry), GFP_NOWAIT); + if (!p2m_entry) return false; - } + p2m_entry->pfn = pfn; p2m_entry->nr_pages = nr_pages; p2m_entry->mfn = mfn; write_lock_irqsave(&p2m_lock, irqflags); - if ((rc = xen_add_phys_to_mach_entry(p2m_entry)) < 0) { + rc = xen_add_phys_to_mach_entry(p2m_entry); + if (rc < 0) { write_unlock_irqrestore(&p2m_lock, irqflags); return false; } diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index f6d20f6cca12..11071fcd630e 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -43,6 +43,7 @@ #include <asm/page.h> #include <asm/pgtable.h> +#include <asm/smap.h> #include <xen/interface/xen.h> #include <xen/interface/sched.h> @@ -50,6 +51,8 @@ #include <xen/interface/platform.h> #include <xen/interface/xen-mca.h> +struct xen_dm_op_buf; + /* * The hypercall asms have to meet several constraints: * - Work on 32- and 64-bit. @@ -214,10 +217,12 @@ privcmd_call(unsigned call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5); + stac(); asm volatile("call *%[call]" : __HYPERCALL_5PARAM : [call] "a" (&hypercall_page[call]) : __HYPERCALL_CLOBBER5); + clac(); return (long)__res; } @@ -474,9 +479,13 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg) static inline int HYPERVISOR_dm_op( - domid_t dom, unsigned int nr_bufs, void *bufs) + domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs) { - return _hypercall3(int, dm_op, dom, nr_bufs, bufs); + int ret; + stac(); + ret = _hypercall3(int, dm_op, dom, nr_bufs, bufs); + clac(); + return ret; } static inline void diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index a5ffcbb20cc0..0e7ef69e8531 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -106,15 +106,83 @@ int xen_cpuhp_setup(int (*cpu_up_prepare_cb)(unsigned int), return rc >= 0 ? 0 : rc; } -static void clamp_max_cpus(void) +static int xen_vcpu_setup_restore(int cpu) { -#ifdef CONFIG_SMP - if (setup_max_cpus > MAX_VIRT_CPUS) - setup_max_cpus = MAX_VIRT_CPUS; -#endif + int rc = 0; + + /* Any per_cpu(xen_vcpu) is stale, so reset it */ + xen_vcpu_info_reset(cpu); + + /* + * For PVH and PVHVM, setup online VCPUs only. The rest will + * be handled by hotplug. + */ + if (xen_pv_domain() || + (xen_hvm_domain() && cpu_online(cpu))) { + rc = xen_vcpu_setup(cpu); + } + + return rc; +} + +/* + * On restore, set the vcpu placement up again. + * If it fails, then we're in a bad state, since + * we can't back out from using it... + */ +void xen_vcpu_restore(void) +{ + int cpu, rc; + + for_each_possible_cpu(cpu) { + bool other_cpu = (cpu != smp_processor_id()); + bool is_up; + + if (xen_vcpu_nr(cpu) == XEN_VCPU_ID_INVALID) + continue; + + /* Only Xen 4.5 and higher support this. */ + is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, + xen_vcpu_nr(cpu), NULL) > 0; + + if (other_cpu && is_up && + HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(cpu), NULL)) + BUG(); + + if (xen_pv_domain() || xen_feature(XENFEAT_hvm_safe_pvclock)) + xen_setup_runstate_info(cpu); + + rc = xen_vcpu_setup_restore(cpu); + if (rc) + pr_emerg_once("vcpu restore failed for cpu=%d err=%d. " + "System will hang.\n", cpu, rc); + /* + * In case xen_vcpu_setup_restore() fails, do not bring up the + * VCPU. This helps us avoid the resulting OOPS when the VCPU + * accesses pvclock_vcpu_time via xen_vcpu (which is NULL.) + * Note that this does not improve the situation much -- now the + * VM hangs instead of OOPSing -- with the VCPUs that did not + * fail, spinning in stop_machine(), waiting for the failed + * VCPUs to come up. + */ + if (other_cpu && is_up && (rc == 0) && + HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)) + BUG(); + } } -void xen_vcpu_setup(int cpu) +void xen_vcpu_info_reset(int cpu) +{ + if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS) { + per_cpu(xen_vcpu, cpu) = + &HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)]; + } else { + /* Set to NULL so that if somebody accesses it we get an OOPS */ + per_cpu(xen_vcpu, cpu) = NULL; + } +} + +int xen_vcpu_setup(int cpu) { struct vcpu_register_vcpu_info info; int err; @@ -123,11 +191,11 @@ void xen_vcpu_setup(int cpu) BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info); /* - * This path is called twice on PVHVM - first during bootup via - * smp_init -> xen_hvm_cpu_notify, and then if the VCPU is being - * hotplugged: cpu_up -> xen_hvm_cpu_notify. - * As we can only do the VCPUOP_register_vcpu_info once lets - * not over-write its result. + * This path is called on PVHVM at bootup (xen_hvm_smp_prepare_boot_cpu) + * and at restore (xen_vcpu_restore). Also called for hotplugged + * VCPUs (cpu_init -> xen_hvm_cpu_prepare_hvm). + * However, the hypercall can only be done once (see below) so if a VCPU + * is offlined and comes back online then let's not redo the hypercall. * * For PV it is called during restore (xen_vcpu_restore) and bootup * (xen_setup_vcpu_info_placement). The hotplug mechanism does not @@ -135,42 +203,44 @@ void xen_vcpu_setup(int cpu) */ if (xen_hvm_domain()) { if (per_cpu(xen_vcpu, cpu) == &per_cpu(xen_vcpu_info, cpu)) - return; + return 0; } - if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS) - per_cpu(xen_vcpu, cpu) = - &HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)]; - if (!xen_have_vcpu_info_placement) { - if (cpu >= MAX_VIRT_CPUS) - clamp_max_cpus(); - return; + if (xen_have_vcpu_info_placement) { + vcpup = &per_cpu(xen_vcpu_info, cpu); + info.mfn = arbitrary_virt_to_mfn(vcpup); + info.offset = offset_in_page(vcpup); + + /* + * Check to see if the hypervisor will put the vcpu_info + * structure where we want it, which allows direct access via + * a percpu-variable. + * N.B. This hypercall can _only_ be called once per CPU. + * Subsequent calls will error out with -EINVAL. This is due to + * the fact that hypervisor has no unregister variant and this + * hypercall does not allow to over-write info.mfn and + * info.offset. + */ + err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, + xen_vcpu_nr(cpu), &info); + + if (err) { + pr_warn_once("register_vcpu_info failed: cpu=%d err=%d\n", + cpu, err); + xen_have_vcpu_info_placement = 0; + } else { + /* + * This cpu is using the registered vcpu info, even if + * later ones fail to. + */ + per_cpu(xen_vcpu, cpu) = vcpup; + } } - vcpup = &per_cpu(xen_vcpu_info, cpu); - info.mfn = arbitrary_virt_to_mfn(vcpup); - info.offset = offset_in_page(vcpup); - - /* Check to see if the hypervisor will put the vcpu_info - structure where we want it, which allows direct access via - a percpu-variable. - N.B. This hypercall can _only_ be called once per CPU. Subsequent - calls will error out with -EINVAL. This is due to the fact that - hypervisor has no unregister variant and this hypercall does not - allow to over-write info.mfn and info.offset. - */ - err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), - &info); + if (!xen_have_vcpu_info_placement) + xen_vcpu_info_reset(cpu); - if (err) { - printk(KERN_DEBUG "register_vcpu_info failed: err=%d\n", err); - xen_have_vcpu_info_placement = 0; - clamp_max_cpus(); - } else { - /* This cpu is using the registered vcpu info, even if - later ones fail to. */ - per_cpu(xen_vcpu, cpu) = vcpup; - } + return ((per_cpu(xen_vcpu, cpu) == NULL) ? -ENODEV : 0); } void xen_reboot(int reason) diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c index a6d014f47e52..87d791356ea9 100644 --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -1,5 +1,6 @@ #include <linux/cpu.h> #include <linux/kexec.h> +#include <linux/memblock.h> #include <xen/features.h> #include <xen/events.h> @@ -10,9 +11,11 @@ #include <asm/reboot.h> #include <asm/setup.h> #include <asm/hypervisor.h> +#include <asm/e820/api.h> #include <asm/xen/cpuid.h> #include <asm/xen/hypervisor.h> +#include <asm/xen/page.h> #include "xen-ops.h" #include "mmu.h" @@ -20,37 +23,34 @@ void __ref xen_hvm_init_shared_info(void) { - int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page; + u64 pa; + + if (HYPERVISOR_shared_info == &xen_dummy_shared_info) { + /* + * Search for a free page starting at 4kB physical address. + * Low memory is preferred to avoid an EPT large page split up + * by the mapping. + * Starting below X86_RESERVE_LOW (usually 64kB) is fine as + * the BIOS used for HVM guests is well behaved and won't + * clobber memory other than the first 4kB. + */ + for (pa = PAGE_SIZE; + !e820__mapped_all(pa, pa + PAGE_SIZE, E820_TYPE_RAM) || + memblock_is_reserved(pa); + pa += PAGE_SIZE) + ; + + memblock_reserve(pa, PAGE_SIZE); + HYPERVISOR_shared_info = __va(pa); + } - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = virt_to_pfn(HYPERVISOR_shared_info); if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) BUG(); - - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; - - /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info - * page, we use it in the event channel upcall and in some pvclock - * related functions. We don't need the vcpu_info placement - * optimizations because we don't use any pv_mmu or pv_irq op on - * HVM. - * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is - * online but xen_hvm_init_shared_info is run at resume time too and - * in that case multiple vcpus might be online. */ - for_each_online_cpu(cpu) { - /* Leave it to be NULL. */ - if (xen_vcpu_nr(cpu) >= MAX_VIRT_CPUS) - continue; - per_cpu(xen_vcpu, cpu) = - &HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)]; - } } static void __init init_hvm_pv_info(void) @@ -106,7 +106,7 @@ static void xen_hvm_crash_shutdown(struct pt_regs *regs) static int xen_cpu_up_prepare_hvm(unsigned int cpu) { - int rc; + int rc = 0; /* * This can happen if CPU was offlined earlier and @@ -121,7 +121,9 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu) per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu); else per_cpu(xen_vcpu_id, cpu) = cpu; - xen_vcpu_setup(cpu); + rc = xen_vcpu_setup(cpu); + if (rc) + return rc; if (xen_have_vector_callback && xen_feature(XENFEAT_hvm_safe_pvclock)) xen_setup_timer(cpu); @@ -130,9 +132,8 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu) if (rc) { WARN(1, "xen_smp_intr_init() for CPU %d failed: %d\n", cpu, rc); - return rc; } - return 0; + return rc; } static int xen_cpu_dead_hvm(unsigned int cpu) @@ -154,6 +155,13 @@ static void __init xen_hvm_guest_init(void) xen_hvm_init_shared_info(); + /* + * xen_vcpu is a pointer to the vcpu_info struct in the shared_info + * page, we use it in the event channel upcall and in some pvclock + * related functions. + */ + xen_vcpu_info_reset(0); + xen_panic_handler_init(); if (xen_feature(XENFEAT_hvm_callback_vector)) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index f33eef4ebd12..811e4ddb3f37 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -89,8 +89,6 @@ void *xen_initial_gdt; -RESERVE_BRK(shared_info_page_brk, PAGE_SIZE); - static int xen_cpu_up_prepare_pv(unsigned int cpu); static int xen_cpu_dead_pv(unsigned int cpu); @@ -107,35 +105,6 @@ struct tls_descs { */ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc); -/* - * On restore, set the vcpu placement up again. - * If it fails, then we're in a bad state, since - * we can't back out from using it... - */ -void xen_vcpu_restore(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - bool other_cpu = (cpu != smp_processor_id()); - bool is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, xen_vcpu_nr(cpu), - NULL); - - if (other_cpu && is_up && - HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(cpu), NULL)) - BUG(); - - xen_setup_runstate_info(cpu); - - if (xen_have_vcpu_info_placement) - xen_vcpu_setup(cpu); - - if (other_cpu && is_up && - HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)) - BUG(); - } -} - static void __init xen_banner(void) { unsigned version = HYPERVISOR_xen_version(XENVER_version, NULL); @@ -960,30 +929,43 @@ void xen_setup_shared_info(void) HYPERVISOR_shared_info = (struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP); -#ifndef CONFIG_SMP - /* In UP this is as good a place as any to set up shared info */ - xen_setup_vcpu_info_placement(); -#endif - xen_setup_mfn_list_list(); - /* - * Now that shared info is set up we can start using routines that - * point to pvclock area. - */ - if (system_state == SYSTEM_BOOTING) + if (system_state == SYSTEM_BOOTING) { +#ifndef CONFIG_SMP + /* + * In UP this is as good a place as any to set up shared info. + * Limit this to boot only, at restore vcpu setup is done via + * xen_vcpu_restore(). + */ + xen_setup_vcpu_info_placement(); +#endif + /* + * Now that shared info is set up we can start using routines + * that point to pvclock area. + */ xen_init_time_ops(); + } } /* This is called once we have the cpu_possible_mask */ -void xen_setup_vcpu_info_placement(void) +void __ref xen_setup_vcpu_info_placement(void) { int cpu; for_each_possible_cpu(cpu) { /* Set up direct vCPU id mapping for PV guests. */ per_cpu(xen_vcpu_id, cpu) = cpu; - xen_vcpu_setup(cpu); + + /* + * xen_vcpu_setup(cpu) can fail -- in which case it + * falls back to the shared_info version for cpus + * where xen_vcpu_nr(cpu) < MAX_VIRT_CPUS. + * + * xen_cpu_up_prepare_pv() handles the rest by failing + * them in hotplug. + */ + (void) xen_vcpu_setup(cpu); } /* @@ -1332,9 +1314,17 @@ asmlinkage __visible void __init xen_start_kernel(void) */ acpi_numa = -1; #endif - /* Don't do the full vcpu_info placement stuff until we have a - possible map and a non-dummy shared_info. */ - per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0]; + /* Let's presume PV guests always boot on vCPU with id 0. */ + per_cpu(xen_vcpu_id, 0) = 0; + + /* + * Setup xen_vcpu early because start_kernel needs it for + * local_irq_disable(), irqs_disabled(). + * + * Don't do the full vcpu_info placement stuff until we have + * the cpu_possible_mask and a non-dummy shared_info. + */ + xen_vcpu_info_reset(0); WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_pv, xen_cpu_dead_pv)); @@ -1431,9 +1421,7 @@ asmlinkage __visible void __init xen_start_kernel(void) #endif xen_raw_console_write("about to get started...\n"); - /* Let's presume PV guests always boot on vCPU with id 0. */ - per_cpu(xen_vcpu_id, 0) = 0; - + /* We need this for printk timestamps */ xen_setup_runstate_info(0); xen_efi_init(); @@ -1451,6 +1439,9 @@ static int xen_cpu_up_prepare_pv(unsigned int cpu) { int rc; + if (per_cpu(xen_vcpu, cpu) == NULL) + return -ENODEV; + xen_setup_timer(cpu); rc = xen_smp_intr_init(cpu); diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index a5bf7c451435..c81046323ebc 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -499,7 +499,7 @@ static unsigned long __init xen_foreach_remap_area(unsigned long nr_pages, void __init xen_remap_memory(void) { unsigned long buf = (unsigned long)&xen_remap_buf; - unsigned long mfn_save, mfn, pfn; + unsigned long mfn_save, pfn; unsigned long remapped = 0; unsigned int i; unsigned long pfn_s = ~0UL; @@ -515,8 +515,7 @@ void __init xen_remap_memory(void) pfn = xen_remap_buf.target_pfn; for (i = 0; i < xen_remap_buf.size; i++) { - mfn = xen_remap_buf.mfns[i]; - xen_update_mem_tables(pfn, mfn); + xen_update_mem_tables(pfn, xen_remap_buf.mfns[i]); remapped++; pfn++; } @@ -530,8 +529,6 @@ void __init xen_remap_memory(void) pfn_s = xen_remap_buf.target_pfn; len = xen_remap_buf.size; } - - mfn = xen_remap_mfn; xen_remap_mfn = xen_remap_buf.next_area_mfn; } diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 82ac611f2fc1..e7f02eb73727 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -1,4 +1,5 @@ #include <linux/smp.h> +#include <linux/cpu.h> #include <linux/slab.h> #include <linux/cpumask.h> #include <linux/percpu.h> @@ -114,6 +115,36 @@ int xen_smp_intr_init(unsigned int cpu) return rc; } +void __init xen_smp_cpus_done(unsigned int max_cpus) +{ + int cpu, rc, count = 0; + + if (xen_hvm_domain()) + native_smp_cpus_done(max_cpus); + + if (xen_have_vcpu_info_placement) + return; + + for_each_online_cpu(cpu) { + if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS) + continue; + + rc = cpu_down(cpu); + + if (rc == 0) { + /* + * Reset vcpu_info so this cpu cannot be onlined again. + */ + xen_vcpu_info_reset(cpu); + count++; + } else { + pr_warn("%s: failed to bring CPU %d down, error %d\n", + __func__, cpu, rc); + } + } + WARN(count, "%s: brought %d CPUs offline\n", __func__, count); +} + void xen_smp_send_reschedule(int cpu) { xen_send_IPI_one(cpu, XEN_RESCHEDULE_VECTOR); diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h index 8ebb6acca64a..87d3c76cba37 100644 --- a/arch/x86/xen/smp.h +++ b/arch/x86/xen/smp.h @@ -14,6 +14,8 @@ extern void xen_smp_intr_free(unsigned int cpu); int xen_smp_intr_init_pv(unsigned int cpu); void xen_smp_intr_free_pv(unsigned int cpu); +void xen_smp_cpus_done(unsigned int max_cpus); + void xen_smp_send_reschedule(int cpu); void xen_smp_send_call_function_ipi(const struct cpumask *mask); void xen_smp_send_call_function_single_ipi(int cpu); diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c index f18561bbf5c9..fd60abedf658 100644 --- a/arch/x86/xen/smp_hvm.c +++ b/arch/x86/xen/smp_hvm.c @@ -12,7 +12,8 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) native_smp_prepare_boot_cpu(); /* - * Setup vcpu_info for boot CPU. + * Setup vcpu_info for boot CPU. Secondary CPUs get their vcpu_info + * in xen_cpu_up_prepare_hvm(). */ xen_vcpu_setup(0); @@ -27,10 +28,20 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus) { + int cpu; + native_smp_prepare_cpus(max_cpus); WARN_ON(xen_smp_intr_init(0)); xen_init_lock_cpu(0); + + for_each_possible_cpu(cpu) { + if (cpu == 0) + continue; + + /* Set default vcpu_id to make sure that we don't use cpu-0's */ + per_cpu(xen_vcpu_id, cpu) = XEN_VCPU_ID_INVALID; + } } #ifdef CONFIG_HOTPLUG_CPU @@ -60,4 +71,5 @@ void __init xen_hvm_smp_init(void) smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi; smp_ops.send_call_func_single_ipi = xen_smp_send_call_function_single_ipi; smp_ops.smp_prepare_boot_cpu = xen_hvm_smp_prepare_boot_cpu; + smp_ops.smp_cpus_done = xen_smp_cpus_done; } diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c index aae32535f4ec..1ea598e5f030 100644 --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -371,10 +371,6 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle) return 0; } -static void xen_pv_smp_cpus_done(unsigned int max_cpus) -{ -} - #ifdef CONFIG_HOTPLUG_CPU static int xen_pv_cpu_disable(void) { @@ -469,7 +465,7 @@ static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id) static const struct smp_ops xen_smp_ops __initconst = { .smp_prepare_boot_cpu = xen_pv_smp_prepare_boot_cpu, .smp_prepare_cpus = xen_pv_smp_prepare_cpus, - .smp_cpus_done = xen_pv_smp_cpus_done, + .smp_cpus_done = xen_smp_cpus_done, .cpu_up = xen_pv_cpu_up, .cpu_die = xen_pv_cpu_die, diff --git a/arch/x86/xen/suspend_hvm.c b/arch/x86/xen/suspend_hvm.c index 01afcadde50a..484999416d8b 100644 --- a/arch/x86/xen/suspend_hvm.c +++ b/arch/x86/xen/suspend_hvm.c @@ -8,15 +8,10 @@ void xen_hvm_post_suspend(int suspend_cancelled) { - int cpu; - - if (!suspend_cancelled) + if (!suspend_cancelled) { xen_hvm_init_shared_info(); + xen_vcpu_restore(); + } xen_callback_vector(); xen_unplug_emulated_devices(); - if (xen_feature(XENFEAT_hvm_safe_pvclock)) { - for_each_online_cpu(cpu) { - xen_setup_runstate_info(cpu); - } - } } diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 9a440a42c618..0d5004477db6 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -78,7 +78,8 @@ bool xen_vcpu_stolen(int vcpu); extern int xen_have_vcpu_info_placement; -void xen_vcpu_setup(int cpu); +int xen_vcpu_setup(int cpu); +void xen_vcpu_info_reset(int cpu); void xen_setup_vcpu_info_placement(void); #ifdef CONFIG_SMP diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 2e567d8433b3..b241bfa529ce 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -1303,10 +1303,9 @@ void rebind_evtchn_irq(int evtchn, int irq) } /* Rebind an evtchn so that it gets delivered to a specific cpu */ -static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu) +int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu) { struct evtchn_bind_vcpu bind_vcpu; - int evtchn = evtchn_from_irq(irq); int masked; if (!VALID_EVTCHN(evtchn)) @@ -1338,12 +1337,13 @@ static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu) return 0; } +EXPORT_SYMBOL_GPL(xen_rebind_evtchn_to_cpu); static int set_affinity_irq(struct irq_data *data, const struct cpumask *dest, bool force) { unsigned tcpu = cpumask_first_and(dest, cpu_online_mask); - int ret = rebind_irq_to_cpu(data->irq, tcpu); + int ret = xen_rebind_evtchn_to_cpu(evtchn_from_irq(data->irq), tcpu); if (!ret) irq_data_update_effective_affinity(data, cpumask_of(tcpu)); diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index 10f1ef582659..9729a64ea1a9 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -421,6 +421,36 @@ static void evtchn_unbind_from_user(struct per_user_data *u, del_evtchn(u, evtchn); } +static DEFINE_PER_CPU(int, bind_last_selected_cpu); + +static void evtchn_bind_interdom_next_vcpu(int evtchn) +{ + unsigned int selected_cpu, irq; + struct irq_desc *desc; + unsigned long flags; + + irq = irq_from_evtchn(evtchn); + desc = irq_to_desc(irq); + + if (!desc) + return; + + raw_spin_lock_irqsave(&desc->lock, flags); + selected_cpu = this_cpu_read(bind_last_selected_cpu); + selected_cpu = cpumask_next_and(selected_cpu, + desc->irq_common_data.affinity, cpu_online_mask); + + if (unlikely(selected_cpu >= nr_cpu_ids)) + selected_cpu = cpumask_first_and(desc->irq_common_data.affinity, + cpu_online_mask); + + this_cpu_write(bind_last_selected_cpu, selected_cpu); + + /* unmask expects irqs to be disabled */ + xen_rebind_evtchn_to_cpu(evtchn, selected_cpu); + raw_spin_unlock_irqrestore(&desc->lock, flags); +} + static long evtchn_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -478,8 +508,10 @@ static long evtchn_ioctl(struct file *file, break; rc = evtchn_bind_to_user(u, bind_interdomain.local_port); - if (rc == 0) + if (rc == 0) { rc = bind_interdomain.local_port; + evtchn_bind_interdom_next_vcpu(rc); + } break; } diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 9e35032351a0..c425d03d37d2 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -278,8 +278,16 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path, err = xenbus_transaction_start(&xbt); if (err) return; - if (xenbus_scanf(xbt, "control", "sysrq", "%c", &sysrq_key) < 0) { - pr_err("Unable to read sysrq code in control/sysrq\n"); + err = xenbus_scanf(xbt, "control", "sysrq", "%c", &sysrq_key); + if (err < 0) { + /* + * The Xenstore watch fires directly after registering it and + * after a suspend/resume cycle. So ENOENT is no error but + * might happen in those cases. + */ + if (err != -ENOENT) + pr_err("Error %d reading sysrq code in control/sysrq\n", + err); xenbus_transaction_end(xbt, 1); return; } diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c index 84106f9c456c..9d314bba7c4e 100644 --- a/drivers/xen/sys-hypervisor.c +++ b/drivers/xen/sys-hypervisor.c @@ -50,6 +50,35 @@ static int __init xen_sysfs_type_init(void) return sysfs_create_file(hypervisor_kobj, &type_attr.attr); } +static ssize_t guest_type_show(struct hyp_sysfs_attr *attr, char *buffer) +{ + const char *type; + + switch (xen_domain_type) { + case XEN_NATIVE: + /* ARM only. */ + type = "Xen"; + break; + case XEN_PV_DOMAIN: + type = "PV"; + break; + case XEN_HVM_DOMAIN: + type = xen_pvh_domain() ? "PVH" : "HVM"; + break; + default: + return -EINVAL; + } + + return sprintf(buffer, "%s\n", type); +} + +HYPERVISOR_ATTR_RO(guest_type); + +static int __init xen_sysfs_guest_type_init(void) +{ + return sysfs_create_file(hypervisor_kobj, &guest_type_attr.attr); +} + /* xen version attributes */ static ssize_t major_show(struct hyp_sysfs_attr *attr, char *buffer) { @@ -327,12 +356,40 @@ static ssize_t features_show(struct hyp_sysfs_attr *attr, char *buffer) HYPERVISOR_ATTR_RO(features); +static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer) +{ + ssize_t ret; + struct xen_build_id *buildid; + + ret = HYPERVISOR_xen_version(XENVER_build_id, NULL); + if (ret < 0) { + if (ret == -EPERM) + ret = sprintf(buffer, "<denied>"); + return ret; + } + + buildid = kmalloc(sizeof(*buildid) + ret, GFP_KERNEL); + if (!buildid) + return -ENOMEM; + + buildid->len = ret; + ret = HYPERVISOR_xen_version(XENVER_build_id, buildid); + if (ret > 0) + ret = sprintf(buffer, "%s", buildid->buf); + kfree(buildid); + + return ret; +} + +HYPERVISOR_ATTR_RO(buildid); + static struct attribute *xen_properties_attrs[] = { &capabilities_attr.attr, &changeset_attr.attr, &virtual_start_attr.attr, &pagesize_attr.attr, &features_attr.attr, + &buildid_attr.attr, NULL }; @@ -471,6 +528,9 @@ static int __init hyper_sysfs_init(void) ret = xen_sysfs_type_init(); if (ret) goto out; + ret = xen_sysfs_guest_type_init(); + if (ret) + goto guest_type_out; ret = xen_sysfs_version_init(); if (ret) goto version_out; @@ -502,6 +562,8 @@ uuid_out: comp_out: sysfs_remove_group(hypervisor_kobj, &version_group); version_out: + sysfs_remove_file(hypervisor_kobj, &guest_type_attr.attr); +guest_type_out: sysfs_remove_file(hypervisor_kobj, &type_attr.attr); out: return ret; diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c index 856ada5d39c9..5b081a01779d 100644 --- a/drivers/xen/xenbus/xenbus_comms.c +++ b/drivers/xen/xenbus/xenbus_comms.c @@ -299,17 +299,7 @@ static int process_msg(void) mutex_lock(&xb_write_mutex); list_for_each_entry(req, &xs_reply_list, list) { if (req->msg.req_id == state.msg.req_id) { - if (req->state == xb_req_state_wait_reply) { - req->msg.type = state.msg.type; - req->msg.len = state.msg.len; - req->body = state.body; - req->state = xb_req_state_got_reply; - list_del(&req->list); - req->cb(req); - } else { - list_del(&req->list); - kfree(req); - } + list_del(&req->list); err = 0; break; } @@ -317,6 +307,15 @@ static int process_msg(void) mutex_unlock(&xb_write_mutex); if (err) goto out; + + if (req->state == xb_req_state_wait_reply) { + req->msg.type = state.msg.type; + req->msg.len = state.msg.len; + req->body = state.body; + req->state = xb_req_state_got_reply; + req->cb(req); + } else + kfree(req); } mutex_unlock(&xs_response_mutex); diff --git a/include/xen/arm/hypercall.h b/include/xen/arm/hypercall.h index 73db4b2eeb89..b40485e54d80 100644 --- a/include/xen/arm/hypercall.h +++ b/include/xen/arm/hypercall.h @@ -39,6 +39,8 @@ #include <xen/interface/sched.h> #include <xen/interface/platform.h> +struct xen_dm_op_buf; + long privcmd_call(unsigned call, unsigned long a1, unsigned long a2, unsigned long a3, unsigned long a4, unsigned long a5); @@ -53,7 +55,8 @@ int HYPERVISOR_physdev_op(int cmd, void *arg); int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args); int HYPERVISOR_tmem_op(void *arg); int HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type); -int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs, void *bufs); +int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs, + struct xen_dm_op_buf *bufs); int HYPERVISOR_platform_op_raw(void *arg); static inline int HYPERVISOR_platform_op(struct xen_platform_op *op) { diff --git a/include/xen/events.h b/include/xen/events.h index 88da2abaf535..f442ca5fcd82 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -58,6 +58,7 @@ void evtchn_put(unsigned int evtchn); void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); void rebind_evtchn_irq(int evtchn, int irq); +int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu); static inline void notify_remote_via_evtchn(int port) { diff --git a/include/xen/interface/version.h b/include/xen/interface/version.h index 7ff6498679a3..145f12f9ecec 100644 --- a/include/xen/interface/version.h +++ b/include/xen/interface/version.h @@ -63,4 +63,19 @@ struct xen_feature_info { /* arg == xen_domain_handle_t. */ #define XENVER_guest_handle 8 +#define XENVER_commandline 9 +struct xen_commandline { + char buf[1024]; +}; + +/* + * Return value is the number of bytes written, or XEN_Exx on error. + * Calling with empty parameter returns the size of build_id. + */ +#define XENVER_build_id 10 +struct xen_build_id { + uint32_t len; /* IN: size of buf[]. */ + unsigned char buf[]; +}; + #endif /* __XEN_PUBLIC_VERSION_H__ */ diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index c44a2ee8c8f8..218e6aae5433 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -15,6 +15,8 @@ static inline uint32_t xen_vcpu_nr(int cpu) return per_cpu(xen_vcpu_id, cpu); } +#define XEN_VCPU_ID_INVALID U32_MAX + void xen_arch_pre_suspend(void); void xen_arch_post_suspend(int suspend_cancelled); |