summaryrefslogtreecommitdiffstats
path: root/fs/bfs (unfollow)
Commit message (Collapse)AuthorFilesLines
2017-12-20xen/balloon: Mark unallocated host memory as UNUSABLEBoris Ostrovsky4-13/+144
Commit f5775e0b6116 ("x86/xen: discard RAM regions above the maximum reservation") left host memory not assigned to dom0 as available for memory hotplug. Unfortunately this also meant that those regions could be used by others. Specifically, commit fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those addresses as MMIO. To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus effectively reverting f5775e0b6116) and keep track of that region as a hostmem resource that can be used for the hotplug. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2017-12-19x86-64/Xen: eliminate W+X mappingsJan Beulich2-0/+15
A few thousand such pages are usually left around due to the re-use of L1 tables having been provided by the hypervisor (Dom0) or tool stack (DomU). Set NX in the direct map variant, which needs to be done in L2 due to the dual use of the re-used L1s. For x86_configure_nx() to actually do what it is supposed to do, call get_cpu_cap() first. This was broken by commit 4763ed4d45 ("x86, mm: Clean up and simplify NX enablement") when switching away from the direct EFER read. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-12-12xen: XEN_ACPI_PROCESSOR is Dom0-onlyJan Beulich1-1/+1
Add a respective dependency. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-12-12x86/Xen: don't report ancient LAPIC versionJan Beulich1-1/+1
Unconditionally reporting a value seen on the P4 or older invokes functionality like io_apic_get_unique_id() on 32-bit builds, resulting in a panic() with sufficiently many CPUs and/or IO-APICs. Doing what that function does would be the hypervisor's responsibility anyway, so makes no sense to be used when running on Xen. Uniformly report a more modern version; this shouldn't matter much as both LAPIC and IO-APIC are being managed entirely / mostly by the hypervisor. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-12-06xen/pvcalls: Fix a check in pvcalls_front_remove()Dan Carpenter1-1/+1
bedata->ref can't be less than zero because it's unsigned. This affects certain error paths in probe. We first set ->ref = -1 and then we set it to a valid value later. Fixes: 219681909913 ("xen/pvcalls: connect to the backend") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-12-06xen/pvcalls: check for xenbus_read() errorsDan Carpenter1-0/+2
Smatch complains that "len" is uninitialized if xenbus_read() fails so let's add some error handling. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-15xen/pvcalls: fix potential endless loop in pvcalls-front.cStefano Stabellini1-6/+5
mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you take in_mutex on the first try, but you can't take out_mutex. Next times you call mutex_trylock() in_mutex is going to fail. It's an endless loop. Solve the problem by waiting until the global refcount is 1 instead (the refcount is 1 when the only active pvcalls frontend function is pvcalls_front_release). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-15xen/pvcalls: Add MODULE_LICENSE()Boris Ostrovsky2-0/+8
Since commit ba1029c9cbc5 ("modpost: detect modules without a MODULE_LICENSE") modules without said macro will generate WARNING: modpost: missing MODULE_LICENSE() in <filename> While at it, also add module description and attribution. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2017-11-08MAINTAINERS: xen, kvm: track pvclock-abi.h changesJoao Martins1-0/+2
This file defines an ABI shared between guest and hypervisor(s) (KVM, Xen) and as such there should be an correspondent entry in MAINTAINERS file. Notice that there's already a text notice at the top of the header file, hence this commit simply enforces it more explicitly and have both peers noticed when such changes happen. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Juergen Gross <jgross@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08x86/xen/time: setup vcpu 0 time info pageJoao Martins4-1/+137
In order to support pvclock vdso on xen we need to setup the time info page for vcpu 0 and register the page with Xen using the VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall will also forcefully update the pvti which will set some of the necessary flags for vdso. Afterwards we check if it supports the PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having vdso/vsyscall support. And if so, it will set the cpu 0 pvti that will be later on used when mapping the vdso image. The xen headers are also updated to include the new hypercall for registering the secondary vcpu_time_info struct. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08x86/xen/time: set pvclock flags on xen_time_init()Joao Martins1-0/+9
Specifically check for PVCLOCK_TSC_STABLE_BIT and if this bit is set, then set it too on pvclock flags. This allows Xen clocksource to use it and thus speeding up xen_clocksource_read() callers (i.e. sched_clock()) Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08x86/pvclock: add setter for pvclock_pvti_cpu0_vaJoao Martins5-17/+27
Right now there is only a pvclock_pvti_cpu0_va() which is defined on kvmclock since: commit dac16fba6fc5 ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap") The only user of this interface so far is kvm. This commit adds a setter function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which is a more generic place to have it; and would allow other PV clocksources to use it, such as Xen. While moving pvclock_pvti_cpu0_va into pvclock, rename also this function to pvclock_get_pvti_cpu0_va (including its call sites) to be symmetric with the setter (pvclock_set_pvti_cpu0_va). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08ptp_kvm: probe for kvm guest availabilityJoao Martins1-0/+3
In the event of moving pvclock_pvti_cpu0_va() definition to common pvclock code, this function would return a value on non KVM guests. Later on this would fail with a GPF on ptp_kvm_init when running on a Xen guest. Therefore, ptp_kvm_init() should check whether it is running in a KVM guest. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08xen/privcmd: remove unused variable pageidxColin Ian King1-3/+0
Variable pageidx is assigned a value but it is never read, hence it is redundant and can be removed. Cleans up clang warning: drivers/xen/privcmd.c:199:2: warning: Value stored to 'pageidx' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-06xen: select grant interface versionJuergen Gross1-2/+33
Grant v2 will be needed in cases where a frame number in the grant table can exceed 32 bits. For PV guests this is a host feature, while for HVM guests this is a guest feature. So select grant v2 in case frame numbers can be larger than 32 bits and grant v1 else. For testing purposes add a way to specify the grant interface version via a boot parameter. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-06xen: update arch/x86/include/asm/xen/cpuid.hJuergen Gross1-10/+32
Update arch/x86/include/asm/xen/cpuid.h from the Xen tree to get newest definitions. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-06xen: add grant interface version dependent constants to gnttab_opsJuergen Gross1-30/+43
Instead of having multiple variables with constants like grant_table_version or grefs_per_grant_frame add those to struct gnttab_ops and access them just via the gnttab_interface pointer. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-06xen: limit grant v2 interface to the v1 functionalityJuergen Gross2-175/+0
As there is currently no user for sub-page grants or transient grants remove that functionality. This at once makes it possible to switch from grant v2 to grant v1 without restrictions, as there is no loss of functionality other than the limited frame number width related to the switch. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-06xen: re-introduce support for grant v2 interfaceJuergen Gross4-13/+398
The grant v2 support was removed from the kernel with commit 438b33c7145ca8a5131a30c36d8f59bce119a19a ("xen/grant-table: remove support for V2 tables") as the higher memory footprint of v2 grants resulted in less grants being possible for a kernel compared to the v1 grant interface. As machines with more than 16TB of memory are expected to be more common in the near future support of grant v2 is mandatory in order to be able to run a Xen pv domain at any memory location. So re-add grant v2 support basically by reverting above commit. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen: support priv-mapping in an HVM tools domainPaul Durrant2-2/+36
If the domain has XENFEAT_auto_translated_physmap then use of the PV- specific HYPERVISOR_mmu_update hypercall is clearly incorrect. This patch adds checks in xen_remap_domain_gfn_array() and xen_unmap_domain_gfn_array() which call through to the approprate xlate_mmu function if the feature is present. A check is also added to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this should not be used in an HVM tools domain. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen/pvcalls: remove redundant check for irq >= 0Colin Ian King1-3/+1
This is a moot point, but irq is always less than zero at the label out_error, so the check for irq >= 0 is redundant and can be removed. Detected by CoverityScan, CID#1460371 ("Logically dead code") Fixes: cb1c7d9bbc87 ("xen/pvcalls: implement connect command") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen/pvcalls: fix unsigned less than zero error checkColin Ian King1-4/+3
The check on bedata->ref is never true because ref is an unsigned integer. Fix this by assigning signed int ret to the return of the call to gnttab_claim_grant_reference so the -ve return can be checked. Detected by CoverityScan, CID#1460358 ("Unsigned compared against 0") Fixes: 219681909913 ("xen/pvcalls: connect to the backend") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen/time: Return -ENODEV from xen_get_wallclock()Boris Ostrovsky1-1/+1
For any other error sync_cmos_clock() will reschedule itself every second or so, for no good reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen/pvcalls-front: mark expected switch fall-throughGustavo A. R. Silva1-1/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed the "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-03xen: xenbus_probe_frontend: mark expected switch fall-throughsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 146562 Addresses-Coverity-ID: 146563 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-02xen/time: do not decrease steal time after live migration on xenDongli Zhang3-7/+73
After guest live migration on xen, steal time in /proc/stat (cpustat[CPUTIME_STEAL]) might decrease because steal returned by xen_steal_lock() might be less than this_rq()->prev_steal_time which is derived from previous return value of xen_steal_clock(). For instance, steal time of each vcpu is 335 before live migration. cpu 198 0 368 200064 1962 0 0 1340 0 0 cpu0 38 0 81 50063 492 0 0 335 0 0 cpu1 65 0 97 49763 634 0 0 335 0 0 cpu2 38 0 81 50098 462 0 0 335 0 0 cpu3 56 0 107 50138 374 0 0 335 0 0 After live migration, steal time is reduced to 312. cpu 200 0 370 200330 1971 0 0 1248 0 0 cpu0 38 0 82 50123 500 0 0 312 0 0 cpu1 65 0 97 49832 634 0 0 312 0 0 cpu2 39 0 82 50167 462 0 0 312 0 0 cpu3 56 0 107 50207 374 0 0 312 0 0 Since runstate times are cumulative and cleared during xen live migration by xen hypervisor, the idea of this patch is to accumulate runstate times to global percpu variables before live migration suspend. Once guest VM is resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new runstate times and previously accumulated times stored in global percpu variables. Comment above HYPERVISOR_suspend() has been removed as it is inaccurate: the call can return an error code (e.g., possibly -EPERM in the future). Similar and more severe issue would impact prior linux 4.8-4.10 as discussed by Michael Las at https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest, which would overflow steal time and lead to 100% st usage in top command for linux 4.8-4.10. A backport of this patch would fix that issue. [boris: added linux/slab.h to driver/xen/time.c, slightly reformatted commit message] References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen: support 52 bit physical addresses in pv guestsJuergen Gross2-3/+12
Physical addresses on processors supporting 5 level paging can be up to 52 bits wide. For a Xen pv guest running on such a machine those physical addresses have to be supported in order to be able to use any memory on the machine even if the guest itself does not support 5 level paging. So when reading/writing a MFN from/to a pte don't use the kernel's PTE_PFN_MASK but a new XEN_PTE_MFN_MASK allowing full 40 bit wide MFNs. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen: introduce a Kconfig option to enable the pvcalls frontendStefano Stabellini2-0/+12
Also add pvcalls-front to the Makefile. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement release commandStefano Stabellini2-0/+99
Send PVCALLS_RELEASE to the backend and wait for a reply. Take both in_mutex and out_mutex to avoid concurrent accesses. Then, free the socket. For passive sockets, check whether we have already pre-allocated an active socket for the purpose of being accepted. If so, free that as well. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement poll commandStefano Stabellini2-9/+138
For active sockets, check the indexes and use the inflight_conn_req waitqueue to wait. For passive sockets if an accept is outstanding (PVCALLS_FLAG_ACCEPT_INFLIGHT), check if it has been answered by looking at bedata->rsp[req_id]. If so, return POLLIN. Otherwise use the inflight_accept_req waitqueue. If no accepts are inflight, send PVCALLS_POLL to the backend. If we have outstanding POLL requests awaiting for a response use the inflight_req waitqueue: inflight_req is awaken when a new response is received; on wakeup we check whether the POLL response is arrived by looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from pvcalls_front_event_handler, if the response was for a POLL command. In pvcalls_front_event_handler, get the struct sock_mapping from the poll id (we previously converted struct sock_mapping* to uintptr_t and used it as id). Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement recvmsgStefano Stabellini2-0/+115
Implement recvmsg by copying data from the "in" ring. If not enough data is available and the recvmsg call is blocking, then wait on the inflight_conn_req waitqueue. Take the active socket in_mutex so that only one function can access the ring at any given time. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement sendmsgStefano Stabellini2-0/+124
Send data to an active socket by copying data to the "out" ring. Take the active socket out_mutex so that only one function can access the ring at any given time. If not enough room is available on the ring, rather than returning immediately or sleep-waiting, spin for up to 5000 cycles. This small optimization turns out to improve performance significantly. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement accept commandStefano Stabellini2-0/+148
Introduce a waitqueue to allow only one outstanding accept command at any given time and to implement polling on the passive socket. Introduce a flags field to keep track of in-flight accept and poll commands. Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make sure that only one accept command is executed at any given time by setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the inflight_accept_req waitqueue. Convert the new struct sock_mapping pointer into an uintptr_t and use it as id for the new socket to pass to the backend. Check if the accept call is non-blocking: in that case after sending the ACCEPT command to the backend store the sock_mapping pointer of the new struct and the inflight req_id then return -EAGAIN (which will respond only when there is something to accept). Next time accept is called, we'll check if the ACCEPT command has been answered, if so we'll pick up where we left off, otherwise we return -EAGAIN again. Note that, differently from the other commands, we can use wait_event_interruptible (instead of wait_event) in the case of accept as we are able to track the req_id of the ACCEPT response that we are waiting. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement listen commandStefano Stabellini2-0/+58
Send PVCALLS_LISTEN to the backend. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement bind commandStefano Stabellini2-0/+69
Send PVCALLS_BIND to the backend. Introduce a new structure, part of struct sock_mapping, to store information specific to passive sockets. Introduce a status field to keep track of the status of the passive socket. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement connect commandStefano Stabellini2-0/+160
Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for the active socket. Introduce fields in struct sock_mapping to keep track of active sockets. Introduce a waitqueue to allow the frontend to wait on data coming from the backend on the active socket (recvmsg command). Two mutexes (one of reads and one for writes) will be used to protect the active socket in and out rings from concurrent accesses. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement socket command and handle eventsStefano Stabellini2-0/+139
Send a PVCALLS_SOCKET command to the backend, use the masked req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0 and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array ready for the response, and there cannot be two outstanding responses with the same req_id. Wait for the response by waiting on the inflight_req waitqueue and check for the req_id field in rsp[req_id]. Use atomic accesses and barriers to read the field. Note that the barriers are simple smp barriers (as opposed to virt barriers) because they are for internal frontend synchronization, not frontend<->backend communication. Once a response is received, clear the corresponding rsp slot by setting req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid only from the frontend point of view. It is not part of the PVCalls protocol. pvcalls_front_event_handler is in charge of copying responses from the ring to the appropriate rsp slot. It is done by copying the body of the response first, then by copying req_id atomically. After the copies, wake up anybody waiting on waitqueue. socket_lock protects accesses to the ring. Convert the pointer to sock_mapping into an uintptr_t and use it as id for the new socket to pass to the backend. The struct will be fully initialized later on connect or bind. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. This way, we can easily get the struct sock_mapping from the struct socket. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: connect to the backendStefano Stabellini1-0/+132
Implement the probe function for the pvcalls frontend. Read the supported versions, max-page-order and function-calls nodes from xenstore. Only one frontend<->backend connection is supported at any given time for a guest. Store the active frontend device to a static pointer. Introduce a stub functions for the event handler. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-10-31xen/pvcalls: implement frontend disconnectStefano Stabellini1-0/+71
Introduce a data structure named pvcalls_bedata. It contains pointers to the command ring, the event channel, a list of active sockets and a list of passive sockets. Lists accesses are protected by a spin_lock. Introduce a waitqueue to allow waiting for a response on commands sent to the backend. Introduce an array of struct xen_pvcalls_response to store commands responses. Introduce a new struct sock_mapping to keep track of sockets. In this patch the struct sock_mapping is minimal, the fields will be added by the next patches. pvcalls_refcount is used to keep count of the outstanding pvcalls users. Only remove connections once the refcount is zero. Implement pvcalls frontend removal function. Go through the list of active and passive sockets and free them all, one at a time. Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>