summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
*---. Merge branches 'stable/drivers-3.2', 'stable/drivers.bugfixes-3.2' and ↵Linus Torvalds2011-10-2523-205/+507
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xenbus: don't rely on xen_initial_domain to detect local xenstore xenbus: Fix loopback event channel assuming domain 0 xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain' xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive * 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pciback: Check if the device is found instead of blindly assuming so. xen/pciback: Do not dereference psdev during printk when it is NULL. xen: remove XEN_PLATFORM_PCI config option xen: XEN_PVHVM depends on PCI xen/pciback: double lock typo xen/pciback: use mutex rather than spinlock in vpci backend xen/pciback: Use mutexes when working with Xenbus state transitions. xen/pciback: miscellaneous adjustments xen/pciback: use mutex rather than spinlock in passthrough backend xen/pciback: use resource_size() * 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pci: support multi-segment systems xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs. xen/pci: make bus notifier handler return sane values xen-swiotlb: fix printk and panic args xen-swiotlb: Fix wrong panic. xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB xen-pcifront: Update warning comment to use 'e820_host' option.
| | | * xen/pci: support multi-segment systemsJan Beulich2011-09-223-14/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the hypercall interface changes are in -unstable, make the kernel side code not ignore the segment (aka domain) number anymore (which results in pretty odd behavior on such systems). Rather, if only the old interfaces are available, don't call them for devices on non-zero segments at all. Signed-off-by: Jan Beulich <jbeulich@suse.com> [v1: Edited git description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.Konrad Rzeszutek Wilk2011-08-261-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The process to swizzle a Machine Frame Number (MFN) is not always necessary. Especially if we know that we actually do not have to do it. In this patch we check the MFN against the device's coherent DMA mask and if the requested page(s) are contingous. If it all checks out we will just return the bus addr without doing the memory swizzle. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen/pci: make bus notifier handler return sane valuesJan Beulich2011-08-261-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Notifier functions are expected to return NOTIFY_* codes, not -E... ones. In particular, since the respective hypercalls failing is not fatal to the operation of the Dom0 kernel, it must be avoided to return negative values here as those would make it appear as if NOTIFY_STOP_MASK wa set, suppressing further notification calls to other interested parties (which is also why we don't want to use notifier_from_errno() here). While at it, also notify the user of a failed hypercall. Signed-off-by: Jan Beulich <jbeulich@novell.com> [v1: Added dev_err and the disable MSI/MSI-X call] [v2: Removed the disable MSI/MSI-X call] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen-swiotlb: fix printk and panic argsRandy Dunlap2011-08-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix printk() and panic() args [swap them] to fix build warnings: drivers/xen/swiotlb-xen.c:201: warning: format '%s' expects type 'char *', but argument 2 has type 'int' drivers/xen/swiotlb-xen.c:201: warning: format '%d' expects type 'int', but argument 3 has type 'char *' drivers/xen/swiotlb-xen.c:202: warning: format '%s' expects type 'char *', but argument 2 has type 'int' drivers/xen/swiotlb-xen.c:202: warning: format '%d' expects type 'int', but argument 3 has type 'char *' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen-swiotlb: Fix wrong panic.Konrad Rzeszutek Wilk2011-08-261-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Propagate the baremetal git commit "swiotlb: fix wrong panic" (fba99fa38b023224680308a482e12a0eca87e4e1) in the Xen-SWIOTLB version. wherein swiotlb's map_page wrongly calls panic() when it can't find a buffer fit for device's dma mask. It should return an error instead. Devices with an odd dma mask (i.e. under 4G) like b44 network card hit this bug (the system crashes): http://marc.info/?l=linux-kernel&m=129648943830106&w=2 If xen-swiotlb returns an error, b44 driver can use the own bouncing mechanism. CC: stable@kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen-swiotlb: Retry up three times to allocate Xen-SWIOTLBKonrad Rzeszutek Wilk2011-08-261-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can fail seting up Xen-SWIOTLB if: - The host does not have enough contiguous DMA32 memory available (can happen on a machine that has fragmented memory from starting, stopping many guests). - Not enough low memory (almost never happens). We retry allocating and exchanging the swath of contiguous memory up to three times. Each time we decrease the amount we need - the minimum being of 2MB. If we compleltly fail, we will print the reason for failure on the Xen console on top of doing it to earlyprintk=xen console. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen-pcifront: Update warning comment to use 'e820_host' option.Konrad Rzeszutek Wilk2011-08-261-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With Xen changeset 23428 "libxl: Add 'e820_host' option to config file" the E820 as seen from the host can now be passed into the guest. This means that a PV guest can now: - Use the correct PCI I/O gap. Before these patches, Linux guest would boot up and would tell: [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:c0000000) while in actuality the PCI I/O gap should have been: [ 0.000000] Allocating PCI resources starting at b0000000 (gap: b0000000:4c000000) - The PV domain with PCI devices was limited to 3GB. It now can be booted with 4GB, 8GB, or whatever number you want. The PCI devices will now _not_ conflict with System RAM. Meaning the drivers can load. CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: linux-pci@vger.kernel.org CC: stable@kernel.org [v2: Made the string less broken up. Suggested by Joe Perches] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: Check if the device is found instead of blindly assuming so.Konrad Rzeszutek Wilk2011-10-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Just in case it is not found, don't try to dereference it. [v1: Added WARN_ON, suggested by Jan Beulich <JBeulich@suse.com>] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: Do not dereference psdev during printk when it is NULL.Konrad Rzeszutek Wilk2011-10-191-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. instead use BUG_ON() as all the callers of the kill_domain_by_device check for psdev. Suggested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen: remove XEN_PLATFORM_PCI config optionStefano Stabellini2011-09-292-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen PVHVM needs xen-platform-pci, on the other hand xen-platform-pci is useless in any other cases. Therefore remove the XEN_PLATFORM_PCI config option and compile xen-platform-pci built-in if XEN_PVHVM is selected. Changes to v1: - remove xen-platform-pci.o and just use platform-pci.o since it is not externally visible anymore. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen: XEN_PVHVM depends on PCIStefano Stabellini2011-09-291-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen PV on HVM guests require PCI support because they need the xen-platform-pci driver in order to initialize xenbus. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: double lock typoDan Carpenter2011-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We called mutex_lock() twice instead of unlocking. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: use mutex rather than spinlock in vpci backendKonrad Rzeszutek Wilk2011-09-221-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the "xen/pciback: use mutex rather than spinlock in passthrough backend" this patch converts the vpci backend to use a mutex instead of a spinlock. Note that the code taking the lock won't ever get called from non-sleepable context Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: Use mutexes when working with Xenbus state transitions.Konrad Rzeszutek Wilk2011-09-222-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The caller that orchestrates the state changes is xenwatch_thread and it takes a mutex. In our processing of Xenbus states we can take the luxery of going to sleep on a mutex, so lets do that and also fix this bug: BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 32, name: xenwatch 2 locks held by xenwatch/32: #0: (xenwatch_mutex){......}, at: [<ffffffff813856ab>] xenwatch_thread+0x4b/0x180 #1: (&(&pdev->dev_lock)->rlock){......}, at: [<ffffffff8138f05b>] xen_pcibk_disconnect+0x1b/0x80 Pid: 32, comm: xenwatch Not tainted 3.1.0-rc6-00015-g3ce340d #2 Call Trace: [<ffffffff810892b2>] __might_sleep+0x102/0x130 [<ffffffff8163b90f>] mutex_lock_nested+0x2f/0x50 [<ffffffff81382c1c>] unbind_from_irq+0x2c/0x1b0 [<ffffffff8110da66>] ? free_irq+0x56/0xb0 [<ffffffff81382dbc>] unbind_from_irqhandler+0x1c/0x30 [<ffffffff8138f06b>] xen_pcibk_disconnect+0x2b/0x80 [<ffffffff81390348>] xen_pcibk_frontend_changed+0xe8/0x140 [<ffffffff81387ac2>] xenbus_otherend_changed+0xd2/0x150 [<ffffffff810895c1>] ? get_parent_ip+0x11/0x50 [<ffffffff81387de0>] frontend_changed+0x10/0x20 [<ffffffff81385712>] xenwatch_thread+0xb2/0x180 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: miscellaneous adjustmentsJan Beulich2011-09-229-44/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a minor bugfix and a set of small cleanups; as it is not clear whether this needs splitting into pieces (and if so, at what granularity), it is a single combined patch. - add a missing return statement to an error path in kill_domain_by_device() - use pci_is_enabled() rather than raw atomic_read() - remove a bogus attempt to zero-terminate an already zero-terminated string - #define DRV_NAME once uniformly in the shared local header - make DRIVER_ATTR() variables static - eliminate a pointless use of list_for_each_entry_safe() - add MODULE_ALIAS() - a little bit of constification - adjust a few messages - remove stray semicolons from inline function definitions Signed-off-by: Jan Beulich <jbeulich@suse.com> [v1: Dropped the resource_size fix, altered the description] [v2: Fixed cleanpatch.pl comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: use mutex rather than spinlock in passthrough backendJan Beulich2011-09-211-19/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To accommodate the call to the callback function from __xen_pcibk_publish_pci_roots(), which so far dropped and the re- acquired the lock without checking that the list didn't actually change, convert the code to use a mutex instead (observing that the code taking the lock won't ever get called from non-sleepable context). As a result, drop the bogus use of list_for_each_entry_safe() and remove the inappropriate dropping of the lock. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen/pciback: use resource_size()Thomas Meyer2011-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use resource_size function on resource object instead of explicit computation. The semantic patch that makes this output is available in scripts/coccinelle/api/resource_size.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xenbus: don't rely on xen_initial_domain to detect local xenstoreDaniel De Graaf2011-10-141-48/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xenstore daemon does not have to run in the xen initial domain; however, Linux currently uses xen_initial_domain to test if a loopback event channel should be used instead of the event channel provided in Xen's start_info structure. Instead, if the event channel passed in the start_info structure is not valid, assume that this domain will run xenstored locally and set up the event channel. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xenbus: Fix loopback event channel assuming domain 0Daniel De Graaf2011-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xenbus event channel established in xenbus_init is intended to be a loopback channel, but the remote domain was hardcoded to 0; this will cause the channel to be unusable when xenstore is not being run in domain 0. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'Konrad Rzeszutek Wilk2011-09-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Randy found a compile error when using make randconfig to trigger drivers/xen/xenbus/xenbus_xs.c:909:2: error: implicit declaration of function 'xen_hvm_domain' it is unclear which of the CONFIG options triggered this. This patch fixes the error. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernelOlaf Hering2011-09-222-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new xs_reset_watches function to shutdown watches from old kernel after kexec boot. The old kernel does not unregister all watches in the shutdown path. They are still active, the double registration can not be detected by the new kernel. When the watches fire, unexpected events will arrive and the xenwatch thread will crash (jumps to NULL). An orderly reboot of a hvm guest will destroy the entire guest with all its resources (including the watches) before it is rebuilt from scratch, so the missing unregister is not an issue in that case. With this change the xenstored is instructed to wipe all active watches for the guest. However, a patch for xenstored is required so that it accepts the XS_RESET_WATCHES request from a client (see changeset 23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored the registration of watches will fail and some features of a PVonHVM guest are not available. The guest is still able to boot, but repeated kexec boots will fail. [v5: use xs_single instead of passing a dummy string to xs_talkv] [v4: ignore -EEXIST in xs_reset_watches] [v3: use XS_RESET_WATCHES instead of XS_INTRODUCE] [v2: move all code which deals with XS_INTRODUCE into xs_introduce() (based on feedback from Ian Campbell); remove casts from kvec assignment] Signed-off-by: Olaf Hering <olaf@aepfle.de> [v1: Redid the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstableOlaf Hering2011-09-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update include/xen/interface/io/xs_wire.h from xen-unstable. Now entries in xsd_sockmsg_type were added. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernelOlaf Hering2011-09-012-1/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After triggering a crash dump in a HVM guest, the PV backend drivers will remain in Connected state. When the kdump kernel starts the PV drivers will skip such devices. As a result, no root device is found and the vmcore cant be saved. A similar situation happens after a kexec boot, here the devices will be in the Closed state. With this change all frontend devices with state XenbusStateConnected or XenbusStateClosed will be reset by changing the state file to Closing -> Closed -> Initializing. This will trigger a disconnect in the backend drivers. Now the frontend drivers will find the backend drivers in state Initwait and can connect. Signed-off-by: Olaf Hering <olaf@aepfle.de> [v2: - add timeout when waiting for backend state change (based on feedback from Ian Campell) - extent printk message to include backend string - add comment to fall-through case in xenbus_reset_frontend] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm kexec: rebind virqs to existing eventchannel portsOlaf Hering2011-09-011-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During a kexec boot some virqs such as timer and debugirq were already registered by the old kernel. The hypervisor will return -EEXISTS from the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq() triggers. Catch the -EEXISTS error and loop through all possible ports to find what port belongs to the virq/cpu combo. Signed-off-by: Olaf Hering <olaf@aepfle.de> [v2: - use NR_EVENT_CHANNELS instead of private MAX_EVTCHNS] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch ↵Olaf Hering2011-09-011-2/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events arrive During repeated kexec boots xenwatch_thread() can crash because xenbus_watch->callback is cleared by xenbus_watch_path() if a node/token combo for a new watch happens to match an already registered watch from an old kernel. In this case xs_watch returns -EEXISTS, then register_xenbus_watch() does not remove the to-be-registered watch from the list of active watches but returns the -EEXISTS to the caller anyway. Because the watch is still active in xenstored it will cause an event which will arrive in the new kernel. process_msg() will find the encapsulated struct xenbus_watch in its list of registered watches and puts the "empty" watch handle in the queue for xenwatch_thread(). xenwatch_thread() then calls ->callback which was cleared earlier by xenbus_watch_path(). To prevent that crash in a guest running on an old xen toolstack remove the special -EEXIST handling. v2: - remove the EEXIST handing in register_xenbus_watch() instead of checking for ->callback in process_msg() Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Olaf Hering <olaf@aepfle.de>
| | |
| \ \
*-. \ \ Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of ↵Linus Torvalds2011-10-2515-111/+238
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m/debugfs: Make type_name more obvious. xen/p2m/debugfs: Fix potential pointer exception. xen/enlighten: Fix compile warnings and set cx to known value. xen/xenbus: Remove the unnecessary check. xen/irq: If we fail during msi_capability_init return proper error code. xen/events: Don't check the info for NULL as it is already done. xen/events: BUG() when we can't allocate our event->irq array. * 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Fix selfballooning and ensure it doesn't go too far xen/gntdev: Fix sleep-inside-spinlock xen: modify kernel mappings corresponding to granted pages xen: add an "highmem" parameter to alloc_xenballooned_pages xen/p2m: Use SetPagePrivate and its friends for M2P overrides. xen/p2m: Make debug/xen/mmu/p2m visible again. Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
| | * | | xen: Fix selfballooning and ensure it doesn't go too farDan Magenheimer2011-10-141-4/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The balloon driver's "current_pages" is very different from totalram_pages. Self-ballooning needs to be driven by the latter. Also, Committed_AS doesn't account for pages used by the kernel so: 1) Add totalreserve_pages to Committed_AS for the normal target. 2) Enforce a floor for when there are little or no user-space threads using memory (e.g. single-user mode) to avoid OOMs. The floor function includes a "min_usable_mb" tuneable in case we discover later that the floor function is still too aggressive in some workloads, though likely it will not be needed. Changes since version 4: - change floor calculation so that it is not as aggressive; this version uses a piecewise linear function similar to minimum_target in the 2.6.18 balloon driver, but modified to add to totalreserve_pages instead of subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels because the kernel has expanded over time - change safety_margin to min_usable_mb and comment on its use - since committed_as does NOT include kernel space (and other reserved pages), totalreserve_pages is now added to committed_as. The result is less aggressive self-ballooning, but theoretically more appropriate. Changes since version 3: - missing include causes compile problem when CONFIG_FRONTSWAP is disabled - add comments after includes Changes since version 2: - missing include causes compile problem only on 32-bit Changes since version 1: - tuneable safety margin added [v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs] [v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled] [v3: guru.anbalagane@oracle.com: fix 32-bit compile] [v2: konrad.wilk@oracle.com: make safety margin tuneable] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> [v1: Altered description and added an extra include] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/gntdev: Fix sleep-inside-spinlockDaniel De Graaf2011-10-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm 1 lock held by qemu-dm/3256: #0: (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5 Pid: 3256, comm: qemu-dm Tainted: G W 3.1.0-rc8+ #5 Call Trace: [<ffffffff81054594>] __might_sleep+0x131/0x135 [<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45 [<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1 [<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb [<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a [<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5 [<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5 [<ffffffff81007d62>] ? check_events+0x12/0x20 [<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7 [<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [<ffffffff8109168b>] ? lock_release+0x21c/0x229 [<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32 [<ffffffff81143452>] sys_ioctl+0x47/0x6a [<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b gntdev_put_map tries to acquire a mutex when freeing pages back to the xenballoon pool, so it cannot be called with a spinlock held. In gntdev_release, the spinlock is not needed as we are freeing the structure later; in the ioctl, only the list manipulation needs to be under the lock. Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen: modify kernel mappings corresponding to granted pagesStefano Stabellini2011-09-296-16/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen: add an "highmem" parameter to alloc_xenballooned_pagesStefano Stabellini2011-09-293-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an highmem parameter to alloc_xenballooned_pages, to allow callers to request lowmem or highmem pages. Fix the code style of free_xenballooned_pages' prototype. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: Use SetPagePrivate and its friends for M2P overrides.Konrad Rzeszutek Wilk2011-09-241-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use the page->private field and hence should use the proper macros and set proper bits. Also WARN_ON in case somebody tries to overwrite our data. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: Make debug/xen/mmu/p2m visible again.Konrad Rzeszutek Wilk2011-09-243-20/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We dropped a lot of the MMU debugfs in favour of using tracing API - but there is one which just provides mostly static information that was made invisible by this change. Bring it back. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."Konrad Rzeszutek Wilk2011-08-092-46/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don' use it anymore and there are more false positives. This reverts commit fc25151d9ac7d809239fe68de0a1490b504bb94a. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/p2m/debugfs: Make type_name more obvious.Konrad Rzeszutek Wilk2011-10-191-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per Ian Campbell suggestion to defend against future breakage in case we expand the P2M values, incorporate the defines in the string array. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/p2m/debugfs: Fix potential pointer exception.Konrad Rzeszutek Wilk2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We could be referencing the last + 1 element of level_name[] array which would cause a pointer exception, because of the initial setup of lvl=4. [v1: No need to do this for type_name, pointed out by Ian Campbell] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/enlighten: Fix compile warnings and set cx to known value.Konrad Rzeszutek Wilk2011-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get: linux/arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’: linux/arch/x86/xen/enlighten.c:226: warning: ‘cx’ may be used uninitialized in this function linux/arch/x86/xen/enlighten.c:240: note: ‘cx’ was declared here and the cx is really not set but passed in the xen_cpuid instruction which masks the value with returned masked_ecx from cpuid. This can potentially lead to invalid data being stored in cx. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/xenbus: Remove the unnecessary check.Konrad Rzeszutek Wilk2011-10-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. we check whether 'xdev' is NULL - but there is no need for it as the 'dev' check is done before. The 'dev' is embedded in the 'xdev' so having xdev != NULL with dev being being checked is not going to happen. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/irq: If we fail during msi_capability_init return proper error code.Konrad Rzeszutek Wilk2011-10-192-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three different modes: PV, HVM, and initial domain 0. In all the cases we would return -1 for failure instead of a proper error code. Fix this by propagating the error code from the generic IRQ code. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/events: Don't check the info for NULL as it is already done.Konrad Rzeszutek Wilk2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The list operation checks whether the 'info' structure that is retrieved from the list is NULL (otherwise it would not been able to retrieve it). This check is not neccessary. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/events: BUG() when we can't allocate our event->irq array.Konrad Rzeszutek Wilk2011-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case we can't allocate we are doomed. We should BUG_ON instead of trying to dereference it later on. Acked-by: Ian Campbell <ian.campbell@citrix.com> [v1: Use BUG_ON instead of BUG] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | Merge branch 'stable/e820-3.2' of ↵Linus Torvalds2011-10-253-181/+165
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/e820-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: release all pages within 1-1 p2m mappings xen: allow extra memory to be in multiple regions xen: allow balloon driver to use more than one memory region xen/balloon: simplify test for the end of usable RAM xen/balloon: account for pages released during memory setup
| * | | | | xen: release all pages within 1-1 p2m mappingsDavid Vrabel2011-09-291-75/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xen_memory_setup() all reserved regions and gaps are set to an identity (1-1) p2m mapping. If an available page has a PFN within one of these 1-1 mappings it will become inaccessible (as it MFN is lost) so release them before setting up the mapping. This can make an additional 256 MiB or more of RAM available (depending on the size of the reserved regions in the memory map) if the initial pages overlap with reserved regions. The 1:1 p2m mappings are also extended to cover partial pages. This fixes an issue with (for example) systems with a BIOS that puts the DMI tables in a reserved region that begins on a non-page boundary. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | xen: allow extra memory to be in multiple regionsDavid Vrabel2011-09-291-96/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the extra memory (used by the balloon driver) to be in multiple regions (typically two regions, one for low memory and one for high memory). This allows the balloon driver to increase the number of available low pages (if the initial number if pages is small). As a side effect, the algorithm for building the e820 memory map is simpler and more obviously correct as the map supplied by the hypervisor is (almost) used as is (in particular, all reserved regions and gaps are preserved). Only RAM regions are altered and RAM regions above max_pfn + extra_pages are marked as unused (the region is split in two if necessary). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | xen: allow balloon driver to use more than one memory regionDavid Vrabel2011-09-293-28/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the xen balloon driver to populate its list of extra pages from more than one region of memory. This will allow platforms to provide (for example) a region of low memory and a region of high memory. The maximum possible number of extra regions is 128 (== E820MAX) which is quite large so xen_extra_mem is placed in __initdata. This is safe as both xen_memory_setup() and balloon_init() are in __init. The balloon regions themselves are not altered (i.e., there is still only the one region). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | xen/balloon: simplify test for the end of usable RAMDavid Vrabel2011-09-291-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When initializing the balloon only max_pfn needs to be checked (max_pfn will always be <= e820_end_of_ram_pfn()) and improve the confusing comment. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | xen/balloon: account for pages released during memory setupDavid Vrabel2011-09-293-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xen_memory_setup() pages that occur in gaps in the memory map are released back to Xen. This reduces the domain's current page count in the hypervisor. The Xen balloon driver does not correctly decrease its initial current_pages count to reflect this. If 'delta' pages are released and the target is adjusted the resulting reservation is always 'delta' less than the requested target. This affects dom0 if the initial allocation of pages overlaps the PCI memory region but won't affect most domU guests that have been setup with pseudo-physical memory maps that don't have gaps. Fix this by accouting for the released pages when starting the balloon driver. If the domain's targets are managed by xapi, the domain may eventually run out of memory and die because xapi currently gets its target calculations wrong and whenever it is restarted it always reduces the target by 'delta'. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | x86: Fix compilation bug in kprobes' twobyte_is_boostableJosh Stone2011-10-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] non-const, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, without the const on twobyte_is_boostable: $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Linux 3.1v3.1Linus Torvalds2011-10-241-1/+1
| | | | | |
* | | | | | Merge git://git.infradead.org/iommu-2.6Linus Torvalds2011-10-242-31/+46
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/iommu-2.6: intel-iommu: fix superpage support in pfn_to_dma_pte() intel-iommu: set iommu_superpage on VM domains to lowest common denominator intel-iommu: fix return value of iommu_unmap() API MAINTAINERS: Update VT-d entry for drivers/pci -> drivers/iommu move intel-iommu: Export a flag indicating that the IOMMU is used for iGFX. intel-iommu: Workaround IOTLB hang on Ironlake GPU intel-iommu: Fix AB-BA lockdep report