| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector
support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial
domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of
reusing the existing one."
Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.
* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not map the same GSI twice in PVHVM guests.
hvc_xen: NULL dereference on allocation failure
xen: Add selfballoning memory reservation tunable.
xenbus: Add support for xenbus backend in stub domain
xen/smp: unbind irqworkX when unplugging vCPUs.
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/gnttab: add deferred freeing logic
debugfs: Add support to print u32 array in debugfs
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
PV on HVM guests map GSIs into event channels. At restore time the
event channels are resumed by restore_pirqs.
Device drivers might try to register the same GSI again through ACPI at
restore time, but the GSI has already been mapped and bound by
restore_pirqs. This patch detects these situations and avoids
mapping the same GSI multiple times.
Without this patch we get:
(XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped
and waste a pirq.
CC: stable@kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If kzalloc() returns a NULL here, we pass a NULL to
xencons_disconnect_backend() which will cause an Oops.
Also I removed the __GFP_ZERO while I was at it since kzalloc() implies
__GFP_ZERO.
CC: stable@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, the memory target in the Xen selfballooning driver is mainly
driven by the value of "Committed_AS". However, there are cases in
which it is desirable to assign additional memory to be available for
the kernel, e.g. for local caches (which are not covered by cleancache),
e.g. dcache and inode caches.
This adds an additional tunable in the selfballooning driver (accessible
via sysfs) which allows the user to specify an additional constant
amount of memory to be reserved by the selfballoning driver for the
local domain.
Signed-off-by: Jana Saout <jana@saout.de>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add an ioctl to the /dev/xen/xenbus_backend device allowing the xenbus
backend to be started after the kernel has booted. This allows xenstore
to run in a different domain from the dom0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The git commit 1ff2b0c303698e486f1e0886b4d9876200ef8ca5
"xen: implement IRQ_WORK_VECTOR handler" added the functionality
to have a per-cpu "irqworkX" for the IPI APIC functionality.
However it missed the unbind when a vCPU is unplugged resulting
in an orphaned per-cpu interrupt line for unplugged vCPU:
30: 216 0 xen-dyn-event hvc_console
31: 810 4 xen-dyn-event eth0
32: 29 0 xen-dyn-event blkif
- 36: 0 0 xen-percpu-ipi irqwork2
- 37: 287 0 xen-dyn-event xenbus
+ 36: 287 0 xen-dyn-event xenbus
NMI: 0 0 Non-maskable interrupts
LOC: 0 0 Local timer interrupts
SPU: 0 0 Spurious interrupts
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* stable/autoballoon.v5.2:
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In xen_memory_setup(), if a page that is being released has a VA
mapping this must also be updated. Otherwise, the page will be not
released completely -- it will still be referenced in Xen and won't be
freed util the mapping is removed and this prevents it from being
reallocated at a different PFN.
This was already being done for the ISA memory region in
xen_ident_map_ISA() but on many systems this was omitting a few pages
as many systems marked a few pages below the ISA memory region as
reserved in the e820 map.
This fixes errors such as:
(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
They use the same set of arguments, so it is just the matter
of using the proper hypercall.
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.
The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
Xen: 0000000000000000 - 00000000000a0000 (usable)
Xen: 00000000000a0000 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 0000000080800000 (usable)
The fun comes when a PV guest that is run with a machine E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:
BIOS-provided physical RAM map:
Xen: 0000000000000000 - 000000000009e000 (usable)
Xen: 000000000009ec00 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 0000000020000000 (usable)
Xen: 0000000020000000 - 0000000020200000 (reserved)
Xen: 0000000020200000 - 0000000040000000 (usable)
Xen: 0000000040000000 - 0000000040200000 (reserved)
Xen: 0000000040200000 - 00000000bad80000 (usable)
Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:
Freeing 20000-20200 pfn range: 512 pages freed
Freeing 40000-40200 pfn range: 512 pages freed
Freeing bad80-badf4 pfn range: 116 pages freed
Freeing badf6-bae7f pfn range: 137 pages freed
Freeing bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory
Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtracted by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.
This patch, implements the populate hypercall (XENMEM_populate_physmap)
which increases the the domain with the same amount of pages that
were released.
The other solution (that did not work) was to transplant the MFN in
the P2M tree - the ones that were going to be freed were put in
the E820_RAM regions past the nr_pages. But the modifications to the
M2P array (the other side of creating PTEs) were not carried away.
As the hypervisor is the only one capable of modifying that and the
only two hypercalls that would do this are: the update_va_mapping
(which won't work, as during initial bootup only PFNs up to nr_pages
are mapped in the guest) or via the populate hypercall.
The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.
On a 8GB machine, with various dom0_mem= parameters this is what we get:
no dom0_mem
-Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
+Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
dom0_mem=3G
-Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
+Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
dom0_mem=max:3G
-Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
+Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
And the 'xm list' or 'xl list' now reflect what the dom0_mem=
argument is.
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v2: Use populate hypercall]
[v3: Remove debug printks]
[v4: Simplify code]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Otherwise we can get these meaningless:
Freeing bad80-badf4 pfn range: 0 pages freed
We also can do this for the summary ones - no point of printing
"Set 0 page(s) to 1-1 mapping"
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v1: Extended to the summary printks]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
During early bootup we can't use alloc_page, so to allocate
leaf pages in the P2M we need to use extend_brk. For that
we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
functions to do the job for us. This function follows the
same logic as set_phys_to_machine.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At the start of the function we were checking for idx != 0
and bailing out. And later calling extend_brk if idx != 0.
That is unnecessary so remove that checks.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For identity cases we want to call reserve_brk only on the boundary
conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
to work around identify regions (PCI spaces, gaps in E820) which are not
aligned on 2MB regions.
However for the case were we want to allocate P2M middle leafs at the
early bootup stage, irregardless of this alignment check we need some
means of doing that. For that we provide the new argument.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We are going to be using the early_alloc_p2m (and
early_alloc_p2m_middle) code in follow up patches which
are not related to setting identity pages.
Hence lets move the code out in its own function and
rename them as appropiate.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is a significant performance improvement for the
m2p_override: about 6% using the gntdev device.
Each m2p_add/remove_override call issues a MULTI_grant_table_op and a
__flush_tlb_single if kmap_op != NULL. Batching all the calls together
is a great performance benefit because it means issuing one hypercall
total rather than two hypercall per page.
If paravirt_lazy_mode is set PARAVIRT_LAZY_MMU, all these calls are
going to be batched together, otherwise they are issued one at a time.
Adding arch_enter_lazy_mmu_mode/arch_leave_lazy_mmu_mode around the
m2p_add/remove_override calls forces paravirt_lazy_mode to
PARAVIRT_LAZY_MMU, therefore makes sure that they are always batched.
However it is not safe to call arch_enter_lazy_mmu_mode if we are in
interrupt context or if we are already in PARAVIRT_LAZY_MMU mode, so
check for both conditions before doing so.
Changes in v4:
- rebased on 3.4-rc4: all the m2p_override users call gnttab_unmap_refs
and gnttab_map_refs;
- check whether we are in interrupt context and the lazy_mode we are in
before calling arch_enter/leave_lazy_mmu_mode.
Changes in v3:
- do not call arch_enter/leave_lazy_mmu_mode in xen_blkbk_unmap, that
can be called in interrupt context.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v5: s/int lazy/bool lazy/]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Provide the registration callback to call in the Xen's
ACPI sleep functionality. This means that during S3/S5
we make a hypercall XENPF_enter_acpi_sleep with the
proper PM1A/PM1B registers.
Based of Ke Yu's <ke.yu@intel.com> initial idea.
[ From http://xenbits.xensource.com/linux-2.6.18-xen.hg
change c68699484a65 ]
[v1: Added Copyright and license]
[v2: Added check if PM1A/B the 16-bits MSB contain something. The spec
only uses 16-bits but might have more in future]
Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Map native ipi vector to xen vector.
Implement apic ipi interface with xen_send_IPI_one.
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ben Guthro <ben@guthro.net>
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rather than just leaking pages that can't be freed at the point where
access permission for the backend domain gets revoked, put them on a
list and run a timer to (infrequently) retry freeing them. (This can
particularly happen when unloading a frontend driver when devices are
still present, and the backend still has them in non-closed state or
hasn't finished closing them yet.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move the code from Xen to debugfs to make the code common
for other users as well.
Accked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[v1: Fixed rebase issues]
[v2: Fixed PPC compile issues]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull sparc changes from David S. Miller:
"This has the generic strncpy_from_user() implementation architectures
can now use, which we've been developing on linux-arch over the past
few days.
For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
and the latter of which pointed out an adjustment I needed to make to
sparc's user_addr_max() definition. Linus, you were right, STACK_TOP
was not the right thing to use, even on sparc itself :-)
From Sam Ravnborg, we have a conversion of sparc32 over to the common
alloc_thread_info_node(), since the aspect which originally blocked
our doing so (sun4c) has been removed."
Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix user_addr_max() definition.
lib: Sparc's strncpy_from_user is generic enough, move under lib/
kernel: Move REPEAT_BYTE definition into linux/kernel.h
sparc: Increase portability of strncpy_from_user() implementation.
sparc: Optimize strncpy_from_user() zero byte search.
sparc: Add full proper error handling to strncpy_from_user().
sparc32: use the common implementation of alloc_thread_info_node()
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We need to use TASK_SIZE because for 64-bit tasks the value
of STACK_TOP actually sits in the middle of the address space
so we'll get false-negatives.
Adjust the TASK_SIZE definition on sparc64 to accomodate this,
in the context in which user_addr_max() is used we have the
test_thread_flag() definition available but not the one for
test_tsk_thread_flag().
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To use this, an architecture simply needs to:
1) Provide a user_addr_max() implementation via asm/uaccess.h
2) Add "select GENERIC_STRNCPY_FROM_USER" to their arch Kcnfig
3) Remove the existing strncpy_from_user() implementation and symbol
exports their architecture had.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: David Howells <dhowells@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
And make sure that everything using it explicitly includes
that header file.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Hide details of maximum user address calculation in a new
asm/uaccess.h interface named user_addr_max().
Provide little-endian implementation in find_zero(), which should work
but can probably be improved.
Abstrace alignment check behind IS_UNALIGNED() macro.
Kill double-semicolon, noticed by David Howells.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Compute a mask that will only have 0x80 in the bytes which
had a zero in them. The formula is:
~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)
In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
part, so we can reuse that in the above calculation.
Once we have this mask, we perform divide and conquer to find the
highest 0x80 location.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Linus removed the end-of-address-space hackery from
fs/namei.c:do_getname() so we really have to validate these edge
conditions and cannot cheat any more (as x86 used to as well).
Move to a common C implementation like x86 did. And if both
src and dst are sufficiently aligned we'll do word at a time
copies and checks as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With sun4c removed we can fall-back to the common implementation.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull XFS update from Ben Myers:
- Removal of xfsbufd
- Background CIL flushes have been moved to a workqueue.
- Fix to xfs_check_page_type applicable to filesystems where
blocksize < page size
- Fix for stale data exposure when extsize hints are used.
- A series of xfs_buf cache cleanups.
- Fix for XFS_IOC_ALLOCSP
- Cleanups for includes and removal of xfs_lrw.[ch].
- Moved all busy extent handling to it's own file so that it is easier
to merge with userspace.
- Fix for log mount failure.
- Fix to enable inode reclaim during quotacheck at mount time.
- Fix for delalloc quota accounting.
- Fix for memory reclaim deadlock on agi buffer.
- Fixes for failed writes and to clean up stale delalloc blocks.
- Fix to use GFP_NOFS in blkdev_issue_flush
- SEEK_DATA/SEEK_HOLE support
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits)
xfs: add trace points for log forces
xfs: fix memory reclaim deadlock on agi buffer
xfs: fix delalloc quota accounting on failure
xfs: protect xfs_sync_worker with s_umount semaphore
xfs: introduce SEEK_DATA/SEEK_HOLE support
xfs: make xfs_extent_busy_trim not static
xfs: make XBF_MAPPED the default behaviour
xfs: flush outstanding buffers on log mount failure
xfs: Properly exclude IO type flags from buffer flags
xfs: clean up xfs_bit.h includes
xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
xfs: move xfs_fsb_to_db to xfs_bmap.h
xfs: clean up busy extent naming
xfs: move busy extent handling to it's own file
xfs: move xfsagino_t to xfs_types.h
xfs: use iolock on XFS_IOC_ALLOCSP calls
xfs: kill XBF_DONTBLOCK
xfs: kill xfs_read_buf()
xfs: kill XBF_LOCK
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To enable easy tracing of the location of log forces and the
frequency of them via perf, add a pair of trace points to the log
force functions. This will help debug where excessive log forces
are being issued from by simple perf commands like:
# ~/perf/perf top -e xfs:xfs_log_force -G -U
Which gives this sort of output:
Events: 141 xfs:xfs_log_force
- 100.00% [kernel] [k] xfs_log_force
- xfs_log_force
87.04% xfsaild
kthread
kernel_thread_helper
- 12.87% xfs_buf_lock
_xfs_buf_find
xfs_buf_get
xfs_trans_get_buf
xfs_da_do_buf
xfs_da_get_buf
xfs_dir2_data_init
xfs_dir2_leaf_addname
xfs_dir_createname
xfs_create
xfs_vn_mknod
xfs_vn_create
vfs_create
do_last.isra.41
path_openat
do_filp_open
do_sys_open
sys_open
system_call_fastpath
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sig.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Note xfs_iget can be called while holding a locked agi buffer. If
it goes into memory reclaim then inode teardown may try to lock the
same buffer. Prevent the deadlock by calling radix_tree_preload
with GFP_NOFS.
Signed-off-by: Peter Watkins <treestem@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfstest 270 was causing quota reservations way beyond what was sane
(ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
in the error handling path of xfs_bmapi_reserve_delalloc() because
xfs_trans_unreserve_quota_nblks() simple negates the value passed -
which doesn't work for an unsigned variable. This causes
reservations of close to 2^32 block instead of removing a
reservation of a handful of blocks.
Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
callers where unsigned integer variables are used, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount. This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:
PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0"
#0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
#1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
#2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
#3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
#4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
#5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
#6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
#7 [ffff88016fdffc60] page_fault at ffffffff813ac635
[exception RIP: xlog_get_lowest_lsn+0x30]
RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246
RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980
RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400
R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
#9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit e459df5, 'xfs: move busy extent handling to it's own file'
moved some code from xfs_alloc.c into xfs_extent_busy.c for
convenience in userspace code merges. One of the functions moved is
xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined
STATIC. Unfortunately this function is still used in xfs_alloc.c, and
this results in an undefined symbol in xfs.ko.
Make xfs_extent_busy_trim not static and add its prototype to
xfs_extent_busy.h.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Rather than specifying XBF_MAPPED for almost all buffers, introduce
XBF_UNMAPPED for the couple of users that use unmapped buffers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we fail to mount the log in xfs_mountfs(), we tear down all the
infrastructure we have already allocated. However, the process of
mounting the log may have progressed to the point of reading,
caching and modifying buffers in memory. Hence before we can free
all the infrastructure, we have to flush and remove all the buffers
from memory.
Problem first reported by Eric Sandeen, later a different incarnation
was reported by Ben Myers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Recent event tracing during a debugging session showed that flags
that define the IO type for a buffer are leaking into the flags on
the buffer incorrectly. Fix the flag exclusion mask in
xfs_buf_alloc() to avoid problems that may be caused by such
leakage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.
Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no
need to keep it in it's own file anymore, so move it to xfs_fsops.c
next to xfs_fs_goingdown() and kill xfs_rw.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The only thing left in xfs_rw.h is a function prototype for an inode
function. Move that to xfs_inode.h, and kill xfs_rw.h.
Also move the function implementing the prototype from xfs_rw.c to
xfs_inode.c so we only have one function left in xfs_rw.c
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is the only remaining useful function in xfs_rw.h, so move it
to a header file responsible for block mapping functions that the
callers already include. Soon we can get rid of xfs_rw.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that the busy extent tracking has been moved out of the
allocation files, clean up the namespace it uses to
"xfs_extent_busy" rather than a mix of "xfs_busy" and
"xfs_alloc_busy".
Signed-off-by: Dave Chinner<dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To make it easier to handle userspace code merges, move all the busy
extent handling out of the allocation code and into it's own file.
The userspace code does not need the busy extent code, so this
simplifies the merging of the kernel code into the userspace
xfsprogs library.
Because the busy extent code has been almost completely rewritten
over the past couple of years, also update the copyright on this new
file to include the authors that made all those changes.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fsstress has a particular effective way of stopping debug XFS
kernels. We keep seeing assert failures due finding delayed
allocation extents where there should be none. This shows up when
extracting extent maps and we are holding all the locks we should be
to prevent races, so this really makes no sense to see these errors.
After checking that fsstress does not use mmap, it occurred to me
that fsstress uses something that no sane application uses - the
XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
do allocation of blocks beyond EOF without using preallocation, and
then call setattr to extend and zero the allocated blocks.
THe problem here is this is a buffered write, and hence the
allocation is a delayed allocation. Unlike the buffered IO path, the
allocation and zeroing are not serialised using the IOLOCK. Hence
the ALLOCSP operation can race with operations holding the iolock to
prevent buffered IO operations from occurring.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
avoid recursion through memory reclaim back into the filesystem.
All the blocking get calls in growfs occur inside a transaction, even though
they are no part of the transaction, so all allocation will be GFP_NOFS due to
the task flag PF_TRANS being set. The blocking read calls occur during log
recovery, so they will probably be unaffected by converting to GFP_NOFS
allocations.
Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
outside a transaction context. The error handling is slightly different in that
xfs_read_buf stales the errored buffer it gets back, but there is probably good
reason for xfs_trans_read_buf() for doing this.
Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
and convert all the callers of xfs_read_buf() to use the former function. We can
then remove xfs_read_buf().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Buffers are always returned locked from the lookup routines. Hence
we don't need to tell the lookup routines to return locked buffers,
on to try and lock them. Remove XBF_LOCK from all the callers and
from internal buffer cache usage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|