summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
*-. Merge branches 'upstream/xenfs' and 'upstream/core' of ↵Linus Torvalds2010-10-2720-137/+1332
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/privcmd: make privcmd visible in domU xen/privcmd: move remap_domain_mfn_range() to core xen code and export. privcmd: MMAPBATCH: Fix error handling/reporting xenbus: export xen_store_interface for xenfs xen/privcmd: make sure vma is ours before doing anything to it xen/privcmd: print SIGBUS faults xen/xenfs: set_page_dirty is supposed to return true if it dirties xen/privcmd: create address space to allow writable mmaps xen: add privcmd driver xen: add variable hypercall caller xen: add xen_set_domain_pte() xen: add /proc/xen/xsd_{kva,port} to xenfs * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits) xen: include xen/xen.h for definition of xen_initial_domain() xen: use host E820 map for dom0 xen: correctly rebuild mfn list list after migration. xen: improvements to VIRQ_DEBUG output xen: set up IRQ before binding virq to evtchn xen: ensure that all event channels start off bound to VCPU 0 xen/hvc: only notify if we actually sent something xen: don't add extra_pages for RAM after mem_end xen: add support for PAT xen: make sure xen_max_p2m_pfn is up to date xen: limit extra memory to a certain ratio of base xen: add extra pages for E820 RAM regions, even if beyond mem_end xen: make sure xen_extra_mem_start is beyond all non-RAM e820 xen: implement "extra" memory to reserve space for pages not present at boot xen: Use host-provided E820 map xen: don't map missing memory xen: defer building p2m mfn structures until kernel is mapped xen: add return value to set_phys_to_machine() xen: convert p2m to a 3 level tree xen: make install_p2mtop_page() static ... Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of 'reserve_early()' - in the new memblock world order it is now 'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.
| | * xen: include xen/xen.h for definition of xen_initial_domain()Ian Campbell2010-10-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | CC arch/x86/xen/setup.o arch/x86/xen/setup.c: In function 'xen_memory_setup': arch/x86/xen/setup.c:161: error: implicit declaration of function 'xen_initial_domain' Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: use host E820 map for dom0Ian Campbell2010-10-221-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running as initial domain, get the real physical memory map from xen using the XENMEM_machine_memory_map hypercall and use it to setup the e820 regions. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen: correctly rebuild mfn list list after migration.Ian Campbell2010-10-221-13/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise the second migration attempt fails because the mfn_list_list still refers to all the old mfns. We need to update the entires in both p2m_top_mfn and the mid_mfn pages which p2m_top_mfn refers to. In order to do this we need to keep track of the virtual addresses mapping the p2m_mid_mfn pages since we cannot rely on mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still contain the old MFN after a migration, which may now belong to another domain and hence have a different mapping in the m2p. Therefore add and maintain a third top level page, p2m_top_mfn_p[], which tracks the virtual addresses of the mfns contained in p2m_top_mfn[]. We also need to update the content of the p2m_mid_missing_mfn page on resume to refer to the page's new mfn. p2m_missing does not need updating since the migration process takes care of the leaf p2m pages for us. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: improvements to VIRQ_DEBUG outputIan Campbell2010-10-221-22/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix bitmask formatting on 64 bit by specifying correct field widths. * Output both global and local masked and pending information. * Indicate in list of pending interrupts whether they are pending in the L2, masked globally and/or masked locally. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: set up IRQ before binding virq to evtchnJeremy Fitzhardinge2010-10-221-5/+5
| | | | | | | | | | | | | | | | | | Make sure the irq is set up before binding a virq event channel to it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: ensure that all event channels start off bound to VCPU 0Ian Campbell2010-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All event channels startbound to VCPU 0 so ensure that cpu_evtchn_mask is initialised to reflect this. Otherwise there is a race after registering an event channel but before the affinity is explicitly set where the event channel can be delivered. If this happens then the event channel remains pending in the L1 (evtchn_pending) array but is cleared in L2 (evtchn_pending_sel), this means the event channel cannot be reraised until another event channel happens to trigger the same L2 entry on that VCPU. sizeof(cpu_evtchn_mask(0))==sizeof(unsigned long*) which is not correct, and causes only the first 32 or 64 event channels (depending on architecture) to be initially bound to VCPU0. Use sizeof(struct cpu_evtchn_s) instead. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: stable@kernel.org
| | * xen/hvc: only notify if we actually sent somethingJeremy Fitzhardinge2010-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | Don't spam dom0/xenconsoled with events unless we've actually added something to the ring. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: don't add extra_pages for RAM after mem_endJeremy Fitzhardinge2010-10-221-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an E820 region is entirely beyond mem_end, don't attempt to truncate it and add the truncated pages to extra_pages, as they will be negative. Also, make sure the extra memory region starts after all BIOS provided E820 regions (and in the case of RAM regions, post-clipping). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: add support for PATJeremy Fitzhardinge2010-10-223-3/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert Linux PAT entries into Xen ones when constructing ptes. Linux doesn't use _PAGE_PAT for ptes, so the only difference in the first 4 entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default) use it for WT. xen_pte_val does the inverse conversion. We hard-code assumptions about Linux's current PAT layout, but a warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems. If necessary we could go to a more general table-based conversion between Linux and Xen PAT entries. hugetlbfs poses a problem at the moment, the x86 architecture uses the same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending on which pagetable level we're using. At the moment this should be OK so long as nobody tries to do a pte_val on a hugetlbfs pte. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: make sure xen_max_p2m_pfn is up to dateJeremy Fitzhardinge2010-10-223-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Keep xen_max_p2m_pfn up to date with the end of the extra memory we're adding. It is possible that it will be too high since memory may be truncated by a "mem=" option on the kernel command line, but that won't matter. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: limit extra memory to a certain ratio of baseJeremy Fitzhardinge2010-10-221-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If extra memory is very much larger than the base memory size then all of the base memory can be filled with structures reserved to describe the extra memory, leaving no space for anything else. Even at the maximum ratio there will be little space for anything else, but this change is intended to at least allow the system to boot rather than crash mysteriously. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: add extra pages for E820 RAM regions, even if beyond mem_endJeremy Fitzhardinge2010-10-221-3/+4
| | | | | | | | | | | | | | | | | | | | | If an entire E820 RAM region is beyond mem_end, still add its pages to the extra area so that space can be used by the kernel. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: make sure xen_extra_mem_start is beyond all non-RAM e820Jeremy Fitzhardinge2010-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | If Xen gives us non-RAM E820 entries (dom0 only, typically), then make sure the extra RAM region is beyond them. It's OK for the extra space to grow into E820 regions, however. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: implement "extra" memory to reserve space for pages not present at bootJeremy Fitzhardinge2010-10-221-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the e820 map to get the initial pseudo-physical address space, look for either Xen-provided memory which doesn't lie within an E820 region, or an E820 RAM region which extends beyond the Xen-provided memory range. Count these pages, and add them to a new "extra memory" range. This range has an E820 RAM range to describe it - so the kernel will allocate page structures for it - but it is also marked reserved so that the kernel will not attempt to use it. The balloon driver can then add this range as a set of currently ballooned-out pages, which can be used to extend the domain beyond its original size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: Use host-provided E820 mapIan Campbell2010-10-222-2/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than simply using a flat memory map from Xen, use its provided E820 map. This allows the domain builder to tell the domain to reserve space for more pages than those initially provided at domain-build time. It also allows the host to specify holes in the address space (for PCI-passthrough, for example). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: don't map missing memoryJeremy Fitzhardinge2010-10-222-2/+22
| | | | | | | | | | | | | | | | | | | | | When setting up a pte for a missing pfn (no matching mfn), just create an empty pte rather than a junk mapping. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: defer building p2m mfn structures until kernel is mappedJeremy Fitzhardinge2010-10-222-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building mfn parts of p2m structure, we rely on being able to use mfn_to_virt, which in turn requires kernel to be mapped into the linear area (which is distinct from the kernel image mapping on 64-bit). Defer calling xen_build_mfn_list_list() until after xen_setup_kernel_pagetable(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: add return value to set_phys_to_machine()Jeremy Fitzhardinge2010-10-222-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_phys_to_machine() can return false on failure, which means a memory allocation failure for the p2m structure. It can only fail if setting the mfn for a pfn in previously unused address space. It is guaranteed to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating the mfn for an existing pfn. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: convert p2m to a 3 level treeJeremy Fitzhardinge2010-10-222-83/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the p2m structure a 3 level tree which covers the full possible physical space. The p2m structure contains mappings from the domain's pfns to system-wide mfns. The structure has 3 levels and two roots. The first root is for the domain's own use, and is linked with virtual addresses. The second is all mfn references, and is used by Xen on save/restore to allow it to update the p2m mapping for the domain. At boot, the domain builder provides a simple flat p2m array for all the initially present pages. We construct the two levels above that using the early_brk allocator. After early boot time, set_phys_to_machine() will allocate any missing levels using the normal kernel allocator (at GFP_KERNEL, so it must be called in a normal blocking context). Because the early_brk() API requires us to pre-reserve the maximum amount of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY config option, but its only negative side-effect is to increase the kernel's apparent bss size. However, since all unused brk memory is returned to the heap, there's no real downside to making it large. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: make install_p2mtop_page() staticJeremy Fitzhardinge2010-10-222-3/+2
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: set the actual extent of the mfn_list_listJeremy Fitzhardinge2010-10-221-1/+1
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: set shared_info->arch.max_pfn to max_p2m_pfnJeremy Fitzhardinge2010-10-221-1/+1
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen/events: change to using fasteoiJeremy Fitzhardinge2010-10-221-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Change event delivery to: - mask+clear event in the upcall function - use handle_fasteoi_irq as the handler - unmask in the eoi function (and handle migration) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: remove noise about registering vcpu infoJeremy Fitzhardinge2010-10-221-8/+0
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: allocate level1_ident_pgtJeremy Fitzhardinge2010-10-221-2/+6
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: use early_brk for level2_kernel_pgtJeremy Fitzhardinge2010-10-221-1/+3
| | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: allocate p2m size based on actual max sizeJeremy Fitzhardinge2010-10-221-14/+21
| | | | | | | | | | | | | | | | | | | | | Allocate p2m tables based on the actual runtime maximum pfn rather than the static config-time limit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * xen: dynamically allocate p2m spaceJeremy Fitzhardinge2010-10-221-8/+22
| | | | | | | | | | | | | | | | | | | | | Use early brk mechanism to allocate p2m tables, to save memory when booting non-Xen. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| | * x86: add RESERVE_BRK_ARRAY() helperJeremy Fitzhardinge2010-10-221-0/+5
| | | | | | | | | | | | | | | | | | Useful when converting static arrays into boottime brk allocated objects. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/privcmd: make privcmd visible in domUJeremy Fitzhardinge2010-10-212-4/+3
| | | | | | | | | | | | | | | | | | | | | It has its uses in a domU as well as dom0. Xen will prevent an unprivileged domain from doing anything untoward. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/privcmd: move remap_domain_mfn_range() to core xen code and export.Ian Campbell2010-10-213-73/+79
| | | | | | | | | | | | | | | | | | | | | | | | This allows xenfs to be built as a module, previously it required flush_tlb_all and arbitrary_virt_to_machine to be exported. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | privcmd: MMAPBATCH: Fix error handling/reportingIan Campbell2010-10-211-15/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On error IOCTL_PRIVCMD_MMAPBATCH is expected to set the top nibble of the effected MFN and return 0. Currently it leaves the MFN unmodified and returns the number of failures. Therefore: - reimplement remap_domain_mfn_range() using direct HYPERVISOR_mmu_update() calls and small batches. The xen_set_domain_pte() interface does not report errors and since some failures are expected/normal using the multicall infrastructure is too noisy. - return 0 as expected - writeback the updated MFN list to mmapbatch->arr not over mmapbatch, smashing the caller's stack. - remap_domain_mfn_range can be static. With this change I am able to start an HVM domain. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xenbus: export xen_store_interface for xenfsJeremy Fitzhardinge2010-10-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | xen_store_interface is needed by xenfs, and xenfs may be a module. [ Impact: build fix for modular xenfs ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/privcmd: make sure vma is ours before doing anything to itJeremy Fitzhardinge2010-10-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Test vma->vm_ops is our operations to make sure we created it. We don't want to stomp on other random vmas. [ Impact: bugfix; prevent ioctl from affecting other mappings ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/privcmd: print SIGBUS faultsJeremy Fitzhardinge2010-10-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Print more detail about privcmd mapping faults for debugging. [ Impact: debug ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/xenfs: set_page_dirty is supposed to return true if it dirtiesJeremy Fitzhardinge2010-10-211-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | I don't think it matters at all in this case (there's only one caller which checks the return value), but may as well be strictly correct. [ Impact: cleanup ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen/privcmd: create address space to allow writable mmapsJeremy Fitzhardinge2010-10-211-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are necessary to allow writeable mmap of the privcmd node to succeed without being marked read-only for writenotify purposes. Which in turn is necessary to allow mappings of foreign guest pages [ Impact: bugfix: allow writable mappings ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen: add privcmd driverJeremy Fitzhardinge2010-10-216-1/+521
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The privcmd interface in xenfs allows the tool stack in the privileged domain to get fairly direct access to the hypervisor in order to do various management things such as domain construction. [ Impact: new xenfs interface for privileged operations ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen: add variable hypercall callerJeremy Fitzhardinge2010-10-211-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | Allow non-constant hypercall to be called, for privcmd. [ Impact: make arbitrary hypercalls; needed for privcmd ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen: add xen_set_domain_pte()Jeremy Fitzhardinge2010-10-212-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add xen_set_domain_pte() to allow setting a pte mapping a page from another domain. The common case is to map from DOMID_IO, the pseudo domain which owns all IO pages, but will also be used in the privcmd interface to map other domain pages. [ Impact: new Xen-internal API for cross-domain mappings ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
| * | xen: add /proc/xen/xsd_{kva,port} to xenfsIan Campbell2010-10-214-2/+125
| |/ | | | | | | | | | | | | | | | | | | These are used by the userspace xenstore daemon, which runs in dom0. Xenstored is what's behind the xenfs "xenbus" filesystem. [ Impact: provide mapping and port to usermode for xenstore ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-10-27118-689/+851
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
| * | split invalidate_inodes()Al Viro2010-10-264-6/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | Pull removal of fsnotify marks into generic_shutdown_super(). Split umount-time work into a new function - evict_inodes(). Make sure that invalidate_inodes() will be able to cope with I_FREEING once we change locking in iput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: skip I_FREEING inodes in writeback_sb_inodesChristoph Hellwig2010-10-261-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Skip I_FREEING inodes just like I_WILL_FREE and I_NEW when walking the writeback lists. Currenly this can't happen, but once we move from inode_lock to more fine grained locking we can have an inode that's still on the writeback lists but has I_FREEING set, and we absolutely need to skip it here, just like we do for all other inode list walks. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: fold invalidate_list into invalidate_inodesChristoph Hellwig2010-10-261-27/+16
| | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: do not drop inode_lock in dispose_listChristoph Hellwig2010-10-261-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Despite the comment above it we can not safely drop the lock here. invalidate_list is called from many other places that just umount. Also switch to proper list macros now that we never drop the lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: inode split IO and LRU listsNick Piggin2010-10-265-41/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of the same inode list structure (inode->i_list) for two different list constructs with different lifecycles and purposes makes it impossible to separate the locking of the different operations. Therefore, to enable the separation of the locking of the writeback and reclaim lists, split the inode->i_list into two separate lists dedicated to their specific tracking functions. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: switch bdev inode bdi's correctlyDave Chinner2010-10-261-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdev inodes can remain dirty even after their last close. Hence the BDI associated with the bdev->inode gets modified duringthe last close to point to the default BDI. However, the bdev inode still needs to be moved to the dirty lists of the new BDI, otherwise it will corrupt the writeback list is was left on. Add a new function bdev_inode_switch_bdi() to move all the bdi state from the old bdi to the new one safely. This is only a temporary measure until the bdev inode<->bdi lifecycle problems are sorted out. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: fix buffer invalidation in invalidate_listChristoph Hellwig2010-10-261-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must not call invalidate_inode_buffers in invalidate_list unless the inode can be reclaimed. If we remove the buffer association of a busy inode fsync won't find the buffers anymore. As invalidate_inode_buffers is called from various others sources than umount this actually does matter in practice. While at it change the loop to a more natural form and remove the WARN_ON for I_NEW, wich we already tested a few lines above. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>