summaryrefslogtreecommitdiffstats
path: root/fs/ceph/ioctl.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2011-11-17ocfs2: Commit transactions in error cases -v2Wengang Wang3-6/+9
There are three cases found that in error cases, journal transactions are not committed nor aborted. We should take care of these case by committing the transactions. Otherwise, there would left a journal handle which will lead to , in same process context, the comming ocfs2_start_trans() gets wrong credits. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: make direntry invalid when deleting itWengang Wang1-2/+1
When we deleting a direntry from a directory, if it's the first in a block we invalid it by setting inode to 0; otherwise, we merge the deleted one to the prior and contiguous direntry. And we don't truncate directories. There is a problem for the later case since inode is not set to 0. This problem happens when the caller passes a file position as parameter to ocfs2_dir_foreach_blk(). If the position happens to point to a stale(not the first, deleted in betweens of ocfs2_dir_foreach_blk()s) direntry, we are not able to recognize its staleness. So that we treat it as a live one wrongly. The fix is to set inode to 0 in both cases indicating the direntry is stale. This won't introduce additional IOs. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_freeJulia Lawall1-1/+1
Memory allocated using kmem_cache_zalloc should be freed using kmem_cache_free, not kfree. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,e,e1,e2; @@ x = kmem_cache_zalloc(e1,e2) ... when != x = e ?-kfree(x) +kmem_cache_free(e1,x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-07-28ocfs2: Avoid livelock in ocfs2_readpage()Jan Kara1-0/+8
When someone writes to an inode, readers accessing the same inode via ocfs2_readpage() just busyloop trying to get ip_alloc_sem because do_generic_file_read() looks up the page again and retries ->readpage() when previous attempt failed with AOP_TRUNCATED_PAGE. When there are enough readers, they can occupy all CPUs and in non-preempt kernel the system is deadlocked because writer holding ip_alloc_sem is never run to release the semaphore. Fix the problem by making reader block on ip_alloc_sem to break the busy loop. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-07-28ocfs2: serialize unaligned aioMark Fasheh5-2/+73
Fix a corruption that can happen when we have (two or more) outstanding aio's to an overlapping unaligned region. Ext4 (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix similar issues. In our case what happens is that we can have an outstanding aio on a region and if a write comes in with some bytes overlapping the original aio we may decide to read that region into a page before continuing (typically because of buffered-io fallback). Since we have no ordering guarantees with the aio, we can read stale or bad data into the page and then write it back out. If the i/o is page and block aligned, then we avoid this issue as there won't be any need to read data from disk. I took the same approach as Eric in the ext4 patch and introduced some serialization of unaligned async direct i/o. I don't expect this to have an effect on the most common cases of AIO. Unaligned aio will be slower though, but that's far more acceptable than data corruption. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-07-25ocfs2: Implement llseek()Sunil Mushran3-2/+151
ocfs2 implements its own llseek() to provide the SEEK_HOLE/SEEK_DATA functionality. SEEK_HOLE sets the file pointer to the start of either a hole or an unwritten (preallocated) extent, that is greater than or equal to the supplied offset. SEEK_DATA sets the file pointer to the start of an allocated extent (not unwritten) that is greater than or equal to the supplied offset. If the supplied offset is on a desired region, then the file pointer is set to it. Offsets greater than or equal to the file size return -ENXIO. Unwritten (preallocated) extents are considered holes because the file system treats reads to such regions in the same way as it does to holes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2: Fix ocfs2_page_mkwrite()Wengang Wang2-37/+67
This patch address two shortcomings in ocfs2_page_mkwrite(): 1. Makes the function return better VM_FAULT_* errors. 2. It handles a error that is triggered when a page is dropped from the mapping due to memory pressure. This patch locks the page to prevent that. [Patch was cleaned up by Sunil Mushran.] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2: Add comment about orphan scanningSunil Mushran1-0/+14
Add a comment that explains the reason as to why orphan scan scans all the slots. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2: Clean up messages in the fsSunil Mushran4-14/+22
Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/cluster: Cluster up now includes network connections tooSunil Mushran2-13/+61
The cluster up check only checks to see if the node is heartbeating or not. If yes it continues assuming that the node is connected to all the nodes. But if that is not the case, the cluster join aborts with a stack of errors that are not easy to comprehend. This patch adds the network connect check upfront and prints the nodes that the node is not yet connected to, before aborting. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/cluster: Add new function o2net_fill_node_map()Sunil Mushran3-33/+90
Patch adds function o2net_fill_node_map() to return the bitmap of nodes that it is connected to. This bitmap is also accessible by the user via the debugfs file, /sys/kernel/debug/o2net/connected_nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/cluster: Fix output in file elapsed_time_in_msSunil Mushran1-3/+6
The o2hb debugfs file, elapsed_time_in_ms, should return values only after the timer is armed atleast once. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: dlmlock_remote() needs to account for remasterySunil Mushran1-10/+8
In dlmlock_remote(), we wait for the resource to stop being active before setting the inprogress flag. Active includes recovery, migration, etc. The problem here is that if the resource was being recovered or migrated, the new owner could very well be that node itself (and thus not a remote node). This problem was observed in Oracle bug#12583620. The error messages observed were as follows: dlm_send_remote_lock_request:337 ERROR: Error -40 (ELOOP) when sending message 503 (key 0xd6d8c7) to node 2 dlmlock_remote:271 ERROR: dlm status = DLM_BADARGS dlmlock:751 ERROR: dlm status = DLM_BADARGS Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Take inflight reference count for remotely mastered resources tooSunil Mushran3-39/+32
The inflight reference count, in the lock resource, is taken to pin the resource in memory. We take it when a new resource is created and release it after a lock is attached to it. We do this to prevent the resource from getting purged prematurely. Earlier this reference count was being taken for locally mastered resources only. This patch extends the same functionality for remotely mastered ones. We are doing this because the same premature purging could occur for remotely mastered resources if the remote node were to die before completion of the create lock. Fix for Oracle bug#12405575. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()Sunil Mushran2-26/+24
dlm_wait_for_node_death() and dlm_wait_for_node_recovery() needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Trace insert/remove of resource to/from hashSunil Mushran3-11/+15
Add mlog to trace adding and removing the resource from/to the hash table. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Clean up refmap helpersSunil Mushran3-79/+66
Patch cleans up helpers that set/clear refmap bits and grab/drop inflight lock ref counts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Cleanup up dlm_finish_local_lockres_recovery()Sunil Mushran1-32/+25
dlm_finish_local_lockres_recovery() needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2: Clean up messages in stack_o2cb.cSunil Mushran1-2/+2
o2cb messages needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Clean up messages in o2dlmSunil Mushran5-70/+71
o2dlm messages needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/cluster: Clean up messages in o2netSunil Mushran1-66/+53
o2net messages needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/cluster: Abort heartbeat start on hard-ro devicesSunil Mushran1-69/+116
Currently if the heartbeat device is hard-ro, the o2hb thread keeps chugging along and dumping errors along the way. The user needs to manually stop the heartbeat. The patch addresses this shortcoming by adding a limit to the number of times the hb thread will iterate in an unsteady state. If the hb thread does not ready steady state in that many interation, the start is aborted. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24Documentation: Update augmented rbtree documentationSasha Levin1-9/+14
Current documentation referred to the old method of handling augmented trees. Update documentation to correspond with the changes done in commit b945d6b2554d ("rbtree: Undo augmented trees performance damage and regression"). Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-24XZ: Fix missing <linux/kernel.h> includeLasse Collin1-1/+1
<linux/kernel.h> is needed for min_t. The old version happened to work on x86 because <asm/unaligned.h> indirectly includes <linux/kernel.h>, but it didn't work on ARM. <linux/kernel.h> includes <asm/byteorder.h> so it's not necessary to include it explicitly anymore. Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Cc: stable <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-24modpost: Fix modpost's license checking V3Alessio Igor Bogani1-1/+28
The commit f02e8a6 sorts symbols placing each of them in its own elf section. The sorting and merging into the canonical sections are done by the linker. Unfortunately modpost to generate Module.symvers file parses vmlinux (already linked) and all modules object files (which aren't linked yet). These aren't sanitized by the linker yet. That breaks modpost that can't detect license properly for modules. This patch makes modpost aware of the new exported symbols structure. Thanks to Arnaud Lacombe <lacombar@gmail.com> and Anders Kaseorg <andersk@ksplice.com> for providing useful suggestions about code. This work was supported by a hardware donation from the CE Linux Forum. Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Alessio Igor Bogani <abogani@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24module: add /sys/module/<name>/uevent filesKay Sievers3-0/+22
Userspace wants to manage module parameters with udev rules. This currently only works for loaded modules, but not for built-in ones. To allow access to the built-in modules we need to re-trigger all module load events that happened before any userspace was running. We already do the same thing for all devices, subsystems(buses) and drivers. This adds the currently missing /sys/module/<name>/uevent files to all module entries. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split & trivial fix)
2011-07-24module: change attr callbacks to take struct module_kobjectKay Sievers3-23/+24
This simplifies the next patch, where we have an attribute on a builtin module (ie. module == NULL). Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split into 2)
2011-07-24modules: make arch's use default loader hooksJonas Bonn26-777/+12
This patch removes all the module loader hook implementations in the architecture specific code where the functionality is the same as that now provided by the recently added default hooks. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24modules: add default loader hook implementationsJonas Bonn2-1/+55
The module loader code allows architectures to hook into the code by providing a small number of entry points that each arch must implement. This patch provides __weakly linked generic implementations of these entry points for architectures that don't need to do anything special. Signed-off-by: Jonas Bonn <jonas@southpole.se> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24param: fix return value handling in param_set_*Satoru Moriya1-2/+2
In STANDARD_PARAM_DEF, param_set_* handles the case in which strtolfn returns -EINVAL but it may return -ERANGE. If it returns -ERANGE, param_set_* may set uninitialized value to the paramerter. We should handle both cases. The one of the cases in which strtolfn() returns -ERANGE is following: *Type of module parameter is long *Set the parameter more than LONG_MAX Signed-off-by: Satoru Moriya <satoru.moriya@hds.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24KVM: IOMMU: Disable device assignment without interrupt remappingAlex Williamson1-0/+18
IOMMU interrupt remapping support provides a further layer of isolation for device assignment by preventing arbitrary interrupt block DMA writes by a malicious guest from reaching the host. By default, we should require that the platform provides interrupt remapping support, with an opt-in mechanism for existing behavior. Both AMD IOMMU and Intel VT-d2 hardware support interrupt remapping, however we currently only have software support on the Intel side. Users wishing to re-enable device assignment when interrupt remapping is not supported on the platform can use the "allow_unsafe_assigned_interrupts=1" module option. [avi: break long lines] Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: trace mmio page faultXiao Guangrong4-1/+80
Add tracepoints to trace mmio page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: mmio page fault supportXiao Guangrong6-14/+255
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: reorganize struct kvm_shadow_walk_iteratorXiao Guangrong1-1/+1
Reorganize it for good using the cache Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: lockless walking shadow page tableXiao Guangrong2-8/+132
Use rcu to protect shadow pages table to be freed, so we can safely walk it, it should run fastly and is needed by mmio page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: do not need atomicly to set/clear spteXiao Guangrong1-15/+71
Now, the spte is just from nonprsent to present or present to nonprsent, so we can use some trick to set/clear spte non-atomicly as linux kernel does Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: introduce the rules to modify shadow page tableXiao Guangrong1-34/+69
Introduce some interfaces to modify spte as linux kernel does: - mmu_spte_clear_track_bits, it set the spte from present to nonpresent, and track the stat bits(accessed/dirty) of spte - mmu_spte_clear_no_track, the same as mmu_spte_clear_track_bits except tracking the stat bits - mmu_spte_set, set spte from nonpresent to present - mmu_spte_update, only update the stat bits Now, it does not allowed to set spte from present to present, later, we can drop the atomicly opration for X86_32 host, and it is the preparing work to get spte on X86_32 host out of the mmu lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: abstract some functions to handle fault pfnXiao Guangrong2-18/+41
Introduce handle_abnormal_pfn to handle fault pfn on page fault path, introduce mmu_invalid_pfn to handle fault pfn on prefetch path It is the preparing work for mmio page fault support Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: filter out the mmio pfn from the fault pfnXiao Guangrong3-4/+21
If the page fault is caused by mmio, the gfn can not be found in memslots, and 'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify the mmio page fault. And, to clarify the meaning of mmio pfn, we return fault page instead of bad page when the gfn is not allowd to prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: remove bypass_guest_pfXiao Guangrong7-132/+33
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: split kvm_mmu_free_pageXiao Guangrong1-3/+18
Split kvm_mmu_free_page to kvm_mmu_isolate_page and kvm_mmu_free_page One is used to remove the page from cache under mmu lock and the other is used to free page table out of mmu lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: count used shadow pages on prepareing pathXiao Guangrong1-5/+5
Move counting used shadow pages from commiting path to preparing path to reduce tlb flush on some paths Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: rename 'pt_write' to 'emulate'Xiao Guangrong2-13/+13
If 'pt_write' is true, we need to emulate the fault. And in later patch, we need to emulate the fault even though it is not a pt_write event, so rename it to better fit the meaning Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cleanup for FNAME(fetch)Xiao Guangrong1-2/+2
gw->pte_access is the final access permission, since it is unified with gw->pt_access when we walked guest page table: FNAME(walk_addr_generic): pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true); Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: optimize to handle dirty bitXiao Guangrong2-34/+25
If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cache mmio info on page fault pathXiao Guangrong6-21/+96
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the codeXiao Guangrong1-11/+31
Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it to cleanup the code between read emulation and write emulation Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: do not update slot bitmap if spte is nonpresentXiao Guangrong1-8/+7
Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: fix walking shadow page tableXiao Guangrong1-4/+5
Properly check the last mapping, and do not walk to the next level if last spte is met Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM guest: KVM Steal time registrationGlauber Costa4-0/+84
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>