summaryrefslogtreecommitdiffstats
path: root/fs/ceph/dir.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ceph: initialize root dentrySage Weil2011-11-111-1/+1
| | | | | | | | | | | Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference in ceph_d_prune on umount. It also means we can eventually strip out all of the conditional checks on d_fsdata because it is now set unconditionally (prior to setting up the d_ops). Fix the ceph_d_prune debug print while we're here. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use new D_COMPLETE dentry flagSage Weil2011-11-061-9/+50
| | | | | | | | We used to use a flag on the directory inode to track whether the dcache contents for a directory were a complete cached copy. Switch to a dentry flag CEPH_D_COMPLETE that is safely updated by ->d_prune(). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clear parent D_COMPLETE flag when on dentry pruneSage Weil2011-11-031-0/+28
| | | | | | | | When the VFS prunes a dentry from the cache, clear the D_COMPLETE flag on the parent dentry. Do this for the live and snapshotted namespaces. Do not bother for the .snap dir contents, since we do not cache that. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-07-261-43/+73
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: document unlocked d_parent accesses ceph: explicitly reference rename old_dentry parent dir in request ceph: document locking for ceph_set_dentry_offset ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ceph: protect d_parent access in ceph_d_revalidate ceph: protect access to d_parent ceph: handle racing calls to ceph_init_dentry ceph: set dir complete frag after adding capability rbd: set blk_queue request sizes to object size ceph: set up readahead size when rsize is not passed rbd: cancel watch request when releasing the device ceph: ignore lease mask ceph: fix ceph_lookup_open intent usage ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC ceph: fix bad parent_inode calc in ceph_lookup_open ceph: avoid carrying Fw cap during write into page cache libceph: don't time out osd requests that haven't been received ceph: report f_bfree based on kb_avail rather than diffing. ceph: only queue capsnap if caps are dirty ceph: fix snap writeback when racing with writes ...
| * ceph: document unlocked d_parent accessesSage Weil2011-07-261-1/+1
| | | | | | | | | | | | | | | | | | For the most part we don't care about racing with rename when directing MDS requests; either the old or new parent is fine. Document that, and do some minor cleanup. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: explicitly reference rename old_dentry parent dir in requestSage Weil2011-07-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | We carry a pin on the parent directory for the rename source and dest dentries. For the source it's r_locked_dir; we need to explicitly reference the old_dentry parent as well, since the dentry's d_parent may change between when the request was created and pinned and when it is freed. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bugSage Weil2011-07-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Have caller pass in a safely-obtained reference to the parent directory for calculating a dentry's hash valud. While we're here, simpify the flow through ceph_encode_fh() so that there is a single exit point and cleanup. Also fix a bug with the dentry hash calculation: calculate the hash for the dentry we were given, not its parent. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: protect d_parent access in ceph_d_revalidateSage Weil2011-07-261-15/+17
| | | | | | | | | | | | | | | | Protect d_parent with d_lock. Carry a reference. Simplify the flow so that there is a single exit point and cleanup. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: protect access to d_parentSage Weil2011-07-261-0/+15
| | | | | | | | | | | | | | | | | | | | d_parent is protected by d_lock: use it when looking up a dentry's parent directory inode. Also take a reference and drop it in the caller to avoid a use-after-free. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: handle racing calls to ceph_init_dentrySage Weil2011-07-261-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | The ->lookup() and prepopulate_readdir() callers are working with unhashed dentries, so we don't have to worry. The export.c callers, though, need to initialize something they got back from d_obtain_alias() and are potentially racing with other callers. Make sure we don't return unless the dentry is properly initialized (by us or someone else). Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix ceph_lookup_open intent usageSage Weil2011-07-261-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We weren't properly calling lookup_instantiate_filp when setting up the lookup intent, which could lead to file leakage on errors. So: - use separate helper for the hidden snapdir translation, immediately following the mds request - use ceph_finish_lookup for the final dentry/return value dance in the exit path - lookup_instantiate_filp on success Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: use flag bit for at_end readdir flagSage Weil2011-07-261-5/+5
| | | | | | | | | | | | | | This saves us a word of memory per file. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* | fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik2011-07-211-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseekJosef Bacik2011-07-211-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases we just return -EINVAL, in others we do the normal generic thing, and in others we're simply making sure that the properly due-dilligence is done. For example in NFS/CIFS we need to make sure the file size is update properly for the SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself that is all we have to do. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | don't open-code parent_ino() in assorted ->readdir()Al Viro2011-07-211-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ceph: LOOKUP_OPEN is set only when it's the last componentAl Viro2011-07-201-1/+0
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: use ihold when we already have an inode refSage Weil2011-06-081-4/+7
| | | | | | | | We should use ihold whenever we already have a stable inode ref, even when we aren't holding i_lock. This avoids adding new and unnecessary locking dependencies. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix broken comparison in readdir loopSage Weil2011-05-191-1/+1
| | | | | | | Both off and fi->offset are unsigned, so the difference is always >= 0. Compare them directly instead of the sign of the difference. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use snprintf for dirstat contentSage Weil2011-05-191-2/+3
| | | | | | | We allocate a buffer for rstats if the dirstat option is enabled. Use snprintf. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: rename dentry_release -> d_release, fix commentSage Weil2011-03-211-7/+6
| | | | | | Just for consistency's sake. Fix obsolete comment too. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add ino32 mount optionYehuda Sadeh2011-03-211-4/+7
| | | | | | | The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: fix d_revalidate oopsen on NFS exportsAl Viro2011-03-101-1/+1
| | | | | | can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: no .snap inside of snapped namespaceSage Weil2011-03-041-0/+1
| | | | | | | | | | | Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not clear I_COMPLETE from d_releaseSage Weil2011-03-031-21/+1
| | | | | | | | | First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (949854d0), causing a NULL pointer dereference. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not set I_COMPLETESage Weil2011-03-031-1/+1
| | | | | | | Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>
* Revert "ceph: keep reference to parent inode on ceph_dentry"Sage Weil2011-03-031-4/+1
| | | | | | | | | This reverts commit 97d79b403ef03f729883246208ef5d8a2ebc4d68. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-02-221-1/+4
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling
| * ceph: keep reference to parent inode on ceph_dentryYehuda Sadeh2011-02-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from 949854d0, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-01-131-0/+20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode
| * ceph: add dir_layout to inodeSage Weil2011-01-131-0/+20
| | | | | | | | | | | | | | | | | | Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
* | fs: rcu-walk aware d_revalidate methodNick Piggin2011-01-071-1/+6
| | | | | | | | | | | | | | | | Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache reduce branches in lookup pathNick Piggin2011-01-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache remove dcache_lockNick Piggin2011-01-071-5/+1
| | | | | | | | | | | | dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache scale subdirsNick Piggin2011-01-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache scale d_unhashedNick Piggin2011-01-071-2/+3
| | | | | | | | | | | | | | Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache scale dentry refcountNick Piggin2011-01-071-1/+3
|/ | | | | | | | Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* ceph: fix null pointer dereference in ceph_init_dentry for nfs reexportSage Weil2010-12-171-1/+2
| | | | | | | The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that d_parent is defined. It isn't for those callers, so check! Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: avoid possible null deref in readdir after dir llseekSage Weil2010-12-011-2/+2
| | | | | | | | | last may be NULL, but we dereference it in the else branch without checking. Normally it doesn't trigger because last == NULL when fpos == 2, but it could happen on a newly opened dir if the user seeks forward. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix readdir EOVERFLOW on 32-bit archsSage Weil2010-11-181-3/+7
| | | | | | | | | | | One of the readdir filldir_t callers was passing the raw ceph 64-bit ino instead of the hashed 32-bit one, producing an EOVERFLOW in the filler callback. Fix this by calling the ceph_vino_to_ino() helper to do the conversion. Reported-by: Jan Smets <jan.smets@alcatel-lucent.com> Tested-by: Jan Smets <jan.smets@alcatel-lucent.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix frag offset for non-leftmost fragsSage Weil2010-11-121-1/+4
| | | | | | | | We start at offset 2 for the leftmost frag, and 0 for subsequent frags. When we reach the end (rightmost), we go back to 2. This fixes readdir on fragmented (large) directories. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix dangling pointerSage Weil2010-11-121-0/+1
| | | | | | | Clear fi->last_name when it's freed. The only caller is rewinddir() (or equivalent lseek). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not carry i_lock for readdir from dcacheSage Weil2010-10-211-25/+16
| | | | | | | | | | | | We were taking dcache_lock inside of i_lock, which introduces a dependency not found elsewhere in the kernel, complicationg the vfs locking scalability work. Since we don't actually need it here anyway, remove it. We only need i_lock to test for the I_COMPLETE flag, so be careful to do so without dcache_lock held. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not hide .snap in root directorySage Weil2010-10-211-1/+0
| | | | | | | Snaps in the root directory are now supported by the MDS, and harmless on older versions. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: factor out libceph from Ceph file systemYehuda Sadeh2010-10-211-27/+28
| | | | | | | | | | | | | | | | | | | | | | This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix null pointer deref on anon root dentry releaseSage Weil2010-09-111-3/+7
| | | | | | | | | | | | | | | | | | When we release a root dentry, particularly after a splice, the parent (actually our) inode was evaluating to NULL and was getting dereferenced by ceph_snap(). This is reproduced by something as simple as mount -t ceph monhost:/a/b mnt mount -t ceph monhost:/a mnt2 ls mnt2 A splice_dentry() would kill the old 'b' inode's root dentry, and we'd crash while releasing it. Fix by checking for both the ROOT and NULL cases explicitly. We only need to invalidate the parent dir when we have a correct parent to invalidate. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: initialize fields on new dentry_infosSage Weil2010-08-251-1/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: constify dentry_operationsSage Weil2010-08-031-4/+4
| | | | | | This makes checkpatch happy. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: code cleanupYehuda Sadeh2010-08-021-0/+2
| | | | | | | Mainly fixing minor issues reported by sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix d_release dop for snapdir, snapped dentriesSage Weil2010-07-231-3/+9
| | | | | | | | | | We need to set the d_release dop for snapdir and snapped dentries so that the ceph_dentry_info struct gets released. We also use the dcache to cache readdir results when possible, which only works if we know when dentries are dropped from the cache. Since we don't use the dcache for readdir in the hidden snapdir, avoid that case in ceph_dentry_release. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: avoid dcache readdir for snapdirSage Weil2010-07-221-0/+1
| | | | | | | | We should always go to the MDS for readdir on the hidden snapdir. The set of snapshots can change at any time; the client can't trust its cache for that. Signed-off-by: Sage Weil <sage@newdream.net>