summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* lookup_open(): lock the parent shared unless O_CREAT is givenAl Viro2016-05-032-3/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lookup_open(): put the dentry fed to ->lookup() or ->atomic_open() into ↵Al Viro2016-05-031-11/+26
| | | | | | in-lookup hash Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lookup_open(): expand the call of real_lookup()Al Viro2016-05-031-3/+10
| | | | | | ... and lose the duplicate IS_DEADDIR() - we'd already checked that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): reorder and clean up a bitAl Viro2016-05-031-34/+27
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open()Al Viro2016-05-031-89/+55
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): be paranoid about may_open() return valueAl Viro2016-05-031-0/+2
| | | | | | | It should never return positives; however, with Linux S&M crowd involved, no bogosity is impossible. Results would be unpleasant... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): delay open_to_namei_flags() until the method callAl Viro2016-05-031-3/+4
| | | | | | nobody else needs that transformation. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_last(): take fput() on error after opening to out:Al Viro2016-05-031-17/+5
| | | | | | | | make it conditional on *opened & FILE_OPENED; in addition to getting rid of exit_fput: thing, it simplifies atomic_open() cleanup on may_open() failure. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_last(): get rid of duplicate ELOOP checkAl Viro2016-05-031-4/+0
| | | | | | may_open() will catch it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): massage the create_error logics a bitAl Viro2016-05-031-23/+20
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): consolidate "overridden ENOENT" in open-yourself casesAl Viro2016-05-031-8/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* atomic_open(): don't bother with EEXIST check - it's done in do_last()Al Viro2016-05-031-5/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' into work.lookupsAl Viro2016-05-035-55/+35
|\
| * atomic_open(): fix the handling of create_errorAl Viro2016-04-301-16/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * if we have a hashed negative dentry and either CREAT|EXCL on r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL with failing may_o_create(), we should fail with EROFS or the error may_o_create() has returned, but not ENOENT. Which is what the current code ends up returning. * if we have CREAT|TRUNC hitting a regular file on a read-only filesystem, we can't fail with EROFS here. At the very least, not until we'd done follow_managed() - we might have a writable file (or a device, for that matter) bound on top of that one. Moreover, the code downstream will see that O_TRUNC and attempt to grab the write access (*after* following possible mount), so if we really should fail with EROFS, it will happen. No need to do that inside atomic_open(). The real logics is much simpler than what the current code is trying to do - if we decided to go for simple lookup, ended up with a negative dentry *and* had create_error set, fail with create_error. No matter whether we'd got that negative dentry from lookup_real() or had found it in dcache. Cc: stable@vger.kernel.org # v3.6+ Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fix the copy vs. map logics in blk_rq_map_user_iov()Al Viro2016-04-093-39/+28
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * do_splice_to(): cap the size before passing to ->splice_read()Al Viro2016-04-041-0/+3
| | | | | | | | | | | | pipe capacity won't exceed 2G anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | lookup_open(): expand the call of vfs_create()Al Viro2016-05-031-9/+12
| | | | | | | | | | | | | | | | Lift IS_DEADDIR handling up into the part common with atomic_open(), remove it from the latter. Collapse permission checks into the call of may_o_create(), getting it closer to atomic_open() case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | path_openat(): take O_PATH handling out of do_last()Al Viro2016-05-031-7/+24
| | | | | | | | | | | | | | | | | | do_last() and lookup_open() simpler that way and so does O_PATH itself. As it bloody well should: we find what the pathname resolves to, same way as in stat() et.al. and associate it with FMODE_PATH struct file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | simple local filesystems: switch to ->iterate_shared()Al Viro2016-05-036-6/+6
| | | | | | | | | | | | | | no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | dcache_{readdir,dir_lseek}() users: switch to ->iterate_sharedAl Viro2016-05-033-7/+4
| | | | | | | | | | | | | | no need to lock directory in dcache_dir_lseek(), while we are at it - per-struct file exclusion is enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | cifs: switch to ->iterate_shared()Al Viro2016-05-032-27/+30
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fuse: switch to ->iterate_shared()Al Viro2016-05-031-49/+45
| | | | | | | | | | | | | | Switch dcache pre-seeding on readdir to d_alloc_parallel(); nothing else is needed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | switch all procfs directories ->iterate_shared()Al Viro2016-05-037-20/+21
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | proc_sys_fill_cache(): switch to d_alloc_parallel()Al Viro2016-05-031-7/+8
| | | | | | | | | | | | make it usable with directory locked shared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | proc_fill_cache(): switch to d_alloc_parallel()Al Viro2016-05-031-5/+10
| | | | | | | | | | | | ... making it usable with directory locked shared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | introduce a parallel variant of ->iterate()Al Viro2016-05-035-11/+48
| | | | | | | | | | | | | | | | New method: ->iterate_shared(). Same arguments as in ->iterate(), called with the directory locked only shared. Once all filesystems switch, the old one will be gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | give readdir(2)/getdents(2)/etc. uniform exclusion with lseek()Al Viro2016-05-037-27/+33
| | | | | | | | | | | | same as read() on regular files has, and for the same reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | parallel lookups: actual switch to rwsemAl Viro2016-05-0311-32/+73
| | | | | | | | | | | | | | | | | | | | | | | | ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | parallel lookups machinery, part 4 (and last)Al Viro2016-05-033-23/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we *do* run into an in-lookup match, we need to wait for it to cease being in-lookup. Fortunately, we do have unused space in in-lookup dentries - d_lru is never looked at until it stops being in-lookup. So we can stash a pointer to wait_queue_head from stack frame of the caller of ->lookup(). Some precautions are needed while waiting, but it's not that hard - we do hold a reference to dentry we are waiting for, so it can't go away. If it's found to be in-lookup the wait_queue_head is still alive and will remain so at least while ->d_lock is held. Moreover, the condition we are waiting for becomes true at the same point where everything on that wq gets woken up, so we can just add ourselves to the queue once. d_alloc_parallel() gets a pointer to wait_queue_head_t from its caller; lookup_slow() adjusted, d_add_ci() taught to use d_alloc_parallel() if the dentry passed to it happens to be in-lookup one (i.e. if it's been called from the parallel lookup). That's pretty much it - all that remains is to switch ->i_mutex to rwsem and have lookup_slow() take it shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | parallel lookups machinery, part 3Al Viro2016-05-033-25/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will need to be able to check if there is an in-lookup dentry with matching parent/name. Right now it's impossible, but as soon as start locking directories shared such beasts will appear. Add a secondary hash for locating those. Hash chains go through the same space where d_alias will be once it's not in-lookup anymore. Search is done under the same bitlock we use for modifications - with the primary hash we can rely on d_rehash() into the wrong chain being the worst that could happen, but here the pointers are buggered once it's removed from the chain. On the other hand, the chains are not going to be long and normally we'll end up adding to the chain anyway. That allows us to avoid bothering with ->d_lock when doing the comparisons - everything is stable until removed from chain. New helper: d_alloc_parallel(). Right now it allocates, verifies that no hashed and in-lookup matches exist and adds to in-lookup hash. Returns ERR_PTR() for error, hashed match (in the unlikely case it's been found) or new dentry. In-lookup matches trigger BUG() for now; that will change in the next commit when we introduce waiting for ongoing lookup to finish. Note that in-lookup matches won't be possible until we actually go for shared locking. lookup_slow() switched to use of d_alloc_parallel(). Again, these commits are separated only for making it easier to review. All this machinery will start doing something useful only when we go for shared locking; it's just that the combination is too large for my taste. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | parallel lookups machinery, part 2Al Viro2016-05-035-3/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need to verify that there's neither a hashed nor in-lookup dentry with desired parent/name before adding to in-lookup set. One possible solution would be to hold the parent's ->d_lock through both checks, but while the in-lookup set is relatively small at any time, dcache is not. And holding the parent's ->d_lock through something like __d_lookup_rcu() would suck too badly. So we leave the parent's ->d_lock alone, which means that we watch out for the following scenario: * we verify that there's no hashed match * existing in-lookup match gets hashed by another process * we verify that there's no in-lookup matches and decide that everything's fine. Solution: per-directory kinda-sorta seqlock, bumped around the times we hash something that used to be in-lookup or move (and hash) something in place of in-lookup. Then the above would turn into * read the counter * do dcache lookup * if no matches found, check for in-lookup matches * if there had been none of those either, check if the counter has changed; repeat if it has. The "kinda-sorta" part is due to the fact that we don't have much spare space in inode. There is a spare word (shared with i_bdev/i_cdev/i_pipe), so the counter part is not a problem, but spinlock is a different story. We could use the parent's ->d_lock, and it would be less painful in terms of contention, for __d_add() it would be rather inconvenient to grab; we could do that (using lock_parent()), but... Fortunately, we can get serialization on the counter itself, and it might be a good idea in general; we can use cmpxchg() in a loop to get from even to odd and smp_store_release() from odd to even. This commit adds the counter and updating logics; the readers will be added in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | beginning of transition to parallel lookups - marking in-lookup dentriesAl Viro2016-05-033-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | marked as such when (would be) parallel lookup is about to pass them to actual ->lookup(); unmarked when * __d_add() is about to make it hashed, positive or not. * __d_move() (from d_splice_alias(), directly or via __d_unalias()) puts a preexisting dentry in its place * in caller of ->lookup() if it has escaped all of the above. Bug (WARN_ON, actually) if it reaches the final dput() or d_instantiate() while still marked such. As the result, we are guaranteed that for as long as the flag is set, dentry will * remain negative unhashed with positive refcount * never have its ->d_alias looked at * never have its ->d_lru looked at * never have its ->d_parent and ->d_name changed Right now we have at most one such for any given parent directory. With parallel lookups that restriction will weaken to * only exist when parent is locked shared * at most one with given (parent,name) pair (comparison of names is according to ->d_compare()) * only exist when there's no hashed dentry with the same (parent,name) Transition will take the next several commits; unfortunately, we'll only be able to switch to rwsem at the end of this series. The reason for not making it a single patch is to simplify review. New primitives: d_in_lookup() (a predicate checking if dentry is in the in-lookup state) and d_lookup_done() (tells the system that we are done with lookup and if it's still marked as in-lookup, it should cease to be such). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | __d_add(): don't drop/regain ->d_lockAl Viro2016-05-031-3/+11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | lookup_slow(): bugger off on IS_DEADDIR() from the very beginningAl Viro2016-05-031-6/+17
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | nfs: missing wakeup in nfs_unblock_sillyrename()Al Viro2016-05-031-0/+1
| | | | | | | | | | | | will be needed as soon as lookups are not serialized by ->i_mutex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | make ext2_get_page() and friends work without external serializationAl Viro2016-05-035-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now ext2_get_page() (and its analogues in a bunch of other filesystems) relies upon the directory being locked - the way it sets and tests Checked and Error bits would be racy without that. Switch to a slightly different scheme, _not_ setting Checked in case of failure. That way the logics becomes if Checked => OK else if Error => fail else if !validate => fail else => OK with validation setting Checked or Error on success and failure resp. and returning which one had happened. Equivalent to the current logics, but unlike the current logics not sensitive to the order of set_bit, test_bit getting reordered by CPU, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ovl_lookup_real(): use lookup_one_len_unlocked()Al Viro2016-05-031-3/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | reconnect_one(): use lookup_one_len_unlocked()Al Viro2016-05-031-3/+7
| | | | | | | | | | | | | | ... and explain the non-obvious logics in case when lookup yields a different dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack()Al Viro2016-05-031-1/+5
| | | | | | | | | | | | ... and have it use inode_lock() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | orangefs: don't open-code inode_lock/inode_unlockAl Viro2016-05-032-4/+4
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ocfs2: don't open-code inode_lock/inode_unlockAl Viro2016-05-031-2/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | configfs_detach_prep(): make sure that wait_mutex won't go awayAl Viro2016-05-031-8/+9
| | | | | | | | | | | | | | | | grab a reference to dentry we'd got the sucker from, and return that dentry via *wait, rather than just returning the address of ->i_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | kernfs: use lookup_one_len_unlocked()Al Viro2016-05-031-3/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | security_d_instantiate(): move to the point prior to attaching dentry to inodeAl Viro2016-05-031-8/+7
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge getxattr prototype change into work.lookupsAl Viro2016-05-03107-458/+414
|\ \ | | | | | | | | | The rest of work.xattr stuff isn't needed for this branch
| * | ->getxattr(): pass dentry and inode as separate argumentsAl Viro2016-04-1134-85/+94
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | xattr_handler: pass dentry and inode as separate arguments of ->get()Al Viro2016-04-1131-114/+113
| | | | | | | | | | | | | | | | | | ... and do not assume they are already attached to each other Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | reiserfs: switch to generic_{get,set,remove}xattr()Al Viro2016-04-117-98/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway, and all reiserfs instances of ->[sg]et() call it and so does ->set_acl(). Checks for name length in the instances had been bogus; they should've been "bugger off if it's _exactly_ the prefix" (as generic would do on its own) and not "bugger off if it's shorter than the prefix" - that can't happen. xattr_full_name() is needed to adjust for the fact that generic instances will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown analogues didn't. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | cifs: kill more bogus checks in ->...xattr() methodsAl Viro2016-04-101-36/+6
| | | | | | | | | | | | | | | | | | | | | none of that stuff can ever be called for NULL or negative dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro2016-04-1030-50/+41
| | | | | | | | | | | | | | | | | | ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>