From fe6e9c1f25ac01f848bd084ee0ee62a5a0966ff3 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 23 Jun 2008 08:30:55 -0400 Subject: [PATCH] fix cgroup-inflicted breakage in block_dev.c devcgroup_inode_permission() expects MAY_FOO, not FMODE_FOO; kindly keep your misdesign consistent if you positively have to inflict it on the kernel. Signed-off-by: Al Viro --- fs/block_dev.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/block_dev.c b/fs/block_dev.c index 470c10ceb0fb..10d8a0aa871a 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -931,8 +931,16 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) struct gendisk *disk; int ret; int part; + int perm = 0; - ret = devcgroup_inode_permission(bdev->bd_inode, file->f_mode); + if (file->f_mode & FMODE_READ) + perm |= MAY_READ; + if (file->f_mode & FMODE_WRITE) + perm |= MAY_WRITE; + /* + * hooks: /n/, see "layering violations". + */ + ret = devcgroup_inode_permission(bdev->bd_inode, perm); if (ret != 0) return ret; -- cgit v1.2.3 From 12fd0d3088d27867be68655bcab2b074f2835f60 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Mon, 9 Jun 2008 21:16:07 -0700 Subject: [patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for immutable and append-only files This patch fixes utimensat() to make its behavior consistent with that of utime()/utimes() when dealing with files marked immutable and append-only. The current utimensat() implementation also returns EPERM if 'times' is non-NULL and the tv_nsec fields are both UTIME_NOW. For consistency, the (times != NULL && times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW) case should be treated like the traditional utimes() case where 'times' is NULL. That is, the call should succeed for a file marked append-only and should give the error EACCES if the file is marked as immutable. The simple way to do this is to set 'times' to NULL if (times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW). This is also the natural approach, since POSIX.1 semantics consider the times == {{x, UTIME_NOW}, {y, UTIME_NOW}} to be exactly equivalent to the case for times == NULL. (Thanks to Miklos for pointing this out.) Patch 3 in this series relies on the simplification provided by this patch. Acked-by: Miklos Szeredi Cc: Al Viro Cc: Ulrich Drepper Signed-off-by: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/utimes.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'fs') diff --git a/fs/utimes.c b/fs/utimes.c index af059d5cb485..14d3edbb3d7c 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -102,6 +102,10 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags if (error) goto dput_and_out; + if (times && times[0].tv_nsec == UTIME_NOW && + times[1].tv_nsec == UTIME_NOW) + times = NULL; + /* Don't worry, the checks are done in inode_change_ok() */ newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME; if (times) { -- cgit v1.2.3 From 94c70b9ba7e9c1036284e779e2fef5be89021533 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Mon, 9 Jun 2008 21:16:05 -0700 Subject: [patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == UTIME_OMIT or UTIME_NOW The POSIX.1 draft spec for utimensat() says that if a times[n].tv_nsec field is UTIME_OMIT or UTIME_NOW, then the value in the corresponding tv_sec field is ignored. See the last sentence of this para, from the spec: If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. If the tv_nsec field has the special value UTIME_OMIT, the file's relevant timestamp shall not be changed. In either case, the tv_sec field shall be ignored. However the current Linux implementation requires the tv_sec value to be zero (or the EINVAL error results). This requirement should be removed. Acked-by: Miklos Szeredi Cc: Al Viro Cc: Ulrich Drepper Signed-off-by: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/utimes.c | 8 -------- 1 file changed, 8 deletions(-) (limited to 'fs') diff --git a/fs/utimes.c b/fs/utimes.c index 14d3edbb3d7c..d466bc587e6e 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -173,14 +173,6 @@ asmlinkage long sys_utimensat(int dfd, char __user *filename, struct timespec __ if (utimes) { if (copy_from_user(&tstimes, utimes, sizeof(tstimes))) return -EFAULT; - if ((tstimes[0].tv_nsec == UTIME_OMIT || - tstimes[0].tv_nsec == UTIME_NOW) && - tstimes[0].tv_sec != 0) - return -EINVAL; - if ((tstimes[1].tv_nsec == UTIME_OMIT || - tstimes[1].tv_nsec == UTIME_NOW) && - tstimes[1].tv_sec != 0) - return -EINVAL; /* Nothing to do, we must not even check the path. */ if (tstimes[0].tv_nsec == UTIME_OMIT && -- cgit v1.2.3 From 4cca92264e61a90b43fc4e076cd25b7f4e16dc61 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Mon, 9 Jun 2008 21:16:08 -0700 Subject: [patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for {UTIME_NOW,UTIME_OMIT} case The POSIX.1 draft spec for utimensat() says: Only a process with the effective user ID equal to the user ID of the file or with appropriate privileges may use futimens() or utimensat() with a non-null times argument that does not have both tv_nsec fields set to UTIME_NOW and does not have both tv_nsec fields set to UTIME_OMIT. If this condition is violated, then the error EPERM should result. However, the current implementation does not generate EPERM if one tv_nsec field is UTIME_NOW while the other is UTIME_OMIT. It should give this error for that case. This patch: a) Repairs that problem. b) Removes the now unneeded nsec_special() helper function. c) Adds some comments to explain the checks that are being performed. Thanks to Miklos, who provided comments on the previous iteration of this patch. As a result, this version is a little simpler and and its logic is better structured. Miklos suggested an alternative idea, migrating the is_owner_or_cap() checks into fs/attr.c:inode_change_ok() via the use of an ATTR_OWNER_CHECK flag. Maybe we could do that later, but for now I've gone with this version, which is IMO simpler, and can be more easily read as being correct. Acked-by: Miklos Szeredi Cc: Al Viro Cc: Ulrich Drepper Signed-off-by: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/utimes.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) (limited to 'fs') diff --git a/fs/utimes.c b/fs/utimes.c index d466bc587e6e..118d1c3241be 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -40,14 +40,9 @@ asmlinkage long sys_utime(char __user *filename, struct utimbuf __user *times) #endif -static bool nsec_special(long nsec) -{ - return nsec == UTIME_OMIT || nsec == UTIME_NOW; -} - static bool nsec_valid(long nsec) { - if (nsec_special(nsec)) + if (nsec == UTIME_OMIT || nsec == UTIME_NOW) return true; return nsec >= 0 && nsec <= 999999999; @@ -106,7 +101,7 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags times[1].tv_nsec == UTIME_NOW) times = NULL; - /* Don't worry, the checks are done in inode_change_ok() */ + /* In most cases, the checks are done in inode_change_ok() */ newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME; if (times) { error = -EPERM; @@ -128,15 +123,26 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; newattrs.ia_valid |= ATTR_MTIME_SET; } - } - /* - * If times is NULL or both times are either UTIME_OMIT or - * UTIME_NOW, then need to check permissions, because - * inode_change_ok() won't do it. - */ - if (!times || (nsec_special(times[0].tv_nsec) && - nsec_special(times[1].tv_nsec))) { + /* + * For the UTIME_OMIT/UTIME_NOW and UTIME_NOW/UTIME_OMIT + * cases, we need to make an extra check that is not done by + * inode_change_ok(). + */ + if (((times[0].tv_nsec == UTIME_NOW && + times[1].tv_nsec == UTIME_OMIT) + || + (times[0].tv_nsec == UTIME_OMIT && + times[1].tv_nsec == UTIME_NOW)) + && !is_owner_or_cap(inode)) + goto mnt_drop_write_and_out; + } else { + + /* + * If times is NULL (or both times are UTIME_NOW), + * then we need to check permissions, because + * inode_change_ok() won't do it. + */ error = -EACCES; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; -- cgit v1.2.3 From c70f84417429f41519be0197a1092a53c2201f47 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Mon, 9 Jun 2008 21:16:09 -0700 Subject: [patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens() The POSIX.1 draft spec for futimens()/utimensat() says: Only a process with the effective user ID equal to the user ID of the file, *or with write access to the file*, or with appropriate privileges may use futimens() or utimensat() with a null pointer as the times argument or with both tv_nsec fields set to the special value UTIME_NOW. The important piece here is "with write access to the file", and this matters for futimens(), which deals with an argument that is a file descriptor referring to the file whose timestamps are being updated, The standard is saying that the "writability" check is based on the file permissions, not the access mode with which the file is opened. (This behavior is consistent with the semantics of FreeBSD's futimes().) However, Linux is currently doing the latter -- futimens(fd, times) is a library function implemented as utimensat(fd, NULL, times, 0) and within the utimensat() implementation we have the code: f = fget(dfd); // dfd is 'fd' ... if (f) { if (!(f->f_mode & FMODE_WRITE)) goto mnt_drop_write_and_out; The check should instead be based on the file permissions. Thanks to Miklos for pointing out how to do this check. Miklos also pointed out a simplification that could be made to my first version of this patch, since the checks for the pathname and file descriptor cases can now be conflated. Acked-by: Miklos Szeredi Cc: Al Viro Cc: Ulrich Drepper Signed-off-by: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/utimes.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) (limited to 'fs') diff --git a/fs/utimes.c b/fs/utimes.c index 118d1c3241be..b6b664e7145e 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -148,14 +148,9 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags goto mnt_drop_write_and_out; if (!is_owner_or_cap(inode)) { - if (f) { - if (!(f->f_mode & FMODE_WRITE)) - goto mnt_drop_write_and_out; - } else { - error = vfs_permission(&nd, MAY_WRITE); - if (error) - goto mnt_drop_write_and_out; - } + error = permission(inode, MAY_WRITE, NULL); + if (error) + goto mnt_drop_write_and_out; } } mutex_lock(&inode->i_mutex); -- cgit v1.2.3 From c8e7f449b225ee6c87454ac069f0a041035c5140 Mon Sep 17 00:00:00 2001 From: Jan Blunck Date: Mon, 9 Jun 2008 16:40:35 -0700 Subject: [patch 1/4] vfs: path_{get,put}() cleanups Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck Cc: Al Viro Cc: Jens Axboe Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/namei.c | 11 +++++------ fs/pipe.c | 10 ++++------ 2 files changed, 9 insertions(+), 12 deletions(-) (limited to 'fs') diff --git a/fs/namei.c b/fs/namei.c index c7e43536c49a..ee1544696e83 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -581,15 +581,13 @@ static __always_inline int link_path_walk(const char *name, struct nameidata *nd int result; /* make sure the stuff we saved doesn't go away */ - dget(save.dentry); - mntget(save.mnt); + path_get(&save); result = __link_path_walk(name, nd); if (result == -ESTALE) { /* nd->path had been dropped */ nd->path = save; - dget(nd->path.dentry); - mntget(nd->path.mnt); + path_get(&nd->path); nd->flags |= LOOKUP_REVAL; result = __link_path_walk(name, nd); } @@ -1216,8 +1214,9 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt, nd->flags = flags; nd->depth = 0; - nd->path.mnt = mntget(mnt); - nd->path.dentry = dget(dentry); + nd->path.dentry = dentry; + nd->path.mnt = mnt; + path_get(&nd->path); retval = path_walk(name, nd); if (unlikely(!retval && !audit_dummy_context() && nd->path.dentry && diff --git a/fs/pipe.c b/fs/pipe.c index ec228bc9f882..700f4e0d9572 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1003,8 +1003,7 @@ struct file *create_write_pipe(void) void free_write_pipe(struct file *f) { free_pipe_info(f->f_dentry->d_inode); - dput(f->f_path.dentry); - mntput(f->f_path.mnt); + path_put(&f->f_path); put_filp(f); } @@ -1015,8 +1014,8 @@ struct file *create_read_pipe(struct file *wrf) return ERR_PTR(-ENFILE); /* Grab pipe from the writer */ - f->f_path.mnt = mntget(wrf->f_path.mnt); - f->f_path.dentry = dget(wrf->f_path.dentry); + f->f_path = wrf->f_path; + path_get(&wrf->f_path); f->f_mapping = wrf->f_path.dentry->d_inode->i_mapping; f->f_pos = 0; @@ -1068,8 +1067,7 @@ int do_pipe(int *fd) err_fdr: put_unused_fd(fdr); err_read_pipe: - dput(fr->f_dentry); - mntput(fr->f_vfsmnt); + path_put(&fr->f_path); put_filp(fr); err_write_pipe: free_write_pipe(fw); -- cgit v1.2.3 From 20d4fdc1a788e4ca0aaf2422772ba668e7e10839 Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Mon, 9 Jun 2008 16:40:36 -0700 Subject: [patch 2/4] fs: make struct file arg to d_path const Signed-off-by: Jan Engelhardt Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 2 +- include/linux/dcache.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index 3ee588d5f585..c4c9072d810c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1847,7 +1847,7 @@ Elong: * * "buflen" should be positive. Caller holds the dcache_lock. */ -char *d_path(struct path *path, char *buf, int buflen) +char *d_path(const struct path *path, char *buf, int buflen) { char *res; struct path root; diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 2a6639407c80..d982eb89c77d 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -300,7 +300,7 @@ extern int d_validate(struct dentry *, struct dentry *); extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...); extern char *__d_path(const struct path *path, struct path *root, char *, int); -extern char *d_path(struct path *, char *, int); +extern char *d_path(const struct path *, char *, int); extern char *dentry_path(struct dentry *, char *, int); /* Allocation counts.. */ -- cgit v1.2.3 From 694a1764d657e0f7a9b139bc7269c8d5f5a2534b Mon Sep 17 00:00:00 2001 From: Marcin Slusarz Date: Mon, 9 Jun 2008 16:40:37 -0700 Subject: [patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink generic_readlink calls ERR_PTR for negative and positive values (vfs_readlink returns length of "link"), but it should not (not an errno) and does not need to. Signed-off-by: Marcin Slusarz Cc: Al Viro Cc: Christoph Hellwig Acked-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/namei.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/namei.c b/fs/namei.c index ee1544696e83..01e67dddcc3d 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2856,16 +2856,17 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen) { struct nameidata nd; void *cookie; + int res; nd.depth = 0; cookie = dentry->d_inode->i_op->follow_link(dentry, &nd); - if (!IS_ERR(cookie)) { - int res = vfs_readlink(dentry, buffer, buflen, nd_get_link(&nd)); - if (dentry->d_inode->i_op->put_link) - dentry->d_inode->i_op->put_link(dentry, &nd, cookie); - cookie = ERR_PTR(res); - } - return PTR_ERR(cookie); + if (IS_ERR(cookie)) + return PTR_ERR(cookie); + + res = vfs_readlink(dentry, buffer, buflen, nd_get_link(&nd)); + if (dentry->d_inode->i_op->put_link) + dentry->d_inode->i_op->put_link(dentry, &nd, cookie); + return res; } int vfs_follow_link(struct nameidata *nd, const char *link) -- cgit v1.2.3 From f9f48ec72bfc9489a30bc6ddbfcf27d86a8bc651 Mon Sep 17 00:00:00 2001 From: "Denis V. Lunev" Date: Mon, 9 Jun 2008 16:40:38 -0700 Subject: [patch 4/4] flock: remove unused fields from file_lock_operations fl_insert and fl_remove are not used right now in the kernel. Remove them. Signed-off-by: Denis V. Lunev Cc: Matthew Wilcox Cc: Alexander Viro Cc: "J. Bruce Fields" Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/locks.c | 6 ------ include/linux/fs.h | 2 -- 2 files changed, 8 deletions(-) (limited to 'fs') diff --git a/fs/locks.c b/fs/locks.c index 11dbf08651b7..dce8c747371c 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -561,9 +561,6 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) /* insert into file's list */ fl->fl_next = *pos; *pos = fl; - - if (fl->fl_ops && fl->fl_ops->fl_insert) - fl->fl_ops->fl_insert(fl); } /* @@ -586,9 +583,6 @@ static void locks_delete_lock(struct file_lock **thisfl_p) fl->fl_fasync = NULL; } - if (fl->fl_ops && fl->fl_ops->fl_remove) - fl->fl_ops->fl_remove(fl); - if (fl->fl_nspid) { put_pid(fl->fl_nspid); fl->fl_nspid = NULL; diff --git a/include/linux/fs.h b/include/linux/fs.h index d490779f18d9..7c1080826832 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -894,8 +894,6 @@ static inline int file_check_writeable(struct file *filp) typedef struct files_struct *fl_owner_t; struct file_lock_operations { - void (*fl_insert)(struct file_lock *); /* lock insertion callback */ - void (*fl_remove)(struct file_lock *); /* lock removal callback */ void (*fl_copy_lock)(struct file_lock *, struct file_lock *); void (*fl_release_private)(struct file_lock *); }; -- cgit v1.2.3 From be285c712bbbe5db43e503782fbef2bfeaa345f9 Mon Sep 17 00:00:00 2001 From: Andreas Gruenbacher Date: Mon, 16 Jun 2008 13:28:07 +0200 Subject: [patch 3/3] vfs: make d_path() consistent across mount operations The path that __d_path() computes can become slightly inconsistent when it races with mount operations: it grabs the vfsmount_lock when traversing mount points but immediately drops it again, only to re-grab it when it reaches the next mount point. The result is that the filename computed is not always consisent, and the file may never have had that name. (This is unlikely, but still possible.) Fix this by grabbing the vfsmount_lock for the whole duration of __d_path(). Signed-off-by: Andreas Gruenbacher Signed-off-by: John Johansen Signed-off-by: Miklos Szeredi Acked-by: Christoph Hellwig Signed-off-by: Al Viro --- fs/dcache.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index c4c9072d810c..2b479de10a0a 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1782,6 +1782,7 @@ char *__d_path(const struct path *path, struct path *root, char * end = buffer+buflen; char * retval; + spin_lock(&vfsmount_lock); prepend(&end, &buflen, "\0", 1); if (!IS_ROOT(dentry) && d_unhashed(dentry) && (prepend(&end, &buflen, " (deleted)", 10) != 0)) @@ -1800,14 +1801,11 @@ char *__d_path(const struct path *path, struct path *root, break; if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) { /* Global root? */ - spin_lock(&vfsmount_lock); if (vfsmnt->mnt_parent == vfsmnt) { - spin_unlock(&vfsmount_lock); goto global_root; } dentry = vfsmnt->mnt_mountpoint; vfsmnt = vfsmnt->mnt_parent; - spin_unlock(&vfsmount_lock); continue; } parent = dentry->d_parent; @@ -1820,6 +1818,8 @@ char *__d_path(const struct path *path, struct path *root, dentry = parent; } +out: + spin_unlock(&vfsmount_lock); return retval; global_root: @@ -1829,9 +1829,11 @@ global_root: goto Elong; root->mnt = vfsmnt; root->dentry = dentry; - return retval; + goto out; + Elong: - return ERR_PTR(-ENAMETOOLONG); + retval = ERR_PTR(-ENAMETOOLONG); + goto out; } /** -- cgit v1.2.3 From 31f3e0b3a18c6d48196c40a82a3b8c01f4ff6b23 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 23 Jun 2008 18:11:52 +0200 Subject: [patch 1/3] vfs: dcache sparse fixes Fix the following sparse warnings: fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static? fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block Signed-off-by: Miklos Szeredi Reviewed-by: Matthew Wilcox Acked-by: Christoph Hellwig Signed-off-by: Al Viro --- fs/dcache.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index 2b479de10a0a..e4b2b9436b32 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -106,9 +107,10 @@ static void dentry_lru_remove(struct dentry *dentry) /* * Release the dentry's inode, using the filesystem * d_iput() operation if defined. - * Called with dcache_lock and per dentry lock held, drops both. */ static void dentry_iput(struct dentry * dentry) + __releases(dentry->d_lock) + __releases(dcache_lock) { struct inode *inode = dentry->d_inode; if (inode) { @@ -132,12 +134,13 @@ static void dentry_iput(struct dentry * dentry) * d_kill - kill dentry and return parent * @dentry: dentry to kill * - * Called with dcache_lock and d_lock, releases both. The dentry must - * already be unhashed and removed from the LRU. + * The dentry must already be unhashed and removed from the LRU. * * If this is the root of the dentry tree, return NULL. */ static struct dentry *d_kill(struct dentry *dentry) + __releases(dentry->d_lock) + __releases(dcache_lock) { struct dentry *parent; @@ -383,11 +386,11 @@ restart: * Try to prune ancestors as well. This is necessary to prevent * quadratic behavior of shrink_dcache_parent(), but is also expected * to be beneficial in reducing dentry cache fragmentation. - * - * Called with dcache_lock, drops it and then regains. - * Called with dentry->d_lock held, drops it. */ static void prune_one_dentry(struct dentry * dentry) + __releases(dentry->d_lock) + __releases(dcache_lock) + __acquires(dcache_lock) { __d_drop(dentry); dentry = d_kill(dentry); @@ -1604,10 +1607,9 @@ static int d_isparent(struct dentry *p1, struct dentry *p2) * * Note: If ever the locking in lock_rename() changes, then please * remember to update this too... - * - * On return, dcache_lock will have been unlocked. */ static struct dentry *__d_unalias(struct dentry *dentry, struct dentry *alias) + __releases(dcache_lock) { struct mutex *m1 = NULL, *m2 = NULL; struct dentry *ret; @@ -1743,7 +1745,6 @@ out_nolock: shouldnt_be_hashed: spin_unlock(&dcache_lock); BUG(); - goto shouldnt_be_hashed; } static int prepend(char **buffer, int *buflen, const char *str, @@ -1758,7 +1759,7 @@ static int prepend(char **buffer, int *buflen, const char *str, } /** - * d_path - return the path of a dentry + * __d_path - return the path of a dentry * @path: the dentry/vfsmount to report * @root: root vfsmnt/dentry (may be modified by this function) * @buffer: buffer to return value in @@ -1847,7 +1848,7 @@ Elong: * * Returns the buffer or an error code if the path was too long. * - * "buflen" should be positive. Caller holds the dcache_lock. + * "buflen" should be positive. */ char *d_path(const struct path *path, char *buf, int buflen) { -- cgit v1.2.3 From cdd16d0265c9234228fd37fbbad844d7e894b278 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 23 Jun 2008 18:11:53 +0200 Subject: [patch 2/3] vfs: dcache cleanups Comment from Al Viro: add prepend_name() wrapper. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro --- fs/dcache.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index e4b2b9436b32..6068c25b393c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1747,8 +1747,7 @@ shouldnt_be_hashed: BUG(); } -static int prepend(char **buffer, int *buflen, const char *str, - int namelen) +static int prepend(char **buffer, int *buflen, const char *str, int namelen) { *buflen -= namelen; if (*buflen < 0) @@ -1758,6 +1757,11 @@ static int prepend(char **buffer, int *buflen, const char *str, return 0; } +static int prepend_name(char **buffer, int *buflen, struct qstr *name) +{ + return prepend(buffer, buflen, name->name, name->len); +} + /** * __d_path - return the path of a dentry * @path: the dentry/vfsmount to report @@ -1780,8 +1784,8 @@ char *__d_path(const struct path *path, struct path *root, { struct dentry *dentry = path->dentry; struct vfsmount *vfsmnt = path->mnt; - char * end = buffer+buflen; - char * retval; + char *end = buffer + buflen; + char *retval; spin_lock(&vfsmount_lock); prepend(&end, &buflen, "\0", 1); @@ -1811,8 +1815,7 @@ char *__d_path(const struct path *path, struct path *root, } parent = dentry->d_parent; prefetch(parent); - if ((prepend(&end, &buflen, dentry->d_name.name, - dentry->d_name.len) != 0) || + if ((prepend_name(&end, &buflen, &dentry->d_name) != 0) || (prepend(&end, &buflen, "/", 1) != 0)) goto Elong; retval = end; @@ -1825,8 +1828,7 @@ out: global_root: retval += 1; /* hit the slash */ - if (prepend(&retval, &buflen, dentry->d_name.name, - dentry->d_name.len) != 0) + if (prepend_name(&retval, &buflen, &dentry->d_name) != 0) goto Elong; root->mnt = vfsmnt; root->dentry = dentry; @@ -1918,16 +1920,11 @@ char *dentry_path(struct dentry *dentry, char *buf, int buflen) retval = end-1; *retval = '/'; - for (;;) { - struct dentry *parent; - if (IS_ROOT(dentry)) - break; + while (!IS_ROOT(dentry)) { + struct dentry *parent = dentry->d_parent; - parent = dentry->d_parent; prefetch(parent); - - if ((prepend(&end, &buflen, dentry->d_name.name, - dentry->d_name.len) != 0) || + if ((prepend_name(&end, &buflen, &dentry->d_name) != 0) || (prepend(&end, &buflen, "/", 1) != 0)) goto Elong; @@ -1978,7 +1975,7 @@ asmlinkage long sys_getcwd(char __user *buf, unsigned long size) error = -ENOENT; /* Has the current directory has been unlinked? */ spin_lock(&dcache_lock); - if (pwd.dentry->d_parent == pwd.dentry || !d_unhashed(pwd.dentry)) { + if (IS_ROOT(pwd.dentry) || !d_unhashed(pwd.dentry)) { unsigned long len; struct path tmp = root; char * cwd; -- cgit v1.2.3 From e8183c2452041326c95258ecc7865b6fcd91c730 Mon Sep 17 00:00:00 2001 From: Tomas Janousek Date: Mon, 23 Jun 2008 15:12:35 +0200 Subject: udf: Fix regression in UDF anchor block detection In some cases it could happen that some block passed test in udf_check_anchor_block() even though udf_read_tagged() refused to read it later (e.g. because checksum was not correct). This patch makes udf_check_anchor_block() use udf_read_tagged() so that the checking is stricter. This fixes the regression (certain disks unmountable) caused by commit 423cf6dc04eb79d441bfda2b127bc4b57134b41d. Signed-off-by: Tomas Janousek Signed-off-by: Jan Kara --- fs/udf/super.c | 57 +++++++++++++++++++++++---------------------------------- 1 file changed, 23 insertions(+), 34 deletions(-) (limited to 'fs') diff --git a/fs/udf/super.c b/fs/udf/super.c index 7a5f69be6ac2..44cc702f96cc 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -682,38 +682,26 @@ static int udf_vrs(struct super_block *sb, int silent) /* * Check whether there is an anchor block in the given block */ -static int udf_check_anchor_block(struct super_block *sb, sector_t block, - bool varconv) +static int udf_check_anchor_block(struct super_block *sb, sector_t block) { - struct buffer_head *bh = NULL; - tag *t; + struct buffer_head *bh; uint16_t ident; - uint32_t location; - if (varconv) { - if (udf_fixed_to_variable(block) >= - sb->s_bdev->bd_inode->i_size >> sb->s_blocksize_bits) - return 0; - bh = sb_bread(sb, udf_fixed_to_variable(block)); - } - else - bh = sb_bread(sb, block); + if (UDF_QUERY_FLAG(sb, UDF_FLAG_VARCONV) && + udf_fixed_to_variable(block) >= + sb->s_bdev->bd_inode->i_size >> sb->s_blocksize_bits) + return 0; + bh = udf_read_tagged(sb, block, block, &ident); if (!bh) return 0; - - t = (tag *)bh->b_data; - ident = le16_to_cpu(t->tagIdent); - location = le32_to_cpu(t->tagLocation); brelse(bh); - if (ident != TAG_IDENT_AVDP) - return 0; - return location == block; + + return ident == TAG_IDENT_AVDP; } /* Search for an anchor volume descriptor pointer */ -static sector_t udf_scan_anchors(struct super_block *sb, bool varconv, - sector_t lastblock) +static sector_t udf_scan_anchors(struct super_block *sb, sector_t lastblock) { sector_t last[6]; int i; @@ -739,7 +727,7 @@ static sector_t udf_scan_anchors(struct super_block *sb, bool varconv, sb->s_blocksize_bits) continue; - if (udf_check_anchor_block(sb, last[i], varconv)) { + if (udf_check_anchor_block(sb, last[i])) { sbi->s_anchor[0] = last[i]; sbi->s_anchor[1] = last[i] - 256; return last[i]; @@ -748,17 +736,17 @@ static sector_t udf_scan_anchors(struct super_block *sb, bool varconv, if (last[i] < 256) continue; - if (udf_check_anchor_block(sb, last[i] - 256, varconv)) { + if (udf_check_anchor_block(sb, last[i] - 256)) { sbi->s_anchor[1] = last[i] - 256; return last[i]; } } - if (udf_check_anchor_block(sb, sbi->s_session + 256, varconv)) { + if (udf_check_anchor_block(sb, sbi->s_session + 256)) { sbi->s_anchor[0] = sbi->s_session + 256; return last[0]; } - if (udf_check_anchor_block(sb, sbi->s_session + 512, varconv)) { + if (udf_check_anchor_block(sb, sbi->s_session + 512)) { sbi->s_anchor[0] = sbi->s_session + 512; return last[0]; } @@ -780,23 +768,24 @@ static void udf_find_anchor(struct super_block *sb) int i; struct udf_sb_info *sbi = UDF_SB(sb); - lastblock = udf_scan_anchors(sb, 0, sbi->s_last_block); + lastblock = udf_scan_anchors(sb, sbi->s_last_block); if (lastblock) goto check_anchor; /* No anchor found? Try VARCONV conversion of block numbers */ + UDF_SET_FLAG(sb, UDF_FLAG_VARCONV); /* Firstly, we try to not convert number of the last block */ - lastblock = udf_scan_anchors(sb, 1, + lastblock = udf_scan_anchors(sb, udf_variable_to_fixed(sbi->s_last_block)); - if (lastblock) { - UDF_SET_FLAG(sb, UDF_FLAG_VARCONV); + if (lastblock) goto check_anchor; - } /* Secondly, we try with converted number of the last block */ - lastblock = udf_scan_anchors(sb, 1, sbi->s_last_block); - if (lastblock) - UDF_SET_FLAG(sb, UDF_FLAG_VARCONV); + lastblock = udf_scan_anchors(sb, sbi->s_last_block); + if (!lastblock) { + /* VARCONV didn't help. Clear it. */ + UDF_CLEAR_FLAG(sb, UDF_FLAG_VARCONV); + } check_anchor: /* -- cgit v1.2.3 From 18ce3751ccd488c78d3827e9f6bf54e6322676fb Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 1 Jul 2008 09:07:34 +0200 Subject: Properly notify block layer of sync writes fsync_buffers_list() and sync_dirty_buffer() both issue async writes and then immediately wait on them. Conceptually, that makes them sync writes and we should treat them as such so that the IO schedulers can handle them appropriately. This patch fixes a write starvation issue that Lin Ming reported, where xx is stuck for more than 2 minutes because of a large number of synchronous IO in the system: INFO: task kjournald:20558 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff810010820978 6712 20558 2 ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2 ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537 Call Trace: [] kobject_get+0x12/0x17 [] getnstimeofday+0x2f/0x83 [] sync_buffer+0x0/0x3f [] io_schedule+0x5d/0x9f [] sync_buffer+0x3b/0x3f [] __wait_on_bit+0x40/0x6f [] sync_buffer+0x0/0x3f [] out_of_line_wait_on_bit+0x6c/0x78 [] wake_bit_function+0x0/0x23 [] sync_dirty_buffer+0x98/0xcb [] journal_commit_transaction+0x97d/0xcb6 [] lock_timer_base+0x26/0x4b [] kjournald+0xc1/0x1fb [] autoremove_wake_function+0x0/0x2e [] kjournald+0x0/0x1fb [] kthread+0x47/0x74 [] schedule_tail+0x28/0x5d [] child_rip+0xa/0x12 [] kthread+0x0/0x74 [] child_rip+0x0/0x12 Lin Ming confirms that this patch fixes the issue. I've run tests with it for the past week and no ill effects have been observed, so I'm proposing it for inclusion into 2.6.26. Signed-off-by: Jens Axboe --- fs/buffer.c | 13 ++++++++----- include/linux/fs.h | 1 + 2 files changed, 9 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/buffer.c b/fs/buffer.c index a073f3f4f013..0f51c0f7c266 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -821,7 +821,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list) * contents - it is a noop if I/O is still in * flight on potentially older contents. */ - ll_rw_block(SWRITE, 1, &bh); + ll_rw_block(SWRITE_SYNC, 1, &bh); brelse(bh); spin_lock(lock); } @@ -2940,16 +2940,19 @@ void ll_rw_block(int rw, int nr, struct buffer_head *bhs[]) for (i = 0; i < nr; i++) { struct buffer_head *bh = bhs[i]; - if (rw == SWRITE) + if (rw == SWRITE || rw == SWRITE_SYNC) lock_buffer(bh); else if (test_set_buffer_locked(bh)) continue; - if (rw == WRITE || rw == SWRITE) { + if (rw == WRITE || rw == SWRITE || rw == SWRITE_SYNC) { if (test_clear_buffer_dirty(bh)) { bh->b_end_io = end_buffer_write_sync; get_bh(bh); - submit_bh(WRITE, bh); + if (rw == SWRITE_SYNC) + submit_bh(WRITE_SYNC, bh); + else + submit_bh(WRITE, bh); continue; } } else { @@ -2978,7 +2981,7 @@ int sync_dirty_buffer(struct buffer_head *bh) if (test_clear_buffer_dirty(bh)) { get_bh(bh); bh->b_end_io = end_buffer_write_sync; - ret = submit_bh(WRITE, bh); + ret = submit_bh(WRITE_SYNC, bh); wait_on_buffer(bh); if (buffer_eopnotsupp(bh)) { clear_buffer_eopnotsupp(bh); diff --git a/include/linux/fs.h b/include/linux/fs.h index 7c1080826832..d8e2762ed14d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -83,6 +83,7 @@ extern int dir_notify_enable; #define READ_SYNC (READ | (1 << BIO_RW_SYNC)) #define READ_META (READ | (1 << BIO_RW_META)) #define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC)) +#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC)) #define WRITE_BARRIER ((1 << BIO_RW) | (1 << BIO_RW_BARRIER)) #define SEL_IN 1 -- cgit v1.2.3 From 2e4bef41a0f7df31be140ef354b9c12f2299016a Mon Sep 17 00:00:00 2001 From: Eric Van Hensbergen Date: Tue, 24 Jun 2008 17:39:39 -0500 Subject: 9p: fix O_APPEND in legacy mode The legacy protocol's open operation doesn't handle an append operation (it is expected that the client take care of it). We were incorrectly passing the extended protocol's flag through even in legacy mode. This was reported in bugzilla report #10689. This patch fixes the problem by disallowing extended protocol open modes from being passed in legacy mode and implemented append functionality on the client side by adding a seek after the open. Signed-off-by: Eric Van Hensbergen --- fs/9p/v9fs_vfs.h | 2 +- fs/9p/vfs_file.c | 4 +++- fs/9p/vfs_inode.c | 18 ++++++++++-------- 3 files changed, 14 insertions(+), 10 deletions(-) (limited to 'fs') diff --git a/fs/9p/v9fs_vfs.h b/fs/9p/v9fs_vfs.h index fd01d90cada5..57997fa14e69 100644 --- a/fs/9p/v9fs_vfs.h +++ b/fs/9p/v9fs_vfs.h @@ -51,4 +51,4 @@ int v9fs_dir_release(struct inode *inode, struct file *filp); int v9fs_file_open(struct inode *inode, struct file *file); void v9fs_inode2stat(struct inode *inode, struct p9_stat *stat); void v9fs_dentry_release(struct dentry *); -int v9fs_uflags2omode(int uflags); +int v9fs_uflags2omode(int uflags, int extended); diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index 0d55affe37d4..52944d2249a4 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -59,7 +59,7 @@ int v9fs_file_open(struct inode *inode, struct file *file) P9_DPRINTK(P9_DEBUG_VFS, "inode: %p file: %p \n", inode, file); v9ses = v9fs_inode2v9ses(inode); - omode = v9fs_uflags2omode(file->f_flags); + omode = v9fs_uflags2omode(file->f_flags, v9fs_extended(v9ses)); fid = file->private_data; if (!fid) { fid = v9fs_fid_clone(file->f_path.dentry); @@ -75,6 +75,8 @@ int v9fs_file_open(struct inode *inode, struct file *file) inode->i_size = 0; inode->i_blocks = 0; } + if ((file->f_flags & O_APPEND) && (!v9fs_extended(v9ses))) + generic_file_llseek(file, 0, SEEK_END); } file->private_data = fid; diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 40fa807bd929..c95295c65045 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -132,10 +132,10 @@ static int p9mode2unixmode(struct v9fs_session_info *v9ses, int mode) /** * v9fs_uflags2omode- convert posix open flags to plan 9 mode bits * @uflags: flags to convert - * + * @extended: if .u extensions are active */ -int v9fs_uflags2omode(int uflags) +int v9fs_uflags2omode(int uflags, int extended) { int ret; @@ -155,14 +155,16 @@ int v9fs_uflags2omode(int uflags) break; } - if (uflags & O_EXCL) - ret |= P9_OEXCL; - if (uflags & O_TRUNC) ret |= P9_OTRUNC; - if (uflags & O_APPEND) - ret |= P9_OAPPEND; + if (extended) { + if (uflags & O_EXCL) + ret |= P9_OEXCL; + + if (uflags & O_APPEND) + ret |= P9_OAPPEND; + } return ret; } @@ -506,7 +508,7 @@ v9fs_vfs_create(struct inode *dir, struct dentry *dentry, int mode, flags = O_RDWR; fid = v9fs_create(v9ses, dir, dentry, NULL, perm, - v9fs_uflags2omode(flags)); + v9fs_uflags2omode(flags, v9fs_extended(v9ses))); if (IS_ERR(fid)) { err = PTR_ERR(fid); fid = NULL; -- cgit v1.2.3 From f5c8f7dae75e1e6bb3200fc61302e4d5e2df3dc2 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 4 Jul 2008 09:59:33 -0700 Subject: ext3: add missing unlock to error path in ext3_quota_write() When write in ext3_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ext3/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ext3/super.c b/fs/ext3/super.c index fe3119a71ada..2845425077e8 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -2875,8 +2875,10 @@ static ssize_t ext3_quota_write(struct super_block *sb, int type, blk++; } out: - if (len == towrite) + if (len == towrite) { + mutex_unlock(&inode->i_mutex); return err; + } if (inode->i_size < off+len-towrite) { i_size_write(inode, off+len-towrite); EXT3_I(inode)->i_disksize = inode->i_size; -- cgit v1.2.3 From 4d04e4fbf8fc9f5136a64d45e2c20de095c08efb Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 4 Jul 2008 09:59:34 -0700 Subject: ext4: add missing unlock to an error path in ext4_quota_write() When write in ext4_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ext4/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index cb96f127c366..02bf24343979 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3337,8 +3337,10 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, blk++; } out: - if (len == towrite) + if (len == towrite) { + mutex_unlock(&inode->i_mutex); return err; + } if (inode->i_size < off+len-towrite) { i_size_write(inode, off+len-towrite); EXT4_I(inode)->i_disksize = inode->i_size; -- cgit v1.2.3 From 10dd08dc04c881dcc9f7f19e2a3ad8e0778e4db5 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 4 Jul 2008 09:59:34 -0700 Subject: reiserfs: add missing unlock to an error path in reiserfs_quota_write() When write in reiserfs_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/reiserfs/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index ed424d708e69..1d40f2bd1970 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -2165,8 +2165,10 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type, blk++; } out: - if (len == towrite) + if (len == towrite) { + mutex_unlock(&inode->i_mutex); return err; + } if (inode->i_size < off + len - towrite) i_size_write(inode, off + len - towrite); inode->i_version++; -- cgit v1.2.3 From c4a2d7fbec3029c8891a3ad5fceec2992096a3b7 Mon Sep 17 00:00:00 2001 From: Michael Halcrow Date: Fri, 4 Jul 2008 09:59:35 -0700 Subject: ecryptfs: remove unnecessary mux from ecryptfs_init_ecryptfs_miscdev() The misc_mtx should provide all the protection required to keep the daemon hash table sane during miscdev registration. Since this mutex is causing gratuitous lockdep warnings, this patch removes it. Signed-off-by: Michael Halcrow Reported-by: Cyrill Gorcunov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ecryptfs/miscdev.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'fs') diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c index 50c994a249a5..09a4522f65e6 100644 --- a/fs/ecryptfs/miscdev.c +++ b/fs/ecryptfs/miscdev.c @@ -575,13 +575,11 @@ int ecryptfs_init_ecryptfs_miscdev(void) int rc; atomic_set(&ecryptfs_num_miscdev_opens, 0); - mutex_lock(&ecryptfs_daemon_hash_mux); rc = misc_register(&ecryptfs_miscdev); if (rc) printk(KERN_ERR "%s: Failed to register miscellaneous device " "for communications with userspace daemons; rc = [%d]\n", __func__, rc); - mutex_unlock(&ecryptfs_daemon_hash_mux); return rc; } -- cgit v1.2.3 From 337e2ab5d1efca56c6fdd57bffaea7e7899e7283 Mon Sep 17 00:00:00 2001 From: Jess Guerrero Date: Fri, 4 Jul 2008 09:59:50 -0700 Subject: ntfs: update help text The url in the help text for ntfs should be updated. Acked-by: Anton Altaparmakov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/Kconfig b/fs/Kconfig index cf12c403b8c7..2694648cbd1b 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -830,7 +830,7 @@ config NTFS_FS from the project web site. For more information see - and . + and . To compile this file system support as a module, choose M here: the module will be called ntfs. -- cgit v1.2.3 From 6d1029b56329b1cc9b7233e5333c1a48ddbbfad8 Mon Sep 17 00:00:00 2001 From: Akinobu Mita Date: Fri, 4 Jul 2008 09:59:51 -0700 Subject: add kernel-doc for simple_read_from_buffer and memory_read_from_buffer Add kernel-doc comments describing simple_read_from_buffer and memory_read_from_buffer. Signed-off-by: Akinobu Mita Cc: Christoph Hellwig Cc: "Randy.Dunlap" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/libfs.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'fs') diff --git a/fs/libfs.c b/fs/libfs.c index 892d41cb3382..baeb71ee1cde 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -512,6 +512,20 @@ void simple_release_fs(struct vfsmount **mount, int *count) mntput(mnt); } +/** + * simple_read_from_buffer - copy data from the buffer to user space + * @to: the user space buffer to read to + * @count: the maximum number of bytes to read + * @ppos: the current position in the buffer + * @from: the buffer to read from + * @available: the size of the buffer + * + * The simple_read_from_buffer() function reads up to @count bytes from the + * buffer @from at offset @ppos into the user space address starting at @to. + * + * On success, the number of bytes read is returned and the offset @ppos is + * advanced by this number, or negative value is returned on error. + **/ ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos, const void *from, size_t available) { @@ -528,6 +542,20 @@ ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos, return count; } +/** + * memory_read_from_buffer - copy data from the buffer + * @to: the kernel space buffer to read to + * @count: the maximum number of bytes to read + * @ppos: the current position in the buffer + * @from: the buffer to read from + * @available: the size of the buffer + * + * The memory_read_from_buffer() function reads up to @count bytes from the + * buffer @from at offset @ppos into the kernel space address starting at @to. + * + * On success, the number of bytes read is returned and the offset @ppos is + * advanced by this number, or negative value is returned on error. + **/ ssize_t memory_read_from_buffer(void *to, size_t count, loff_t *ppos, const void *from, size_t available) { -- cgit v1.2.3 From 086f7316f0d400806d76323beefae996bb3849b1 Mon Sep 17 00:00:00 2001 From: "Andrew G. Morgan" Date: Fri, 4 Jul 2008 09:59:58 -0700 Subject: security: filesystem capabilities: fix fragile setuid fixup code This commit includes a bugfix for the fragile setuid fixup code in the case that filesystem capabilities are supported (in access()). The effect of this fix is gated on filesystem capability support because changing securebits is only supported when filesystem capabilities support is configured.) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrew G. Morgan Acked-by: Serge Hallyn Acked-by: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/open.c | 37 ++++++++++++++++++++++--------------- include/linux/capability.h | 2 ++ include/linux/securebits.h | 15 ++++++++------- kernel/capability.c | 21 +++++++++++++++++++++ 4 files changed, 53 insertions(+), 22 deletions(-) (limited to 'fs') diff --git a/fs/open.c b/fs/open.c index a1450086e92f..a99ad09c3197 100644 --- a/fs/open.c +++ b/fs/open.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -425,7 +426,7 @@ asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode) { struct nameidata nd; int old_fsuid, old_fsgid; - kernel_cap_t old_cap; + kernel_cap_t uninitialized_var(old_cap); /* !SECURE_NO_SETUID_FIXUP */ int res; if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */ @@ -433,23 +434,27 @@ asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode) old_fsuid = current->fsuid; old_fsgid = current->fsgid; - old_cap = current->cap_effective; current->fsuid = current->uid; current->fsgid = current->gid; - /* - * Clear the capabilities if we switch to a non-root user - * - * FIXME: There is a race here against sys_capset. The - * capabilities can change yet we will restore the old - * value below. We should hold task_capabilities_lock, - * but we cannot because user_path_walk can sleep. - */ - if (current->uid) - cap_clear(current->cap_effective); - else - current->cap_effective = current->cap_permitted; + if (!issecure(SECURE_NO_SETUID_FIXUP)) { + /* + * Clear the capabilities if we switch to a non-root user + */ +#ifndef CONFIG_SECURITY_FILE_CAPABILITIES + /* + * FIXME: There is a race here against sys_capset. The + * capabilities can change yet we will restore the old + * value below. We should hold task_capabilities_lock, + * but we cannot because user_path_walk can sleep. + */ +#endif /* ndef CONFIG_SECURITY_FILE_CAPABILITIES */ + if (current->uid) + old_cap = cap_set_effective(__cap_empty_set); + else + old_cap = cap_set_effective(current->cap_permitted); + } res = __user_walk_fd(dfd, filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, &nd); if (res) @@ -478,7 +483,9 @@ out_path_release: out: current->fsuid = old_fsuid; current->fsgid = old_fsgid; - current->cap_effective = old_cap; + + if (!issecure(SECURE_NO_SETUID_FIXUP)) + cap_set_effective(old_cap); return res; } diff --git a/include/linux/capability.h b/include/linux/capability.h index fa830f8de032..02673846d205 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -501,6 +501,8 @@ extern const kernel_cap_t __cap_empty_set; extern const kernel_cap_t __cap_full_set; extern const kernel_cap_t __cap_init_eff_set; +kernel_cap_t cap_set_effective(const kernel_cap_t pE_new); + int capable(int cap); int __capable(struct task_struct *t, int cap); diff --git a/include/linux/securebits.h b/include/linux/securebits.h index c1f19dbceb05..92f09bdf1175 100644 --- a/include/linux/securebits.h +++ b/include/linux/securebits.h @@ -7,14 +7,15 @@ inheritance of root-permissions and suid-root executable under compatibility mode. We raise the effective and inheritable bitmasks *of the executable file* if the effective uid of the new process is - 0. If the real uid is 0, we raise the inheritable bitmask of the + 0. If the real uid is 0, we raise the effective (legacy) bit of the executable file. */ #define SECURE_NOROOT 0 #define SECURE_NOROOT_LOCKED 1 /* make bit-0 immutable */ -/* When set, setuid to/from uid 0 does not trigger capability-"fixes" - to be compatible with old programs relying on set*uid to loose - privileges. When unset, setuid doesn't change privileges. */ +/* When set, setuid to/from uid 0 does not trigger capability-"fixup". + When unset, to provide compatiblility with old programs relying on + set*uid to gain/lose privilege, transitions to/from uid 0 cause + capabilities to be gained/lost. */ #define SECURE_NO_SETUID_FIXUP 2 #define SECURE_NO_SETUID_FIXUP_LOCKED 3 /* make bit-2 immutable */ @@ -26,10 +27,10 @@ #define SECURE_KEEP_CAPS 4 #define SECURE_KEEP_CAPS_LOCKED 5 /* make bit-4 immutable */ -/* Each securesetting is implemented using two bits. One bit specify +/* Each securesetting is implemented using two bits. One bit specifies whether the setting is on or off. The other bit specify whether the - setting is fixed or not. A setting which is fixed cannot be changed - from user-level. */ + setting is locked or not. A setting which is locked cannot be + changed from user-level. */ #define issecure_mask(X) (1 << (X)) #define issecure(X) (issecure_mask(X) & current->securebits) diff --git a/kernel/capability.c b/kernel/capability.c index cfbe44299488..901e0fdc3fff 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -121,6 +121,27 @@ static int cap_validate_magic(cap_user_header_t header, unsigned *tocopy) * uninteresting and/or not to be changed. */ +/* + * Atomically modify the effective capabilities returning the original + * value. No permission check is performed here - it is assumed that the + * caller is permitted to set the desired effective capabilities. + */ +kernel_cap_t cap_set_effective(const kernel_cap_t pE_new) +{ + kernel_cap_t pE_old; + + spin_lock(&task_capability_lock); + + pE_old = current->cap_effective; + current->cap_effective = pE_new; + + spin_unlock(&task_capability_lock); + + return pE_old; +} + +EXPORT_SYMBOL(cap_set_effective); + /** * sys_capget - get the capabilities of a given process. * @header: pointer to struct that contains capability version and -- cgit v1.2.3 From 20cbc972617069c1ed434f62151e4de57d26ea46 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Sat, 5 Jul 2008 12:29:05 -0700 Subject: Fix clear_refs_write() use of struct mm_walk Don't use a static entry, so as to prevent races during concurrent use of this function. Reported-by: Alexey Dobriyan Cc: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ab8ccc9d14ff..05053d701ac5 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -476,10 +476,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, return -ESRCH; mm = get_task_mm(task); if (mm) { - static struct mm_walk clear_refs_walk; - memset(&clear_refs_walk, 0, sizeof(clear_refs_walk)); - clear_refs_walk.pmd_entry = clear_refs_pte_range; - clear_refs_walk.mm = mm; + struct mm_walk clear_refs_walk = { + .pmd_entry = clear_refs_pte_range, + .mm = mm, + }; down_read(&mm->mmap_sem); for (vma = mm->mmap; vma; vma = vma->vm_next) { clear_refs_walk.private = vma; -- cgit v1.2.3 From 5d7e0d2bd98ef4f5a16ac9da1987ae655368dd6a Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Sat, 5 Jul 2008 01:02:01 -0700 Subject: Fix pagemap_read() use of struct mm_walk Fix some issues in pagemap_read noted by Alexey: - initialize pagemap_walk.mm to "mm" , so the code starts working as advertised - initialize ->private to "&pm" so it wouldn't immediately oops in pagemap_pte_hole() - unstatic struct pagemap_walk, so two threads won't fsckup each other (including those started by root, including flipping ->mm when you don't have permissions) - pagemap_read() contains two calls to ptrace_may_attach(), second one looks unneeded. - avoid possible kmalloc(0) and integer wraparound. Cc: Alexey Dobriyan Cc: Matt Mackall Signed-off-by: Andrew Morton [ Personally, I'd just remove the functionality entirely - Linus ] Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 72 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 38 insertions(+), 34 deletions(-) (limited to 'fs') diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 05053d701ac5..c492449f3b45 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -602,11 +602,6 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return err; } -static struct mm_walk pagemap_walk = { - .pmd_entry = pagemap_pte_range, - .pte_hole = pagemap_pte_hole -}; - /* * /proc/pid/pagemap - an array mapping virtual pages to pfns * @@ -641,6 +636,11 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, struct pagemapread pm; int pagecount; int ret = -ESRCH; + struct mm_walk pagemap_walk; + unsigned long src; + unsigned long svpfn; + unsigned long start_vaddr; + unsigned long end_vaddr; if (!task) goto out; @@ -659,11 +659,15 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, if (!mm) goto out_task; - ret = -ENOMEM; + uaddr = (unsigned long)buf & PAGE_MASK; uend = (unsigned long)(buf + count); pagecount = (PAGE_ALIGN(uend) - uaddr) / PAGE_SIZE; - pages = kmalloc(pagecount * sizeof(struct page *), GFP_KERNEL); + ret = 0; + if (pagecount == 0) + goto out_mm; + pages = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL); + ret = -ENOMEM; if (!pages) goto out_mm; @@ -684,33 +688,33 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, pm.out = (u64 *)buf; pm.end = (u64 *)(buf + count); - if (!ptrace_may_attach(task)) { - ret = -EIO; - } else { - unsigned long src = *ppos; - unsigned long svpfn = src / PM_ENTRY_BYTES; - unsigned long start_vaddr = svpfn << PAGE_SHIFT; - unsigned long end_vaddr = TASK_SIZE_OF(task); - - /* watch out for wraparound */ - if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT) - start_vaddr = end_vaddr; - - /* - * The odds are that this will stop walking way - * before end_vaddr, because the length of the - * user buffer is tracked in "pm", and the walk - * will stop when we hit the end of the buffer. - */ - ret = walk_page_range(start_vaddr, end_vaddr, - &pagemap_walk); - if (ret == PM_END_OF_BUFFER) - ret = 0; - /* don't need mmap_sem for these, but this looks cleaner */ - *ppos += (char *)pm.out - buf; - if (!ret) - ret = (char *)pm.out - buf; - } + pagemap_walk.pmd_entry = pagemap_pte_range; + pagemap_walk.pte_hole = pagemap_pte_hole; + pagemap_walk.mm = mm; + pagemap_walk.private = ± + + src = *ppos; + svpfn = src / PM_ENTRY_BYTES; + start_vaddr = svpfn << PAGE_SHIFT; + end_vaddr = TASK_SIZE_OF(task); + + /* watch out for wraparound */ + if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT) + start_vaddr = end_vaddr; + + /* + * The odds are that this will stop walking way + * before end_vaddr, because the length of the + * user buffer is tracked in "pm", and the walk + * will stop when we hit the end of the buffer. + */ + ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk); + if (ret == PM_END_OF_BUFFER) + ret = 0; + /* don't need mmap_sem for these, but this looks cleaner */ + *ppos += (char *)pm.out - buf; + if (!ret) + ret = (char *)pm.out - buf; out_pages: for (; pagecount; pagecount--) { -- cgit v1.2.3 From 18c6ac383f3e46cfce08d0bf972705852a4e1268 Mon Sep 17 00:00:00 2001 From: Sunil Mushran Date: Mon, 7 Jul 2008 10:06:29 -0700 Subject: [PATCH] ocfs2/dlm: Fixes oops in dlm_new_lockres() Patch fixes a race that can result in an oops while adding a lockres to the dlm lockres tracking list. Bug introduced by mainline commit 29576f8bb54045be944ba809d4fca1ad77c94165. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh --- fs/ocfs2/dlm/dlmmaster.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index efc015c6128a..44f87caf3683 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -606,7 +606,9 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm, res->last_used = 0; + spin_lock(&dlm->spinlock); list_add_tail(&res->tracking, &dlm->tracking_list); + spin_unlock(&dlm->spinlock); memset(res->lvb, 0, DLM_LVB_LEN); memset(res->refmap, 0, sizeof(res->refmap)); -- cgit v1.2.3 From 2aac05a91971fbd1bf6cbed78b8731eb7454b9b7 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 7 Jul 2008 13:26:10 -0400 Subject: NFS: Fix readdir cache invalidation invalidate_inode_pages2_range() takes page offset arguments, not byte ranges. Another thought is that individual pages might perhaps get evicted by VM pressure, in which case we might perhaps want to re-read not only the evicted page, but all subsequent pages too (in case the server returns more/less data per page so that the alignment of the next entry changes). We should therefore remove the condition that we only do this on page->index==0. Signed-off-by: Trond Myklebust --- fs/nfs/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 58d43daec084..982a2064fe4c 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -204,7 +204,7 @@ int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page *page) * Note: assumes we have exclusive access to this mapping either * through inode->i_mutex or some other mechanism. */ - if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) { + if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) { /* Should never happen */ nfs_zap_mapping(inode, inode->i_mapping); } -- cgit v1.2.3 From eb35c218d83ec0780d9db869310f2e333f628702 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Tue, 8 Jul 2008 14:37:06 -0400 Subject: reiserfs: discard prealloc in reiserfs_delete_inode With the removal of struct file from the xattr code, reiserfs_file_release() isn't used anymore, so the prealloc isn't discarded. This causes hangs later down the line. This patch adds it to reiserfs_delete_inode. In most cases it will be a no-op due to it already having been called, but will avoid hangs with xattrs. Signed-off-by: Jeff Mahoney Signed-off-by: Linus Torvalds --- fs/reiserfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 57917932212e..192269698a8a 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -45,6 +45,8 @@ void reiserfs_delete_inode(struct inode *inode) goto out; reiserfs_update_inode_transaction(inode); + reiserfs_discard_prealloc(&th, inode); + err = reiserfs_delete_object(&th, inode); /* Do quota update inside a transaction for journaled quotas. We must do that -- cgit v1.2.3