summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Revert "mm: correctly synchronize rss-counters at exit/exec"Linus Torvalds2012-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 40af1bbdca47e5c8a2044039bb78ca8fd8b20f94. It's horribly and utterly broken for at least the following reasons: - calling sync_mm_rss() from mmput() is fundamentally wrong, because there's absolutely no reason to believe that the task that does the mmput() always does it on its own VM. Example: fork, ptrace, /proc - you name it. - calling it *after* having done mmdrop() on it is doubly insane, since the mm struct may well be gone now. - testing mm against NULL before you call it is insane too, since a NULL mm there would have caused oopses long before. .. and those are just the three bugs I found before I decided to give up looking for me and revert it asap. I should have caught it before I even took it, but I trusted Andrew too much. Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: correctly synchronize rss-counters at exit/execKonstantin Khlebnikov2012-06-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mm->rss_stat counters have per-task delta: task->rss_stat. Before changing task->mm pointer the kernel must flush this delta with sync_mm_rss(). do_exit() already calls sync_mm_rss() to flush the rss-counters before committing the rss statistics into task->signal->maxrss, taskstats, audit and other stuff. Unfortunately the kernel does this before calling mm_release(), which can call put_user() for processing task->clear_child_tid. So at this point we can trigger page-faults and task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes inconsistent and check_mm() will print something like this: | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1 | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1 This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of do_exit() and calls it earlier. After mm_release() there should be no pagefaults. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> [3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-06-055-10/+74
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix blksize calculation fuse: fix stat call on 32 bit platforms fuse: optimize fallocate on permanent failure fuse: add FALLOCATE operation fuse: Convert to kstrtoul_from_user
| * fuse: fix blksize calculationMiklos Szeredi2012-05-141-1/+9
| | | | | | | | | | | | | | Don't use inode->i_blkbits which might be stale, instead calculate the blksize information from the freshly obtained attributes. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: fix stat call on 32 bit platformsPavel Shilovsky2012-05-143-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we store attr->ino at inode->i_ino, return attr->ino at the first time and then return inode->i_ino if the attribute timeout isn't expired. That's wrong on 32 bit platforms because attr->ino is 64 bit and inode->i_ino is 32 bit in this case. Fix this by saving 64 bit ino in fuse_inode structure and returning it every time we call getattr. Also squash attr->ino into inode->i_ino explicitly. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: optimize fallocate on permanent failureMiklos Szeredi2012-04-262-0/+10
| | | | | | | | | | | | | | If userspace filesystem doesn't support fallocate, remember this and don't send request next time. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: add FALLOCATE operationAnatol Pomozov2012-04-251-0/+33
| | | | | | | | | | | | | | | | | | fallocate filesystem operation preallocates media space for the given file. If fallocate returns success then any subsequent write to the given range never fails with 'not enough space' error. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: Convert to kstrtoul_from_userPeter Huewe2012-04-251-8/+2
| | | | | | | | | | | | | | | | | | This patch replaces the code for getting an number from a userspace buffer by a simple call to kstroul_from_user. This makes it easier to read and less error prone. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2012-06-058-143/+167
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Move get_next_mid to ops struct CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe CIFS: Improve identation in cifs_unlock_range CIFS: Fix possible wrong memory allocation
| * | CIFS: Move get_next_mid to ops structPavel Shilovsky2012-06-017-95/+103
| | | | | | | | | | | | | | | | | | Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safePavel Shilovsky2012-06-011-2/+4
| | | | | | | | | | | | | | | Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | CIFS: Improve identation in cifs_unlock_rangePavel Shilovsky2012-06-011-40/+35
| | | | | | | | | | | | | | | Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | CIFS: Fix possible wrong memory allocationPavel Shilovsky2012-06-011-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | when cifs_reconnect sets maxBuf to 0 and we try to calculate a size of memory we need to store locks. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | vfs: Fix /proc/<tid>/fdinfo/<fd> file handlingLinus Torvalds2012-06-041-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cyrill Gorcunov reports that I broke the fdinfo files with commit 30a08bf2d31d ("proc: move fd symlink i_mode calculations into tid_fd_revalidate()"), and he's quite right. The tid_fd_revalidate() function is not just used for the <tid>/fd symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the permission model for those are different. So do the dynamic symlink permission handling just for symlinks, making the fdinfo files once more appear as the proper regular files they are. Of course, Al Viro argued (probably correctly) that we shouldn't do the symlink permission games at all, and make the symlinks always just be the normal 'lrwxrwxrwx'. That would have avoided this issue too, but since somebody noticed that the permissions had changed (which was the reason for that original commit 30a08bf2d31d in the first place), people do apparently use this feature. [ Basically, you can use the symlink permission data as a cheap "fdinfo" replacement, since you see whether the file is open for reading and/or writing by just looking at st_mode of the symlink. So the feature does make sense, even if the pain it has caused means we probably shouldn't have done it to begin with. ] Reported-and-tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'akpm' (Fixups for Andrew's patchbomb)Linus Torvalds2012-06-0213-45/+33
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixups for the mac NLS tables from Andrew. * emailed from Andrew Morton, and one cleanup by me: nls: fix (and rename) mac NLS table files and config options fs/nls/Makefile: remove bogus CONFIG_ assignments
| * | | nls: fix (and rename) mac NLS table files and config optionsLinus Torvalds2012-06-0213-33/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The config options in the Kconfig file (with _CODEPAGE_ in the name) didn't match the config option name in the Makefile (no _CODEPAGE_). And both of them were of the hard-to-read MACXYZZY variety, which made them hard to parse for normal humans: MACROMAN easily reads as "macro man", not as "Mac Roman". So rename the options to be consistent, and be NLS_MAC_xyzzy. Rename the files to be mac-xyzzy.c too, and drop the "nls" part entirely (it's already in the directory name). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | fs/nls/Makefile: remove bogus CONFIG_ assignmentsAndrew Morton2012-06-021-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These were debug things which snuck through. Reported-by: Yinghai Lu <yinghai@kernel.org> Cc: Vladimir Serbinenko <phcoder@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtdLinus Torvalds2012-06-026-13/+97
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull mtd update from David Woodhouse: - More robust parsing especially of xattr data in JFFS2 - Updates to mxc_nand and gpmi drivers to support new boards and device tree - Improve consistency of information about ECC strength in NAND devices - Clean up partition handling of plat_nand - Support NAND drivers without dedicated access to OOB area - BCH hardware ECC support for OMAP - Other fixes and cleanups, and a few new device IDs Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to added include files next to each other. * tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits) mtd: mxc_nand: move ecc strengh setup before nand_scan_tail mtd: block2mtd: fix recursive call of mtd_writev mtd: gpmi-nand: define ecc.strength mtd: of_parts: fix breakage in Kconfig mtd: nand: fix scan_read_raw_oob mtd: docg3 fix in-middle of blocks reads mtd: cfi_cmdset_0002: Slight cleanup of fixup messages mtd: add fixup for S29NS512P NOR flash. jffs2: allow to complete xattr integrity check on first GC scan jffs2: allow to discriminate between recoverable and non-recoverable errors mtd: nand: omap: add support for hardware BCH ecc ARM: OMAP3: gpmc: add BCH ecc api and modes mtd: nand: check the return code of 'read_oob/read_oob_raw' mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw' mtd: m25p80: Add support for Winbond W25Q80BW jffs2: get rid of jffs2_sync_super jffs2: remove unnecessary GC pass on sync jffs2: remove unnecessary GC pass on umount jffs2: remove lock_super mtd: gpmi: add gpmi support for mx6q ...
| * | | | jffs2: allow to complete xattr integrity check on first GC scanJean-Christophe DUBOIS2012-05-143-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike file data integrity the xattr data integrity was not checked before some explicit access to the attribute was made. This could leave in the system a number of corrupted extended attributes which will be detected only at access time and possibly at a very late time compared to the time the corruption actually happened. This patch adds the ability to check for extended attribute integrity on first GC scan pass (similar to file data integrity check). This allows for all present attributes to be completly verified before any use of them. In order to work correctly this patch also needs the patch allowing JFFS2 to discriminate between recoverable and non recoverable errors on extended attributes. Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: allow to discriminate between recoverable and non-recoverable errorsJean-Christophe DUBOIS2012-05-141-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is basically a revert of commit f326966b3df47f4fa7e90425f60efdd30c31fe19. It allows JFFS2 to make the distinction between a potential transient error (reading or writing the media) and a non recoverable error like a bad CRC on the extended attribute data or some insconsitent parameters. In order to make clear that the error is indeed intended to report a corrupted attribute, a new local error code (JFFS2_XATTR_IS_CORRUPTED) is introduced rather than returning a confusing positive EIO, which is what led to the inappropriate "fix" last time. This error code is never reported to user space and only checked locally in this file. Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: get rid of jffs2_sync_superArtem Bityutskiy2012-05-144-21/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently JFFS2 file-system maps the VFS "superblock" abstraction to the write-buffer. Namely, it uses VFS services to synchronize the write-buffer periodically. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblock using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds no matter what. So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super' VFS service, and then remove it together with the kernel thread. This patch switches the JFFS2 write-buffer management from '->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt' flag we just schedule a delayed work for synchronizing the write-buffer. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: remove unnecessary GC pass on syncArtem Bityutskiy2012-05-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not need to call 'jffs2_write_super()' on sync. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows sync down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see d579ed00aa96a7f7486978540a0d7cecaff742ae. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: remove unnecessary GC pass on umountArtem Bityutskiy2012-05-141-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not need to call 'jffs2_write_super()' on unmount. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows unmount down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see 8c85e125124a473d6f3e9bb187b0b84207f81d91. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: remove lock_superArtem Bityutskiy2012-05-141-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: refactor csize usage in jffs2_do_read_inode_internal()Xi Wang2012-05-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the verbose `je32_to_cpu(latest_node->csize)' with a shorter `csize'. Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | jffs2: validate symlink size in jffs2_do_read_inode_internal()Xi Wang2012-05-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `csize' is read from disk and thus needs validation. Otherwise a bogus value 0xffffffff would turn the subsequent kmalloc(csize + 1, ...) into kmalloc(0, ...), leading to out-of-bounds write. This patch limits `csize' to JFFS2_MAX_NAME_LEN, which is also used in jffs2_symlink(). Artem: we actually validate csize by checking CRC, so this 0xFFs cannot come from empty flash region. But I guess an attacker could feed JFFS2 an image with random csize value, including 0xFFs. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | JFFS2: Add parameter to reserve disk space for rootDaniel Drake2012-05-143-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new rp_size= parameter which creates a "reserved pool" of disk space which can only be used by root. Other users are not permitted to write to disk when the available space is less than the pool size. Based on original code by Artem Bityutskiy in https://dev.laptop.org/ticket/5317 [dwmw2: use capable(CAP_SYS_RESOURCE) not uid/gid check, fix debug prints] Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-06-013-12/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of signal handling patches from Al Viro: "This time it's mostly helpers and conversions to them; there's a lot of stuff remaining in the tree, but that'll either go in -rc2 (isolated bug fixes, ideally via arch maintainers' trees) or will sit there until the next cycle." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: x86: get rid of calling do_notify_resume() when returning to kernel mode blackfin: check __get_user() return value whack-a-mole with TIF_FREEZE FRV: Optimise the system call exit path in entry.S [ver #2] FRV: Shrink TIF_WORK_MASK [ver #2] FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions new helper: signal_delivered() powerpc: get rid of restore_sigmask() most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set set_restore_sigmask() is never called without SIGPENDING (and never should be) TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set don't call try_to_freeze() from do_signal() pull clearing RESTORE_SIGMASK into block_sigmask() sh64: failure to build sigframe != signal without handler openrisc: tracehook_signal_handler() is supposed to be called on success new helper: sigmask_to_save() new helper: restore_saved_sigmask() new helpers: {clear,test,test_and_clear}_restore_sigmask() HAVE_RESTORE_SIGMASK is defined on all architectures now
| * | | | | HAVE_RESTORE_SIGMASK is defined on all architectures nowAl Viro2012-06-013-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK and picks default set_restore_sigmask() in linux/thread_info.h. Kill the ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones get merged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-06-0189-1081/+1175
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
| * | | | | | nfs: don't open in ->d_revalidateMiklos Szeredi2012-06-012-55/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a mount needs to be followed or not. It does check d_mountpoint() on the dentry, which can result in a weird error if the VFS found that the mount does not in fact need to be followed, e.g.: # mount --bind /mnt/nfs /mnt/nfs-clone # echo something > /mnt/nfs/tmp/bar # echo x > /tmp/file # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar # cat /mnt/nfs/tmp/bar cat: /mnt/nfs/tmp/bar: Not a directory Which should, by any sane filesystem, result in "something" being printed. So instead do the open in f_op->open() and in the unlikely case that the cached dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let the VFS retry. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: retry last component if opening stale dentryMiklos Szeredi2012-06-011-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS optimizes away d_revalidates for last component of open. This means that open itself can find the dentry stale. This patch allows the filesystem to return EOPENSTALE and the VFS will retry the lookup on just the last component if possible. If the lookup was done using RCU mode, including the last component, then this is not possible since the parent dentry is lost. In this case fall back to non-RCU lookup. Currently this is not used since NFS will always leave RCU mode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: nameidata_to_filp(): don't throw away file on errorMiklos Szeredi2012-06-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If open fails, don't put the file. This allows it to be reused if open needs to be retried. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: nameidata_to_filp(): inline __dentry_open()Miklos Szeredi2012-06-011-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Copy __dentry_open() into nameidata_to_filp(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_dentry_open(): don't put filpMiklos Szeredi2012-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move put_filp() out to __dentry_open(), the only caller now. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: split __dentry_open()Miklos Szeredi2012-06-012-14/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split __dentry_open() into two functions: do_dentry_open() - does most of the actual work, doesn't put file on failure open_check_o_direct() - after a successful open, checks direct_IO method This will allow i_op->atomic_open to do just the file initialization and leave the direct_IO checking to the VFS. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last() common post lookupMiklos Szeredi2012-06-011-31/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now the post lookup code can be shared between O_CREAT and plain opens since they are essentially the same. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): add audit_inode before openMiklos Szeredi2012-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows this code to be shared between O_CREAT and plain opens. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): only return EISDIR for O_CREATMiklos Szeredi2012-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows this code to be shared between O_CREAT and plain opens. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): check LOOKUP_DIRECTORYMiklos Szeredi2012-06-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check for ENOTDIR before finishing open. This allows this code to be shared between O_CREAT and plain opens. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): make ENOENT exit RCU safeMiklos Szeredi2012-06-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow this code to be used in RCU mode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: make follow_link check RCU safeMiklos Szeredi2012-06-011-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow this code to be used in RCU mode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): use inode variableMiklos Szeredi2012-06-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use helper variable instead of path->dentry->d_inode before complete_walk(). This will allow this code to be used in RCU mode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): inline walk_component()Miklos Szeredi2012-06-011-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Copy walk_component() into do_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: do_last(): make exit RCU safeMiklos Szeredi2012-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and "exit:" labels. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: split do_lookup()Miklos Szeredi2012-06-011-14/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split do_lookup() into two functions: lookup_fast() - does cached lookup without i_mutex lookup_slow() - does lookup with i_mutex Both follow managed dentries. The new functions are needed by atomic_open. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | Btrfs: move over to use ->update_timeJosef Bacik2012-06-013-41/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs had been doing it's own file_update_time so we could catch ENOSPC properly, so just update our btrfs_update_time to work with the new stuff and then we'll be fancy later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | fs: introduce inode operation ->update_timeJosef Bacik2012-06-017-26/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs has to make sure we have space to allocate new blocks in order to modify the inode, so updating time can fail. We've gotten around this by having our own file_update_time but this is kind of a pain, and Christoph has indicated he would like to make xfs do something different with atime updates. So introduce ->update_time, where we will deal with i_version an a/m/c time updates and indicate which changes need to be made. The normal version just does what it has always done, updates the time and marks the inode dirty, and then filesystems can choose to do something different. I've gone through all of the users of file_update_time and made them check for errors with the exception of the fault code since it's complicated and I wasn't quite sure what to do there, also Jan is going to be pushing the file time updates into page_mkwrite for those who have it so that should satisfy btrfs and make it not a big deal to check the file_update_time() return code in the generic fault path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | reiserfs: get rid of resierfs_sync_superArtem Bityutskiy2012-06-013-11/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch stops reiserfs using the VFS 'write_super()' method along with the s_dirt flag, because they are on their way out. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblock using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds, even if there are no diry superblocks, or there are no client file-systems which would need this (e.g., btrfs does not use '->write_super()'). So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super()' VFS service, and then remove it together with the kernel thread. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | reiserfs: mark the superblock as dirty a bit laterArtem Bityutskiy2012-06-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'journal_mark_dirty()' function currently first marks the superblock as dirty by setting 's_dirt' to 1, then does various sanity checks and returns, then actuall does all the magic with the journal. This is not an ideal order, though. It makes more sense to first do all the checks, then do all the internal stuff, and at the end notify the VFS that the superblock is now dirty. This patch moves the 's_dirt = 1' assignment from the very beginning of this function to the very end. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>