summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* cifs: make error on lack of a unc= option more explicitJeff Layton2012-12-051-14/+5
| | | | | | | | | | | | | | Error out with a clear error message if there is no unc= option. The existing code doesn't handle this in a clear fashion, and the check for a UNCip option with no UNC string is just plain wrong. Later, we'll fix the code to not require a unc= option, but for now we need this to at least clarify why people are getting errors about DFS parsing. With this change we can also get rid of some later NULL pointer checks since we know the UNC and UNCip will never be NULL there. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: don't override the uid/gid in getattr when cifsacl is enabledJeff Layton2012-12-051-3/+4
| | | | | | | | | | If we're using cifsacl, then we don't want to override the uid/gid with the current uid/gid, since that would prevent you from being able to upcall for this info. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: remove uneeded __KERNEL__ block from cifsacl.hJeff Layton2012-12-052-7/+2
| | | | | | | | | ...and make those symbols static in cifsacl.c. Nothing outside of that file refers to them. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: fix the format specifiers in sid_to_strJeff Layton2012-12-051-7/+4
| | | | | | | | | | | | | | | The format specifiers are for signed values, but these are unsigned. Given that '-' is a delimiter between fields, I don't think you'd get what you'd expect if you got a value here that would overflow the sign bit. The version and authority fields are 8 bit values so use a "hh" length modifier there. The subauths are 32 bit values, so there's no need to use a "l" length modifier there. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: redefine NUM_SUBAUTH constant from 5 to 15Jeff Layton2012-12-052-6/+19
| | | | | | | | | | | | | | | | | According to several places on the Internet and the samba winbind code, this is hard limited to 15 in windows, not 5. This does balloon out the allocation of each by 40 bytes, but I don't see any alternative. Also, rename it to SID_MAX_SUB_AUTHORITIES to match the alleged name of this constant in the windows header files Finally, rename SIDLEN to SID_STRING_MAX, fix the value to reflect the change to SID_MAX_SUB_AUTHORITIES and document how it was determined. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: make cifs_copy_sid handle a source sid with variable size subauth arraysJeff Layton2012-12-052-2/+11
| | | | | | | | | ...and lift the restriction in id_to_sid upcall that the size must be at least as big as a full cifs_sid. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: make compare_sids staticJeff Layton2012-12-052-50/+50
| | | | | | | | | ..nothing outside of cifsacl.c calls it. Also fix the incorrect comment on the function. It returns 0 when they match. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: use the NUM_AUTHS and NUM_SUBAUTHS constants in cifsacl codeJeff Layton2012-12-052-5/+5
| | | | | | | | ...instead of hardcoding in '5' and '6' all over the place. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: move num_subauth check inside of CONFIG_CIFS_DEBUG2 check in parse_sid()Jeff Layton2012-12-051-2/+2
| | | | | | Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: clean up id_mode_to_cifs_aclJeff Layton2012-12-051-28/+25
| | | | | | | | | Add a label we can goto on error, and get rid of some excess indentation. Also move to kernel-style comments. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* cifs: fix types on module parametersJeff Layton2012-12-051-6/+5
| | | | | | | | | Most of these are unsigned ints, so we should be passing "uint" to module_param. Also, get rid of the extra "(bool)" in the description of enable_oplocks. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* default authentication needs to be at least ntlmv2 security for cifs mountsSteve French2012-12-052-11/+1
| | | | | | | | | | | | | | | | We had planned to upgrade to ntlmv2 security a few releases ago, and have been warning users in dmesg on mount about the impending upgrade, but had to make a change (to use nltmssp with ntlmv2) due to testing issues with some non-Windows, non-Samba servers. The approach in this patch is simpler than earlier patches, and changes the default authentication mechanism to ntlmv2 password hashes (encapsulated in ntlmssp) from ntlm (ntlm is too weak for current use and ntlmv2 has been broadly supported for many, many years). Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com>
* vfs: clear to the end of the buffer on partial buffer readsDan Carpenter2012-12-051-1/+1
| | | | | | | | READ is zero so the "rw & READ" test is always false. The intended test was "((rw & RW_MASK) == READ)". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: avoid "attempt to access beyond end of device" warningsLinus Torvalds2012-12-041-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block device access simplification that avoided accessing the (racy) block size information (commit bbec0270bdd8: "blkdev_max_block: make private to fs/buffer.c") no longer checks the maximum block size in the block mapping path. That was _almost_ as simple as just removing the code entirely, because the readers and writers all check the size of the device anyway, so under normal circumstances it "just worked". However, the block size may be such that the end of the device may straddle one single buffer_head. At which point we may still want to access the end of the device, but the buffer we use to access it partially extends past the end. The 'bd_set_size()' function intentionally sets the block size to avoid this, but mounting the device - or setting the block size by hand to some other value - can modify that block size. So instead, teach 'submit_bh()' about the special case of the buffer head straddling the end of the device, and turning such an access into a smaller IO access, avoiding the problem. This, btw, also means that unlike before, we can now access the whole device regardless of device block size setting. So now, even if the device size is only 512-byte aligned, we can read and write even the last sector even when having a much bigger block size for accessing the rest of the device. So with this, we could now get rid of the 'bd_set_size()' block size code entirely - resulting in faster IO for the common case - but that would be a separate patch. Reported-and-tested-by: Romain Francoise <romain@orebokech.com> Reporeted-and-tested-by: Meelis Roos <mroos@linux.ee> Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'block-dev'Linus Torvalds2012-12-033-190/+71
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge 'block-dev' branch. I was going to just mark everything here for stable and leave it to the 3.8 merge window, but having decided on doing another -rc, I migth as well merge it now. This removes the bd_block_size_semaphore semaphore that was added in this release to fix a race condition between block size changes and block IO, and replaces it with atomicity guaratees in fs/buffer.c instead, along with simplifying fs/block-dev.c. This removes more lines than it adds, makes the code generally simpler, and avoids the latency/rt issues that the block size semaphore introduced for mount. I'm not happy with the timing, but it wouldn't be much better doing this during the merge window and then having some delayed back-port of it into stable. * block-dev: blkdev_max_block: make private to fs/buffer.c direct-io: don't read inode->i_blkbits multiple times blockdev: remove bd_block_size_semaphore again fs/buffer.c: make block-size be per-page and protected by the page lock
| * blkdev_max_block: make private to fs/buffer.cLinus Torvalds2012-11-302-55/+14
| | | | | | | | | | | | | | | | | | | | | | | | We really don't want to look at the block size for the raw block device accesses in fs/block-dev.c, because it may be changing from under us. So get rid of the max_block logic entirely, since the caller should already have done it anyway. That leaves the only user of this function in fs/buffer.c, so move the whole function there and make it static. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * direct-io: don't read inode->i_blkbits multiple timesLinus Torvalds2012-11-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Since directio can work on a raw block device, and the block size of the device can change under it, we need to do the same thing that fs/buffer.c now does: read the block size a single time, using ACCESS_ONCE(). Reading it multiple times can get different results, which will then confuse the code because it actually encodes the i_blksize in relationship to the underlying logical blocksize. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * blockdev: remove bd_block_size_semaphore againLinus Torvalds2012-11-291-101/+4
| | | | | | | | | | | | | | | | | | | | This reverts the block-device direct access code to the previous unlocked code, now that fs/buffer.c no longer needs external locking. With this, fs/block_dev.c is back to the original version, apart from a whitespace cleanup that I didn't want to revert. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * fs/buffer.c: make block-size be per-page and protected by the page lockLinus Torvalds2012-11-291-31/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the buffer size handling be a per-page thing, which allows us to not have to worry about locking too much when changing the buffer size. If a page doesn't have buffers, we still need to read the block size from the inode, but we can do that with ACCESS_ONCE(), so that even if the size is changing, we get a consistent value. This doesn't convert all functions - many of the buffer functions are used purely by filesystems, which in turn results in the buffer size being fixed at mount-time. So they don't have the same consistency issues that the raw device access can have. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-014-10/+21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A bunch of fixes; the last one is this cycle regression, the rest are -stable fodder." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix off-by-one in argument passed by iterate_fd() to callbacks lookup_one_len: don't accept . and .. cifs: get rid of blind d_drop() in readdir nfs_lookup_revalidate(): fix a leak don't do blind d_drop() in nfs_prime_dcache()
| * | fix off-by-one in argument passed by iterate_fd() to callbacksAl Viro2012-11-301-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | Noticed by Pavel Roskin; the thing in his patch I disagree with was compensating for that shite in callbacks instead of fixing it once in the iterator itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | lookup_one_len: don't accept . and ..Al Viro2012-11-301-0/+5
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | cifs: get rid of blind d_drop() in readdirAl Viro2012-11-301-1/+4
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | nfs_lookup_revalidate(): fix a leakAl Viro2012-11-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | We are leaking fattr and fhandle if we decide that dentry is not to be invalidated, after all (e.g. happens to be a mountpoint). Just free both before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | don't do blind d_drop() in nfs_prime_dcache()Al Viro2012-11-301-1/+2
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2012-12-012-5/+4
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | Pull CIFS fixes from Steve French: "Two low risk, small fixes, that fix cifs regressions introduced in 3.7." * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix wrong buffer pointer usage in smb_set_file_info cifs: fix writeback race with file that is growing
| * | CIFS: Fix wrong buffer pointer usage in smb_set_file_infoPavel Shilovsky2012-11-281-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6bdf6dbd662176c0da5c3ac8ed10ac94e7776c85 caused a regression in setattr codepath that leads to files with wrong attributes. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | cifs: fix writeback race with file that is growingJeff Layton2012-11-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit eddb079deb4 created a regression in the writepages codepath. Previously, whenever it needed to check the size of the file, it did so by consulting the inode->i_size field directly. With that patch, the i_size was fetched once on entry into the writepages code and that value was used henceforth. If the file is changing size though (for instance, if someone is writing to it or has truncated it), then that value is likely to be wrong. This can lead to data corruption. Pages past the EOF at the time that the writepages call was issued may be silently dropped and ignored because cifs_writepages wrongly assumes that the file must have been truncated in the interim. Fix cifs_writepages to properly fetch the size from the inode->i_size field instead to properly account for this possibility. Original bug report is here: https://bugzilla.kernel.org/show_bug.cgi?id=50991 Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* | | Merge branch 'akpm' (Fixes from Andrew)Linus Torvalds2012-11-274-4/+20
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches) futex: avoid wake_futex() for a PI futex_q watchdog: using u64 in get_sample_period() writeback: put unused inodes to LRU after writeback completion mm: vmscan: check for fatal signals iff the process was throttled Revert "mm: remove __GFP_NO_KSWAPD" proc: check vma->vm_file before dereferencing UAPI: strip the _UAPI prefix from header guards during header installation include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
| * | | writeback: put unused inodes to LRU after writeback completionJan Kara2012-11-273-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 169ebd90131b ("writeback: Avoid iput() from flusher thread") removed iget-iput pair from inode writeback. As a side effect, inodes that are dirty during iput_final() call won't be ever added to inode LRU (iput_final() doesn't add dirty inodes to LRU and later when the inode is cleaned there's noone to add the inode there). Thus inodes are effectively unreclaimable until someone looks them up again. The practical effect of this bug is limited by the fact that inodes are pinned by a dentry for long enough that the inode gets cleaned. But still the bug can have nasty consequences leading up to OOM conditions under certain circumstances. Following can easily reproduce the problem: for (( i = 0; i < 1000; i++ )); do mkdir $i for (( j = 0; j < 1000; j++ )); do touch $i/$j echo 2 > /proc/sys/vm/drop_caches done done then one needs to run 'sync; ls -lR' to make inodes reclaimable again. We fix the issue by inserting unused clean inodes into the LRU after writeback finishes in inode_sync_complete(). Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | proc: check vma->vm_file before dereferencingStanislav Kinsbursky2012-11-271-2/+3
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7b540d0646ce ("proc_map_files_readdir(): don't bother with grabbing files") switched proc_map_files_readdir() to use @f_mode directly instead of grabbing @file reference, but same time the test for @vm_file presence was lost leading to nil dereference. The patch brings the test back. The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped (which is set to 'n' by default) so the bug doesn't affect regular kernels. The regression is 3.7-rc1 only as far as I can tell. [gorcunov@openvz.org: provided changelog] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for_linus' of ↵Linus Torvalds2012-11-271-0/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 regression fix from Jan Kara: "Fix an ext3 regression introduced during 3.7 merge window. It leads to deadlock if you stress the filesystem in the right way (luckily only if blocksize < pagesize)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix lock ordering bug in journal_unmap_buffer()
| * | jbd: Fix lock ordering bug in journal_unmap_buffer()Jan Kara2012-11-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 09e05d48 introduced a wait for transaction commit into journal_unmap_buffer() in the case we are truncating a buffer undergoing commit in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly we forgot to drop buffer lock before waiting for transaction commit and thus deadlock is possible when kjournald wants to lock the buffer. Fix the problem by dropping the buffer lock before waiting for transaction commit. Since we are still holding page lock (and that is OK), buffer cannot disappear under us. CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken Signed-off-by: Jan Kara <jack@suse.cz>
* | | Merge tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6Linus Torvalds2012-11-241-18/+21
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MTD fixes from David Woodhouse: "The most important part of this is that it fixes a regression in Samsung NAND chip detection, introduced by some rework which went into 3.7. The initial fix wasn't quite complete, so it's in two parts. In fact the first part is committed twice (Artem committed his own copy of the same patch) and I've merged Artem's tree into mine which already had that fix. I'd have recommitted that to make it somewhat cleaner, but figured by this point in the release cycle it was better to merge *exactly* the commits which have been in linux-next. If I'd recommitted, I'd also omit the sparse warning fix. But it's there, and it's harmless — just marking one function as 'static' in onenand code. This also includes a couple more fixes for stable: an AB-BA deadlock in JFFS2, and an invalid range check in slram." * tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6: mtd: nand: fix Samsung SLC detection regression mtd: nand: fix Samsung SLC NAND identification regression jffs2: Fix lock acquisition order bug in jffs2_write_begin mtd: onenand: Make flexonenand_set_boundary static mtd: slram: invalid checking of absolute end address mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions() mtd: nand: fix Samsung SLC NAND identification regression
| * | jffs2: Fix lock acquisition order bug in jffs2_write_beginThomas Betker2012-11-091-18/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jffs2_write_begin() first acquires the page lock, then f->sem. This causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first acquires f->sem, then the page lock: jffs2_garbage_collect_live mutex_lock(&f->sem) (A) jffs2_garbage_collect_dnode jffs2_gc_fetch_page read_cache_page_async do_read_cache_page lock_page(page) (B) jffs2_write_begin grab_cache_page_write_begin find_lock_page lock_page(page) (B) mutex_lock(&f->sem) (A) We fix this by restructuring jffs2_write_begin() to take f->sem before the page lock. However, we make sure that f->sem is not held when calling jffs2_reserve_space(), as this is not permitted by the locking rules. The deadlock above was observed multiple times on an SoC with a dual ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred when using scp to copy files from a host system to the ARM target system. The fix was heavily tested on the same target system. Cc: stable@vger.kernel.org Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
* | | Merge branch 'for_linus' of ↵Linus Torvalds2012-11-214-19/+60
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs and ext3 fixes from Jan Kara: "Fixes of reiserfs deadlocks when quotas are enabled (locking there was completely busted by BKL conversion) and also one small ext3 fix in the trim interface." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext3: Avoid underflow of in ext3_trim_fs() reiserfs: Move quota calls out of write lock reiserfs: Protect reiserfs_quota_write() with write lock reiserfs: Protect reiserfs_quota_on() with write lock reiserfs: Fix lock ordering during remount
| * | ext3: Avoid underflow of in ext3_trim_fs()Lukas Czerner2012-11-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if len argument in ext3_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | reiserfs: Move quota calls out of write lockJan Kara2012-11-193-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
| * | reiserfs: Protect reiserfs_quota_write() with write lockJan Kara2012-11-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
| * | reiserfs: Protect reiserfs_quota_on() with write lockJan Kara2012-11-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
| * | reiserfs: Fix lock ordering during remountJan Kara2012-11-191-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. CC: stable@vger.kernel.org # >= 3.0 Signed-off-by: Jan Kara <jack@suse.cz>
* | | fanotify: fix FAN_Q_OVERFLOW case of fanotify_read()Al Viro2012-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event metadata will not contain a valid file descriptor, but copy_event_to_user() didn't check for that, and unconditionally does a fd_install() on the file descriptor. Which in turn will cause a BUG_ON() in __fd_install(). Introduced by commit 352e3b249284 ("fanotify: sanitize failure exits in copy_event_to_user()") Mea culpa - missed that path ;-/ Reported-by: Alex Shi <lkml.alex@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2012-11-181-1/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc VFS fixes from Al Viro: "Remove a bogus BUG_ON() that can trigger spuriously + alpha bits of do_mount() constification I'd missed during the merge window." This pull request came in a week ago, I missed it for some reason. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill bogus BUG_ON() in do_close_on_exec() missing const in alpha callers of do_mount()
| * | | kill bogus BUG_ON() in do_close_on_exec()Al Viro2012-11-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can be legitimately triggered via procfs access. Now, at least 2 of 3 of get_files_struct() callers in procfs are useless, but when and if we get rid of those we can always add WARN_ON() here. BUG_ON() at that spot is simply wrong. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfsLinus Torvalds2012-11-183-19/+69
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs bugfixes from Ben Myers: - fix attr tree double split corruption - fix broken error handling in xfs_vm_writepage - drop buffer io reference when a bad bio is built * tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs: xfs: drop buffer io reference when a bad bio is built xfs: fix broken error handling in xfs_vm_writepage xfs: fix attr tree double split corruption
| * | | | xfs: drop buffer io reference when a bad bio is builtDave Chinner2012-11-171-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error handling in xfs_buf_ioapply_map() does not handle IO reference counts correctly. We increment the b_io_remaining count before building the bio, but then fail to decrement it in the failure case. This leads to the buffer never running IO completion and releasing the reference that the IO holds, so at unmount we can leak the buffer. This leak is captured by this assert failure during unmount: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273 This is not a new bug - the b_io_remaining accounting has had this problem for a long, long time - it's just very hard to get a zero length bio being built by this code... Further, the buffer IO error can be overwritten on a multi-segment buffer by subsequent bio completions for partial sections of the buffer. Hence we should only set the buffer error status if the buffer is not already carrying an error status. This ensures that a partial IO error on a multi-segment buffer will not be lost. This part of the problem is a regression, however. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | xfs: fix broken error handling in xfs_vm_writepageDave Chinner2012-11-171-15/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we shut down the filesystem, it might first be detected in writeback when we are allocating a inode size transaction. This happens after we have moved all the pages into the writeback state and unlocked them. Unfortunately, if we fail to set up the transaction we then abort writeback and try to invalidate the current page. This then triggers are BUG() in block_invalidatepage() because we are trying to invalidate an unlocked page. Fixing this is a bit of a chicken and egg problem - we can't allocate the transaction until we've clustered all the pages into the IO and we know the size of it (i.e. whether the last block of the IO is beyond the current EOF or not). However, we don't want to hold pages locked for long periods of time, especially while we lock other pages to cluster them into the write. To fix this, we need to make a clear delineation in writeback where errors can only be handled by IO completion processing. That is, once we have marked a page for writeback and unlocked it, we have to report errors via IO completion because we've already started the IO. We may not have submitted any IO, but we've changed the page state to indicate that it is under IO so we must now use the IO completion path to report errors. To do this, add an error field to xfs_submit_ioend() to pass it the error that occurred during the building on the ioend chain. When this is non-zero, mark each ioend with the error and call xfs_finish_ioend() directly rather than building bios. This will immediately push the ioends through completion processing with the error that has occurred. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | xfs: fix attr tree double split corruptionDave Chinner2012-11-171-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain circumstances, a double split of an attribute tree is needed to insert or replace an attribute. In rare situations, this can go wrong, leaving the attribute tree corrupted. In this case, the attr being replaced is the last attr in a leaf node, and the replacement is larger so doesn't fit in the same leaf node. When we have the initial condition of a node format attribute btree with two leaves at index 1 and 2. Call them L1 and L2. The leaf L1 is completely full, there is not a single byte of free space in it. L2 is mostly empty. The attribute being replaced - call it X - is the last attribute in L1. The way an attribute replace is executed is that the replacement attribute - call it Y - is first inserted into the tree, but has an INCOMPLETE flag set on it so that list traversals ignore it. Once this transaction is committed, a second transaction it run to atomically mark Y as COMPLETE and X as INCOMPLETE, so that a traversal will now find Y and skip X. Once that transaction is committed, attribute X is then removed. So, the initial condition is: +--------+ +--------+ | L1 | | L2 | | fwd: 2 |---->| fwd: 0 | | bwd: 0 |<----| bwd: 1 | | fsp: 0 | | fsp: N | |--------| |--------| | attr A | | attr 1 | |--------| |--------| | attr B | | attr 2 | |--------| |--------| .......... .......... |--------| |--------| | attr X | | attr n | +--------+ +--------+ So now we go to replace X, and see that L1:fsp = 0 - it is full so we can't insert Y in the same leaf. So we record the the location of attribute X so we can track it for later use, then we split L1 into L1 and L3 and reblance across the two leafs. We end with: +--------+ +--------+ +--------+ | L1 | | L3 | | L2 | | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 | | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 | | fsp: M | | fsp: J | | fsp: N | |--------| |--------| |--------| | attr A | | attr X | | attr 1 | |--------| +--------+ |--------| | attr B | | attr 2 | |--------| |--------| .......... .......... |--------| |--------| | attr W | | attr n | +--------+ +--------+ And we track that the original attribute is now at L3:0. We then try to insert Y into L1 again, and find that there isn't enough room because the new attribute is larger than the old one. Hence we have to split again to make room for Y. We end up with this: +--------+ +--------+ +--------+ +--------+ | L1 | | L4 | | L3 | | L2 | | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 | | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 | | fsp: M | | fsp: J | | fsp: J | | fsp: N | |--------| |--------| |--------| |--------| | attr A | | attr Y | | attr X | | attr 1 | |--------| + INCOMP + +--------+ |--------| | attr B | +--------+ | attr 2 | |--------| |--------| .......... .......... |--------| |--------| | attr W | | attr n | +--------+ +--------+ And now we have the new (incomplete) attribute @ L4:0, and the original attribute at L3:0. At this point, the first transaction is committed, and we move to the flipping of the flags. This is where we are supposed to end up with this: +--------+ +--------+ +--------+ +--------+ | L1 | | L4 | | L3 | | L2 | | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 | | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 | | fsp: M | | fsp: J | | fsp: J | | fsp: N | |--------| |--------| |--------| |--------| | attr A | | attr Y | | attr X | | attr 1 | |--------| +--------+ + INCOMP + |--------| | attr B | +--------+ | attr 2 | |--------| |--------| .......... .......... |--------| |--------| | attr W | | attr n | +--------+ +--------+ But that doesn't happen properly - the attribute tracking indexes are not pointing to the right locations. What we end up with is both the old attribute to be removed pointing at L4:0 and the new attribute at L4:1. On a debug kernel, this assert fails like so: XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725 because the new attribute location does not exist. On a production kernel, this goes unnoticed and the code proceeds ahead merrily and removes L4 because it thinks that is the block that is no longer needed. This leaves the hash index node pointing to entries L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free space, and so everything is busted. This corruption is caused by the removal of the old attribute triggering a join - it joins everything correctly but then frees the wrong block. xfs_repair will report something like: bad sibling back pointer for block 4 in attribute fork for inode 131 problem with attribute contents in inode 131 would clear attr fork bad nblocks 8 for inode 131, would reset to 3 bad anextents 4 for inode 131, would reset to 0 The problem lies in the assignment of the old/new blocks for tracking purposes when the double leaf split occurs. The first split tries to place the new attribute inside the current leaf (i.e. "inleaf == true") and moves the old attribute (X) to the new block. This sets up the old block/index to L1:X, and newly allocated block to L3:0. It then moves attr X to the new block and tries to insert attr Y at the old index. That fails, so it splits again. With the second split, the rebalance ends up placing the new attr in the second new block - L4:0 - and this is where the code goes wrong. What is does is it sets both the new and old block index to the second new block. Hence it inserts attr Y at the right place (L4:0) but overwrites the current location of the attr to replace that is held in the new block index (currently L3:0). It over writes it with L4:1 - the index we later assert fail on. Hopefully this table will show this in a foramt that is a bit easier to understand: Split old attr index new attr index vanilla patched vanilla patched before 1st L1:26 L1:26 N/A N/A after 1st L3:0 L3:0 L1:26 L1:26 after 2nd L4:0 L3:0 L4:1 L4:0 ^^^^ ^^^^ wrong wrong The fix is surprisingly simple, for all this analysis - just stop the rebalance on the out-of leaf case from overwriting the new attr index - it's already correct for the double split case. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* | | | | mm, oom: reintroduce /proc/pid/oom_adjDavid Rientjes2012-11-161-0/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a revert of 01dc52ebdf47 ("oom: remove deprecated oom_adj") from Davidlohr Bueso. It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier kernels. It simply scales the value linearly when /proc/pid/oom_score_adj is written. The major difference is that its scheduled removal is no longer included in Documentation/feature-removal-schedule.txt. We do warn users with a single printk, though, to suggest the more powerful and supported /proc/pid/oom_score_adj interface. Reported-by: Artem S. Tashkinov <t.artem@lycos.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifsLinus Torvalds2012-11-153-2/+19
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull UBIFS fixes from Artem Bityutskiy: "Two patches which fix a problem reported by several people in the past, but only fixed now because no one gave enough material for debugging. Anyway, these fix the problem that sometimes after a power cut the file-system is not mountable with the following symptom: grab_empty_leb: could not find an empty LEB The fixes make the file-system mountable again." * tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs: UBIFS: fix mounting problems after power cuts UBIFS: introduce categorized lprops counter