summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2017-02-2123-206/+456
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "For this cycle we add support for the shutdown ioctl, which is primarily used for testing, but which can be useful on production systems when a scratch volume is being destroyed and the data on it doesn't need to be saved. This found (and we fixed) a number of bugs with ext4's recovery to corrupted file system --- the bugs increased the amount of data that could be potentially lost, and in the case of the inline data feature, could cause the kernel to BUG. Also included are a number of other bug fixes, including in ext4's fscrypt, DAX, inline data support" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN ext4: fix fencepost in s_first_meta_bg validation ext4: don't BUG when truncating encrypted inodes on the orphan list ext4: do not use stripe_width if it is not set ext4: fix stripe-unaligned allocations dax: assert that i_rwsem is held exclusive for writes ext4: fix DAX write locking ext4: add EXT4_IOC_GOINGDOWN ioctl ext4: add shutdown bit and check for it ext4: rename s_resize_flags to s_ext4_flags ext4: return EROFS if device is r/o and journal replay is needed ext4: preserve the needs_recovery flag when the journal is aborted jbd2: don't leak modified metadata buffers on an aborted journal ext4: fix inline data error paths ext4: move halfmd4 into hash.c directly ext4: fix use-after-iput when fscrypt contexts are inconsistent jbd2: fix use after free in kjournald2() ext4: fix data corruption in data=journal mode ext4: trim allocation requests to group size ext4: replace BUG_ON with WARN_ON in mb_find_extent() ...
| * ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWNTheodore Ts'o2017-02-202-5/+5
| | | | | | | | | | | | | | It's very likely the file system independent ioctl name will be FS_IOC_SHUTDOWN, so let's use the same name for the ext4 ioctl name. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix fencepost in s_first_meta_bg validationTheodore Ts'o2017-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: don't BUG when truncating encrypted inodes on the orphan listTheodore Ts'o2017-02-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a BUG when the kernel tries to mount a file system constructed as follows: echo foo > foo.txt mke2fs -Fq -t ext4 -O encrypt foo.img 100 debugfs -w foo.img << EOF write foo.txt a set_inode_field a i_flags 0x80800 set_super_value s_last_orphan 12 quit EOF root@kvm-xfstests:~# mount -o loop foo.img /mnt [ 160.238770] ------------[ cut here ]------------ [ 160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874! [ 160.240106] invalid opcode: 0000 [#1] SMP [ 160.240106] Modules linked in: [ 160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G W 4.10.0-rc3-00034-gcdd33b941b67 #227 [ 160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 160.240106] task: f4518000 task.stack: f47b6000 [ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 [ 160.240106] EFLAGS: 00010246 CPU: 0 [ 160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007 [ 160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac [ 160.240106] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0 [ 160.240106] Call Trace: [ 160.240106] ext4_truncate+0x1e9/0x3e5 [ 160.240106] ext4_fill_super+0x286f/0x2b1e [ 160.240106] ? set_blocksize+0x2e/0x7e [ 160.240106] mount_bdev+0x114/0x15f [ 160.240106] ext4_mount+0x15/0x17 [ 160.240106] ? ext4_calculate_overhead+0x39d/0x39d [ 160.240106] mount_fs+0x58/0x115 [ 160.240106] vfs_kern_mount+0x4b/0xae [ 160.240106] do_mount+0x671/0x8c3 [ 160.240106] ? _copy_from_user+0x70/0x83 [ 160.240106] ? strndup_user+0x31/0x46 [ 160.240106] SyS_mount+0x57/0x7b [ 160.240106] do_int80_syscall_32+0x4f/0x61 [ 160.240106] entry_INT80_32+0x2f/0x2f [ 160.240106] EIP: 0xb76b919e [ 160.240106] EFLAGS: 00000246 CPU: 0 [ 160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8 [ 160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660 [ 160.240106] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b [ 160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9 [ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac [ 160.317241] ---[ end trace d6a773a375c810a5 ]--- The problem is that when the kernel tries to truncate an inode in ext4_truncate(), it tries to clear any on-disk data beyond i_size. Without the encryption key, it can't do that, and so it triggers a BUG. E2fsck does *not* provide this service, and in practice most file systems have their orphan list processed by e2fsck, so to avoid crashing, this patch skips this step if we don't have access to the encryption key (which is the case when processing the orphan list; in all other cases, we will have the encryption key, or the kernel wouldn't have allowed the file to be opened). An open question is whether the fact that e2fsck isn't clearing the bytes beyond i_size causing problems --- and if we've lived with it not doing it for so long, can we drop this from the kernel replay of the orphan list in all cases (not just when we don't have the key for encrypted inodes). Addresses-Google-Bug: #35209576 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: do not use stripe_width if it is not setJan Kara2017-02-101-2/+2
| | | | | | | | | | | | | | | | Avoid using stripe_width for sbi->s_stripe value if it is not actually set. It prevents using the stride for sbi->s_stripe. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix stripe-unaligned allocationsJan Kara2017-02-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a filesystem is created using: mkfs.ext4 -b 4096 -E stride=512 <dev> and we try to allocate 64MB extent, we will end up directly in ext4_mb_complex_scan_group(). This is because the request is detected as power-of-two allocation (so we start in ext4_mb_regular_allocator() with ac_criteria == 0) however the check before ext4_mb_simple_scan_group() refuses the direct buddy scan because the allocation request is too large. Since cr == 0, the check whether we should use ext4_mb_scan_aligned() fails as well and we fall back to ext4_mb_complex_scan_group(). Fix the problem by checking for upper limit on power-of-two requests directly when detecting them. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * dax: assert that i_rwsem is held exclusive for writesChristoph Hellwig2017-02-081-1/+5
| | | | | | | | | | | | | | | | | | Make sure all callers follow the same locking protocol, given that DAX transparantly replaced the normal buffered I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * ext4: fix DAX write lockingChristoph Hellwig2017-02-081-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Unlike O_DIRECT DAX is not an optional opt-in feature selected by the application, so we'll have to provide the traditional synchronіzation of overlapping writes as we do for buffered writes. This was broken historically for DAX, but got fixed for ext2 and XFS as part of the iomap conversion. Fix up ext4 as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * ext4: add EXT4_IOC_GOINGDOWN ioctlTheodore Ts'o2017-02-063-1/+61
| | | | | | | | | | | | | | This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl. (In fact, it uses the same code points.) Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add shutdown bit and check for itTheodore Ts'o2017-02-0511-3/+103
| | | | | | | | | | | | | | Add a shutdown bit that will cause ext4 processing to fail immediately with EIO. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: rename s_resize_flags to s_ext4_flagsTheodore Ts'o2017-02-052-5/+10
| | | | | | | | | | | | | | We are currently using one bit in s_resize_flags; rename it in order to allow more of the bits in that unsigned long for other purposes. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: return EROFS if device is r/o and journal replay is neededTheodore Ts'o2017-02-051-1/+2
| | | | | | | | | | | | | | | | | | If the file system requires journal recovery, and the device is read-ony, return EROFS to the mount system call. This allows xfstests generic/050 to pass. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: preserve the needs_recovery flag when the journal is abortedTheodore Ts'o2017-02-051-2/+4
| | | | | | | | | | | | | | | | | | If the journal is aborted, the needs_recovery feature flag should not be removed. Otherwise, it's the journal might not get replayed and this could lead to more data getting lost. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * jbd2: don't leak modified metadata buffers on an aborted journalTheodore Ts'o2017-02-051-1/+3
| | | | | | | | | | | | | | | | | | | | If the journal has been aborted, we shouldn't mark the underlying buffer head as dirty, since that will cause the metadata block to get modified. And if the journal has been aborted, we shouldn't allow this since it will almost certainly lead to a corrupted file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: fix inline data error pathsTheodore Ts'o2017-02-052-6/+23
| | | | | | | | | | | | | | | | The write_end() function must always unlock the page and drop its ref count, even on an error. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: move halfmd4 into hash.c directlyJason A. Donenfeld2017-02-024-71/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "half md4" transform should not be used by any new code. And fortunately, it's only used now by ext4. Since ext4 supports several hashing methods, at some point it might be desirable to move to something like SipHash. As an intermediate step, remove half md4 from cryptohash.h and lib, and make it just a local function in ext4's hash.c. There's precedent for doing this; the other function ext can use for its hashes -- TEA -- is also implemented in the same place. Also, by being a local function, this might allow gcc to perform some additional optimizations. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix use-after-iput when fscrypt contexts are inconsistentEric Biggers2017-02-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where the child's encryption context was inconsistent with its parent directory, we were using inode->i_sb and inode->i_ino after the inode had already been iput(). Fix this by doing the iput() in the correct places. Note: only ext4 had this bug, not f2fs and ubifs. Fixes: d9cdc9033181 ("ext4 crypto: enforce context consistency") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * jbd2: fix use after free in kjournald2()Sahitya Tummala2017-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2(). TASK 1: umount cmd: |--jbd2_journal_destroy() { |--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } |--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. } Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix data corruption in data=journal modeJan Kara2017-01-271-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_journalled_write_end() did not propely handle all the cases when generic_perform_write() did not copy all the data into the target page and could mark buffers with uninitialized contents as uptodate and dirty leading to possible data corruption (which would be quickly fixed by generic_perform_write() retrying the write but still). Fix the problem by carefully handling the case when the page that is written to is not uptodate. CC: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: trim allocation requests to group sizeJan Kara2017-01-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | If filesystem groups are artifically small (using parameter -g to mkfs.ext4), ext4_mb_normalize_request() can result in a request that is larger than a block group. Trim the request size to not confuse allocation code. Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: replace BUG_ON with WARN_ON in mb_find_extent()Theodore Ts'o2017-01-231-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | The last BUG_ON in mb_find_extent() is apparently triggering in some rare cases. Most of the time it indicates a bug in the buddy bitmap algorithms, but there are some weird cases where it can trigger when buddy bitmap is still in memory, but the block bitmap has to be read from disk, and there is disk or memory corruption such that the block bitmap and the buddy bitmap are out of sync. Google-Bug-Id: #33702157 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: propagate error values from ext4_inline_data_truncate()Theodore Ts'o2017-01-233-19/+27
| | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphoresTheodore Ts'o2017-01-122-7/+4
| | | | | | | | | | | | | | | | There is no need to call ext4_mark_inode_dirty while holding xattr_sem or i_data_sem, so where it's easy to avoid it, move it out from the critical region. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()Theodore Ts'o2017-01-123-54/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by 2e81a4eeedca. The deadlock can be reproduced via: dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo and looks like this: [ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] *** DEADLOCK *** [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq) Reported-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add debug_want_extra_isize mount optionTheodore Ts'o2017-01-111-2/+7
| | | | | | | | | | | | | | | | In order to test the inode extra isize expansion code, it is useful to be able to easily create file systems that have inodes with extra isize values smaller than the current desired value. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: do not polute the extents cache while shifting extentsRoman Pen2017-01-091-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inside ext4_ext_shift_extents() function ext4_find_extent() is called without EXT4_EX_NOCACHE flag, which should prevent cache population. This leads to oudated offsets in the extents tree and wrong blocks afterwards. Patch fixes the problem providing EXT4_EX_NOCACHE flag for each ext4_find_extents() call inside ext4_ext_shift_extents function. Fixes: 331573febb6a2 Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org
| * ext4: Include forgotten start block on fallocate insert rangeRoman Pen2017-01-091-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing 'insert range' start block should be also shifted right. The bug can be easily reproduced by the following test: ptr = malloc(4096); assert(ptr); fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600); assert(fd >= 0); rc = fallocate(fd, 0, 0, 8192); assert(rc == 0); for (i = 0; i < 2048; i++) *((unsigned short *)ptr + i) = 0xbeef; rc = pwrite(fd, ptr, 4096, 0); assert(rc == 4096); rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); for (block = 2; block < 1000; block++) { rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096); assert(rc == 0); for (i = 0; i < 2048; i++) *((unsigned short *)ptr + i) = block; rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); } Because start block is not included in the range the hole appears at the wrong offset (just after the desired offset) and the following pwrite() overwrites already existent block, keeping hole untouched. Simple way to verify wrong behaviour is to check zeroed blocks after the test: $ hexdump ./ext4.file | grep '0000 0000' The root cause of the bug is a wrong range (start, stop], where start should be inclusive, i.e. [start, stop]. This patch fixes the problem by including start into the range. But not to break left shift (range collapse) stop points to the beginning of the a block, not to the end. The other not obvious change is an iterator check on validness in a main loop. Because iterator is unsigned the following corner case should be considered with care: insert a block at 0 offset, when stop variables overflows and never becomes less than start, which is 0. To handle this special case iterator is set to NULL to indicate that end of the loop is reached. Fixes: 331573febb6a2 Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org
| * Merge branch 'fscrypt' into dTheodore Ts'o2017-01-0918-301/+263
| |\
* | \ Merge tag 'fscrypt-for-linus' of ↵Linus Torvalds2017-02-2125-722/+659
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Various cleanups for the file system encryption feature" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: constify struct fscrypt_operations fscrypt: properly declare on-stack completion fscrypt: split supp and notsupp declarations into their own headers fscrypt: remove redundant assignment of res fscrypt: make fscrypt_operations.key_prefix a string fscrypt: remove unused 'mode' member of fscrypt_ctx ext4: don't allow encrypted operations without keys fscrypt: make test_dummy_encryption require a keyring key fscrypt: factor out bio specific functions fscrypt: pass up error codes from ->get_context() fscrypt: remove user-triggerable warning messages fscrypt: use EEXIST when file already uses different policy fscrypt: use ENOTDIR when setting encryption policy on nondirectory fscrypt: use ENOKEY when file cannot be created w/o key
| * | | fscrypt: constify struct fscrypt_operationsEric Biggers2017-02-085-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at>
| * | | fscrypt: properly declare on-stack completionRichard Weinberger2017-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a completion is declared on-stack we have to use COMPLETION_INITIALIZER_ONSTACK(). Fixes: 0b81d07790726 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | fscrypt: split supp and notsupp declarations into their own headersEric Biggers2017-02-0710-421/+397
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, each filesystem configured without encryption support would define all the public fscrypt functions to their notsupp_* stubs. This list of #defines had to be updated in every filesystem whenever a change was made to the public fscrypt functions. To make things more maintainable now that we have three filesystems using fscrypt, split the old header fscrypto.h into several new headers. fscrypt_supp.h contains the real declarations and is included by filesystems when configured with encryption support, whereas fscrypt_notsupp.h contains the inline stubs and is included by filesystems when configured without encryption support. fscrypt_common.h contains common declarations needed by both. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | fscrypt: remove redundant assignment of resColin Ian King2017-02-071-1/+0
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | res is assigned to sizeof(ctx), however, this is unused and res is updated later on without that assigned value to res ever being used. Remove this redundant assignment. Fixes CoverityScan CID#1395546 "Unused value" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: make fscrypt_operations.key_prefix a stringEric Biggers2017-01-087-76/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: remove unused 'mode' member of fscrypt_ctxEric Biggers2017-01-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Nothing reads or writes fscrypt_ctx.mode, and it doesn't belong there because a fscrypt_ctx is not tied to a specific encryption mode. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | ext4: don't allow encrypted operations without keysTheodore Ts'o2017-01-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we allow deletes without the key, the following should not be permitted: # cd /vdc/encrypted-dir-without-key # ls -l total 4 -rw-r--r-- 1 root root 0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB -rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD # mv uRJ5vJh9gE7vcomYMqTAyD 6,LKNRJsp209FbXoSvJWzB Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: make test_dummy_encryption require a keyring keyTheodore Ts'o2017-01-022-24/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the test_dummy_encryption ext4 mount option, which exists only to test encrypted I/O paths with xfstests, overrides all per-inode encryption keys with a fixed key. This change minimizes test_dummy_encryption-specific code path changes by supplying a fake context for directories which are not encrypted for use when creating new directories, files, or symlinks. This allows us to properly exercise the keyring lookup, derivation, and context inheritance code paths. Before mounting a file system using test_dummy_encryption, userspace must execute the following shell commands: mode='\x00\x00\x00\x00' raw="$(printf ""\\\\x%02x"" $(seq 0 63))" if lscpu | grep "Byte Order" | grep -q Little ; then size='\x40\x00\x00\x00' else size='\x00\x00\x00\x40' fi key="${mode}${raw}${size}" keyctl new_session echo -n -e "${key}" | keyctl padd logon fscrypt:4242424242424242 @s Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: factor out bio specific functionsRichard Weinberger2017-01-016-147/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That way we can get rid of the direct dependency on CONFIG_BLOCK. Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: pass up error codes from ->get_context()Eric Biggers2016-12-311-31/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was possible for the ->get_context() operation to fail with a specific error code, which was then not returned to the caller of FS_IOC_SET_ENCRYPTION_POLICY or FS_IOC_GET_ENCRYPTION_POLICY. Make sure to pass through these error codes. Also reorganize the code so that ->get_context() only needs to be called one time when setting an encryption policy, and handle contexts of unrecognized sizes more appropriately. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: remove user-triggerable warning messagesEric Biggers2016-12-311-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several warning messages were not rate limited and were user-triggerable from FS_IOC_SET_ENCRYPTION_POLICY. These shouldn't really have been there in the first place, but either way they aren't as useful now that the error codes have been improved. So just remove them. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: use EEXIST when file already uses different policyEric Biggers2016-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of an effort to clean up fscrypt-related error codes, make FS_IOC_SET_ENCRYPTION_POLICY fail with EEXIST when the file already uses a different encryption policy. This is more descriptive than EINVAL, which was ambiguous with some of the other error cases. I am not aware of any users who might be relying on the previous error code of EINVAL, which was never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: use ENOTDIR when setting encryption policy on nondirectoryEric Biggers2016-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of an effort to clean up fscrypt-related error codes, make FS_IOC_SET_ENCRYPTION_POLICY fail with ENOTDIR when the file descriptor does not refer to a directory. This is more descriptive than EINVAL, which was ambiguous with some of the other error cases. I am not aware of any users who might be relying on the previous error code of EINVAL, which was never documented anywhere, and in some buggy kernels did not exist at all as the S_ISDIR() check was missing. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | fscrypt: use ENOKEY when file cannot be created w/o keyEric Biggers2016-12-315-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | Merge tag 'device-properties-4.11-rc1' of ↵Linus Torvalds2017-02-214-90/+177
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device property updates from Rafael J. Wysocki: "Generic device properties framework updates for v4.11-rc1 Allow built-in (static) device properties to be declared as constant, make it possible to save memory by discarding alternative (but unused) built-in (static) property sets and add support for automatic handling of built-in properties to the I2C code" * tag 'device-properties-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: i2c: allow specify device properties in i2c_board_info device property: export code duplicating array of property entries device property: constify property arrays values device property: allow to constify properties
| * | | i2c: allow specify device properties in i2c_board_infoDmitry Torokhov2017-02-082-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With many drivers converting to using generic device properties, it is useful to provide array of device properties when instantiating new i2c client via i2c_board_info and have them automatically added to the device in question. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | device property: export code duplicating array of property entriesDmitry Torokhov2017-02-072-65/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When augmenting ACPI-enumerated devices with additional property data based on DMI info, a module has often several potential property sets, with only one being active on a given box. In order to save memory it should be possible to mark everything and __initdata or __initconst, execute DMI match early, and duplicate relevant properties. Then kernel will discard the rest of them. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | device property: constify property arrays valuesDmitry Torokhov2017-02-072-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Data that is fed into property arrays should not be modified, so let's mark relevant pointers as const. This will allow us making source arrays as const/__initconst. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | device property: allow to constify propertiesDmitry Torokhov2017-02-072-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason why statically defined properties should be modifiable, so let's make device_add_properties() and the rest of pset_*() functions to take const pointers to properties. This will allow us to mark properties as const/__initconst at definition sites. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | Merge tag 'acpi-4.11-rc1' of ↵Linus Torvalds2017-02-21240-506/+838
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20170119, which among other things updates copyright notices in all of the ACPICA files, fix a couple of issues in the ACPI EC and button drivers, fix modalias handling for non-discoverable devices with DT-compatible identification strings, add a suspend quirk for one platform and fix a message in the APEI code. Specifics: - Update of the ACPICA code in the kernel to upstream revision 20170119 including: + Fixes related to the handling of the bit width and bit offset fields in Generic Address Structure (Lv Zheng) + ACPI resources handling fix related to invalid resource descriptors (Bob Moore) + Fix to enable implicit result conversion for several ASL library functions (Bob Moore) + Support for method invocations as target operands in AML (Bob Moore) + Fix to use a correct operand type for DeRefOf() in some situations (Bob Moore) + Utilities updates (Bob Moore, Lv Zheng) + Disassembler/debugger updates (David Box, Lv Zheng) + Build fixes (Colin Ian King, Lv Zheng) + Update of copyright notices in all files (Bob Moore) - Fix for modalias handling for SPI and I2C devices with DT-compatible identification strings (Dan O'Donovan) - Fixes for the ACPI EC and button drivers (Lv Zheng) - ACPI processor handling fix related to CPU hotplug (online/offline) on x86 (Vitaly Kuznetsov) - Suspend quirk to save/restore NVS memory over S3 transitions for Lenovo G50-45 (Zhang Rui) - Message formatting fix for the ACPI APEI code (Colin Ian King)" * tag 'acpi-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) ACPICA: Update version to 20170119 ACPICA: Tools: Update common signon, remove compilation bit width ACPICA: Source tree: Update copyright notices to 2017 ACPICA: Linuxize: Restore and fix Intel compiler build x86/ACPI: keep x86_cpu_to_acpiid mapping valid on CPU hotplug spi: acpi: Initialize modalias from of_compatible i2c: acpi: Initialize info.type from of_compatible ACPI / bus: Introduce acpi_of_modalias() equiv of of_modalias_node() ACPI: save NVS memory for Lenovo G50-45 ACPI, APEI, EINJ: fix malformed newline escape ACPI / button: Remove lid_init_state=method mode ACPI / button: Change default behavior to lid_init_state=open ACPI / EC: Use busy polling mode when GPE is not enabled ACPI / EC: Remove old CLEAR_ON_RESUME quirk ACPICA: Update version to 20161222 ACPICA: Parser: Update parse info table for some operators ACPICA: Fix a problem with recent extra support for control method invocations ACPICA: Parser: Allow method invocations as target operands ACPICA: Fix for implicit result conversion for the ToXXX functions ACPICA: Resources: Not a valid resource if buffer length too long ..
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| *---. \ \ \ Merge branches 'acpi-ec', 'acpi-button' and 'acpi-apei'Rafael J. Wysocki2017-02-205-110/+38
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-ec: ACPI / EC: Use busy polling mode when GPE is not enabled ACPI / EC: Remove old CLEAR_ON_RESUME quirk * acpi-button: ACPI / button: Remove lid_init_state=method mode ACPI / button: Change default behavior to lid_init_state=open * acpi-apei: ACPI, APEI, EINJ: fix malformed newline escape