summaryrefslogtreecommitdiffstats
path: root/fs/ext4 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ext4: handle errors on ext4_commit_superJaegeuk Kim2018-05-141-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | When remounting ext4 from ro to rw, currently it allows its transition, even if ext4_commit_super() returns EIO. Even worse thing is, after that, fs/buffer complains buffer dirty bits like: Call trace: [<ffffff9750c259dc>] mark_buffer_dirty+0x184/0x1a4 [<ffffff9750cb398c>] __ext4_handle_dirty_super+0x4c/0xfc [<ffffff9750c7a9fc>] ext4_file_open+0x154/0x1c0 [<ffffff9750bea51c>] do_dentry_open+0x114/0x2d0 [<ffffff9750bea75c>] vfs_open+0x5c/0x94 [<ffffff9750bf879c>] path_openat+0x668/0xfe8 [<ffffff9750bf8088>] do_filp_open+0x74/0x120 [<ffffff9750beac98>] do_sys_open+0x148/0x254 [<ffffff9750beade0>] SyS_openat+0x10/0x18 [<ffffff9750a83ab0>] el0_svc_naked+0x24/0x28 EXT4-fs (dm-1): previous I/O error to superblock detected Buffer I/O error on dev dm-1, logical block 0, lost sync page write EXT4-fs (dm-1): re-mounted. Opts: (null) Buffer I/O error on dev dm-1, logical block 80, lost async page write Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: do not update s_last_mounted of a frozen fsAmir Goldstein2018-05-141-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If fs is frozen after mount and before the first file open, the update of s_last_mounted bypasses freeze protection and prints out a WARNING splat: $ mount /vdf $ fsfreeze -f /vdf $ cat /vdf/foo [ 31.578555] WARNING: CPU: 1 PID: 1415 at fs/ext4/ext4_jbd2.c:53 ext4_journal_check_start+0x48/0x82 [ 31.614016] Call Trace: [ 31.614997] __ext4_journal_start_sb+0xe4/0x1a4 [ 31.616771] ? ext4_file_open+0xb6/0x189 [ 31.618094] ext4_file_open+0xb6/0x189 If fs is frozen, skip s_last_mounted update. [backport hint: to apply to stable tree, need to apply also patches vfs: add the sb_start_intwrite_trylock() helper ext4: factor out helper ext4_sample_last_mounted()] Cc: stable@vger.kernel.org Fixes: bc0b0d6d69ee ("ext4: update the s_last_mounted field in the superblock") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: factor out helper ext4_sample_last_mounted()Amir Goldstein2018-05-141-36/+46
| | | | | | Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: update mtime in ext4_punch_hole even if no blocks are releasedLukas Czerner2018-05-141-18/+18
| | | | | | | | | | | | | | Currently in ext4_punch_hole we're going to skip the mtime update if there are no actual blocks to release. However we've actually modified the file by zeroing the partial block so the mtime should be updated. Moreover the sync and datasync handling is skipped as well, which is also wrong. Fix it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Joe Habermann <joe.habermann@quantum.com> Cc: <stable@vger.kernel.org>
* ext4: add verifier check for symlink with append/immutable flagsLuis R. Rodriguez2018-05-131-0/+7
| | | | | | | | | | The Linux VFS does not allow a way to set append/immuttable attributes to symlinks, this is just not possible. If this is detected inform the user as the filesystem must be corrupted. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* fs: ext4: add new return type vm_fault_tSouptick Joarder2018-05-131-3/+4
| | | | | | | | | | | | | Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
* ext4: fix hole length detection in ext4_ind_map_blocks()Jan Kara2018-05-131-4/+10
| | | | | | | | | | | | | | | | | When ext4_ind_map_blocks() computes a length of a hole, it doesn't count with the fact that mapped offset may be somewhere in the middle of the completely empty subtree. In such case it will return too large length of the hole which then results in lseek(SEEK_DATA) to end up returning an incorrect offset beyond the end of the hole. Fix the problem by correctly taking offset within a subtree into account when computing a length of a hole. Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8 CC: stable@vger.kernel.org Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: mark block bitmap corrupted when foundWang Shilong2018-05-122-0/+10
| | | | | | | | | | | | | | | | There are still some cases that we missed to set block bitmaps corrupted bit properly: 1) block bitmap number is wrong. 2) failed to read block bitmap due to disk errors. 3) double free block bitmaps.. 4) some mismatch check with bitmaps vs buddy information. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
* ext4: mark inode bitmap corrupted when foundWang Shilong2018-05-121-4/+9
| | | | | | | | | | | | | | | | | There are still some cases that we missed to set block bitmaps corrupted bit properly: 1)inode bitmap number is wrong. 2)failed to read block bitmap due to disk errors. 3)double allocations from bitmap Also remove a duplicated call ext4_error() afer ext4_read_inode_bitmap(), as ext4_error() have been called inside ext4_read_inode_bitmap() properly. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
* ext4: add new ext4_mark_group_bitmap_corrupted() helperWang Shilong2018-05-125-48/+52
| | | | | | | | | | Since there are many places to set inode/block bitmap corrupt bit, add a new helper for it, which will make codes more clear. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
* ext4: fix wrong return value in ext4_read_inode_bitmap()Wang Shilong2018-05-121-1/+1
| | | | | | | | | The only reason that sb_getblk() could fail is out of memory, ext4 codes have returned -ENOMME for all other places except this one, let's fix it here too. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use raw i_version value for ea_inodeEryu Guan2018-05-101-2/+22
| | | | | | | | | | | | | | | | | | | | | | Currently, creating large xattr (e.g. 2k) in ea_inode would cause ea_inode refcount corruption, e.g. Pass 4: Checking reference counts Extended attribute inode 13 ref count is 0, should be 1. Fix? no This is because that we save the lower 32bit of refcount in inode->i_version and store it in raw_inode->i_disk_version on disk. But since commit ee73f9a52a34 ("ext4: convert to new i_version API"), we load/store modified i_disk_version from/to disk instead of raw value, which causes on-disk ea_inode refcount corruption. Fix it by loading/storing raw i_version/i_disk_version, because it's a self-managed value in this case. Fixes: ee73f9a52a34 ("ext4: convert to new i_version API") Cc: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use XATTR_CREATE in ext4_initxattrs()Eryu Guan2018-05-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I hit ENOSPC error when creating new file in a newly created ext4 with ea_inode feature enabled, if selinux is enabled and ext4 is mounted without any selinux context. e.g. mkfs -t ext4 -O ea_inode -F /dev/sda5 mount /dev/sda5 /mnt/ext4 touch /mnt/ext4/testfile # got ENOSPC here It turns out that we run out of journal credits in ext4_xattr_set_handle() when creating new selinux label for the newly created inode. This is because that in __ext4_new_inode() we use __ext4_xattr_set_credits() to calculate the reserved credits for new xattr, with the 'is_create' argument being true, which implies less credits in the ea_inode case. But we calculate the required credits in ext4_xattr_set_handle() with 'is_create' being false, which means we need more credits if ea_inode feature is enabled. So we don't have enough credits and error out with ENOSPC. Fix it by simply calling ext4_xattr_set_handle() with XATTR_CREATE flag in ext4_initxattrs(), so we end up with requiring less credits than reserved. The semantic of XATTR_CREATE is "Perform a pure create, which fails if the named attribute exists already." (from setxattr(2)), which is fine in this case, because we only call ext4_initxattrs() on newly created inode. Fixes: af65207c76ce ("ext4: fix __ext4_new_inode() journal credits calculation") Cc: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: make function ‘ext4_getfsmap_find_fixed_metadata’ staticMathieu Malaterre2018-05-101-2/+2
| | | | | | | | | | Since function ‘ext4_getfsmap_find_fixed_metadata’ can be made static, make it so. Remove the following gcc warning (W=1): fs/ext4/fsmap.c:405:5: warning: no previous prototype for ‘ext4_getfsmap_find_fixed_metadata’ [-Wmissing-prototypes] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Merge tag 'for_linus_stable' of ↵Linus Torvalds2018-04-293-9/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix misc bugs and a regression for ext4" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs ext4: fix bitmap position validation ext4: set h_journal if there is a failure starting a reserved handle ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
| * ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfsTheodore Ts'o2018-04-261-0/+1
| | | | | | | | | | | | | | Fixes: a45403b51582 ("ext4: always initialize the crc32c checksum driver") Reported-by: François Valenduc <francoisvalenduc@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: fix bitmap position validationLukas Czerner2018-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in ext4_valid_block_bitmap() we expect the bitmap to be positioned anywhere between 0 and s_blocksize clusters, but that's wrong because the bitmap can be placed anywhere in the block group. This causes false positives when validating bitmaps on perfectly valid file system layouts. Fix it by checking whether the bitmap is within the group boundary. The problem can be reproduced using the following mkfs -t ext3 -E stride=256 /dev/vdb1 mount /dev/vdb1 /mnt/test cd /mnt/test wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz tar xf linux-4.16.3.tar.xz This will result in the warnings in the logs EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap [ Changed slightly for clarity and to not drop a overflow test -- TYT ] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers") Cc: stable@vger.kernel.org
| * ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKSEric Biggers2018-04-121-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the "insert range" fallocate operation, extents starting at the range offset are shifted "right" (to a higher file offset) by the range length. But, as shown by syzbot, it's not validated that this doesn't cause extents to be shifted beyond EXT_MAX_BLOCKS. In that case ->ee_block can wrap around, corrupting the extent tree. Fix it by returning an error if the space between the end of the last extent and EXT4_MAX_BLOCKS is smaller than the range being inserted. This bug can be reproduced by running the following commands when the current directory is on an ext4 filesystem with a 4k block size: fallocate -l 8192 file fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file fallocate --insert-range -l 8192 file Then after unmounting the filesystem, e2fsck reports corruption. Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds2018-04-101-11/+31
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
| * | ext4, dax: introduce ext4_dax_aopsDan Williams2018-03-301-11/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Otherwise, direct-I/O triggers incorrect page cache assumptions and warnings. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | Merge branch 'work.misc' of ↵Linus Torvalds2018-04-061-2/+2
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, including Christoph's I_DIRTY patches" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: move I_DIRTY_INODE to fs.h ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls fs: fold open_check_o_direct into do_dentry_open vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents vfs: make sure struct filename->iname is word-aligned get rid of pointless includes of fs_struct.h [poll] annotate SAA6588_CMD_POLL users
| * | fs: move I_DIRTY_INODE to fs.hChristoph Hellwig2018-03-281-2/+2
| |/ | | | | | | | | | | | | And use it in a few more places rather than opencoding the values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ext4: force revalidation of directory pointer after seekdir(2)Theodore Ts'o2018-04-021-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | A malicious user could force the directory pointer to be in an invalid spot by using seekdir(2). Use the mechanism we already have to notice if the directory has changed since the last time we called ext4_readdir() to force a revalidation of the pointer. Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: add extra checks to ext4_xattr_block_get()Theodore Ts'o2018-03-312-7/+30
| | | | | | | | | | | | | | | | | | | | Add explicit checks in ext4_xattr_block_get() just in case the e_value_offs and e_value_size fields in the the xattr block are corrupted in memory after the buffer_verified bit is set on the xattr block. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* | ext4: add bounds checking to ext4_xattr_find_entry()Theodore Ts'o2018-03-311-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Add some paranoia checks to make sure we don't stray beyond the end of the valid memory region containing ext4 xattr entries while we are scanning for a match. Also rename the function to xattr_find_entry() since it is static and thus only used in fs/ext4/xattr.c Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* | ext4: move call to ext4_error() into ext4_xattr_check_block()Theodore Ts'o2018-03-301-33/+27
| | | | | | | | | | | | | | | | | | | | Refactor the call to EXT4_ERROR_INODE() into ext4_xattr_check_block(). This simplifies the code, and fixes a problem where not all callers of ext4_xattr_check_block() were not resulting in ext4_error() getting called when the xattr block is corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: don't show data=<mode> option if defaultedTyson Nottingham2018-03-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, mount -l would show data=<mode> even if the ext4 default journaling mode was being used. Change this to be consistent with the rest of the options. Ext4 already did the right thing when the journaling mode being used matched the one specified in the superblock's default mount options. The reason it failed to do the right thing for the ext4 defaults is that, when set, they were never included in sbi->s_def_mount_opt (unlike the superblock's defaults, which were). Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: omit init_itable=n in procfs when disabledTyson Nottingham2018-03-301-1/+1
| | | | | | | | | | | | | | | | Don't show init_itable=n in /proc/fs/ext4/<dev>/options when filesystem is mounted with noinit_itable. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: show more binary mount options in procfsTyson Nottingham2018-03-301-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, /proc/fs/ext4/<dev>/options would only show binary options if they were set (1 in the options bit mask). E.g. it would show "grpid" if it was set, but it would not show "nogrpid" if grpid was not set. This seems sensible, but when an option is absent from the file, it can be hard for the unfamiliar to know what is being used. E.g. if there isn't a (no)grpid entry, nogrpid is in effect. But if there isn't a (no)auto_da_alloc entry, auto_da_alloc is in effect. If there isn't a (minixdf|bsddf) entry, it turns out bsddf is in effect. It all depends on how the option is implemented. It's clearer to be explicit, so print the corresponding option regardless of whether it means a 1 or a 0 in the bit mask. Note that options which do not have an explicit disable option aren't indicated as being disabled even with this change (e.g. dax). Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: simplify kobject usageTyson Nottingham2018-03-301-33/+12
| | | | | | | | | | | | | | | | Replace kset with generic kobject provided by kobject_create_and_add(), since the latter is sufficient. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: remove unused parameters in sysfs codeTyson Nottingham2018-03-301-15/+10
| | | | | | | | | | Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: null out kobject* during sysfs cleanupTyson Nottingham2018-03-301-0/+2
| | | | | | | | | | | | | | Make cleanup of ext4_feat kobject consistent with similar objects. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: don't allow r/w mounts if metadata blocks overlap the superblockTheodore Ts'o2018-03-301-0/+6
| | | | | | | | | | | | | | | | | | | | If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: always initialize the crc32c checksum driverTheodore Ts'o2018-03-301-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extended attribute code now uses the crc32c checksum for hashing purposes, so we should just always always initialize it. We also want to prevent NULL pointer dereferences if one of the metadata checksum features is enabled after the file sytsem is originally mounted. This issue has been assigned CVE-2018-1094. https://bugzilla.kernel.org/show_bug.cgi?id=199183 https://bugzilla.redhat.com/show_bug.cgi?id=1560788 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: fail ext4_iget for root directory if unallocatedTheodore Ts'o2018-03-301-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the root directory has an i_links_count of zero, then when the file system is mounted, then when ext4_fill_super() notices the problem and tries to call iput() the root directory in the error return path, ext4_evict_inode() will try to free the inode on disk, before all of the file system structures are set up, and this will result in an OOPS caused by a NULL pointer dereference. This issue has been assigned CVE-2018-1092. https://bugzilla.kernel.org/show_bug.cgi?id=199179 https://bugzilla.redhat.com/show_bug.cgi?id=1560777 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: limit xattr size to INT_MAXEric Biggers2018-03-291-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4 isn't validating the sizes of xattrs where the value of the xattr is stored in an external inode. This is problematic because ->e_value_size is a u32, but ext4_xattr_get() returns an int. A very large size is misinterpreted as an error code, which ext4_get_acl() translates into a bogus ERR_PTR() for which IS_ERR() returns false, causing a crash. Fix this by validating that all xattrs are <= INT_MAX bytes. This issue has been assigned CVE-2018-1095. https://bugzilla.kernel.org/show_bug.cgi?id=199185 https://bugzilla.redhat.com/show_bug.cgi?id=1560793 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
* | ext4: add validity checks for bitmap block numbersTheodore Ts'o2018-03-272-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | An privileged attacker can cause a crash by mounting a crafted ext4 image which triggers a out-of-bounds read in the function ext4_valid_block_bitmap() in fs/ext4/balloc.c. This issue has been assigned CVE-2018-1093. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: fix comments in ext4_swap_extents()zhenwei.pi2018-03-261-2/+2
| | | | | | | | | | | | | | | | "mark_unwritten" in comment and "unwritten" in the function arguments is mismatched. Signed-off-by: zhenwei.pi <zhenwei.pi@youruncloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: use generic_writepages instead of __writepage/write_cache_pagesGoldwyn Rodrigues2018-03-261-14/+1
| | | | | | | | | | | | | | | | Code cleanup. Instead of writing an internal static function, use the available generic_writepages(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: don't complain about incorrect features when probingEric Sandeen2018-03-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If mount is auto-probing for filesystem type, it will try various filesystems in order, with the MS_SILENT flag set. We get that flag as the silent arg to ext4_fill_super. If we're probing (silent==1) then don't complain about feature incompatibilities that are found if it looks like it's actually a different valid extN type - failed probes should be silent in this case. If the on-disk features are unknown even to ext4, then complain. Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Tested-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* | ext4: remove EXT4_STATE_DIOREAD_LOCK flagNikolay Borisov2018-03-226-54/+10
| | | | | | | | | | | | | | | | | | | | | | | | Commit 16c54688592c ("ext4: Allow parallel DIO reads") reworked the way locking happens around parallel dio reads. This resulted in obviating the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic. Currently this amounts to dead code so let's remove it. No functional changes Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* | ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()Jiri Slaby2018-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_iomap_begin() has a bug where offset returned in the iomap structure will be truncated to unsigned long size. On 64-bit architectures this is fine but on 32-bit architectures obviously not. Not many places actually use the offset stored in the iomap structure but one of visible failures is in SEEK_HOLE / SEEK_DATA implementation. If we create a file like: dd if=/dev/urandom of=file bs=1k seek=8m count=1 then lseek64("file", 0x100000000ULL, SEEK_DATA) wrongly returns 0x100000000 on unfixed kernel while it should return 0x200000000. Avoid the overflow by proper type cast. Fixes: 545052e9e35a ("ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org # v4.15
* | ext4: update i_disksize if direct write past ondisk sizeEryu Guan2018-03-221-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in ext4 direct write path, we update i_disksize only when new eof is greater than i_size, and don't update it even when new eof is greater than i_disksize but less than i_size. This doesn't work well with delalloc buffer write, which updates i_size and i_disksize only when delalloc blocks are resolved (at writeback time), the i_disksize from direct write can be lost if a previous buffer write succeeded at write time but failed at writeback time, then results in corrupted ondisk inode size. Consider this case, first buffer write 4k data to a new file at offset 16k with delayed allocation, then direct write 4k data to the same file at offset 4k before delalloc blocks are resolved, which doesn't update i_disksize because it writes within i_size(20k), but the extent tree metadata has been committed in journal. Then writeback of the delalloc blocks fails (due to device error etc.), and i_size/i_disksize from buffer write can't be written to disk (still zero). A subsequent umount/mount cycle recovers journal and writes extent tree metadata from direct write to disk, but with i_disksize being zero. Fix it by updating i_disksize too in direct write path when new eof is greater than i_disksize but less than i_size, so i_disksize is always consistent with direct write. This fixes occasional i_size corruption in fstests generic/475. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: protect i_disksize update by i_data_sem in direct write pathEryu Guan2018-03-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i_disksize update should be protected by i_data_sem, by either taking the lock explicitly or by using ext4_update_i_disksize() helper. But the i_disksize updates in ext4_direct_IO_write() are not protected at all, which may be racing with i_disksize updates in writeback path in delalloc buffer write path. This is found by code inspection, and I didn't hit any i_disksize corruption due to this bug. Thanks to Jan Kara for catching this bug and suggesting the fix! Reported-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: don't update checksum of new initialized bitmapsTheodore Ts'o2018-02-192-46/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading the inode or block allocation bitmap, if the bitmap needs to be initialized, do not update the checksum in the block group descriptor. That's because we're not set up to journal those changes. Instead, just set the verified bit on the bitmap block, so that it's not necessary to validate the checksum. When a block or inode allocation actually happens, at that point the checksum will be calculated, and update of the bg descriptor block will be properly journalled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: pass -ESHUTDOWN code to jbd2 layerTheodore Ts'o2018-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Previously the jbd2 layer assumed that a file system check would be required after a journal abort. In the case of the deliberate file system shutdown, this should not be necessary. Allow the jbd2 layer to distinguish between these two cases by using the ESHUTDOWN errno. Also add proper locking to __journal_abort_soft(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: eliminate sleep from shutdown ioctlTheodore Ts'o2018-02-191-3/+1
| | | | | | | | | | | | | | | | | | The msleep() when processing EXT4_GOING_FLAGS_NOLOGFLUSH was a hack to avoid some races (that are now fixed), but in fact it introduced its own race. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: shutdown should not prevent get_write_accessTheodore Ts'o2018-02-191-7/+0
| | | | | | | | | | | | | | | | | | | | The ext4 forced shutdown flag needs to prevent new handles from being started, but it needs to allow existing handles to complete. So the forced shutdown flag should not force ext4_journal_get_write_access to fail. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: add tracepoints for shutdown and file system errorsTheodore Ts'o2018-02-192-0/+5
|/ | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Merge tag 'iversion-v4.16-2' of ↵Linus Torvalds2018-02-072-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version cleanup from Jeff Layton: "Goffredo went ahead and sent a patch to rename this function, and reverse its sense, as we discussed last week. The patch is very straightforward and I figure it's probably best to go ahead and merge this to get the API as settled as possible" * tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}