summaryrefslogtreecommitdiffstats
path: root/fs/ext4/fast_commit.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ext4: use handle to mark fc as ineligible in __track_dentry_update()Luis Henriques (SUSE)2024-10-041-8/+11
| | | | | | | | | | | | | | | Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch fixes the calls to this function in __track_dentry_update() by adding an extra parameter to the callback used in ext4_fc_track_template(). Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: make some fast commit functions reuse extents pathBaokun Li2024-09-041-4/+7
| | | | | | | | | | | | | | | | | | | | | The ext4_find_extent() can update the extent path so that it does not have to allocate and free the path repeatedly, thus reducing the consumption of memory allocation and freeing in the following functions: ext4_ext_clear_bb ext4_ext_replay_set_iblocks ext4_fc_replay_add_range ext4_fc_set_bitmaps_and_counters No functional changes. Note that ext4_find_extent() does not support error pointers, so in this case set path to NULL first. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-25-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: get rid of ppath in ext4_ext_insert_extent()Baokun Li2024-09-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_ext_insert_extent(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. * Free path when npath is used, free npath when it is not used. * The got_allocated_blocks label in ext4_ext_map_blocks() does not update err now, so err is updated to 0 if the err returned by ext4_ext_search_right() is greater than 0 and is about to enter got_allocated_blocks. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-15-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix incorrect tid assumption in ext4_fc_mark_ineligible()Luis Henriques (SUSE)2024-08-271-4/+11
| | | | | | | | | | | | | | | | Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Furthermore, the sbi->s_fc_ineligible_tid handling also makes the same assumption by being initialised to '0'. Fortunately, the sb flag EXT4_MF_FC_INELIGIBLE can be used to check whether sbi->s_fc_ineligible_tid has been previously set instead of comparing it with '0'. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-5-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: fix fast commit inode enqueueing during a full journal commitLuis Henriques (SUSE)2024-08-271-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: don't track ranges in fast_commit if inode has inlined dataLuis Henriques (SUSE)2024-07-091-0/+6
| | | | | | | | | | | | | | | | | | When fast-commit needs to track ranges, it has to handle inodes that have inlined data in a different way because ext4_fc_write_inode_data(), in the actual commit path, will attempt to map the required blocks for the range. However, inodes that have inlined data will have it's data stored in inode->i_block and, eventually, in the extended attribute space. Unfortunately, because fast commit doesn't currently support extended attributes, the solution is to mark this commit as ineligible. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883 Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Tested-by: Ben Hutchings <benh@debian.org> Fixes: 9725958bb75c ("ext4: fast commit may miss tracking unwritten range during ftruncate") Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix possible tid_t sequence overflowsLuis Henriques (SUSE)2024-07-091-4/+4
| | | | | | | | | | | | | In the fast commit code there are a few places where tid_t variables are being compared without taking into account the fact that these sequence numbers may wrap. Fix this issue by using the helper functions tid_gt() and tid_geq(). Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20240529092030.9557-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: make state in ext4_mb_mark_bb to be boolKemeng Shi2023-10-061-4/+4
| | | | | | | | | As state could only be either 0 or 1, just make it bool. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230928160407.142069-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use ext4_fc_tl_mem in fast-commit replay pathEric Biggers2023-02-091-18/+26
| | | | | | | | | | | | | To avoid 'sparse' warnings about missing endianness conversions, don't store native endianness values into struct ext4_fc_tl. Instead, use a separate struct type, ext4_fc_tl_mem. Fixes: dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()") Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221217050212.150665-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeoutJan Kara2022-12-091-1/+1
| | | | | | | | | | | | jbd2_submit_inode_data() hardcoded use of jbd2_journal_submit_inode_data_buffers() for submission of data pages. Make it use j_submit_inode_data_buffers hook instead. This effectively switches ext4 fastcommits to use ext4_writepages() for data writeout instead of generic_writepages(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: simplify fast-commit CRC calculationEric Biggers2022-12-091-38/+16
| | | | | | | | | | Instead of checksumming each field as it is added to the block, just checksum each block before it is written. This is simpler, and also much more efficient. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-8-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix off-by-one errors in fast-commit block fillingEric Biggers2022-12-091-33/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to several different off-by-one errors, or perhaps due to a late change in design that wasn't fully reflected in the code that was actually merged, there are several very strange constraints on how fast-commit blocks are filled with tlv entries: - tlvs must start at least 10 bytes before the end of the block, even though the minimum tlv length is 8. Otherwise, the replay code will ignore them. (BUG: ext4_fc_reserve_space() could violate this requirement if called with a len of blocksize - 9 or blocksize - 8. Fortunately, this doesn't seem to happen currently.) - tlvs must end at least 1 byte before the end of the block. Otherwise the replay code will consider them to be invalid. This quirk contributed to a bug (fixed by an earlier commit) where uninitialized memory was being leaked to disk in the last byte of blocks. Also, strangely these constraints don't apply to the replay code in e2fsprogs, which will accept any tlvs in the blocks (with no bounds checks at all, but that is a separate issue...). Given that this all seems to be a bug, let's fix it by just filling blocks with tlv entries in the natural way. Note that old kernels will be unable to replay fast-commit journals created by kernels that have this commit. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-7-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix unaligned memory access in ext4_fc_reserve_space()Eric Biggers2022-12-091-18/+21
| | | | | | | | | | | | As is done elsewhere in the file, build the struct ext4_fc_tl on the stack and memcpy() it into the buffer, rather than directly writing it to a potentially-unaligned location in the buffer. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-6-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add missing validation of fast-commit record lengthsEric Biggers2022-12-091-19/+19
| | | | | | | | | | | | Validate the inode and filename lengths in fast-commit journal records so that a malicious fast-commit journal cannot cause a crash by having invalid values for these. Also validate EXT4_FC_TAG_DEL_RANGE. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix leaking uninitialized memory in fast-commit journalEric Biggers2022-12-091-0/+5
| | | | | | | | | | | When space at the end of fast-commit journal blocks is unused, make sure to zero it out so that uninitialized memory is not leaked to disk. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: don't set up encryption key during jbd2 transactionEric Biggers2022-12-091-1/+1
| | | | | | | | | | | | | | | | | Commit a80f7fcf1867 ("ext4: fixup ext4_fc_track_* functions' signature") extended the scope of the transaction in ext4_unlink() too far, making it include the call to ext4_find_entry(). However, ext4_find_entry() can deadlock when called from within a transaction because it may need to set up the directory's encryption key. Fix this by restoring the transaction to its original scope. Reported-by: syzbot+1a748d0007eeac3ab079@syzkaller.appspotmail.com Fixes: a80f7fcf1867 ("ext4: fixup ext4_fc_track_* functions' signature") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: disable fast-commit of encrypted dir operationsEric Biggers2022-12-091-16/+25
| | | | | | | | | | | | | | | | | | | | fast-commit of create, link, and unlink operations in encrypted directories is completely broken because the unencrypted filenames are being written to the fast-commit journal instead of the encrypted filenames. These operations can't be replayed, as encryption keys aren't present at journal replay time. It is also an information leak. Until if/when we can get this working properly, make encrypted directory operations ineligible for fast-commit. Note that fast-commit operations on encrypted regular files continue to be allowed, as they seem to work. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221106224841.279231-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix fortify warning in fs/ext4/fast_commit.c:1551Theodore Ts'o2022-11-061-2/+3
| | | | | | | | | | | With the new fortify string system, rework the memcpy to avoid this warning: memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4) Cc: stable@kernel.org Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix potential out of bound read in ext4_fc_replay_scan()Ye Bin2022-10-011-2/+36
| | | | | | | | | | | | | | For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read when mounting corrupt file system image. ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this three tags will read data during scan, tag length couldn't less than data length which will read. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: factor out ext4_fc_get_tl()Ye Bin2022-10-011-21/+25
| | | | | | | | Factor out ext4_fc_get_tl() to fill 'tl' with host byte order. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220924075233.2315259-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: introduce EXT4_FC_TAG_BASE_LEN helperYe Bin2022-10-011-26/+28
| | | | | | | | | Introduce EXT4_FC_TAG_BASE_LEN helper for calculate length of struct ext4_fc_tl. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220924075233.2315259-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: factor out ext4_free_ext_path()Ye Bin2022-10-011-4/+2
| | | | | | | | | | Factor out ext4_free_ext_path() to free extent path. As after previous patch 'ext4_ext_drop_refs()' is only used in 'extents.c', so make it static. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220924021211.3831551-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: update 'state->fc_regions_size' after successful memory allocationYe Bin2022-10-011-4/+5
| | | | | | | | | | | | To avoid to 'state->fc_regions_size' mismatch with 'state->fc_regions' when fail to reallocate 'fc_reqions',only update 'state->fc_regions_size' after 'state->fc_regions' is allocated successfully. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220921064040.3693255-4-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix potential memory leak in ext4_fc_record_regions()Ye Bin2022-10-011-6/+8
| | | | | | | | | | | | As krealloc may return NULL, in this case 'state->fc_regions' may not be freed by krealloc, but 'state->fc_regions' already set NULL. Then will lead to 'state->fc_regions' memory leak. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220921064040.3693255-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix potential memory leak in ext4_fc_record_modified_inode()Ye Bin2022-10-011-3/+5
| | | | | | | | | | | | As krealloc may return NULL, in this case 'state->fc_modified_inodes' may not be freed by krealloc, but 'state->fc_modified_inodes' already set NULL. Then will lead to 'state->fc_modified_inodes' memory leak. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220921064040.3693255-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: adjust fast commit disable judgement order in ext4_fc_track_inodeYe Bin2022-10-011-3/+3
| | | | | | | | | | | If fastcommit is already disabled, there isn't need to mark inode ineligible. So move 'ext4_fc_disabled()' judgement bofore 'ext4_should_journal_data(inode)' judgement which can avoid to do meaningless judgement. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220916083836.388347-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: factor out ext4_fc_disabled()Ye Bin2022-10-011-23/+15
| | | | | | | | | Factor out ext4_fc_disabled(). No functional change. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220916083836.388347-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix miss release buffer head in ext4_fc_write_inodeYe Bin2022-10-011-6/+9
| | | | | | | | | | | | In 'ext4_fc_write_inode' function first call 'ext4_get_inode_loc' get 'iloc', after use it miss release 'iloc.bh'. So just release 'iloc.bh' before 'ext4_fc_write_inode' return. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100859.1415196-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2022-08-051-22/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add new ioctls to set and get the file system UUID in the ext4 superblock and improved the performance of the online resizing of file systems with bigalloc enabled. Fixed a lot of bugs, in particular for the inline data feature, potential races when creating and deleting inodes with shared extended attribute blocks, and the handling of directory blocks which are corrupted" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) ext4: add ioctls to get/set the ext4 superblock uuid ext4: avoid resizing to a partial cluster size ext4: reduce computation of overhead during resize jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted ext4: block range must be validated before use in ext4_mb_clear_bb() mbcache: automatically delete entries from cache on freeing mbcache: Remove mb_cache_entry_delete() ext2: avoid deleting xattr block that is being reused ext2: unindent codeblock in ext2_xattr_set() ext2: factor our freeing of xattr block reference ext4: fix race when reusing xattr blocks ext4: unindent codeblock in ext4_xattr_block_set() ext4: remove EA inode entry from mbcache on inode eviction mbcache: add functions to delete entry if unused mbcache: don't reclaim used entries ext4: make sure ext4_append() always allocates new block ext4: check if directory block is within i_size ext4: reflect mb_optimize_scan value in options file ext4: avoid remove directory when directory is corrupted ext4: aligned '*' in comments ...
| * ext4: use ext4_debug() instead of jbd_debug()Jan Kara2022-08-031-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We use jbd_debug() in some places in ext4. It seems a bit strange to use jbd2 debugging output function for ext4 code. Also these days ext4_debug() uses dynamic printk so each debug message can be enabled / disabled on its own so the time when it made some sense to have these combined (to allow easier common selecting of messages to report) has passed. Just convert all jbd_debug() uses in ext4 to ext4_debug(). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | fs/ext4: Use the new blk_opf_t typeBart Van Assche2022-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Improve static type checking by using the new blk_opf_t type for variables that represent request flags. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Baokun Li <libaokun1@huawei.com> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-52-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | fs/buffer: Combine two submit_bh() and ll_rw_block() argumentsBart Van Assche2022-07-141-1/+1
|/ | | | | | | | | | | | | | Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ext4: remove unnecessary conditionalsLv Ruyi2022-05-131-2/+1
| | | | | | | | | | iput() has already handled null and non-null parameter, so it is no need to use if(). Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Link: https://lore.kernel.org/r/20220411032337.2517465-1-lv.ruyi@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove unnecessary type castingsYu Zhe2022-05-111-5/+5
| | | | | | | | remove unnecessary void* type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Link: https://lore.kernel.org/r/20220401081321.73735-1-yuzhe@nfschina.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add commit tid info in ext4_fc_commit_start/stop trace eventsRitesh Harjani2022-03-151-2/+2
| | | | | | | | | | | | | | | | | | This adds commit_tid info in ext4_fc_commit_start/stop which is helpful in debugging fast_commit issues. For e.g. issues where due to jbd2 journal full commit, FC miss to commit updates to a file. Also improves TP_prink format string i.e. all ext4 and jbd2 trace events starts with "dev MAjOR,MINOR". Let's follow the same convention while we are still at it. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/ebcd6b9ab5b718db30f90854497886801ce38c63.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add commit_tid info in jbd debug logRitesh Harjani2022-03-151-6/+9
| | | | | | | | | | | | | This adds commit_tid argument in ext4_fc_update_stats() so that we can add this information too in jbd_debug logs. This is also required in a later patch to pass the commit_tid info in ext4_fc_commit_start/stop() trace events. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/dabda3f2919a60e01887e798bf5915216b451733.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add transaction tid info in fc_track eventsRitesh Harjani2022-03-151-5/+5
| | | | | | | | | | | | This patch adds the transaction & inode tid info in trace events for callers of ext4_fc_track_template(). This is helpful in debugging race conditions where an inode could belong to two different transaction tids. It also fixes the checkpatch warnings which says use tabs instead of spaces. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c203c09dc11bb372803c430f621f25a4b8c2c8b4.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add new trace event in ext4_fc_cleanupRitesh Harjani2022-03-151-0/+1
| | | | | | | | | This adds a new trace event in ext4_fc_cleanup() which is helpful in debugging some fast_commit issues. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/794cdb1d5d3622f3f80d30c222ff6652ea68c375.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: return early for non-eligible fast_commit track eventsRitesh Harjani2022-03-151-10/+49
| | | | | | | | | | | | | | Currently ext4_fc_track_template() checks, whether the trace event path belongs to replay or does sb has ineligible set, if yes it simply returns. This patch pulls those checks before calling ext4_fc_track_template() in the callers of ext4_fc_track_template(). [ Add checks to ext4_rename() which calls the __ext4_fc_track_*() functions directly. -- TYT ] Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/3cd025d9c490218a92e6d8fb30b6123e693373e3.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FCRitesh Harjani2022-03-131-3/+3
| | | | | | | | | | | | This just puts trace_ext4_fc_commit_start(sb) & ktime_get() for measuring FC commit time, after the check of whether sb supports JOURNAL_FAST_COMMIT or not. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d53cf3e535924ec0a1eb41a560e96561b0727e7a.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: improve fast_commit performance and scalabilityRitesh Harjani2022-03-031-18/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ext4_fc_commit_dentry_updates() is of quadratic time complexity, which is causing performance bottlenecks with high threads/file/dir count with fs_mark. This patch makes commit dentry updates (and hence ext4_fc_commit()) path to linear time complexity. Hence improves the performance of workloads which does fsync on multiple threads/open files one-by-one. Absolute numbers in avg file creates per sec (from fs_mark in 1K order) ======================================================================= no. Order without-patch(K) with-patch(K) Diff(%) 1 1 16.90 17.51 +3.60 2 2,2 32.08 31.80 -0.87 3 3,3 53.97 55.01 +1.92 4 4,4 78.94 76.90 -2.58 5 5,5 95.82 95.37 -0.46 6 6,6 87.92 103.38 +17.58 7 6,10 0.73 126.13 +17178.08 8 6,14 2.33 143.19 +6045.49 workload type ============== For e.g. 7th row order of 6,10 (2^6 == 64 && 2^10 == 1024) echo /run/riteshh/mnt/{1..64} |sed -E 's/[[:space:]]+/ -d /g' \ | xargs -I {} bash -c "sudo fs_mark -L 100 -D 1024 -n 1024 -s0 -S5 -d {}" Perf profile (w/o patches) ============================= 87.15% [kernel] [k] ext4_fc_commit --> Heavy contention/bottleneck 1.98% [kernel] [k] perf_event_interrupt 0.96% [kernel] [k] power_pmu_enable 0.91% [kernel] [k] update_sd_lb_stats.constprop.0 0.67% [kernel] [k] ktime_get Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/930f35d4fd5f83e2673c868781d9ebf15e91bf4e.1645426817.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use in_range() for range checking in ext4_fc_replay_check_excludedRitesh Harjani2022-02-261-2/+2
| | | | | | | | | Instead of open coding it, use in_range() function instead. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/8e5526ef14150778871ac7c937c8993c6a09cd3e.1644992610.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix incorrect type issue during replay_del_rangeXin Yin2022-02-031-2/+3
| | | | | | | | | | | | should not use fast commit log data directly, add le32_to_cpu(). Reported-by: kernel test robot <lkp@intel.com> Fixes: 0b5b5a62b945 ("ext4: use ext4_ext_remove_space() for fast commit replay delete range") Cc: stable@kernel.org Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20220126063146.2302-1-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix error handling in ext4_fc_record_modified_inode()Ritesh Harjani2022-02-031-35/+29
| | | | | | | | | | | | | | | | | Current code does not fully takes care of krealloc() error case, which could lead to silent memory corruption or a kernel bug. This patch fixes that. Also it cleans up some duplicated error handling logic from various functions in fast_commit.c file. Reported-by: luo penghao <luo.penghao@zte.com.cn> Suggested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/62e8b6a1cce9359682051deb736a3c0953c9d1e9.1642416995.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: fast commit may miss file actionsXin Yin2022-02-031-5/+6
| | | | | | | | | | | | | | | | | | | | | | | in the follow scenario: 1. jbd start transaction n 2. task A get new handle for transaction n+1 3. task A do some actions and add inode to FC_Q_MAIN fc_q 4. jbd complete transaction n and clear FC_Q_MAIN fc_q 5. task A call fsync Fast commit will lost the file actions during a full commit. we should also add updates to staging queue during a full commit. and in ext4_fc_cleanup(), when reset a inode's fc track range, check it's i_sync_tid, if it bigger than current transaction tid, do not rest it, or we will lost the track range. And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-3-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: fast commit may not fallback for ineligible commitXin Yin2022-02-031-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | For the follow scenario: 1. jbd start commit transaction n 2. task A get new handle for transaction n+1 3. task A do some ineligible actions and mark FC_INELIGIBLE 4. jbd complete transaction n and clean FC_INELIGIBLE 5. task A call fsync In this case fast commit will not fallback to full commit and transaction n+1 also not handled by jbd. Make ext4_fc_mark_ineligible() also record transaction tid for latest ineligible case, when call ext4_fc_cleanup() check current transaction tid, if small than latest ineligible tid do not clear the EXT4_MF_FC_INELIGIBLE. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: prevent used blocks from being allocated during fast commit replayXin Yin2022-02-031-5/+15
| | | | | | | | | | | | | | | During fast commit replay procedure, we clear inode blocks bitmap in ext4_ext_clear_bb(), this may cause ext4_mb_new_blocks_simple() allocate blocks still in use. Make ext4_fc_record_regions() also record physical disk regions used by inodes during replay procedure. Then ext4_mb_new_blocks_simple() can excludes these blocks in use. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220110035141.1980-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: remove redundant statementluo penghao2022-01-101-1/+0
| | | | | | | | | | | | | | | The local variable assignment at the end of the function is meaningless. The clang_analyzer complains as follows: fs/ext4/fast_commit.c:779:2 warning: Value stored to 'dst' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211104063406.2747-1-luo.penghao@zte.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: destroy ext4_fc_dentry_cachep kmemcache on module removalSebastian Andrzej Siewior2022-01-101-0/+5
| | | | | | | | | | | | | | | | | The kmemcache for ext4_fc_dentry_cachep remains registered after module removal. Destroy ext4_fc_dentry_cachep kmemcache on module removal. Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: use ext4_ext_remove_space() for fast commit replay delete rangeXin Yin2022-01-101-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | For now ,we use ext4_punch_hole() during fast commit replay delete range procedure. But it will be affected by inode->i_size, which may not correct during fast commit replay procedure. The following test will failed. -create & write foo (len 1000K) -falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K) -create & fsync bar -falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K) -fsync foo -crash before a full commit After the fast_commit reply procedure, the range 400K-500K will not be removed. Because in this case, when calling ext4_punch_hole() the inode->i_size is 0, and it just retruns with doing nothing. Change to use ext4_ext_remove_space() instead of ext4_punch_hole() to remove blocks of inode directly. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223032337.5198-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org