summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* .gitignore: prefix local generated files with a slashMasahiro Yamada2021-05-0127-54/+56
| | | | | | | | | | | | The pattern prefixed with '/' matches files in the same directory, but not ones in sub-directories. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Andra Paraschiv <andraprs@amazon.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
* kbuild: replace LANG=C with LC_ALL=CMasahiro Yamada2021-05-017-7/+7
| | | | | | | | | | | | | | | | | | | | | LANG gives a weak default to each LC_* in case it is not explicitly defined. LC_ALL, if set, overrides all other LC_* variables. LANG < LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_NUMERIC, ... < LC_ALL This is why documentation such as [1] suggests to set LC_ALL in build scripts to get the deterministic result. LANG=C is not strong enough to override LC_* that may be set by end users. [1]: https://reproducible-builds.org/docs/locales/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Matthias Maennich <maennich@google.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> (mptcp) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Makefile: Move -Wno-unused-but-set-variable out of GCC only blockNathan Chancellor2021-05-011-4/+4
| | | | | | | | | | | | | | | Currently, -Wunused-but-set-variable is only supported by GCC so it is disabled unconditionally in a GCC only block (it is enabled with W=1). clang currently has its implementation for this warning in review so preemptively move this statement out of the GCC only block and wrap it with cc-disable-warning so that both compilers function the same. Cc: stable@vger.kernel.org Link: https://reviews.llvm.org/D100581 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kbuild: add a script to remove stale generated filesMasahiro Yamada2021-05-013-8/+36
| | | | | | | | | | | | | | | | | | | | | | | | We maintain .gitignore and Makefiles so build artifacts are properly ignored by Git, and cleaned up by 'make clean'. However, the code is always changing; generated files are often moved to another directory, or removed when they become unnecessary. Such garbage files tend to be left over in the source tree because people usually git-pull without cleaning the tree. This is not only the noise for 'git status', but also a build issue in some cases. One solution is to remove a stale file like commit 223c24a7dba9 ("kbuild: Automatically remove stale <linux/version.h> file") did. Such workaround should be removed after a while, but we forget about that if we scatter the workaround code in random places. So, this commit adds a new script to collect cleanings of stale files. As a start point, move the code in arch/arm/boot/compressed/Makefile into this script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kbuild: update config_data.gz only when the content of .config is changedMasahiro Yamada2021-05-012-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the timestamp of the .config file is updated, config_data.gz is regenerated, then vmlinux is re-linked. This occurs even if the content of the .config has not changed at all. This issue was mitigated by commit 67424f61f813 ("kconfig: do not write .config if the content is the same"); Kconfig does not update the .config when it ends up with the identical configuration. The issue is remaining when the .config is created by *_defconfig with some config fragment(s) applied on top. This is typical for powerpc and mips, where several *_defconfig targets are constructed by using merge_config.sh. One workaround is to have the copy of the .config. The filechk rule updates the copy, kernel/config_data, by checking the content instead of the timestamp. With this commit, the second run with the same configuration avoids the needless rebuilds. $ make ARCH=mips defconfig all [ snip ] $ make ARCH=mips defconfig all *** Default configuration is based on target '32r2el_defconfig' Using ./arch/mips/configs/generic_defconfig as base Merging arch/mips/configs/generic/32r2.config Merging arch/mips/configs/generic/el.config Merging ./arch/mips/configs/generic/board-boston.config Merging ./arch/mips/configs/generic/board-ni169445.config Merging ./arch/mips/configs/generic/board-ocelot.config Merging ./arch/mips/configs/generic/board-ranchu.config Merging ./arch/mips/configs/generic/board-sead-3.config Merging ./arch/mips/configs/generic/board-xilfpga.config # # configuration written to .config # SYNC include/config/auto.conf CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh CHK include/generated/compile.h CHK include/generated/autoksyms.h Reported-by: Elliot Berman <eberman@codeaurora.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* .gitignore: ignore only top-level modules.builtinMasahiro Yamada2021-05-011-1/+1
| | | | | | | | | | | | | | | | modules.builtin used to be created in every directory. Since commit 8b41fc4454e3 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), modules.builtin is created only in the top directory. Add the '/' prefix so that it matches to only the modules.builtin located in the top directory. It has been more than one year since that change. I hope this will not flood 'Untracked files' of 'git status'. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* .gitignore: move tags and TAGS close to other tag filesMasahiro Yamada2021-05-011-2/+4
| | | | | | | | | | For consistency, move tags and TAGS close to the cscope and GNU Global patterns. I removed the '/' prefix in case somebody wants to manually create tag files in sub-directories. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kernel/.gitgnore: remove stale timeconst.h and hz.bcMasahiro Yamada2021-05-011-2/+0
| | | | | | | | | | | | timeconst.h and hz.bc used to exist in kernel/. Commit 5cee96459726 ("time/timers: Move all time(r) related files into kernel/time") moved them to kernel/time/. Commit 0a227985d4a9 ("time: Move timeconst.h into include/generated") moved timeconst.h to include/generated/ and removed hz.bc . Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* usr/include: refactor .gitignoreMasahiro Yamada2021-05-011-3/+1
| | | | | | | | | | | The current .gitignore intends to ignore everything under usr/include/ except .gitignore and Makefile. A cleaner solution is to use a pattern suffixed with '/', which matches only directories. It works well here because all the exported headers are located in sub-directories, like <linux/*.h>, <asm/*.h>. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* genksyms: fix stale commentMasahiro Yamada2021-05-011-1/+1
| | | | | | | | | | (shipped source) is a stale comment. Since commit 833e62245943 ("genksyms: generate lexer and parser during build instead of shipping"), there is no source file to be shipped in this directory. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kbuild: add comment about why cmd_shipped uses 'cat'Masahiro Yamada2021-05-011-0/+3
| | | | | | | | | | | | | | | | | | cmd_shipped uses 'cat' instead of 'cp' for copying a file. The reason is explained in the commit [1], but it was in the pre-git era. $ touch a $ chmod -w a $ cp a b $ cp a b cp: cannot create regular file 'b': Permission denied Add comments so that you can see the reason without looking into the history. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=a70dba8086160449cc94c5bdaff78419b6b8e3c8 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* sparc: syscalls: switch to generic syscallshdr.shMasahiro Yamada2021-05-012-43/+4
| | | | | | | | Many architectures duplicate similar shell scripts. This commit converts sparc to use scripts/syscallhdr.sh. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* sparc: syscalls: switch to generic syscalltbl.shMasahiro Yamada2021-05-015-56/+12
| | | | | | | | | Many architectures duplicate similar shell scripts. This commit converts sparc to use scripts/syscalltbl.sh. This also unifies syscall_table_64.h and syscall_table_c32.h. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* sh: syscalls: switch to generic syscallhdr.shMasahiro Yamada2021-05-012-41/+2
| | | | | | | | Many architectures duplicate similar shell scripts. This commit converts sh to use scripts/syscallhdr.sh. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* sh: syscalls: switch to generic syscalltbl.shMasahiro Yamada2021-05-012-37/+2
| | | | | | | | Many architectures duplicate similar shell scripts. This commit converts sh to use scripts/syscalltbl.sh. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2021-05-0126-427/+1144
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features for ext4 this cycle include support for encrypted casefold, ensure that deleted file names are cleared in directory blocks by zeroing directory entries when they are unlinked or moved as part of a hash tree node split. We also improve the block allocator's performance on a freshly mounted file system by prefetching block bitmaps. There are also the usual cleanups and bug fixes, including fixing a page cache invalidation race when there is mixed buffered and direct I/O and the block size is less than page size, and allow the dax flag to be set and cleared on inline directories" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: wipe ext4_dir_entry2 upon file deletion ext4: Fix occasional generic/418 failure fs: fix reporting supported extra file attributes for statx() ext4: allow the dax flag to be set and cleared on inline directories ext4: fix debug format string warning ext4: fix trailing whitespace ext4: fix various seppling typos ext4: fix error return code in ext4_fc_perform_commit() ext4: annotate data race in jbd2_journal_dirty_metadata() ext4: annotate data race in start_this_handle() ext4: fix ext4_error_err save negative errno into superblock ext4: fix error code in ext4_commit_super ext4: always panic when errors=panic is specified ext4: delete redundant uptodate check for buffer ext4: do not set SB_ACTIVE in ext4_orphan_cleanup() ext4: make prefetch_block_bitmaps default ext4: add proc files to monitor new structures ext4: improve cr 0 / cr 1 group scanning ext4: add MB_NUM_ORDERS macro ext4: add mballoc stats proc file ...
| * ext4: wipe ext4_dir_entry2 upon file deletionLeah Rumancik2021-04-221-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len. In case sensitive data is stored in filenames, this ensures no potentially sensitive data is left in the directory entry upon deletion. Also, wipe these fields upon moving a directory entry during the conversion to an htree and when splitting htree nodes. The data wiped may still exist in the journal, but there are future commits planned to address this. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: Fix occasional generic/418 failureJan Kara2021-04-221-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric has noticed that after pagecache read rework, generic/418 is occasionally failing for ext4 when blocksize < pagesize. In fact, the pagecache rework just made hard to hit race in ext4 more likely. The problem is that since ext4 conversion of direct IO writes to iomap framework (commit 378f32bab371), we update inode size after direct IO write only after invalidating page cache. Thus if buffered read sneaks at unfortunate moment like: CPU1 - write at offset 1k CPU2 - read from offset 0 iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT); ext4_readpage(); ext4_handle_inode_extension() the read will zero out tail of the page as it still sees smaller inode size and thus page cache becomes inconsistent with on-disk contents with all the consequences. Fix the problem by moving inode size update into end_io handler which gets called before the page cache is invalidated. Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com> Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * fs: fix reporting supported extra file attributes for statx()Theodore Ts'o2021-04-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | statx(2) notes that any attribute that is not indicated as supported by stx_attributes_mask has no usable value. Commits 801e523796004 ("fs: move generic stat response attr handling to vfs_getattr_nosec") and 712b2698e4c02 ("fs/stat: Define DAX statx attribute") sets STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting stx_attributes_mask, which can cause xfstests generic/532 to fail. Fix this in the same way as commit 1b9598c8fb99 ("xfs: fix reporting supported extra file attributes for statx()") Fixes: 801e523796004 ("fs: move generic stat response attr handling to vfs_getattr_nosec") Fixes: 712b2698e4c02 ("fs/stat: Define DAX statx attribute") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: allow the dax flag to be set and cleared on inline directoriesTheodore Ts'o2021-04-132-1/+8
| | | | | | | | | | | | | | | | | | | | This is needed to allow generic/607 to pass for file systems with the inline data_feature enabled, and it allows the use of file systems where the directories use inline_data, while the files are accessed via DAX. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix debug format string warningArnd Bergmann2021-04-102-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using no_printk() for jbd_debug() revealed two warnings: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=] 256 | jbd_debug(3, "Processing fast commit blk with seq %d"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range': fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=] 1732 | jbd_debug(1, "Converting from %d to %d %lld", The first one was added incorrectly, and was also missing a few newlines in debug output, and the second one happened when the type of an argument changed. Reported-by: kernel test robot <lkp@intel.com> Fixes: d556435156b7 ("jbd2: avoid -Wempty-body warnings") Fixes: 6db074618969 ("ext4: use BIT() macro for BH_** state bits") Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix trailing whitespaceJack Qiu2021-04-104-6/+6
| | | | | | | | | | | | | | | | | | | | Made suggested modifications from checkpatch in reference to ERROR: trailing whitespace Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20210409042035.15516-1-jack.qiu@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix various seppling typosBhaskar Chowdhury2021-04-108-10/+10
| | | | | | | | | | | | Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Link: https://lore.kernel.org/r/cover.1616840203.git.unixbhaskar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix error return code in ext4_fc_perform_commit()Xu Yihang2021-04-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | In case of if not ext4_fc_add_tlv branch, an error return code is missing. Cc: stable@kernel.org Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: annotate data race in jbd2_journal_dirty_metadata()Jan Kara2021-04-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Assertion checks in jbd2_journal_dirty_metadata() are known to be racy but we don't want to be grabbing locks just for them. We thus recheck them under b_state_lock only if it looks like they would fail. Annotate the checks with data_race(). Cc: stable@kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: annotate data race in start_this_handle()Jan Kara2021-04-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | Access to journal->j_running_transaction is not protected by appropriate lock and thus is racy. We are well aware of that and the code handles the race properly. Just add a comment and data_race() annotation. Cc: stable@kernel.org Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix ext4_error_err save negative errno into superblockYe Bin2021-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix As write_mmp_block() so that it returns -EIO instead of 1, so that the correct error gets saved into the superblock. Cc: stable@kernel.org Fixes: 54d3adbc29f0 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix error code in ext4_commit_superFengnan Chang2021-04-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | We should set the error code when ext4_commit_super check argument failed. Found in code review. Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs"). Cc: stable@kernel.org Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: always panic when errors=panic is specifiedYe Bin2021-04-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()"), the following series of commands would trigger a panic: 1. mount /dev/sda -o ro,errors=panic test 2. mount /dev/sda -o remount,abort test After commit 014c9caa29d3, remounting a file system using the test mount option "abort" will no longer trigger a panic. This commit will restore the behaviour immediately before commit 014c9caa29d3. (However, note that the Linux kernel's behavior has not been consistent; some previous kernel versions, including 5.4 and 4.19 similarly did not panic after using the mount option "abort".) This also makes a change to long-standing behaviour; namely, the following series commands will now cause a panic, when previously it did not: 1. mount /dev/sda -o ro,errors=panic test 2. echo test > /sys/fs/ext4/sda/trigger_fs_error However, this makes ext4's behaviour much more consistent, so this is a good thing. Cc: stable@kernel.org Fixes: 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210401081903.3421208-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: delete redundant uptodate check for bufferYang Guo2021-04-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer uptodate state has been checked in function set_buffer_uptodate, there is no need use buffer_uptodate before calling set_buffer_uptodate and delete it. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Yang Guo <guoyang2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/1617260610-29770-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()Zhang Yi2021-04-091-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due to some error happens behind ext4_orphan_cleanup(), it will end up triggering a after free issue of super_block. The problem is that ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is enabled, after we cleanup the truncated inodes, the last iput() will put them into the lru list, and these inodes' pages may probably dirty and will be write back by the writeback thread, so it could be raced by freeing super_block in the error path of mount_bdev(). After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it was used to ensure updating the quota file properly, but evict inode and trash data immediately in the last iput does not affect the quotafile, so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by just remove the SB_ACTIVE setting. [1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00 Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: make prefetch_block_bitmaps defaultHarshad Shirwadkar2021-04-092-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | Block bitmap prefetching is needed for these allocator optimization data structures to get populated and provide better group scanning order. So, turn it on bu default. prefetch_block_bitmaps mount option is now marked as removed and a new option no_prefetch_block_bitmaps is added to disable block bitmap prefetching. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-8-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add proc files to monitor new structuresHarshad Shirwadkar2021-04-093-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new file "mb_structs_summary" which allows us to see the summary of the new allocator structures added in this series. Here's the sample output of file: optimize_scan: 1 max_free_order_lists: list_order_0_groups: 0 list_order_1_groups: 0 list_order_2_groups: 0 list_order_3_groups: 0 list_order_4_groups: 0 list_order_5_groups: 0 list_order_6_groups: 0 list_order_7_groups: 0 list_order_8_groups: 0 list_order_9_groups: 0 list_order_10_groups: 0 list_order_11_groups: 0 list_order_12_groups: 0 list_order_13_groups: 40 fragment_size_tree: tree_min: 16384 tree_max: 32768 tree_nodes: 40 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: improve cr 0 / cr 1 group scanningHarshad Shirwadkar2021-04-095-15/+452
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free order >= the order of the request. So, with this patch, we maintain lists for each possible order and insert each group into a list based on the largest free order in its buddy bitmap. During cr 0 allocation, we traverse these lists in the increasing order of largest free orders. This allows us to find a group with the best available cr 0 match in constant time. If nothing can be found, we fallback to cr 1 immediately. At CR1, the story is slightly different. We want to traverse in the order of increasing average fragment size. For CR1, we maintain a rb tree of groupinfos which is sorted by average fragment size. Instead of traversing linearly, at CR1, we traverse in the order of increasing average fragment size, starting at the most optimal group. This brings down cr 1 search complexity to log(num groups). For cr >= 2, we just perform the linear search as before. Also, in case of lock contention, we intermittently fallback to linear search even in CR 0 and CR 1 cases. This allows us to proceed during the allocation path even in case of high contention. There is an opportunity to do optimization at CR2 too. That's because at CR2 we only consider groups where bb_free counter (number of free blocks) is greater than the request extent size. That's left as future work. All the changes introduced in this patch are protected under a new mount option "mb_optimize_scan". With this patchset, following experiment was performed: Created a highly fragmented disk of size 65TB. The disk had no contiguous 2M regions. Following command was run consecutively for 3 times: time dd if=/dev/urandom of=file bs=2M count=10 Here are the results with and without cr 0/1 optimizations introduced in this patch: |---------+------------------------------+---------------------------| | | Without CR 0/1 Optimizations | With CR 0/1 Optimizations | |---------+------------------------------+---------------------------| | 1st run | 5m1.871s | 2m47.642s | | 2nd run | 2m28.390s | 0m0.611s | | 3rd run | 2m26.530s | 0m1.255s | |---------+------------------------------+---------------------------| Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add MB_NUM_ORDERS macroHarshad Shirwadkar2021-04-092-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | A few arrays in mballoc.c use the total number of valid orders as their size. Currently, this value is set as "sb->s_blocksize_bits + 2". This makes code harder to read. So, instead add a new macro MB_NUM_ORDERS(sb) to make the code more readable. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add mballoc stats proc fileHarshad Shirwadkar2021-04-093-2/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here: https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch This patch reorganizes the stats by cr level. This is how the output looks like: mballoc: reqs: 0 success: 0 groups_scanned: 0 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 0 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 0 goal_hits: 0 2^n_hits: 0 breaks: 0 lost: 0 buddies_generated: 0/40 buddies_time_used: 0 preallocated: 0 discarded: 0 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add ability to return parsed options from parse_optionsHarshad Shirwadkar2021-04-091-21/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, the function parse_options() was returning journal_devnum and journal_ioprio variables to the caller. This patch generalizes that interface to allow parse_options to return any parsed options to return back to the caller. In this patch series, it gets used to capture the value of "mb_optimize_scan=%u" mount option. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: drop s_mb_bal_lock and convert protected fields to atomicHarshad Shirwadkar2021-04-092-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s_mb_buddies_generated gets used later in this patch series to determine if the cr 0 and cr 1 optimziations should be performed or not. Currently, s_mb_buddies_generated is protected under a spin_lock. In the allocation path, it is better if we don't depend on the lock and instead read the value atomically. In order to do that, we drop s_bal_lock altogether and we convert the only two protected fields by it s_mb_buddies_generated and s_mb_generation_time to atomic type. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix check to prevent false positive report of incorrect used inodesZhang Yi2021-04-091-16/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit <50122847007> ("ext4: fix check to prevent initializing reserved inodes") check the block group zero and prevent initializing reserved inodes. But in some special cases, the reserved inode may not all belong to the group zero, it may exist into the second group if we format filesystem below. mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda So, it will end up triggering a false positive report of a corrupted file system. This patch fix it by avoid check reserved inodes if no free inode blocks will be zeroed. Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * jbd2: avoid -Wempty-body warningsArnd Bergmann2021-04-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building with 'make W=1' shows a harmless -Wempty-body warning: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:267:75: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 267 | jbd_debug(3, "Fast commit replay failed, err = %d\n", err); | ^ Change the empty dprintk() macros to no_printk(), which avoids this warning and adds format string checking. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210322102152.95684-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: optimize match for casefolded encrypted dirsDaniel Rosenberg2021-04-062-34/+38
| | | | | | | | | | | | | | | | | | | | Matching names with casefolded encrypting directories requires decrypting entries to confirm case since we are case preserving. We can avoid needing to decrypt if our hash values don't match. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210319073414.1381041-3-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: handle casefolding with encryptionDaniel Rosenberg2021-04-068-89/+287
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for encryption with casefolding. Since the name on disk is case preserving, and also encrypted, we can no longer just recompute the hash on the fly. Additionally, to avoid leaking extra information from the hash of the unencrypted name, we use siphash via an fscrypt v2 policy. The hash is stored at the end of the directory entry for all entries inside of an encrypted and casefolded directory apart from those that deal with '.' and '..'. This way, the change is backwards compatible with existing ext4 filesystems. [ Changed to advertise this feature via the file: /sys/fs/ext4/features/encrypted_casefold -- TYT ] Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210319073414.1381041-2-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: remove unnecessary braces in fs/ext4/dir.cMilan Djurovic2021-04-021-2/+2
| | | | | | | | | | | | | | | | Removes braces to follow the coding style. Signed-off-by: Milan Djurovic <mdjurovic@zohomail.com> Link: https://lore.kernel.org/r/20210316052953.67616-1-mdjurovic@zohomail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: delete some unused tracepoint definitionsEric Whitney2021-04-021-176/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of tracepoint instances have been removed from ext4 by past patches but the definitions of those tracepoints have not. All instances of ext4_ext_in_cache and ext4_ext_put_in_cache were removed by commit 69eb33dc24dc ("ext4: remove single extent cache"). ext4_get_reserved_cluster_alloc was removed by commit b6bf9171ef5c ("ext4: reduce reserved cluster count by number of allocated clusters"). ext4_find_delalloc_range was removed by commit 7d1b1fbc95eb ("ext4: reimplement ext4_find_delay_alloc_range on extent status tree"). All instances of ext4_direct_IO_enter and ext4_direct_IO_exit were removed by commit 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure"). Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20210216191634.20957-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * Updated locking documentation for transaction_tAlexander Lochmann2021-04-021-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Some members of transaction_t are allowed to be read without any lock being held if accessed from the correct context. We used LockDoc's findings to determine those members. Each member of them is marked with a short comment: "no lock needed for jbd2 thread". Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210211171410.17984-1-alexander.lochmann@tu-dortmund.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: updated locking documentation for journal_tAlexander Lochmann2021-04-021-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some members of transaction_t are allowed to be read without any lock being held if consistency doesn't matter. Based on LockDoc's findings, we extended the locking documentation of those members. Each one of them is marked with a short comment: "no lock for quick racy checks". Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ad82c7a9-a624-4ed5-5ada-a6410c44c0b3@tu-dortmund.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: use memcpy_to_page() in pagecache_write()Chaitanya Kulkarni2021-03-251-4/+1
| | | | | | | | | | | | Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-7-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: use memcpy_from_page() in pagecache_read()Chaitanya Kulkarni2021-03-251-4/+1
| | | | | | | | | | | | Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-6-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge tag 'dlm-5.13' of ↵Linus Torvalds2021-05-019-142/+202
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This includes more dlm networking cleanups and improvements for making dlm shutdowns more robust" * tag 'dlm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: fix missing unlock on error in accept_from_sock() fs: dlm: add shutdown hook fs: dlm: flush swork on shutdown fs: dlm: remove unaligned memory access handling fs: dlm: check on minimum msglen size fs: dlm: simplify writequeue handling fs: dlm: use GFP_ZERO for page buffer fs: dlm: change allocation limits fs: dlm: add check if dlm is currently running fs: dlm: add errno handling to check callback fs: dlm: set subclass for othercon sock_mutex fs: dlm: set connected bit after accept fs: dlm: fix mark setting deadlock fs: dlm: fix debugfs dump
| * | fs: dlm: fix missing unlock on error in accept_from_sock()Yang Yingliang2021-03-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing unlock before return from accept_from_sock() in the error handling case. Fixes: 6cde210a9758 ("fs: dlm: add helper for init connection") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com>