summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds2022-10-135-9/+54
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "A quiet round this time: several assorted filesystem fixes, the most noteworthy one being some additional wakeups in cap handling code, and a messenger cleanup" * tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client: ceph: remove Sage's git tree from documentation ceph: fix incorrectly showing the .snap size for stat ceph: fail the open_by_handle_at() if the dentry is being unlinked ceph: increment i_version when doing a setattr with caps ceph: Use kcalloc for allocating multiple elements ceph: no need to wait for transition RDCACHE|RD -> RD ceph: fail the request if the peer MDS doesn't support getvxattr op ceph: wake up the waiters if any new caps comes libceph: drop last_piece flag from ceph_msg_data_cursor
| * ceph: fix incorrectly showing the .snap size for statXiubo Li2022-10-041-4/+23
| | | | | | | | | | | | | | | | | | We should set the 'stat->size' to the real number of snapshots for snapdirs. Link: https://tracker.ceph.com/issues/57342 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fail the open_by_handle_at() if the dentry is being unlinkedXiubo Li2022-10-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unlinking a file the kclient will send a unlink request to MDS by holding the dentry reference, and then the MDS will return 2 replies, which are unsafe reply and a deferred safe reply. After the unsafe reply received the kernel will return and succeed the unlink request to user space apps. Only when the safe reply received the dentry's reference will be released. Or the dentry will only be unhashed from dcache. But when the open_by_handle_at() begins to open the unlinked files it will succeed. The inode->i_count couldn't be used to check whether the inode is opened or not. Link: https://tracker.ceph.com/issues/56524 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luís Henriques <lhenriques@suse.de> Tested-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: increment i_version when doing a setattr with capsJeff Layton2022-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When the client has enough caps to satisfy a setattr locally without having to talk to the server, we currently do the setattr without incrementing the change attribute. Ensure that if the ctime changes locally, then the change attribute does too. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: Use kcalloc for allocating multiple elementsKenneth Lee2022-10-041-1/+1
| | | | | | | | | | | | | | | | | | Prefer using kcalloc(a, b) over kzalloc(a * b) as this improves semantics since kcalloc is intended for allocating an array of memory. Signed-off-by: Kenneth Lee <klee33@uw.edu> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: no need to wait for transition RDCACHE|RD -> RDXiubo Li2022-10-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | For write when trying to get the Fwb caps we need to keep waiting on transition from WRBUFFER|WR -> WR to avoid a new WR sync write from going before a prior buffered writeback happens. While for read there is no need to wait on transition from RDCACHE|RD -> RD, and we can just exclude the revoking caps and force to sync read. Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fail the request if the peer MDS doesn't support getvxattr opXiubo Li2022-10-043-1/+17
| | | | | | | | | | | | | | | | | | Just fail the request instead sending the request out, or the peer MDS will crash. Link: https://tracker.ceph.com/issues/56529 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: wake up the waiters if any new caps comesXiubo Li2022-10-041-0/+4
| | | | | | | | | | | | | | | | | | When new caps comes we need to wake up the waiters and also when revoking the caps, there also could be new caps comes. Link: https://tracker.ceph.com/issues/54044 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2022-10-1317-36/+194
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "New Features: - Add NFSv4.2 xattr tracepoints - Replace xprtiod WQ in rpcrdma - Flexfiles cancels I/O on layout recall or revoke Bugfixes and Cleanups: - Directly use ida_alloc() / ida_free() - Don't open-code max_t() - Prefer using strscpy over strlcpy - Remove unused forward declarations - Always return layout states on flexfiles layout return - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of error - Allow more xprtrdma memory allocations to fail without triggering a reclaim - Various other xprtrdma clean ups - Fix rpc_killall_tasks() races" * tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked SUNRPC: Add API to force the client to disconnect SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls SUNRPC: Fix races with rpc_killall_tasks() xprtrdma: Fix uninitialized variable xprtrdma: Prevent memory allocations from driving a reclaim xprtrdma: Memory allocation should be allowed to fail during connect xprtrdma: MR-related memory allocation should be allowed to fail xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc() xprtrdma: Clean up synopsis of rpcrdma_req_create() svcrdma: Clean up RPCRDMA_DEF_GFP SUNRPC: Replace the use of the xprtiod WQ in rpcrdma NFSv4.2: Add a tracepoint for listxattr NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2 NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration NFSv4: remove nfs4_renewd_prepare_shutdown() declaration fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment NFSv4/pNFS: Always return layout stats on layout return for flexfiles ...
| * | NFSv4/flexfiles: Cancel I/O if the layout is recalled or revokedTrond Myklebust2022-10-063-5/+97
| | | | | | | | | | | | | | | | | | | | | | | | If the layout is recalled or revoked, we want to cancel I/O as quickly as possible so that we can return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.2: Add a tracepoint for listxattrAnna Schumaker2022-10-052-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This can be defined as simply an NFS4_INODE_EVENT() since we don't have the name of a specific xattr to list. This roughly matches readdir, which also uses an NFS4_INODE_EVENT() tracepoint. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattrAnna Schumaker2022-10-052-0/+49
| | | | | | | | | | | | | | | | | | | | | These functions take similar arguments, and can share a tracepoint class for common formatting. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2Anna Schumaker2022-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | NFS4_CONTENT_DATA and NFS4_CONTENT_HOLE both only exist under NFS v4.2. Move their corresponding TRACE_DEFINE_ENUM calls under this Kconfig option. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTRAnna Schumaker2022-10-051-0/+8
| | | | | | | | | | | | | | | | | | | | | We can translate this into an empty response list instead of passing an error up to userspace. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declarationGaosheng Cui2022-10-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_write_prepare() has been removed since commit a4cdda59111f ("NFS: Create a common pgio_rpc_prepare function"), so remove it. nfs_wait_atomic_killable() has been removed since commit 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: remove nfs4_renewd_prepare_shutdown() declarationGaosheng Cui2022-10-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | nfs4_renewd_prepare_shutdown() has been removed since commit 3050141bae57 ("NFSv4: Kill nfs4_renewd_prepare_shutdown()"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in commentJiangshan Yi2022-10-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix spelling typo and syntax error in comment. Suggested-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4/pNFS: Always return layout stats on layout return for flexfilesTrond Myklebust2022-10-031-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to ensure that the server never misses the layout stats when we're closing the file, so that it knows whether or not to update its internal state. Otherwise, if we were racing with a layout stat, we might cause the server to invalidate its layout before the layout stat got processed. Fixes: 06946c6a3d8b ("pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: move from strlcpy with unused retval to strscpyWolfram Sang2022-10-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: clean up a needless assignment in nfs_file_write()Lukas Bulwahn2022-10-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 064109db53ec ("NFS: remove redundant code in nfs_file_write()") identifies that filemap_fdatawait_range() will always return 0 and removes a dead error-handling case in nfs_file_write(). With this change however, assigning the return of filemap_fdatawait_range() to the result variable is a dead store. Remove this needless assignment. No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | nfs: remove unnecessary (void*) conversions.yuzhe2022-10-034-7/+7
| | | | | | | | | | | | | | | | | | | | | remove unnecessary void* type castings. Signed-off-by: yuzhe <yuzhe@nfschina.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Directly use ida_alloc()/free()Bo Liu2022-10-031-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use ida_alloc()/ida_free() instead of ida_simple_get()/ida_simple_remove(). The latter is deprecated and more verbose. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | Merge tag 'for-linus-6.1-ofs1' of ↵Linus Torvalds2022-10-131-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs update from Mike Marshall: "Change iterate to iterate_shared" * tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: Orangefs: change iterate to iterate_shared
| * | | Orangefs: change iterate to iterate_sharedMike Marshall2022-10-031-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | Changed .iterate to .iterate_shared in orangefs_dir_operations. I didn't change anything else, there were no xfstests regressions and no problem with any of my other tests... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
* | | Merge tag 'mm-hotfixes-stable-2022-10-11' of ↵Linus Torvalds2022-10-122-5/+21
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one is not" * tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix leak of nilfs_root in case of writer thread creation failure nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level() nilfs2: fix use-after-free bug of struct nilfs_root mm/damon/core: initialize damon_target->list in damon_new_target() mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
| * | | nilfs2: fix leak of nilfs_root in case of writer thread creation failureRyusuke Konishi2022-10-121-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If nilfs_attach_log_writer() failed to create a log writer thread, it frees a data structure of the log writer without any cleanup. After commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes a leak of struct nilfs_root, which started to leak an ifile metadata inode and a kobject on that struct. In addition, if the kernel is booted with panic_on_warn, the above ifile metadata inode leak will cause the following panic when the nilfs2 kernel module is removed: kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when called from nilfs_destroy_cachep+0x16/0x3a [nilfs2] WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140 ... RIP: 0010:kmem_cache_destroy+0x138/0x140 Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48 ... Call Trace: <TASK> ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2] nilfs_destroy_cachep+0x16/0x3a [nilfs2] exit_nilfs_fs+0xa/0x1b [nilfs2] __x64_sys_delete_module+0x1d9/0x3a0 ? __sanitizer_cov_trace_pc+0x1a/0x50 ? syscall_trace_enter.isra.19+0x119/0x190 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> Kernel panic - not syncing: panic_on_warn set ... This patch fixes these issues by calling nilfs_detach_log_writer() cleanup function if spawning the log writer thread fails. Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com Fixes: e912a5b66837 ("nilfs2: use root object to get ifile") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7381dc4ad60658ca4c05@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()Ryusuke Konishi2022-10-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the i_mode field in inode of metadata files is corrupted on disk, it can cause the initialization of bmap structure, which should have been called from nilfs_read_inode_common(), not to be called. This causes a lockdep warning followed by a NULL pointer dereference at nilfs_bmap_lookup_at_level(). This patch fixes these issues by adding a missing sanitiy check for the i_mode field of metadata file's inode. Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+2b32eb36c1a825b7a74c@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | nilfs2: fix use-after-free bug of struct nilfs_rootRyusuke Konishi2022-10-121-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the beginning of the inode bitmap area is corrupted on disk, an inode with the same inode number as the root inode can be allocated and fail soon after. In this case, the subsequent call to nilfs_clear_inode() on that bogus root inode will wrongly decrement the reference counter of struct nilfs_root, and this will erroneously free struct nilfs_root, causing kernel oopses. This fixes the problem by changing nilfs_new_inode() to skip reserved inode numbers while repairing the inode bitmap. Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b8c672b0e22615c80fe0@syzkaller.appspotmail.com Reported-by: Khalid Masum <khalid.masum.92@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds2022-10-1230-175/+260
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
| * | | | nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failureRyusuke Konishi2022-10-121-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If creation or finalization of a checkpoint fails due to anomalies in the checkpoint metadata on disk, a kernel warning is generated. This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted with panic_on_warn, does not panic. A nilfs_error is appropriate here to handle the abnormal filesystem condition. This also replaces the detected error codes with an I/O error so that neither of the internal error codes is returned to callers. Link: https://lkml.kernel.org/r/20220929123330.19658-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+fbb3e0b24e8dae5a16ee@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | proc: mark more files as permanentAlexey Dobriyan2022-10-038-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark /proc/devices /proc/kpagecount /proc/kpageflags /proc/kpagecgroup /proc/loadavg /proc/meminfo /proc/softirqs /proc/uptime /proc/version as permanent /proc entries, saving alloc/free and some list/spinlock ops per use. These files are never removed by the kernel so it is OK to mark them. Link: https://lkml.kernel.org/r/Yyn527DzDMa+r0Yj@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | nilfs2: remove the unneeded result variableye xingchen2022-10-031-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return the value nilfs_segctor_sync() directly instead of storing it in another redundant variable. Link: https://lkml.kernel.org/r/20220831033403.302184-1-ye.xingchen@zte.com.cn Link: https://lkml.kernel.org/r/20220921034803.2476-3-konishi.ryusuke@gmail.com Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | nilfs2: delete unnecessary checks before brelse()Minghao Chi2022-10-031-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "nilfs2 minor amendments". This patch (of 2): The brelse() inline function tests whether its argument is NULL and then returns immediately. Thus remove the tests which are not needed around the shown calls. Link: https://lkml.kernel.org/r/20220921034803.2476-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20220819081700.96279-1-chi.minghao@zte.com.cn Link: https://lkml.kernel.org/r/20220921034803.2476-2-konishi.ryusuke@gmail.com Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | fs/ocfs2: fix repeated words in commentswangjianli2022-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete the redundant word 'to'. Link: https://lkml.kernel.org/r/20220908130036.31149-1-wangjianli@cdjrlc.com Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | proc: make config PROC_CHILDREN depend on PROC_FSLukas Bulwahn2022-10-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2e13ba54a268 ("fs, proc: introduce CONFIG_PROC_CHILDREN") introduces the config PROC_CHILDREN to configure kernels to provide the /proc/<pid>/task/<tid>/children file. When one deselects PROC_FS for kernel builds without /proc/, the config PROC_CHILDREN has no effect anymore, but is still visible in menuconfig. Add the dependency on PROC_FS to make the PROC_CHILDREN option disappear for kernel builds without /proc/. Link: https://lkml.kernel.org/r/20220909122529.1941-1-lukas.bulwahn@gmail.com Fixes: 2e13ba54a268 ("fs, proc: introduce CONFIG_PROC_CHILDREN") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Iago López Galeiras <iago@endocode.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | fs: uninline inode_maybe_inc_iversion()Andrew Morton2022-10-031-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has many callsites and is large. text data bss dec hex filename 91796 15984 512 108292 1a704 mm/shmem.o-before 91180 15984 512 107676 1a49c mm/shmem.o-after Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | fs/ocfs2/suballoc.h: fix spelling typo in commentJiangshan Yi2022-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix spelling typo in comment. Link: https://lkml.kernel.org/r/20220905061656.1829179-1-13667453960@163.com Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Reported-by: k2ci <kernel-bot@kylinos.cn> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | ocfs2: replace zero-length arrays with DECLARE_FLEX_ARRAY() helperGustavo A. R. Silva2022-10-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero-length arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace zero-length array declarations in a couple of structures and unions with the new DECLARE_FLEX_ARRAY() helper macro. This helper allows for a flexible-array member in a union and as only member in a structure. Also, this addresses multiple warnings reported when building with Clang-15 and -Wzero-length-array. Lastly, this will also help memcpy (in a coming hardening update) execute proper bounds-checking on variable length object i_symlink at fs/ocfs2/namei.c:1973: fs/ocfs2/namei.c: 1973 memcpy((char *) fe->id2.i_symlink, symname, l); Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/197 Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Link: https://lkml.kernel.org/r/YxKY6O2hmdwNh8r8@work Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | ntfs: check overflow when iterating ATTR_RECORDsHawkins Jiawei2022-09-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). Because the ATTR_RECORDs are next to each other, kernel can get the next ATTR_RECORD from end address of current ATTR_RECORD, through current ATTR_RECORD length field. The problem is that during iteration, when kernel calculates the end address of current ATTR_RECORD, kernel may trigger an integer overflow bug in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This may wrap, leading to a forever iteration on 32bit systems. This patch solves it by adding some checks on calculating end address of current ATTR_RECORD during iteration. Link: https://lkml.kernel.org/r/20220831160935.3409-4-yin31149@gmail.com Link: https://lore.kernel.org/all/20220827105842.GM2030@kadam/ Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: chenxiaosong (A) <chenxiaosong2@huawei.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | ntfs: fix out-of-bounds read in ntfs_attr_find()Hawkins Jiawei2022-09-121-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To ensure access on these ATTR_RECORDs are within bounds, kernel will do some checking during iteration. The problem is that during checking whether ATTR_RECORD's name is within bounds, kernel will dereferences the ATTR_RECORD name_offset field, before checking this ATTR_RECORD strcture is within bounds. This problem may result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller: ================================================================== BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607 [...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193 ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845 ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854 mount_bdev+0x34d/0x410 fs/super.c:1400 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> The buggy address belongs to the physical page: page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350 head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140 raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== This patch solves it by moving the ATTR_RECORD strcture's bounds checking earlier, then checking whether ATTR_RECORD's name is within bounds. What's more, this patch also add some comments to improve its maintainability. Link: https://lkml.kernel.org/r/20220831160935.3409-3-yin31149@gmail.com Link: https://lore.kernel.org/all/1636796c-c85e-7f47-e96f-e074fee3c7d3@huawei.com/ Link: https://groups.google.com/g/syzkaller-bugs/c/t_XdeKPGTR4/m/LECAuIGcBgAJ Signed-off-by: chenxiaosong (A) <chenxiaosong2@huawei.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reported-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com Tested-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com Cc: Anton Altaparmakov <anton@tuxera.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | ntfs: fix use-after-free in ntfs_attr_find()Hawkins Jiawei2022-09-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "ntfs: fix bugs about Attribute", v2. This patchset fixes three bugs relative to Attribute in record: Patch 1 adds a sanity check to ensure that, attrs_offset field in first mft record loading from disk is within bounds. Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid dereferencing ATTR_RECORD before checking this ATTR_RECORD is within bounds. Patch 3 adds an overflow checking to avoid possible forever loop in ntfs_attr_find(). Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free detection as reported by Syzkaller. Although one of patch 1 or patch 2 can fix this, we still need both of them. Because patch 1 fixes the root cause, and patch 2 not only fixes the direct cause, but also fixes the potential out-of-bounds bug. This patch (of 3): Syzkaller reported use-after-free read as follows: ================================================================== BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607 [...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193 ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845 ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854 mount_bdev+0x34d/0x410 fs/super.c:1400 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> The buggy address belongs to the physical page: page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350 head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140 raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Kernel will loads $MFT/$DATA's first mft record in ntfs_read_inode_mount(). Yet the problem is that after loading, kernel doesn't check whether attrs_offset field is a valid value. To be more specific, if attrs_offset field is larger than bytes_allocated field, then it may trigger the out-of-bounds read bug(reported as use-after-free bug) in ntfs_attr_find(), when kernel tries to access the corresponding mft record's attribute. This patch solves it by adding the sanity check between attrs_offset field and bytes_allocated field, after loading the first mft record. Link: https://lkml.kernel.org/r/20220831160935.3409-1-yin31149@gmail.com Link: https://lkml.kernel.org/r/20220831160935.3409-2-yin31149@gmail.com Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: ChenXiaoSong <chenxiaosong2@huawei.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | reiserfs: move from strlcpy with unused retval to strscpyWolfram Sang2022-09-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220818210153.8095-1-wsa+renesas@sang-engineering.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | ocfs2: move from strlcpy with unused retval to strscpyWolfram Sang2022-09-122-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220818210123.7637-4-wsa+renesas@sang-engineering.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | hfs: replace kmap() with kmap_local_page() in btree.cFabio M. De Francesco2022-09-121-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmap() is being deprecated in favor of kmap_local_page(). Two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap's pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since its use in btree.c is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in btree.c. Where possible, use the suited standard helpers (memzero_page(), memcpy_page()) instead of open coding kmap_local_page() plus memset() or memcpy(). Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Link: https://lkml.kernel.org/r/20220821180400.8198-4-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | hfs: replace kmap() with kmap_local_page() in bnode.cFabio M. De Francesco2022-09-121-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmap() is being deprecated in favor of kmap_local_page(). Two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap's pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since its use in bnode.c is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in bnode.c. Where possible, use the suited standard helpers (memzero_page(), memcpy_page()) instead of open coding kmap_local_page() plus memset() or memcpy(). Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Link: https://lkml.kernel.org/r/20220821180400.8198-3-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | hfs: unmap the page in the "fail_page" labelFabio M. De Francesco2022-09-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "hfs: Replace kmap() with kmap_local_page()". kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmaps pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since its use in fs/hfs is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in fs/hfs. Where possible, use the suited standard helpers (memzero_page(), memcpy_page()) instead of open coding kmap_local_page() plus memset() or memcpy(). Fix a bug due to a page being not unmapped if the code jumps to the "fail_page" label (1/3). Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. This patch (of 3): Several paths within hfs_btree_open() jump to the "fail_page" label where put_page() is called while the page is still mapped. Call kunmap() to unmap the page soon before put_page(). Link: https://lkml.kernel.org/r/20220821180400.8198-1-fmdefrancesco@gmail.com Link: https://lkml.kernel.org/r/20220821180400.8198-2-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Matthew Wilcox <willy@infradead.org>] Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | aio: use atomic_try_cmpxchg in __get_reqs_availableUros Bizjak2022-09-121-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in __get_reqs_available. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. No functional change intended. Link: https://lkml.kernel.org/r/20220714164851.3055-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | buffer: use try_cmpxchg in discard_bufferUros Bizjak2022-09-121-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in discard_buffer. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Link: https://lkml.kernel.org/r/20220714171653.12128-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | epoll: use try_cmpxchg in list_add_tail_locklessUros Bizjak2022-09-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in list_add_tail_lockless. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Link: https://lkml.kernel.org/r/20220714173255.12987-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | fs/qnx6: delete unnecessary checks before brelse()Minghao Chi2022-09-121-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | brelse() tests whether its argument is NULL and then returns immediately. Thus remove the tests which are not needed around the shown calls. Link: https://lkml.kernel.org/r/20220819081819.96347-1-chi.minghao@zte.com.cn Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: CGEL ZTE <cgel.zte@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minghao Chi <chi.minghao@zte.com.cn> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>