linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'f2fs-for-4.21' of ↵	Linus Torvalds	2018-12-31	20	-366/+533
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've focused on bug fixes since Pixel devices have been shipping with f2fs. Some of them were related to hardware encryption support which are actually not an issue in mainline, but would be better to merge them in order to avoid potential bugs. Enhancements: - do GC sub-sections when the section is large - add a flag in ioctl(SHUTDOWN) to trigger fsck for QA - use kvmalloc() in order to give another chance to avoid ENOMEM Bug fixes: - fix accessing memory boundaries in a malformed iamge - GC gives stale unencrypted block - GC counts in large sections - detect idle time more precisely - block allocation of DIO writes - race conditions between write_begin and write_checkpoint - allow GCs for node segments via ioctl() There are various clean-ups and minor bug fixes as well" * tag 'f2fs-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits) f2fs: sanity check of xattr entry size f2fs: fix use-after-free issue when accessing sbi->stat_info f2fs: check PageWriteback flag for ordered case f2fs: fix validation of the block count in sanity_check_raw_super f2fs: fix missing unlock(sbi->gc_mutex) f2fs: fix to dirty inode synchronously f2fs: clean up structure extent_node f2fs: fix block address for __check_sit_bitmap f2fs: fix sbi->extent_list corruption issue f2fs: clean up checkpoint flow f2fs: flush stale issued discard candidates f2fs: correct wrong spelling, issing_* f2fs: use kvmalloc, if kmalloc is failed f2fs: remove redundant comment of unused wio_mutex f2fs: fix to reorder set_page_dirty and wait_on_page_writeback f2fs: clear PG_writeback if IPU failed f2fs: add an ioctl() to explicitly trigger fsck later f2fs: avoid frequent costly fsck triggers f2fs: fix m_may_create to make OPU DIO write correctly f2fs: fix to update new block address correctly for OPU ...
\| *	f2fs: sanity check of xattr entry size	Jaegeuk Kim	2018-12-27	1	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a security report where f2fs_getxattr() has a hole to expose wrong memory region when the image is malformed like this. f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4 Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix use-after-free issue when accessing sbi->stat_info	Sahitya Tummala	2018-12-27	1	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	iput() on sbi->node_inode can update sbi->stat_info in the below context, if the f2fs_write_checkpoint() has failed with error. f2fs_balance_fs_bg+0x1ac/0x1ec f2fs_write_node_pages+0x4c/0x260 do_writepages+0x80/0xbc __writeback_single_inode+0xdc/0x4ac writeback_single_inode+0x9c/0x144 write_inode_now+0xc4/0xec iput+0x194/0x22c f2fs_put_super+0x11c/0x1e8 generic_shutdown_super+0x70/0xf4 kill_block_super+0x2c/0x5c kill_f2fs_super+0x44/0x50 deactivate_locked_super+0x60/0x8c deactivate_super+0x68/0x74 cleanup_mnt+0x40/0x78 Fix this by moving f2fs_destroy_stats() further below iput() in both f2fs_put_super() and f2fs_fill_super() paths. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: check PageWriteback flag for ordered case	Chao Yu	2018-12-27	13	-54/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For all ordered cases in f2fs_wait_on_page_writeback(), we need to check PageWriteback status, so let's clean up to relocate the check into f2fs_wait_on_page_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix validation of the block count in sanity_check_raw_super	Martin Blumenstingl	2018-12-27	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Treat "block_count" from struct f2fs_super_block as 64-bit little endian value in sanity_check_raw_super() because struct f2fs_super_block declares "block_count" as "__le64". This fixes a bug where the superblock validation fails on big endian devices with the following error: F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0) F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0) F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock As result of this the partition cannot be mounted. With this patch applied the superblock validation works fine and the partition can be mounted again: F2FS-fs (sda1): Mounted with checkpoint version = 7c84 My little endian x86-64 hardware was able to mount the partition without this fix. To confirm that mounting f2fs filesystems works on big endian machines again I tested this on a 32-bit MIPS big endian (lantiq) device. Fixes: 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows") Cc: stable@vger.kernel.org Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix missing unlock(sbi->gc_mutex)	Jaegeuk Kim	2018-12-27	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes missing unlock call. Cc: <stable@vger.kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to dirty inode synchronously	Chao Yu	2018-12-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user change inode's i_flags via ioctl, let's add it into global dirty list, so that checkpoint can guarantee its persistence before fsync, it can make checkpoint keeping strong consistency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: clean up structure extent_node	Chao Yu	2018-12-27	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The union in struct extent_node wass only to indicate below fields struct rb_node rb_node; union { struct { unsigned int fofs; unsigned int len; ... ... can be parsed as fields in struct rb_entry, but they were never be used explicitly before, so let's remove them for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix block address for __check_sit_bitmap	Qiuyang Sun	2018-12-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Should use lstart (logical start address) instead of start (in dev) here. This fixes a bug in multi-device scenarios. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix sbi->extent_list corruption issue	Sahitya Tummala	2018-12-27	2	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there is a failure in f2fs_fill_super() after/during the recovery of fsync'd nodes, it frees the current sbi and retries again. This time the mount is successful, but the files that got recovered before retry, still holds the extent tree, whose extent nodes list is corrupted since sbi and sbi->extent_list is freed up. The list_del corruption issue is observed when the file system is getting unmounted and when those recoverd files extent node is being freed up in the below context. list_del corruption. prev->next should be fffffff1e1ef5480, but was (null) <...> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53! lr : __list_del_entry_valid+0x94/0xb4 pc : __list_del_entry_valid+0x94/0xb4 <...> Call trace: __list_del_entry_valid+0x94/0xb4 __release_extent_node+0xb0/0x114 __free_extent_tree+0x58/0x7c f2fs_shrink_extent_tree+0xdc/0x3b0 f2fs_leave_shrinker+0x28/0x7c f2fs_put_super+0xfc/0x1e0 generic_shutdown_super+0x70/0xf4 kill_block_super+0x2c/0x5c kill_f2fs_super+0x44/0x50 deactivate_locked_super+0x60/0x8c deactivate_super+0x68/0x74 cleanup_mnt+0x40/0x78 __cleanup_mnt+0x1c/0x28 task_work_run+0x48/0xd0 do_notify_resume+0x678/0xe98 work_pending+0x8/0x14 Fix this by not creating extents for those recovered files if shrinker is not registered yet. Once mount is successful and shrinker is registered, those files can have extents again. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: clean up checkpoint flow	Chao Yu	2018-12-27	1	-13/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up checkpoint flow a bit: - remove unneeded circulation of flushing meta pages. - don't flush nat_bits pages in prior to other checkpoint pages. - add bug_on to check remained meta pages after flushing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: flush stale issued discard candidates	Jaegeuk Kim	2018-12-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes, I could observe # of issuing_discard to be 1 which blocks background jobs due to is_idle()=false. The only way to get out of it was to trigger gc_urgent. This patch avoids that by checking any candidates as done in the list. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: correct wrong spelling, issing_*	Jaegeuk Kim	2018-12-27	3	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let's use "queued" instead of "issuing". Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: use kvmalloc, if kmalloc is failed	Jaegeuk Kim	2018-12-27	11	-70/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One report says memalloc failure during mount. (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14) (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0) (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160) (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0) (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120) (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688) (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc) (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188) (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20) (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c) (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134) (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4) (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc) (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: remove redundant comment of unused wio_mutex	Yunlong Song	2018-12-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes of unused wio_mutex, but missing the comment, so delete it. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to reorder set_page_dirty and wait_on_page_writeback	Chao Yu	2018-12-14	4	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reorders flow from - update page - set_page_dirty - wait_on_page_writeback to - wait_on_page_writeback - update page - set_page_dirty The reason is: - set_page_dirty will increase reference of dirty page, the reference should be cleared before wait_on_page_writeback to keep its consistency. - some devices need stable page during page writebacking, so we should not change page's data. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: clear PG_writeback if IPU failed	Sheng Yong	2018-12-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If IPU failed, nothing is commited, we should end page writeback. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: add an ioctl() to explicitly trigger fsck later	Jaegeuk Kim	2018-12-14	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an option in ioctl(F2FS_IOC_SHUTDOWN) in order to trigger fsck by setting a NEED_FSCK flag. Generally, shutdown is used for the test to validate filesystem consistency, and setting NEED_FSCK flag can be used for Android to trigger fsck.f2fs at boot time explicitly so that we could measure the elapsed time as well as force filesystem check. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: avoid frequent costly fsck triggers	Jaegeuk Kim	2018-11-28	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want to re-enable nat_bits, we rely on fsck which requires full scan of directory tree. Let's do that by regular fsck or unclean shutdown. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix m_may_create to make OPU DIO write correctly	Jia Zhu	2018-11-27	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to update new block address correctly for OPU	Jia Zhu	2018-11-27	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we allocated a new block address for OPU mode in direct_IO. But the new address couldn't be assigned to @map->m_pblk correctly. This patch fix it. Cc: <stable@vger.kernel.org> Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: adjust trace print in f2fs_get_victim() to cover all paths	Sahitya Tummala	2018-11-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adjust the trace print in f2fs_get_victim() to cover GC done by F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to allow node segment for GC by ioctl path	Sahitya Tummala	2018-11-27	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow node type segments also to be GC'd via f2fs ioctl F2FS_IOC_GARBAGE_COLLECT_RANGE. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: make "f2fs_fault_name[]" const char *	Alexey Dobriyan	2018-11-27	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Those strings are immutable. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: read page index before freeing	Pan Bian	2018-11-27	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function truncate_node frees the page with f2fs_put_page. However, the page index is read after that. So, the patch reads the index before freeing the page. Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated") Cc: <stable@vger.kernel.org> Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix wrong return value of f2fs_acl_create	Tiezhu Yang	2018-11-27	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create() should return -EIO instead of -ENOMEM, this patch makes it consistent with posix_acl_create() which has been fixed in commit beaf226b863a ("posix_acl: don't ignore return value of posix_acl_create_masq()"). Fixes: 83dfe53c185e ("f2fs: fix reference leaks in f2fs_acl_create") Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: avoid build warn of fall_through	Jaegeuk Kim	2018-11-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After merging the f2fs tree, today's linux-next build (x86_64_allmodconfig) produced this warning: In file included from fs/f2fs/dir.c:11: fs/f2fs/f2fs.h: In function '__mark_inode_dirty_flag': fs/f2fs/f2fs.h:2388:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (set) ^ fs/f2fs/f2fs.h:2390:2: note: here case FI_DATA_EXIST: ^~~~ Exposed by my use of -Wimplicit-fallthrough Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix race between write_checkpoint and write_begin	Sheng Yong	2018-11-27	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following race could lead to inconsistent SIT bitmap: Task A Task B ====== ====== f2fs_write_checkpoint block_operations f2fs_lock_all down_write(node_change) down_write(node_write) ... sync ... up_write(node_change) f2fs_file_write_iter set_inode_flag(FI_NO_PREALLOC) ...... f2fs_write_begin(index=0, has inline data) prepare_write_begin __do_map_lock(AIO) => down_read(node_change) f2fs_convert_inline_page => update SIT __do_map_lock(AIO) => up_read(node_change) f2fs_flush_sit_entries <= inconsistent SIT finish write checkpoint sudden-power-off If SPO occurs after checkpoint is finished, SIT bitmap will be set incorrectly. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: check memory boundary by insane namelen	Jaegeuk Kim	2018-11-27	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If namelen is corrupted to have very long value, fill_dentries can copy wrong memory area. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: only flush the single temp bio cache which owns the target page	Yunlong Song	2018-11-27	1	-27/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, when f2fs finds which temp bio cache owns the target page, it will flush all the three temp bio caches, but we only need to flush one single bio cache indeed, which can help to keep bio merged. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix out-place-update DIO write	Chao Yu	2018-11-27	3	-14/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In get_more_blocks(), we may override @create as below code: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create is 1, so in LFS mode, dio overwrite under LFS mode can easily run out of free segments, result in below panic. Call Trace: allocate_segment_by_default+0xa8/0x270 [f2fs] f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs] __allocate_data_block+0x306/0x480 [f2fs] f2fs_map_blocks+0x6f6/0x920 [f2fs] __get_data_block+0x4f/0xb0 [f2fs] get_data_block_dio_write+0x50/0x60 [f2fs] do_blockdev_direct_IO+0xcd5/0x21e0 __blockdev_direct_IO+0x3a/0x3c f2fs_direct_IO+0x1ff/0x4a0 [f2fs] generic_file_direct_write+0xd9/0x160 __generic_file_write_iter+0xbb/0x1e0 f2fs_file_write_iter+0xaf/0x220 [f2fs] __vfs_write+0xd0/0x130 vfs_write+0xb2/0x1b0 SyS_pwrite64+0x69/0xa0 ? vtime_user_exit+0x29/0x70 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8 So this patch introduces a parameter map.m_may_create to indicate that f2fs_map_blocks() is called from write or read path, which can give the right hint to let f2fs_map_blocks() trigger OPU allocation and call f2fs_balanc_fs() correctly. BTW, it disables physical address preallocation for direct IO in f2fs_preallocate_blocks, which is redundant to OPU allocation of f2fs_map_blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to be aware discard/preflush/dio command in is_idle()	Chao Yu	2018-11-27	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds missing in-flight discard/preflush/dio command count check in is_idle(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: add to account direct IO	Chao Yu	2018-11-27	3	-1/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions in direct IO path, in order to account DIO. Later, we will add this count into is_idle() to let background GC/Discard thread be aware of DIO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: move dir data flush to write checkpoint process	Yunlei He	2018-11-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch move dir data flush to write checkpoint process, by doing this, it may reduce some time for dir fsync. pre: -f2fs_do_sync_file enter -file_write_and_wait_range <- flush & wait -write_checkpoint -do_checkpoint <- wait all -f2fs_do_sync_file exit now: -f2fs_do_sync_file enter -write_checkpoint -block_operations <- flush dir & no wait -do_checkpoint <- wait all -f2fs_do_sync_file exit Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: Change to use DEFINE_SHOW_ATTRIBUTE macro	Yangtao Li	2018-11-27	1	-12/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: change segment to section in f2fs_ioc_gc_range	Yunlong Song	2018-11-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves blocks of section each time, so fix it from segment to section. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: export migration_granularity sysfs entry	Chao Yu	2018-11-27	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add one sysfs entry to control migration granularity of GC in large section f2fs, it can be tuned to mitigate heavy overhead of migrating huge number of blocks in large section. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: support subsectional garbage collection	Chao Yu	2018-11-27	3	-6/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Section is minimal garbage collection unit of f2fs, in zoned block device, or ancient block mapping flash device, in order to improve GC efficiency, we can align GC unit to lower device erase unit, normally, it consists of multiple of segments. Once background or foreground GC triggers, it brings a large number of IOs, which will impact user IO, and also occupy cpu/memory resource intensively. So, to reduce impact of GC on large size section, this patch supports subsectional GC, in one cycle of GC, it only migrate partial segment{s} in victim section. Currently, by default, we use sbi->segs_per_sec as migration granularity. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: introduce __is_large_section() for cleanup	Chao Yu	2018-11-27	6	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a wrapper __is_large_section() to clean up codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: clean up f2fs_sb_has_##feature_name	Chao Yu	2018-11-27	10	-73/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to access .raw_super field, to avoid unneeded pointer conversion, this patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly. Just do cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: remove codes of unused wio_mutex	Yunlong Song	2018-11-27	2	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix count of seg_freed to make sec_freed correct	Yunlong Song	2018-11-27	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before gc starts, do_garbage_collect will skip counting seg_freed++, and this will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: fix to account preflush command for noflush_merge mode	Chao Yu	2018-11-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we only account preflush command for flush_merge mode, so for noflush_merge mode, we can not know in-flight preflush command count, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
\| *	f2fs: avoid GC causing encrypted file corrupted	Yunlong Song	2018-11-27	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The encrypted file may be corrupted by GC in following case: Time 1: \| segment 1 blkaddr = A \| GC -> \| segment 2 blkaddr = B \| Encrypted block 1 is moved from blkaddr A of segment 1 to blkaddr B of segment 2, Time 2: \| segment 1 blkaddr = B \| GC -> \| segment 3 blkaddr = C \| Before page 1 is written back and if segment 2 become a victim, then page 1 is moved from blkaddr B of segment 2 to blkaddr Cof segment 3, during the GC process of Time 2, f2fs should wait for page 1 written back before reading it, or move_data_block will read a garbage block from blkaddr B since page is not written back to blkaddr B yet. Commit 6aa58d8a ("f2fs: readahead encrypted block during GC") introduce ra_data_block to read encrypted block, but it forgets to add f2fs_wait_on_page_writeback to avoid racing between GC and flush. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* \|	Merge tag 'char-misc-4.21-rc1' of ↵	Linus Torvalds	2018-12-29	1	-0/+29
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of char and misc driver patches for 4.21-rc1. Lots of different types of driver things in here, as this tree seems to be the "collection of various driver subsystems not big enough to have their own git tree" lately. Anyway, some highlights of the changes in here: - binderfs: is it a rule that all driver subsystems will eventually grow to have their own filesystem? Binder now has one to handle the use of it in containerized systems. This was discussed at the Plumbers conference a few months ago and knocked into mergable shape very fast by Christian Brauner. Who also has signed up to be another binder maintainer, showing a distinct lack of good judgement :) - binder updates and fixes - mei driver updates - fpga driver updates and additions - thunderbolt driver updates - soundwire driver updates - extcon driver updates - nvmem driver updates - hyper-v driver updates - coresight driver updates - pvpanic driver additions and reworking for more device support - lp driver updates. Yes really, it's _finally_ moved to the proper parallal port driver model, something I never thought I would see happen. Good stuff. - other tiny driver updates and fixes. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (116 commits) MAINTAINERS: add another Android binder maintainer intel_th: msu: Fix an off-by-one in attribute store stm class: Add a reference to the SyS-T document stm class: Fix a module refcount leak in policy creation error path char: lp: use new parport device model char: lp: properly count the lp devices char: lp: use first unused lp number while registering char: lp: detach the device when parallel port is removed char: lp: introduce list to save port number bus: qcom: remove duplicated include from qcom-ebi2.c VMCI: Use memdup_user() rather than duplicating its implementation char/rtc: Use of_node_name_eq for node name comparisons misc: mic: fix a DMA pool free failure ptp: fix an IS_ERR() vs NULL check genwqe: Fix size check binder: implement binderfs binder: fix use-after-free due to ksys_close() during fdget() bus: fsl-mc: remove duplicated include files bus: fsl-mc: explicitly define the fsl_mc_command endianness misc: ti-st: make array read_ver_cmd static, shrinks object size ...
\| * \|	binder: fix use-after-free due to ksys_close() during fdget()	Todd Kjos	2018-12-19	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	44d8047f1d8 ("binder: use standard functions to allocate fds") exposed a pre-existing issue in the binder driver. fdget() is used in ksys_ioctl() as a performance optimization. One of the rules associated with fdget() is that ksys_close() must not be called between the fdget() and the fdput(). There is a case where this requirement is not met in the binder driver which results in the reference count dropping to 0 when the device is still in use. This can result in use-after-free or other issues. If userpace has passed a file-descriptor for the binder driver using a BINDER_TYPE_FDA object, then kys_close() is called on it when handling a binder_ioctl(BC_FREE_BUFFER) command. This violates the assumptions for using fdget(). The problem is fixed by deferring the close using task_work_add(). A new variant of __close_fd() was created that returns a struct file with a reference. The fput() is deferred instead of using ksys_close(). Fixes: 44d8047f1d87a ("binder: use standard functions to allocate fds") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Todd Kjos <tkjos@google.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* \| \|	Merge tag 'driver-core-4.21-rc1' of ↵	Linus Torvalds	2018-12-29	1	-12/+11
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 4.21-rc1. It's not really big, just a number of small changes for some reported issues, some documentation updates to hopefully make it harder for people to abuse the driver model, and some other minor cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: mm, memory_hotplug: update a comment in unregister_memory() component: convert to DEFINE_SHOW_ATTRIBUTE sysfs: Disable lockdep for driver bind/unbind files driver core: Add missing dev->bus->need_parent_lock checks kobject: return error code if writing /sys/.../uevent fails driver core: Move async_synchronize_full call driver core: platform: Respect return code of platform_device_register_full() kref/kobject: Improve documentation drivers/base/memory.c: Use DEVICE_ATTR_RO and friends driver core: Replace simple_strto{l,ul} by kstrtou{l,ul} kernfs: Improve kernfs_notify() poll notification latency kobject: Fix warnings in lib/kobject_uevent.c kobject: drop unnecessary cast "%llu" for u64 driver core: fix comments for device_block_probing() driver core: Replace simple_strtol by kstrtoint
\| * \ \	Merge 4.20-rc5 into driver-core-next	Greg Kroah-Hartman	2018-12-03	82	-468/+802
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
\| * \| \| \|	kernfs: Improve kernfs_notify() poll notification latency	Radu Rendec	2018-11-27	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kernfs_notify() does two notifications: poll and fsnotify. Originally, both notifications were done from scheduled work context and all that kernfs_notify() did was schedule the work. This patch simply moves the poll notification from the scheduled work handler to kernfs_notify(). The fsnotify notification still needs to be done from scheduled work context because it can sleep (it needs to lock a mutex). If the poll notification is time critical (the notified thread needs to wake as quickly as possible), it's better to do it from kernfs_notify() directly. One example is calling sysfs_notify_dirent() from a hardware interrupt handler to wake up a thread and handle the interrupt in user space. Signed-off-by: Radu Rendec <radu.rendec@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* \| \| \| \|	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	2018-12-29	26	-79/+106
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...