| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
In only call path of __cluster_may_compress(), __f2fs_write_data_pages()
has checked SBI_POR_DOING condition, and also cluster_may_compress()
has checked CP_ERROR_FLAG condition, so remove redundant check condition
in __cluster_may_compress() for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Commit d5f7bc0064e0 ("f2fs: deprecate f2fs_trace_io") left some
dead codes, delete them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
I've added new sysfs nodes to show runtime compression stat since mount.
compr_written_block - show the block count written after compression
compr_saved_block - show the saved block count with compression
compr_new_inode - show the count of inode newly enabled for compression
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic/269 reports a hangtask issue, the root cause is ABBA deadlock
described as below:
Thread A Thread B
- down_write(&sbi->gc_lock) -- A
- f2fs_write_data_pages
- lock all pages in cluster -- B
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- down_write(&sbi->gc_lock) -- A
- f2fs_gc
- do_garbage_collect
- ra_data_block
- pagecache_get_page -- B
To fix this, it needs to avoid calling f2fs_balance_fs() if there is
still cluster pages been locked in context of cluster writeback, so
instead, let's call f2fs_balance_fs() in the end of
f2fs_write_raw_pages() when all cluster pages were unlocked.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework the post-read processing logic to be much easier to understand.
At least one bug is fixed by this: if an I/O error occurred when reading
from disk, decryption and verity would be performed on the uninitialized
data, causing misleading messages in the kernel log.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expand 'compress_algorithm' mount option to accept parameter as format of
<algorithm>:<level>, by this way, it gives a way to allow user to do more
specified config on lz4 and zstd compression level, then f2fs compression
can provide higher compress ratio.
In order to set compress level for lz4 algorithm, it needs to set
CONFIG_LZ4HC_COMPRESS and CONFIG_F2FS_FS_LZ4HC config to enable lz4hc
compress algorithm.
CR and performance number on lz4/lz4hc algorithm:
dd if=enwik9 of=compressed_file conv=fsync
Original blocks: 244382
lz4 lz4hc-9
compressed blocks 170647 163270
compress ratio 69.8% 66.8%
speed 16.4207 s, 60.9 MB/s 26.7299 s, 37.4 MB/s
compress ratio = after / before
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
This patch addresses minor issues in compression chksum.
Fixes: b28f047b28c5 ("f2fs: compress: support chksum")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found out f2fs_free_dic() is invoked in a wrong timing, but
f2fs_verify_bio() still needed the dic info and it triggered the
below kernel panic. It has been caused by the race condition of
pending_pages value between decompression and verity logic, when
the same compression cluster had been split in different bios.
By split bios, f2fs_verify_bio() ended up with decreasing
pending_pages value before it is reset to nr_cpages by
f2fs_decompress_pages() and caused the kernel panic.
[ 4416.564763] Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000000
...
[ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work
[ 4416.908515] pc : fsverity_verify_page+0x20/0x78
[ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c
[ 4416.913722] sp : ffffffc019533cd0
[ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402
[ 4416.913724] x27: 0000000000000001 x26: 0000000000000100
[ 4416.913726] x25: 0000000000000001 x24: 0000000000000004
[ 4416.913727] x23: 0000000000001000 x22: 0000000000000000
[ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0
[ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30
[ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298
[ 4416.913732] x15: 0000000000000000 x14: 0000000000000000
[ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000
[ 4416.913734] x11: 0000000000001000 x10: 0000000000001000
[ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000
[ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade
[ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0
[ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0
[ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0
[ 4416.929184] Call trace:
[ 4416.929187] fsverity_verify_page+0x20/0x78
[ 4416.929189] f2fs_verify_bio+0x11c/0x29c
[ 4416.929192] f2fs_verity_work+0x58/0x84
[ 4417.050667] process_one_work+0x270/0x47c
[ 4417.055354] worker_thread+0x27c/0x4d8
[ 4417.059784] kthread+0x13c/0x320
[ 4417.063693] ret_from_fork+0x10/0x18
Chao pointed this can happen by the below race condition.
Thread A f2fs_post_read_wq fsverity_wq
- f2fs_read_multi_pages()
- f2fs_alloc_dic
- dic->pending_pages = 2
- submit_bio()
- submit_bio()
- f2fs_post_read_work() handle first bio
- f2fs_decompress_work()
- __read_end_io()
- f2fs_decompress_pages()
- dic->pending_pages--
- enqueue f2fs_verity_work()
- f2fs_verity_work() handle first bio
- f2fs_verify_bio()
- dic->pending_pages--
- f2fs_post_read_work() handle second bio
- f2fs_decompress_work()
- enqueue f2fs_verity_work()
- f2fs_verify_pages()
- f2fs_free_dic()
- f2fs_verity_work() handle second bio
- f2fs_verfy_bio()
- use-after-free on dic
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch supports to store chksum value with compressed
data, and verify the integrality of compressed data while
reading the data.
The feature can be enabled through specifying mount option
'compress_chksum'.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the below mem leak.
[ 130.157600] =============================================================================
[ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown()
[ 130.162742] -----------------------------------------------------------------------------
[ 130.162742]
[ 130.164979] Disabling lock debugging due to kernel taint
[ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200
[ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35
[ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[ 130.171941] Call Trace:
[ 130.172528] dump_stack+0x74/0x9a
[ 130.173298] slab_err+0xb7/0xdc
[ 130.174044] ? kernel_poison_pages+0xc0/0xc0
[ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90
[ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141
[ 130.177190] kmem_cache_destroy+0x59/0x100
[ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs]
[ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs]
[ 130.180538] generic_shutdown_super+0x72/0x110
[ 130.181547] kill_block_super+0x27/0x50
[ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs]
[ 130.183448] deactivate_locked_super+0x3b/0x80
[ 130.184456] deactivate_super+0x3e/0x50
[ 130.185363] cleanup_mnt+0x109/0x160
[ 130.186179] __cleanup_mnt+0x12/0x20
[ 130.187003] task_work_run+0x70/0xb0
[ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0
[ 130.188917] syscall_exit_to_user_mode+0x31/0x170
[ 130.189989] do_syscall_64+0x45/0x90
[ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 130.191986] RIP: 0033:0x7faf868ea2eb
[ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01
[ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb
[ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50
[ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0
[ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50
[ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000
[ 130.210515] INFO: Object 0x00000000a980843a @offset=744
[ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297
[ 130.215030] __slab_alloc+0x20/0x40
[ 130.216566] kmem_cache_alloc+0x2a0/0x2e0
[ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs]
[ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs]
[ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs]
[ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs]
[ 130.225414] do_writepages+0x43/0xd0
[ 130.226907] __filemap_fdatawrite_range+0xd5/0x110
[ 130.228632] filemap_write_and_wait_range+0x48/0xb0
[ 130.230336] __generic_file_write_iter+0x18a/0x1d0
[ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs]
[ 130.233737] new_sync_write+0x113/0x1a0
[ 130.235204] vfs_write+0x1a6/0x200
[ 130.236579] ksys_write+0x67/0xe0
[ 130.237898] __x64_sys_write+0x1a/0x20
[ 130.239309] do_syscall_64+0x38/0x90
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory
allocation of compress_io_ctx and decompress_io_ctx structure.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory
allocation of page pointer array in compress context.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Fix wrong memory allocation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By profiling f2fs compression works, I've found vmap() callings have
unexpected hikes in the execution time in our test environment and
those are bottlenecks of f2fs decompression path. Changing these with
vm_map_ram(), we can enhance f2fs decompression speed pretty much.
[Verification]
Android Pixel 3(ARM64, 6GB RAM, 128GB UFS)
Turned on only 0-3 little cores(at 1.785GHz)
dd if=/dev/zero of=dummy bs=1m count=1000
echo 3 > /proc/sys/vm/drop_caches
dd if=dummy of=/dev/zero bs=512k
- w/o compression -
1048576000 bytes (0.9 G) copied, 2.082554 s, 480 M/s
1048576000 bytes (0.9 G) copied, 2.081634 s, 480 M/s
1048576000 bytes (0.9 G) copied, 2.090861 s, 478 M/s
- before patch -
1048576000 bytes (0.9 G) copied, 7.407527 s, 135 M/s
1048576000 bytes (0.9 G) copied, 7.283734 s, 137 M/s
1048576000 bytes (0.9 G) copied, 7.291508 s, 137 M/s
- after patch -
1048576000 bytes (0.9 G) copied, 1.998959 s, 500 M/s
1048576000 bytes (0.9 G) copied, 1.987554 s, 503 M/s
1048576000 bytes (0.9 G) copied, 1.986380 s, 503 M/s
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As 5kft <5kft@5kft.org> reported:
kworker/u9:3: page allocation failure: order:9, mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G C 5.8.3-sunxi #trunk
Hardware name: Allwinner sun8i Family
Workqueue: f2fs_post_read_wq f2fs_post_read_work
[<c010d6d5>] (unwind_backtrace) from [<c0109a55>] (show_stack+0x11/0x14)
[<c0109a55>] (show_stack) from [<c056d489>] (dump_stack+0x75/0x84)
[<c056d489>] (dump_stack) from [<c0243b53>] (warn_alloc+0xa3/0x104)
[<c0243b53>] (warn_alloc) from [<c024473b>] (__alloc_pages_nodemask+0xb87/0xc40)
[<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] (kmalloc_order+0x19/0x38)
[<c02267c5>] (kmalloc_order) from [<c02267fd>] (kmalloc_order_trace+0x19/0x90)
[<c02267fd>] (kmalloc_order_trace) from [<c047c665>] (zstd_init_decompress_ctx+0x21/0x88)
[<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>] (f2fs_decompress_pages+0x97/0x228)
[<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>] (__read_end_io+0xfb/0x130)
[<c045d0ab>] (__read_end_io) from [<c045d141>] (f2fs_post_read_work+0x61/0x84)
[<c045d141>] (f2fs_post_read_work) from [<c0130b2f>] (process_one_work+0x15f/0x3b0)
[<c0130b2f>] (process_one_work) from [<c0130e7b>] (worker_thread+0xfb/0x3e0)
[<c0130e7b>] (worker_thread) from [<c0135c3b>] (kthread+0xeb/0x10c)
[<c0135c3b>] (kthread) from [<c0100159>]
zstd may allocate large size memory for {,de}compression, it may cause
file copy failure on low-end device which has very few memory.
For decompression, let's just allocate proper size memory based on current
file's cluster size instead of max cluster size.
Reported-by: 5kft <5kft@5kft.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
refcount_t type variable should never be less than one, so it's a
little bit hard to understand when we use it to indicate pending
compressed page count, let's change to use atomic_t for better
readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- f2fs_write_multi_pages
- f2fs_compress_pages
- init_compress_ctx
- compress_pages
- destroy_compress_ctx --- 1
- f2fs_write_compressed_pages
- destroy_compress_ctx --- 2
destroy_compress_ctx() in f2fs_write_multi_pages() is redundant, remove
it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added two small interfaces: (a) GC_URGENT_LOW
mode for performance and (b) F2FS_IOC_SEC_TRIM_FILE ioctl for
security.
The new GC mode allows Android to run some lower priority GCs in
background, while new ioctl discards user information without race
condition when the account is removed.
In addition, some patches were merged to address latency-related
issues. We've fixed some compression-related bug fixes as well as edge
race conditions.
Enhancements:
- add GC_URGENT_LOW mode in gc_urgent
- introduce F2FS_IOC_SEC_TRIM_FILE ioctl
- bypass racy readahead to improve read latencies
- shrink node_write lock coverage to avoid long latency
Bug fixes:
- fix missing compression flag control, i_size, and mount option
- fix deadlock between quota writes and checkpoint
- remove inode eviction path in synchronous path to avoid deadlock
- fix to wait GCed compressed page writeback
- fix a kernel panic in f2fs_is_compressed_page
- check page dirty status before writeback
- wait page writeback before update in node page write flow
- fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry
We've added some minor sanity checks and refactored trivial code
blocks for better readability and debugging information"
* tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
f2fs: prepare a waiter before entering io_schedule
f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive
f2fs: replace test_and_set/clear_bit() with set/clear_bit()
f2fs: make file immutable even if releasing zero compression block
f2fs: compress: disable compression mount option if compression is off
f2fs: compress: add sanity check during compressed cluster read
f2fs: use macro instead of f2fs verity version
f2fs: fix deadlock between quota writes and checkpoint
f2fs: correct comment of f2fs_exist_written_data
f2fs: compress: delay temp page allocation
f2fs: compress: fix to update isize when overwriting compressed file
f2fs: space related cleanup
f2fs: fix use-after-free issue
f2fs: Change the type of f2fs_flush_inline_data() to void
f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl
f2fs: should avoid inode eviction in synchronous path
f2fs: segment.h: delete a duplicated word
f2fs: compress: fix to avoid memory leak on cc->cpages
f2fs: use generic names for generic ioctls
f2fs: don't keep meta inode pages used for compressed block migration
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, we allocate temp pages which is used to pad hole in
cluster during read IO submission, it may take long time before
releasing them in f2fs_decompress_pages(), since they are only
used as temp output buffer in decompression context, so let's
just do the allocation in that context to reduce time of memory
pool resource occupation.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Memory allocated for storing compressed pages' poitner should be
released after f2fs_write_compressed_pages(), otherwise it will
cause memory leak issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
like we did for encrypted page.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch is to fix a crash:
#3 [ffffb6580689f898] oops_end at ffffffffa2835bc2
#4 [ffffb6580689f8b8] no_context at ffffffffa28766e7
#5 [ffffb6580689f920] async_page_fault at ffffffffa320135e
[exception RIP: f2fs_is_compressed_page+34]
RIP: ffffffffa2ba83a2 RSP: ffffb6580689f9d8 RFLAGS: 00010213
RAX: 0000000000000001 RBX: fffffc0f50b34bc0 RCX: 0000000000002122
RDX: 0000000000002123 RSI: 0000000000000c00 RDI: fffffc0f50b34bc0
RBP: ffff97e815a40178 R8: 0000000000000000 R9: ffff97e83ffc9000
R10: 0000000000032300 R11: 0000000000032380 R12: ffffb6580689fa38
R13: fffffc0f50b34bc0 R14: ffff97e825cbd000 R15: 0000000000000c00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffffb6580689f9d8] __is_cp_guaranteed at ffffffffa2b7ea98
#7 [ffffb6580689f9f0] f2fs_submit_page_write at ffffffffa2b81a69
#8 [ffffb6580689fa30] f2fs_do_write_meta_page at ffffffffa2b99777
#9 [ffffb6580689fae0] __f2fs_write_meta_page at ffffffffa2b75f1a
#10 [ffffb6580689fb18] f2fs_sync_meta_pages at ffffffffa2b77466
#11 [ffffb6580689fc98] do_checkpoint at ffffffffa2b78e46
#12 [ffffb6580689fd88] f2fs_write_checkpoint at ffffffffa2b79c29
#13 [ffffb6580689fdd0] f2fs_sync_fs at ffffffffa2b69d95
#14 [ffffb6580689fe20] sync_filesystem at ffffffffa2ad2574
#15 [ffffb6580689fe30] generic_shutdown_super at ffffffffa2a9b582
#16 [ffffb6580689fe48] kill_block_super at ffffffffa2a9b6d1
#17 [ffffb6580689fe60] kill_f2fs_super at ffffffffa2b6abe1
#18 [ffffb6580689fea0] deactivate_locked_super at ffffffffa2a9afb6
#19 [ffffb6580689feb8] cleanup_mnt at ffffffffa2abcad4
#20 [ffffb6580689fee0] task_work_run at ffffffffa28bca28
#21 [ffffb6580689ff00] exit_to_usermode_loop at ffffffffa28050b7
#22 [ffffb6580689ff38] do_syscall_64 at ffffffffa280560e
#23 [ffffb6580689ff50] entry_SYSCALL_64_after_hwframe at ffffffffa320008c
This occurred when umount f2fs if enable F2FS_FS_COMPRESSION
with F2FS_IO_TRACE. Fixes it by adding IS_IO_TRACED_PAGE to check
validity of pid for page_private.
Signed-off-by: Yu Changchun <yuchangchun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In f2fs_write_raw_pages(), we need to check page dirty status before
writeback, because there could be a racer (e.g. reclaimer) helps
writebacking the dirty page.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The parameter compr is unused in the f2fs_cluster_blocks function
so we no longer need to pass it as a parameter.
Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- to avoid race between checkpoint and quota file writeback, it
just needs to hold read lock of node_write in writeback path.
- node_write lock has covered all LFS data write paths, it's not
necessary, we only need to hold node_write lock at write path of
quota file.
This refactors commit ca7f76e68074 ("f2fs: fix wrong discard space").
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
to avoid polluting global symbol namespace.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
ERROR:INITIALISED_STATIC: do not initialise statics to NULL
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Wire up f2fs to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:
- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.
- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.
- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.
- Not doing filesystem-layer crypto on inline-encrypted files.
This patch includes a fix for a race during IPU by
Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Satya Tangirala <satyat@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
|
|
|
|
|
| |
Just cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
While compressed data writeback, we need to drop dirty pages like we did
for non-compressed pages if cp stops, however it's not needed to compress
any data in such case, so let's detect cp stop condition in
cluster_may_compress() to avoid redundant compressing and let following
f2fs_write_raw_pages() drops dirty pages correctly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During zstd compression, ZSTD_endStream() may return non-zero value
because distination buffer is full, but there is still compressed data
remained in intermediate buffer, it means that zstd algorithm can not
save at last one block space, let's just writeback raw data instead of
compressed one, this can fix data corruption when decompressing
incomplete stored compression data.
Fixes: 50cfa66f0de0 ("f2fs: compress: support zstd compress algorithm")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commonly, in order to handle lz4 worst compress case, caller should
allocate buffer with size of LZ4_compressBound(inputsize) for target
compressed data storing, however in this case, if caller didn't
allocate enough space, lz4 compressor still can handle output buffer
budget properly, and end up compressing when left space in output
buffer is not enough.
So we don't have to allocate buffer with size for worst case, then
we can avoid 2 * 4KB size intermediate buffer allocation when
log_cluster_size is 2, and avoid unnecessary compressing work of
compressor if we can not save at least 4KB space.
Suggested-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
LZO-RLE extension (run length encoding) was introduced to improve
performance of LZO algorithm in scenario of data contains many zeros,
zram has changed to use this extended algorithm by default, this
patch adds to support this algorithm extension, to enable this
extension, it needs to enable F2FS_FS_LZO and F2FS_FS_LZORLE config,
and specifies "compress_algorithm=lzo-rle" mountoption.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If compression feature is on, in scenario of no enough free memory,
page refault ratio is higher than before, the root cause is:
- {,de}compression flow needs to allocate intermediate pages to store
compressed data in cluster, so during their allocation, vm may reclaim
mmaped pages.
- if above reclaimed pages belong to compressed cluster, during its
refault, it may cause more intermediate pages allocation, result in
reclaiming more mmaped pages.
So this patch introduces a mempool for intermediate page allocation,
in order to avoid high refault ratio, by default, number of
preallocated page in pool is 512, user can change the number by
assigning 'num_compress_pages' parameter during module initialization.
Ma Feng found warnings in the original patch and fixed like below.
Fix the following sparse warning:
fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
Should it be static?
fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Supports to truncate compressed/normal cluster partially on compressed
inode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
f2fs_quota_sync() uses f2fs_lock_op() before flushing dirty pages, but
f2fs_write_data_page() returns EAGAIN.
Likewise dentry blocks, we can just bypass getting the lock, since quota
blocks are also maintained by checkpoint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In below error path, tpages[i] could be NULL, fix to check it before
releasing it.
- f2fs_read_multi_pages
- f2fs_alloc_dic
- f2fs_free_dic
Fixes: 61fbae2b2b12 ("f2fs: fix to avoid NULL pointer dereference")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
Just cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Add zstd compress algorithm support, use "compress_algorithm=zstd"
mountoption to enable it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add below two callback interfaces in struct f2fs_compress_ops:
int (*init_decompress_ctx)(struct decompress_io_ctx *dic);
void (*destroy_decompress_ctx)(struct decompress_io_ctx *dic);
Which will be used by zstd compress algorithm later.
In addition, this patch adds callback function pointer check, so that
specified algorithm can avoid defining unneeded functions.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
Otherwise, it will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
{cic,dic}.ref should be initialized to number of compressed pages,
let's initialize it directly rather than doing w/
f2fs_set_compressed_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both compression and fsverity feature is on, generic/572 will
report below NULL pointer dereference bug.
BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
#PF: supervisor read access in kernel mode
Workqueue: fsverity_read_queue f2fs_verity_work [f2fs]
RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
Call Trace:
process_one_work+0x16c/0x3f0
worker_thread+0x4c/0x440
? rescuer_thread+0x350/0x350
kthread+0xf8/0x130
? kthread_unpark+0x70/0x70
ret_from_fork+0x35/0x40
There are two issue in f2fs_verity_work():
- it needs to traverse and verify all pages in bio.
- if pages in bio belong to non-compressed cluster, accessing
decompress IO context stored in page private will cause NULL
pointer dereference.
Fix them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
In f2fs_decompress_end_io(), we should clear PG_error flag before page
unlock, otherwise reread will fail due to the flag as described in
commit fb7d70db305a ("f2fs: clear PageError on the read path").
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic/232 reports below deadlock:
fsstress D 0 96980 96969 0x00084000
Call Trace:
schedule+0x4a/0xb0
io_schedule+0x12/0x40
__lock_page+0x127/0x1d0
pagecache_get_page+0x1d8/0x250
prepare_compress_overwrite+0xe0/0x490 [f2fs]
f2fs_prepare_compress_overwrite+0x5d/0x80 [f2fs]
f2fs_write_begin+0x833/0xb90 [f2fs]
f2fs_quota_write+0x145/0x1e0 [f2fs]
write_blk+0x36/0x80 [quota_tree]
do_insert_tree+0x2ac/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
qtree_write_dquot+0x70/0x190 [quota_tree]
v2_write_dquot+0x43/0x90 [quota_v2]
dquot_acquire+0x77/0x100
f2fs_dquot_acquire+0x2f/0x60 [f2fs]
dqget+0x310/0x450
dquot_transfer+0xb2/0x120
f2fs_setattr+0x11a/0x4a0 [f2fs]
notify_change+0x349/0x480
chown_common+0x168/0x1c0
do_fchownat+0xbc/0xf0
__x64_sys_lchown+0x21/0x30
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
task PC stack pid father
kworker/u256:0 D 0 103444 2 0x80084000
Workqueue: writeback wb_workfn (flush-251:1)
Call Trace:
schedule+0x4a/0xb0
schedule_timeout+0x15e/0x2f0
io_schedule_timeout+0x19/0x40
congestion_wait+0x7e/0x120
f2fs_write_multi_pages+0x12a/0x840 [f2fs]
f2fs_write_cache_pages+0x48f/0x790 [f2fs]
f2fs_write_data_pages+0x2db/0x330 [f2fs]
do_writepages+0x1a/0x60
__writeback_single_inode+0x3d/0x340
writeback_sb_inodes+0x225/0x4a0
wb_writeback+0xf7/0x320
wb_workfn+0xba/0x470
process_one_work+0x16c/0x3f0
worker_thread+0x4c/0x440
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
fsstress D 0 5277 5266 0x00084000
Call Trace:
schedule+0x4a/0xb0
rwsem_down_write_slowpath+0x29d/0x540
block_operations+0x105/0x360 [f2fs]
f2fs_write_checkpoint+0x101/0x1010 [f2fs]
f2fs_sync_fs+0xa8/0x130 [f2fs]
f2fs_do_sync_file+0x1ad/0x890 [f2fs]
do_fsync+0x38/0x60
__x64_sys_fdatasync+0x13/0x20
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The root cause is there is potential deadlock between quota data
update and writeback.
Kworker Thread B Thread C
- f2fs_write_cache_pages
- lock whole cluster --- A
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- f2fs_setattr
- f2fs_lock_op --- B
- f2fs_write_checkpoint
- block_operations
- f2fs_lock_all --- B
- dquot_transfer
- f2fs_quota_write
- f2fs_prepare_compress_overwrite
- pagecache_get_page --- A
- f2fs_trylock_op failed --- B
- congestion_wait
- goto rewrite
To fix this issue, during quota file writeback, just redirty all pages
left in cluster rather holding pages' lock in cluster and looping retrying
lock cp_rwsem.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
por_fsstress reports inconsistent status in orphan inode, the root cause
of this is in f2fs_write_raw_pages() we decrease i_compr_blocks incorrectly
due to wrong calculation in f2fs_compressed_blocks().
So this patch exposes below two functions based on __f2fs_cluster_blocks:
- f2fs_compressed_blocks: get count of compressed blocks in compressed cluster
- f2fs_cluster_blocks: get count of valid blocks (including reserved blocks)
in compressed cluster.
Then use f2fs_compress_blocks() to get correct compressed blocks count in
f2fs_write_raw_pages().
sanity_check_inode: inode (ino=ad80) hash inconsistent i_compr_blocks:2, i_blocks:1, run fsck to fix
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
If we are in write IO path, we need to avoid using GFP_KERNEL.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Geert Uytterhoeven reported:
for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);
On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().
This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.
Quoted from Geert Uytterhoeven:
"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.
If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
- rename datablock_addr() to data_blkaddr().
- wrap data_blkaddr() with f2fs_data_blkaddr() to clean up
parameters.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In compress cluster, if physical block number is less than logic
page number, race condition will cause use-after-free issue as
described below:
- f2fs_write_compressed_pages
- fio.page = cic->rpages[0];
- f2fs_outplace_write_data
- f2fs_compress_write_end_io
- kfree(cic->rpages);
- kfree(cic);
- fio.page = cic->rpages[1];
f2fs_write_multi_pages+0xfd0/0x1a98
f2fs_write_data_pages+0x74c/0xb5c
do_writepages+0x64/0x108
__writeback_single_inode+0xdc/0x4b8
writeback_sb_inodes+0x4d0/0xa68
__writeback_inodes_wb+0x88/0x178
wb_writeback+0x1f0/0x424
wb_workfn+0x2f4/0x574
process_one_work+0x210/0x48c
worker_thread+0x2e8/0x44c
kthread+0x110/0x120
ret_from_fork+0x10/0x18
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|