| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This series adds two ioctls to control cached data and fragmented
files. Most of the rest fixes missing error cases and bugs that we
have not covered so far. Summary:
Enhancements:
- support an ioctl to execute online file defragmentation
- support an ioctl to flush cached data
- speed up shrinking of extent_cache entries
- handle broken superblock
- refector dirty inode management infra
- revisit f2fs_map_blocks to handle more cases
- reduce global lock coverage
- add detecting user's idle time
Major bug fixes:
- fix data race condition on cached nat entries
- fix error cases of volatile and atomic writes"
* tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits)
f2fs: should unset atomic flag after successful commit
f2fs: fix wrong memory condition check
f2fs: monitor the number of background checkpoint
f2fs: detect idle time depending on user behavior
f2fs: introduce time and interval facility
f2fs: skip releasing nodes in chindless extent tree
f2fs: use atomic type for node count in extent tree
f2fs: recognize encrypted data in f2fs_fiemap
f2fs: clean up f2fs_balance_fs
f2fs: remove redundant calls
f2fs: avoid unnecessary f2fs_balance_fs calls
f2fs: check the page status filled from disk
f2fs: introduce __get_node_page to reuse common code
f2fs: check node id earily when readaheading node page
f2fs: read isize while holding i_mutex in fiemap
Revert "f2fs: check the node block address of newly allocated nid"
f2fs: cover more area with nat_tree_lock
f2fs: introduce max_file_blocks in sbi
f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
f2fs: introduce zombie list for fast shrinking extent trees
...
|
| |
| |
| |
| |
| |
| |
| | |
If there is an error during commit, we should keep the flag in order to
abort it.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
This patch adds to show the number of background checkpoint.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
This patch adds time and interval arrays to store some timing variables.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1. rename field in struct extent_tree from count to node_cnt for
readability.
2. alter to use atomic type for node_cnt.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
This patch fixes to teach f2fs_fiemap to recognize encrypted data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
This patch removes redundant calls.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
After reading a page, we need to check whether there is any error.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
Add node id check in ra_node_page and get_node_page_ra like get_node_page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
make sure the isize we read doesn't change during the process.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Original issue is fixed by:
f2fs: cover more area with nat_tree_lock
This reverts commit 24928634f81b1592e83b37dcd89ed45c28f12feb.
|
| |
| |
| |
| |
| |
| |
| |
| | |
There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
This patch adds an entry to show the number of zombie extent_tree.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.
INFO: task umount:2723 blocked for more than 120 seconds.
Tainted: G O 4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D ffff88000859f9d8 0 2723 2110 0x00000000
ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
[<ffffffff818239f0>] ? bit_wait+0x50/0x50
[<ffffffff8182310c>] schedule+0x3c/0x90
[<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
[<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
[<ffffffff8111614d>] ? ktime_get+0x7d/0x140
[<ffffffff818239f0>] ? bit_wait+0x50/0x50
[<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
[<ffffffff8111617c>] ? ktime_get+0xac/0x140
[<ffffffff818239f0>] ? bit_wait+0x50/0x50
[<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
[<ffffffff81823a25>] bit_wait_io+0x35/0x50
[<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
[<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
[<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
[<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
[<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
[<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
[<ffffffff812639bc>] evict+0xbc/0x190
[<ffffffff81263d19>] iput+0x229/0x2c0
[<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
[<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
[<ffffffff812478f7>] kill_block_super+0x27/0x70
[<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
[<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
[<ffffffff81247f4c>] deactivate_super+0x5c/0x60
[<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
[<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
[<ffffffff810ac463>] task_work_run+0x73/0xa0
[<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
[<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
[<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are two rules to handle aborting volatile or atomic writes.
1. drop atomic writes
- we don't need to keep any stale db data.
2. write journal data
- we should keep the journal data with fsync for db recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise, we can get mismatched largest extent information.
One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
We need to use i_size_read() to get inode->i_size.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
If link is broken, its len is zero, and we don't need to move forward.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds a max block check for get_data_block_bmap.
Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
that ends at isize, it will also return the extent beyond isize,
which is outside the range.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.
This sould be avoided, so this patch reports such kind of error to user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| | |
The __f2fs_commit_super is static.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.
So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
We can check inode's inline_data flag when calling to convert it.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
It would be better to use atomic variable for total_extent_tree.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|