summaryrefslogtreecommitdiffstats
path: root/fs/ext4/extents.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"Theodore Ts'o2015-01-021-2/+2
| | | | | | | | | This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815. This was causing regression test failures with generic/285 with an ext3 filesystem using CONFIG_EXT4_USE_FOR_EXT23. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix suboptimal seek_{data,hole} extents traversialDmitry Monakhov2014-12-031-2/+2
| | | | | | | | | | | | | | | | | | | It is ridiculous practice to scan inode block by block, this technique applicable only for old indirect files. This takes significant amount of time for really large files. Let's reuse ext4_fiemap which already traverse inode-tree in most optimal meaner. TESTCASE: ftruncate64(fd, 0); ftruncate64(fd, 1ULL << 40); /* lseek will spin very long time */ lseek64(fd, 0, SEEK_DATA); lseek64(fd, 0, SEEK_HOLE); Original report: https://lkml.org/lkml/2014/10/16/620 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: ext4_inline_data_fiemap should respect callers argumentDmitry Monakhov2014-12-021-1/+2
| | | | | | | | | Currently ext4_inline_data_fiemap ignores requested arguments (start and len) which may lead endless loop if start != 0. Also fix incorrect extent length determination. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove never taken branch from ext4_ext_shift_path_extents()Jan Kara2014-11-251-2/+0
| | | | | | | | | | path[depth].p_hdr can never be NULL for a path passed to us (and even if it could, EXT_LAST_EXTENT() would make something != NULL from it). So just remove the branch. Coverity-id: 1196498 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: move handling of list of shrinkable inodes into extent status codeJan Kara2014-11-251-2/+0
| | | | | | | | | | | | | Currently callers adding extents to extent status tree were responsible for adding the inode to the list of inodes with freeable extents. This is error prone and puts list handling in unnecessarily many places. Just add inode to the list automatically when the first non-delay extent is added to the tree and remove inode from the list when the last non-delay extent is removed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: change LRU to round-robin in extent status tree shrinkerZheng Liu2014-11-251-2/+2
| | | | | | | | | | | | | | | | In this commit we discard the lru algorithm for inodes with extent status tree because it takes significant effort to maintain a lru list in extent status tree shrinker and the shrinker can take a long time to scan this lru list in order to reclaim some objects. We replace the lru ordering with a simple round-robin. After that we never need to keep a lru list. That means that the list needn't be sorted if the shrinker can not reclaim any objects in the first round. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: cache extent hole in extent status tree for ext4_da_map_blocks()Zheng Liu2014-11-251-15/+16
| | | | | | | | | | | | | | | | | | Currently extent status tree doesn't cache extent hole when a write looks up in extent tree to make sure whether a block has been allocated or not. In this case, we don't put extent hole in extent cache because later this extent might be removed and a new delayed extent might be added back. But it will cause a defect when we do a lot of writes. If we don't put extent hole in extent cache, the following writes also need to access extent tree to look at whether or not a block has been allocated. It brings a cache miss. This commit fixes this defect. Also if the inode doesn't have any extent, this extent hole will be cached as well. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix block reservation for bigalloc filesystemsJan Kara2014-11-251-8/+4
| | | | | | | | | | | | | | | | | | For bigalloc filesystems we have to check whether newly requested inode block isn't already part of a cluster for which we already have delayed allocation reservation. This check happens in ext4_ext_map_blocks() and that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if ext4_da_map_blocks() finds in extent cache information about the block, we don't call into ext4_ext_map_blocks() and thus we always end up getting new reservation even if the space for cluster is already reserved. This results in overreservation and premature ENOSPC reports. Fix the problem by checking for existing cluster reservation already in ext4_da_map_blocks(). That simplifies the logic and actually allows us to get rid of the EXT4_MAP_FROM_CLUSTER flag completely. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix end of region partial cluster handlingEric Whitney2014-11-231-7/+9
| | | | | | | | | | | | | | ext4_ext_remove_space() can incorrectly free a partial_cluster if EAGAIN is encountered while truncating or punching. Extent removal should be retried in this case. It also fails to free a partial cluster when the punched region begins at the start of a file on that unaligned cluster and where the entire file has not been punched. Remove the requirement that all blocks in the file must have been freed in order to free the partial cluster. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: miscellaneous partial cluster cleanupsEric Whitney2014-11-231-18/+21
| | | | | | | | | | Add some casts and rearrange a few statements for improved readability. Some code can also be simplified and made more readable if we set partial_cluster to 0 rather than to a negative value when we can tell we've hit the left edge of the punched region. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix end of leaf partial cluster handlingEric Whitney2014-11-231-19/+17
| | | | | | | | | | | | | | | | | | | | | The fix in commit ad6599ab3ac9 ("ext4: fix premature freeing of partial clusters split across leaf blocks"), intended to avoid dereferencing an invalid extent pointer when determining whether a partial cluster should be freed, wasn't quite good enough. Assure that at least one extent remains at the start of the leaf once the hole has been punched. Otherwise, the pointer to the extent to the right of the hole will be invalid and a partial cluster will be incorrectly freed. Set partial_cluster to 0 when we can tell we've hit the left edge of the punched region within the leaf. This prevents incorrect freeing of a partial cluster when ext4_ext_rm_leaf is called one last time during extent tree traversal after the punched region has been removed. Adjust comments to reflect code changes and a correction. Remove a bit of dead code. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix partial cluster initializationEric Whitney2014-11-231-34/+46
| | | | | | | | | | | | | | | | | | | | | | | | The partial_cluster variable is not always initialized correctly when hole punching on bigalloc file systems. Although commit c06344939422 ("ext4: fix partial cluster handling for bigalloc file systems") addressed the case where the right edge of the punched region and the next extent to its right were within the same leaf, it didn't handle the case where the next extent to its right is in the next leaf. This causes xfstest generic/300 to fail. Fix this by replacing the code in c0634493922 with a more general solution that can continue the search for the first cluster to the right of the punched region into the next leaf if present. If found, partial_cluster is initialized to this cluster's negative value. There's no need to determine if that cluster is actually shared; we simply record it so its blocks won't be freed in the event it does happen to be shared. Also, minimize the burden on non-bigalloc file systems with some minor code simplification. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: make ext4_ext_convert_to_initialized() return proper number of blocksJan Kara2014-10-301-5/+4
| | | | | | | | | | | | | ext4_ext_convert_to_initialized() can return more blocks than are actually allocated from map->m_lblk in case where initial part of the on-disk extent is zeroed out. Luckily this doesn't have serious consequences because the caller currently uses the return value only to unmap metadata buffers. Anyway this is a data corruption/exposure problem waiting to happen so fix it. Coverity-id: 1226848 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: Replace open coded mdata csum feature to helper functionDmitry Monakhov2014-10-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: optimize block allocation on grow indepthDmitry Monakhov2014-10-021-6/+14
| | | | | | | | | It is reasonable to prepend newly created index to older one. [ Dropped no longer used function parameter newext. -tytso ] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVEDTheodore Ts'o2014-09-051-1/+5
| | | | | | | | | | | | | | | | | | | The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented because it was too hard to make sure the mballoc and get_block flags could be reliably passed down through all of the codepaths that end up calling ext4_mb_new_blocks(). Since then, we have mb_flags passed down through most of the code paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky as it used to. This commit plumbs in the last of what is required, and then adds a WARN_ON check to make sure we haven't missed anything. If this passes a full regression test run, we can then drop EXT4_STATE_DELALLOC_RESERVED. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: rename ext4_ext_find_extent() to ext4_find_extent()Theodore Ts'o2014-09-011-19/+19
| | | | | | Make the function name less redundant. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: reuse path object in ext4_ext_shift_extents()Theodore Ts'o2014-09-011-17/+8
| | | | | | | | Now that the semantics of ext4_ext_find_extent() are much cleaner, it's safe and more efficient to reuse the path object across the multiple calls to ext4_ext_find_extent() in ext4_ext_shift_extents(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: teach ext4_ext_find_extent() to realloc path if necessaryTheodore Ts'o2014-09-011-10/+10
| | | | | | | This adds additional safety in case for some reason we end reusing a path structure which isn't big enough for current depth of the inode. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: allow a NULL argument to ext4_ext_drop_refs()Theodore Ts'o2014-09-011-30/+18
| | | | | | | | Teach ext4_ext_drop_refs() to accept a NULL argument, much like kfree(). This allows us to drop a lot of checks to make sure path is non-NULL before calling ext4_ext_drop_refs(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: call ext4_ext_drop_refs() from ext4_ext_find_extent()Theodore Ts'o2014-09-011-7/+4
| | | | | | | | | | | In nearly all of the calls to ext4_ext_find_extent() where the caller is trying to recycle the path object, ext4_ext_drop_refs() gets called to release the buffer heads before the path object gets overwritten. To simplify things for the callers, and to avoid the possibility of a memory leak, make ext4_ext_find_extent() responsible for dropping the buffers. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling codeTheodore Ts'o2014-09-011-55/+57
| | | | | | | | | | | | | | | | | Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(), ext4_split_extent(), ext4_convert_unwritten_extents_endio(). This requires fixing all of their callers to potentially ext4_ext_find_extent() to free the struct ext4_ext_path object in case of an error, and there are interlocking dependencies all the way up to ext4_ext_map_blocks(), ext4_swap_extents(), and ext4_ext_remove_space(). Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it is no longer necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: drop EXT4_EX_NOFREE_ON_ERR in convert_initialized_extent()Theodore Ts'o2014-09-011-5/+5
| | | | | | | Transfer responsibility of freeing struct ext4_ext_path on error to ext4_ext_find_extent(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: collapse ext4_convert_initialized_extents()Theodore Ts'o2014-09-011-77/+59
| | | | | | | | | | | | The function ext4_convert_initialized_extents() is only called by a single function --- ext4_ext_convert_initalized_extents(). Inline the code and get rid of the unnecessary bits in order to simplify the code. Rename ext4_ext_convert_initalized_extents() to convert_initalized_extents() since it's a static function that is actually only used in a single caller, ext4_ext_map_blocks(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: teach ext4_ext_find_extent() to free path on errorTheodore Ts'o2014-09-011-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, there are a places where it is all to easy to leak memory on an error path, via a usage like this: struct ext4_ext_path *path = NULL while (...) { ... path = ext4_ext_find_extent(inode, block, path, 0); if (IS_ERR(path)) { /* oops, if path was non-NULL before the call to ext4_ext_find_extent, we've leaked it! :-( */ ... return PTR_ERR(path); } ... } Unfortunately, there some code paths where we are doing the following instead: path = ext4_ext_find_extent(inode, block, orig_path, 0); and where it's important that we _not_ free orig_path in the case where ext4_ext_find_extent() returns an error. So change the function signature of ext4_ext_find_extent() so that it takes a struct ext4_ext_path ** for its third argument, and by default, on an error, it will free the struct ext4_ext_path, and then zero out the struct ext4_ext_path * pointer. In order to avoid causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes ext4_ext_find_extent() to use the original behavior of forcing the caller to deal with freeing the original path pointer on the error case. The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this allows for a gentle transition and makes the patches easier to verify. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix ZERO_RANGE bug hidden by flag aliasingTheodore Ts'o2014-09-011-7/+14
| | | | | | | | | | | | | | | | | | | | We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN falgs, which apparently was hiding a bug that was unmasked when this flag aliasing issue was addressed (see the subsequent commit). The reproduction case was: fsx -N 10000 -l 500000 -r 4096 -t 4096 -w 4096 -Z -R -W /vdb/junk ... which would cause fsx to report corruption in the data file. The fix we have is a bit of an overkill, but I'd much rather be conservative for now, and we can optimize ZERO_RANGE_FL handling later. The fact that we need to zap the extent_status cache for the inode is unfortunate, but correctness is far more important than performance. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com>
* ext4: fix ext4_swap_extents() error handlingTheodore Ts'o2014-08-311-33/+29
| | | | | | | | | | | If ext4_ext_find_extent() returns an error, we have to clear path1 or path2 or else we would end up trying to free an ERR_PTR, which would be bad. Also eliminate some redundant code and mark the error paths as unlikely() Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: refactor ext4_move_extents code baseDmitry Monakhov2014-08-311-13/+221
| | | | | | | | | | | | | | ext4_move_extents is too complex for review. It has duplicate almost each function available in the rest of other codebase. It has useless artificial restriction orig_offset == donor_offset. But in fact logic of ext4_move_extents is very simple: Iterate extents one by one (similar to ext4_fill_fiemap_extents) ->Iterate each page covered extent (similar to generic_perform_write) ->swap extents for covered by page (can be shared with IOC_MOVE_DATA) Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use ext4_ext_next_allocated_block instead of mext_next_extentDmitry Monakhov2014-08-311-9/+7
| | | | | | | | This allows us to make mext_next_extent static and potentially get rid of it. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix transaction issues for ext4_fallocate and ext_zero_rangeDmitry Monakhov2014-08-281-33/+35
| | | | | | | | | | | | | | | After commit f282ac19d86f we use different transactions for preallocation and i_disksize update which result in complain from fsck after power-failure. spotted by generic/019. IMHO this is regression because fs becomes inconsistent, even more 'e2fsck -p' will no longer works (which drives admins go crazy) Same transaction requirement applies ctime,mtime updates testcase: xfstest generic/019 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: fix incorect journal credits reservation in ext4_zero_rangeDmitry Monakhov2014-08-281-2/+9
| | | | | | | | | | Currently we reserve only 4 blocks but in worst case scenario ext4_zero_partial_blocks() may want to zeroout and convert two non adjacent blocks. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: move i_size,i_disksize update routines to helper functionDmitry Monakhov2014-08-231-13/+4
| | | | | | Cc: stable@vger.kernel.org # needed for bug fix patches Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix COLLAPSE RANGE test for bigalloc file systemsNamjae Jeon2014-07-291-5/+2
| | | | | | | | | | | | | Blocks in collapse range should be collapsed per cluster unit when bigalloc is enable. If bigalloc is not enable, EXT4_CLUSTER_SIZE will be same with EXT4_BLOCK_SIZE. With this bug fixed, patch enables COLLAPSE_RANGE for bigalloc, which fixes a large number of xfstest failures which use fsx. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use correct depth valueDmitry Monakhov2014-07-281-1/+1
| | | | | | | | | | Inode's depth can be changed from here: ext4_ext_try_to_merge() ->ext4_ext_try_to_merge_up() We must use correct value. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: add i_data_sem sanity checkDmitry Monakhov2014-07-281-0/+2
| | | | | | | | | Each caller of ext4_ext_dirty must hold i_data_sem, The only exception is migration code, let's make it convenient. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: remove metadata reservation checksTheodore Ts'o2014-07-151-2/+1
| | | | | | | | | Commit 27dd43854227b ("ext4: introduce reserved space") reserves 2% of the file system space to make sure metadata allocations will always succeed. Given that, tracking the reservation of metadata blocks is no longer necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix ZERO_RANGE test failure in data journallingNamjae Jeon2014-05-271-0/+7
| | | | | | | | | | | | xfstests generic/091 is failing when mounting ext4 with data=journal. I think that this regression is same problem that occurred prior to collapse range issue. So ZERO RANGE also need to call ext4_force_commit as collapse range. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add missing BUFFER_TRACE before ext4_journal_get_write_accessliang xie2014-05-131-0/+1
| | | | | | | Make them more consistently Signed-off-by: xieliang <xieliang@xiaomi.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove unnecessary double parenthesesLukas Czerner2014-05-121-3/+3
| | | | | Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: rename uninitialized extents to unwrittenLukas Czerner2014-04-211-109/+109
| | | | | | | | | | | | | | | | | | | | | | | Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: get rid of EXT4_MAP_UNINIT flagLukas Czerner2014-04-211-4/+0
| | | | | | | | | | | | | | | | Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the cases where we're using dioread_nolock and we're writing into either unallocated, or unwritten extent, because we need to make sure that any DIO write into that inode will wait for the extent conversion. However EXT4_MAP_UNINIT is not only entirely misleading name but also unnecessary because we can check for EXT4_MAP_UNWRITTEN in the dioread_nolock case instead. This commit removes EXT4_MAP_UNINIT flag. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: disable COLLAPSE_RANGE for bigallocNamjae Jeon2014-04-191-0/+3
| | | | | | | | | | | Once COLLAPSE RANGE is be disable for ext4 with bigalloc feature till finding root-cause of problem. It will be enable with fixing that regression of xfstest(generic 075 and 091) again. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix COLLAPSE_RANGE failure with 1KB block sizeNamjae Jeon2014-04-191-3/+10
| | | | | | | | | | | | | | | | When formatting with 1KB or 2KB(not aligned with PAGE SIZE) block size, xfstests generic/075 and 091 are failing. The offset supplied to function truncate_pagecache_range is block size aligned. In this function start offset is re-aligned to PAGE_SIZE by rounding_up to the next page boundary. Due to this rounding up, old data remains in the page cache when blocksize is less than page size and start offset is not aligned with page size. In case of collapse range, we need to align start offset to page size boundary by doing a round down operation instead of round up. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: use EINVAL if not a regular file in ext4_collapse_range()Theodore Ts'o2014-04-181-1/+1
| | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: enforce we are operating on a regular file in ext4_zero_range()jon ernst2014-04-181-0/+3
| | | | | Signed-off-by: Jon Ernst <jonernst07@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix extent merging in ext4_ext_shift_path_extents()Lukas Czerner2014-04-181-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a bug in ext4_ext_shift_path_extents() where if we actually manage to merge a extent we would skip shifting the next extent. This will result in in one extent in the extent tree not being properly shifted. This is causing failure in various xfstests tests using fsx or fsstress with collapse range support. It will also cause file system corruption which looks something like: e2fsck 1.42.9 (4-Feb-2014) Pass 1: Checking inodes, blocks, and sizes Inode 20 has out of order extents (invalid logical block 3, physical block 492938, len 2) Clear? yes ... when running e2fsck. It's also very easily reproducible just by running fsx without any parameters. I can usually hit the problem within a minute. Fix it by increasing ex_start only if we're not merging the extent. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
* ext4: discard preallocations after removing spaceLukas Czerner2014-04-181-1/+1
| | | | | | | | | | | | | | | | | | Currently in ext4_collapse_range() and ext4_punch_hole() we're discarding preallocation twice. Once before we attempt to do any changes and second time after we're done with the changes. While the second call to ext4_discard_preallocations() in ext4_punch_hole() case is not needed, we need to discard preallocation right after ext4_ext_remove_space() in collapse range case because in the case we had to restart a transaction in the middle of removing space we might have new preallocations created. Remove unneeded ext4_discard_preallocations() ext4_punch_hole() and move it to the better place in ext4_collapse_range() Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: no need to truncate pagecache twice in collapse rangeLukas Czerner2014-04-181-1/+1
| | | | | | | | | | | | We're already calling truncate_pagecache() before we attempt to do any actual job so there is not need to truncate pagecache once more using truncate_setsize() after we're finished. Remove truncate_setsize() and replace it just with i_size_write() note that we're holding appropriate locks. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix removing status extents in ext4_collapse_range()Lukas Czerner2014-04-181-1/+1
| | | | | | | | | | | | | | Currently in ext4_collapse_range() when calling ext4_es_remove_extent() to remove status extents we're passing (EXT_MAX_BLOCKS - punch_start - 1) in order to remove all extents from start of the collapse range to the end of the file. However this is wrong because we might miss the possible extent covering the last block of the file. Fix it by removing the -1. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
* ext4: use filemap_write_and_wait_range() correctly in collapse rangeLukas Czerner2014-04-181-1/+1
| | | | | | | | | | | | Currently we're passing -1 as lend argumnet for filemap_write_and_wait_range() which is wrong since lend is signed type so it would cause some confusion and we might not write_and_wait for the entire range we're expecting to write. Fix it by using LLONG_MAX instead. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>