summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: use ->t_dfops for all xfs_bmapi_write() callersBrian Foster2018-07-125-20/+26
| | | | | | | | | | | | | | Attach ->t_dfops for all remaining callers of xfs_bmapi_write(). This prepares the latter to no longer require a separate dfops parameter. Note that xfs_symlink() already uses ->t_dfops. Fix up the local references for consistency. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use ->t_dfops in dqalloc transactionBrian Foster2018-07-121-14/+20
| | | | | | | | | | | | | | | | xfs_dquot_disk_alloc() receives a transaction from the caller and passes a local dfops along to xfs_bmapi_write(). If we attach this dfops to the transaction, we have to make sure to clear it before returning to avoid invalid access of stack memory. Since xfs_qm_dqread_alloc() is the only caller, pull dfops into the caller and attach it to the transaction to eliminate this pattern entirely. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: replace xfs_da_args->dfops accesses with ->t_dfops and removeBrian Foster2018-07-127-119/+117
| | | | | | | | | | | | Now that xfs_da_args->dfops is always assigned from a ->t_dfops pointer (or one that is immediately attached), replace all downstream accesses of the former with the latter and remove the field from struct xfs_da_args. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use ->t_dfops in extent split tx and remove paramBrian Foster2018-07-121-7/+6
| | | | | | | | | | | Attach the local dfops to ->t_dfops of the extent split transaction. Since this is the only caller of xfs_bmap_split_extent_at(), remove the dfops parameter as well. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove dfops param in attr fork add pathBrian Foster2018-07-121-11/+8
| | | | | | | | | | | | Now that the attribute fork add tx carries dfops along with the transaction, it is unnecessary to pass it down the stack. Remove the dfops parameter and access ->t_dfops directly where necessary. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use ->t_dfops for attr set/remove operationsBrian Foster2018-07-122-5/+9
| | | | | | | | | | | | Attach the local dfops to the transaction allocated for xattr add and remove operations. Add an earlier initialization in xfs_attr_remove() to ensure the structure is valid if it remains unused at transaction commit time. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use ->t_dfops for recovery of [b|c]ui log itemsBrian Foster2018-07-122-0/+16
| | | | | | | | | | | | | | | | | | | | Log recovery passes down a central dfops structure to recovery handlers for bui and cui log items. Each of these handlers allocates and commits a transaction and defers any remaining operations to be completed by the main recovery sequence. Since dfops outlives the transaction in this context, set and clear ->t_dfops appropriately such that the *_finish_item() paths and below (i.e., xfs_bmapi*()) can expect to find the dfops in the transaction without it being committed with the dfops attached. This is required because transaction commit expects that an associated dfops is finished and in this context the dfops may be populated at commit time. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove dfops param from high level dirname callsBrian Foster2018-07-124-42/+36
| | | | | | | | | | | | All callers of the directory create, rename and remove interfaces already associate the dfops with the transaction. Drop the dfops parameters in these calls in preparation for further cleanups in the layers below. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove dfops parameter from ifree call stackBrian Foster2018-07-124-12/+7
| | | | | | | | | | | | | | The inode free callchain starting in xfs_inactive_ifree() already associates its dfops with the transaction. It still passes the dfops on the stack down through xfs_difree_inobt(), however. Clean up the call stack and reference dfops directly from the transaction. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rename xfs_trans ->t_agfl_dfops to ->t_dfopsBrian Foster2018-07-126-16/+16
| | | | | | | | | | | | | | | | | | The ->t_agfl_dfops field is currently used to defer agfl block frees from associated transaction contexts. While all known problematic contexts have already been updated to use ->t_agfl_dfops, the broader goal is defer agfl frees from all callers that already use a deferred operations structure. Further, the transaction field facilitates a good amount of code clean up where the transaction and dfops have historically been passed down through the stack separately. Rename the field to something more generic to prepare to use it as such throughout XFS. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: cow unwritten conversion uses uninitialized dfopsBrian Foster2018-07-121-7/+4
| | | | | | | | | | | | | | | A couple COW fork unwritten extent conversion helpers pass an uninitialized dfops pointer to xfs_bmapi_write(). This does not cause problems because conversion does not use a transaction or the dfops structure for the COW fork. Drop the uninitialized usage of dfops in these codepaths and pass NULL along to xfs_bmapi_write() instead. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: update my copyrights for the writeback and iomap codeChristoph Hellwig2018-07-122-1/+2
| | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: add support for sub-pagesize writeback without buffer_headsChristoph Hellwig2018-07-125-455/+61
| | | | | | | | | | | Switch to using the iomap_page structure for checking sub-page uptodate status and track sub-page I/O completion status, and remove large quantities of boilerplate code working around buffer heads. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* iomap: add support for sub-pagesize buffered I/O without buffer headsChristoph Hellwig2018-07-121-21/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After already supporting a simple implementation of buffered writes for the blocksize == PAGE_SIZE case in the last commit this adds full support even for smaller block sizes. There are three bits of per-block information in the buffer_head structure that really matter for the iomap read and write path: - uptodate status (BH_uptodate) - marked as currently under read I/O (BH_Async_Read) - marked as currently under write I/O (BH_Async_Write) Instead of having new per-block structures this now adds a per-page structure called struct iomap_page to track this information in a slightly different form: - a bitmap for the per-block uptodate status. For worst case of a 64k page size system this bitmap needs to contain 128 bits. For the typical 4k page size case it only needs 8 bits, although we still need a full unsigned long due to the way the atomic bitmap API works. - two atomic_t counters are used to track the outstanding read and write counts There is quite a bit of boilerplate code as the buffered I/O path uses various helper methods, but the actual code is very straight forward. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: allow writeback on pages without buffer headsChristoph Hellwig2018-07-122-15/+39
| | | | | | | | | | | | Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size equal to the page size, and deal with pages without buffer heads in writeback. Thanks to the previous refactoring this is basically trivial now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: refactor the tail of xfs_writepage_mapChristoph Hellwig2018-07-121-33/+32
| | | | | | | | | | | Rejuggle how we deal with the different error vs non-error and have ioends vs not have ioend cases to keep the fast path streamlined, and the duplicate code at a minimum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_start_page_writebackChristoph Hellwig2018-07-121-26/+20
| | | | | | | | | | This helper only has two callers, one of them with a constant error argument. Remove it to make pending changes to the code a little easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: move all writeback buffer_head manipulation into xfs_map_at_offsetChristoph Hellwig2018-07-121-17/+5
| | | | | | | | | This keeps it in a single place so it can be made otional more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: don't look at buffer heads in xfs_add_to_ioendChristoph Hellwig2018-07-121-36/+32
| | | | | | | | | | Calculate all information for the bio based on the passed in information without requiring a buffer_head structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove the imap_valid flagChristoph Hellwig2018-07-121-51/+38
| | | | | | | | | | | | | | | Simplify the way we check for a valid imap - we know we have a valid mapping after xfs_map_blocks returned successfully, and we know we can call xfs_imap_valid on any imap, as it will always fail on a zero-initialized map. We can also remove the xfs_imap_valid function and fold it into xfs_map_blocks now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directlyChristoph Hellwig2018-07-121-14/+5
| | | | | | | | | | | | | xfs_bmapi_read adds zero value in xfs_map_blocks. Replace it with a direct call to the low-level extent lookup function. Note that we now always pass a 0 length to the trace points as we ask for an unspecified len. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_reflink_find_cow_mappingChristoph Hellwig2018-07-124-39/+13
| | | | | | | | | | | We only have one caller left, and open coding the simple extent list lookup in it allows us to make the code both more understandable and reuse calculations and variables already present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove the now unused XFS_BMAPI_IGSTATE flagChristoph Hellwig2018-07-122-7/+2
| | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: make xfs_writepage_map extent map centricDave Chinner2018-07-121-52/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_writepage_map() iterates over the bufferheads on a page to decide what sort of IO to do and what actions to take. However, when it comes to reflink and deciding when it needs to execute a COW operation, we no longer look at the bufferhead state but instead we ignore than and look up internal state held in the COW fork extent list. This means xfs_writepage_map() is somewhat confused. It does stuff, then ignores it, then tries to handle the impedence mismatch by shovelling the results inside the existing mapping code. It works, but it's a bit of a mess and it makes it hard to fix the cached map bug that the writepage code currently has. To unify the two different mechanisms, we first have to choose a direction. That's already been set - we're de-emphasising bufferheads so they are no longer a control structure as we need to do taht to allow for eventual removal. Hence we need to move away from looking at bufferhead state to determine what operations we need to perform. We can't completely get rid of bufferheads yet - they do contain some state that is absolutely necessary, such as whether that part of the page contains valid data or not (buffer_uptodate()). Other state in the bufferhead is redundant: BH_dirty - the page is dirty, so we can ignore this and just write it BH_delay - we have delalloc extent info in the DATA fork extent tree BH_unwritten - same as BH_delay BH_mapped - indicates we've already used it once for IO and it is mapped to a disk address. Needs to be ignored for COW blocks. The BH_mapped flag is an interesting case - it's supposed to indicate that it's already mapped to disk and so we can just use it "as is". In theory, we don't even have to do an extent lookup to find where to write it too, but we have to do that anyway to determine we are actually writing over a valid extent. Hence it's not even serving the purpose of avoiding a an extent lookup during writeback, and so we can pretty much ignore it. Especially as we have to ignore it for COW operations... Therefore, use the extent map as the source of information to tell us what actions we need to take and what sort of IO we should perform. The first step is to have xfs_map_blocks() set the io type according to what it looks up. This means it can easily handle both normal overwrite and COW cases. The only thing we also need to add is the ability to return hole mappings. We need to return and cache hole mappings now for the case of multiple blocks per page. We no longer use the BH_mapped to indicate a block over a hole, so we have to get that info from xfs_map_blocks(). We cache it so that holes that span two pages don't need separate lookups. This allows us to avoid ever doing write IO over a hole, too. Now that we have xfs_map_blocks() returning both a cached map and the type of IO we need to perform, we can rewrite xfs_writepage_map() to drop all the bufferhead control. It's also much simplified because it doesn't need to explicitly handle COW operations. Instead of iterating bufferheads, it iterates blocks within the page and then looks up what per-block state is required from the appropriate bufferhead. It then validates the cached map, and if it's not valid, we get a new map. If we don't get a valid map or it's over a hole, we skip the block. At this point, we have to remap the bufferhead via xfs_map_at_offset(). As previously noted, we had to do this even if the buffer was already mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN and XFS_IO_COW IO types. With xfs_map_blocks() now controlling the type, even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet- written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE. Bufferheads that span such regions still need their BH_Delay flags cleared and their block numbers calculated, so we now unconditionally map each bufferhead before submission. But wait! There's more - remember the old "treat unwritten extents as holes on read" hack? Yeah, that means we can have a dirty page with unmapped, unwritten bufferheads that contain data! What makes these so special is that the unwritten "hole" bufferheads do not have a valid block device pointer, so if we attempt to write them xfs_add_to_ioend() blows up. So we make xfs_map_at_offset() do the "realtime or data device" lookup from the inode and ignore what was or wasn't put into the bufferhead when the buffer was instantiated. The astute reader will have realised by now that this code treats unwritten extents in multiple-blocks-per-page situations differently. If we get any combination of unwritten blocks on a dirty page that contain valid data in the page, we're going to convert them to real extents. This can actually be a win, because it means that pages with interleaving unwritten and written blocks will get converted to a single written extent with zeros replacing the interspersed unwritten blocks. This is actually good for reducing extent list and conversion overhead, and it means we issue a contiguous IO instead of lots of little ones. The downside is that we use up a little extra IO bandwidth. Neither of these seem like a bad thing given that spinning disks are seek sensitive, and SSDs/pmem have bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger IOs will result in better performance on them... As a result of all this, the only state we actually care about from the bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to pass some information to the bio via xfs_add_to_ioend(), but that is trivial to separate and pass explicitly. This means we really only need 1 bit of state per block per page from the buffered write path in the writeback path. Everything else we do with the bufferhead is purely to make the buffered IO front end continue to work correctly. i.e we've pretty much marginalised bufferheads in the writeback path completely. Signed-off-By: Dave Chinner <dchinner@redhat.com> [hch: forward port, refactor and split off bits into other commits] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rename the offset variable in xfs_writepage_mapChristoph Hellwig2018-07-121-10/+10
| | | | | | | | | | | Calling it file_offset makes the usage more clear, especially with a new poffset variable that will be added soon for the offset inside the page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_map_cowChristoph Hellwig2018-07-122-99/+100
| | | | | | | | | | | | | | | | | | | | We can handle the existing cow mapping case as a special case directly in xfs_writepage_map, and share code for allocating delalloc blocks with regular I/O in xfs_map_blocks. This means we need to always call xfs_map_blocks for reflink inodes, but we can still skip most of the work if it turns out that there is no COW mapping overlapping the current block. As a subtle detail we need to start caching holes in the wpc to deal with the case of COW reservations between EOF. But we'll need that infrastructure later anyway, so this is no big deal. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_reflink_trim_irec_to_next_cowChristoph Hellwig2018-07-124-43/+0
| | | | | | | | | | | We already have to check for overlapping COW extents everytime we come back to a page in xfs_writepage_map / xfs_map_cow, so this additional trim is not required. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocksChristoph Hellwig2018-07-121-4/+1
| | | | | | | | | | | | | | We want to be able to use the extent state as a reliably indicator for the type of I/O, and stop using the buffer head state. For this we need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged extents of different types. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: don't clear imap_valid for a non-uptodate buffersChristoph Hellwig2018-07-121-7/+2
| | | | | | | | | | | | | Finding a buffer that isn't uptodate doesn't invalidate the mapping for any given block. The last_sector check will already take care of starting another ioend as soon as we find any non-update buffer, and if the current mapping doesn't include the next uptodate buffer the xfs_imap_valid check will take care of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: do not set the page uptodate in xfs_writepage_mapChristoph Hellwig2018-07-121-6/+0
| | | | | | | | | | | | | | | | | | We already track the page uptodate status based on the buffer uptodate status, which is updated whenever reading or zeroing blocks. This code has been there since commit a ptool commit in 2002, which claims to: "merge" the 2.4 fsx fix for block size < page size to 2.5. This needed major changes to actually fit. and isn't present in other writepage implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: move locking into xfs_bmap_punch_delalloc_rangeChristoph Hellwig2018-07-123-9/+5
| | | | | | | | | Both callers want the same looking, so do it only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: simplify xfs_aops_discard_pageChristoph Hellwig2018-07-121-76/+9
| | | | | | | | | | | | | Instead of looking at the buffer heads to see if a block is delalloc just call xfs_bmap_punch_delalloc_range on the whole page - this will leave any non-delalloc block intact and handle the iteration for us. As a side effect one more place stops caring about buffer heads and we can remove the xfs_check_page_type function entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use iomap for blocksize == PAGE_SIZE readpage and readpagesChristoph Hellwig2018-07-121-0/+4
| | | | | | | | | | | | For file systems with a block size that equals the page size we never do partial reads, so we can use the buffer_head-less iomap versions of readpage and readpages without conflicting with the buffer_head structures create later in write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* Merge branch 'iomap-4.19-merge' into xfs-4.19-mergeDarrick J. Wong2018-07-124-96/+520
|\
| * iomap: add inline data support to iomap_readpage_actorAndreas Gruenbacher2018-07-031-0/+6
| | | | | | | | | | | | | | | | | | Just copy the inline data into the page using the existing helper. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: support direct I/O to inline dataAndreas Gruenbacher2018-07-031-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | Add support for reading from and writing to inline data to iomap_dio_rw. This saves filesystems from having to implement fallback code for this case. The inline data is actually cached in the inode, so the I/O is only direct in the sense that it doesn't go through the page cache. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: refactor iomap_dio_actorChristoph Hellwig2018-07-031-36/+52
| | | | | | | | | | | | | | | | | | | | | | Split the function up into two helpers for the bio based I/O and hole case, and a small helper to call the two. This separates the code a little better in preparation for supporting I/O to inline data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: add initial support for writes without buffer headsChristoph Hellwig2018-06-202-9/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now just limited to blocksize == PAGE_SIZE, where we can simply read in the full page in write begin, and just set the whole page dirty after copying data into it. This code is enabled by default and XFS will now be feed pages without buffer heads in ->writepage and ->writepages. If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old path will still be used, this both helps the transition in XFS and prepares for the gfs2 migration to the iomap infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: add an iomap-based readpage and readpages implementationChristoph Hellwig2018-06-201-1/+213
| | | | | | | | | | | | | | | | | | | | | | Simply use iomap_apply to iterate over the file and a submit a bio for each non-uptodate but mapped region and zero everything else. Note that as-is this can not be used for file systems with a blocksize smaller than the page size, but that support will be added later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: add a page_done callbackChristoph Hellwig2018-06-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This will be used by gfs2 to attach data to transactions for the journaled data mode. But the concept is generic enough that we might be able to use it for other purposes like encryption/integrity post-processing in the future. Based on a patch from Andreas Gruenbacher. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: generic inline data handlingAndreas Gruenbacher2018-06-201-7/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add generic inline data handling by adding a pointer to the inline data region to struct iomap. When handling a buffered IOMAP_INLINE write, iomap_write_begin will copy the current inline data from the inline data region into the page cache, and iomap_write_end will copy the changes in the page cache back to the inline data region. This doesn't cover inline data reads and direct I/O yet because so far, we have no users. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> [hch: small cleanups to better fit in with other iomap work] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: complete partial direct I/O writes synchronouslyAndreas Gruenbacher2018-06-201-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to xfstest generic/240, applications seem to expect direct I/O writes to either complete as a whole or to fail; short direct I/O writes are apparently not appreciated. This means that when only part of an asynchronous direct I/O write succeeds, we can either fail the entire write, or we can wait for the partial write to complete and retry the remaining write as buffered I/O. The old __blockdev_direct_IO helper has code for waiting for partial writes to complete; the new iomap_dio_rw iomap helper does not. The above mentioned fallback mode is needed for gfs2, which doesn't allow block allocations under direct I/O to avoid taking cluster-wide exclusive locks. As a consequence, an asynchronous direct I/O write to a file range that contains a hole will result in a short write. In that case, wait for the short write to complete to allow gfs2 to recover. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * iomap: mark newly allocated buffer heads as newAndreas Gruenbacher2018-06-201-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW flag set. This will be used by filesystems like gfs2, which allocate blocks in iomap->begin. Minor corrections to the comment for IOMAP_UNWRITTEN maps. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * fs: factor out a __generic_write_end helperChristoph Hellwig2018-06-202-32/+37
| | | | | | | | | | | | | | | | | | | | | | Bits of the buffer.c based write_end implementations that don't know about buffer_heads and can be reused by other implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2018-07-0811-96/+155
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Bug fixes for ext4; most of which relate to vulnerabilities where a maliciously crafted file system image can result in a kernel OOPS or hang. At least one fix addresses an inline data bug could be triggered by userspace without the need of a crafted file system (although it does require that the inline data feature be enabled)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: check superblock mapped prior to committing ext4: add more mount time checks of the superblock ext4: add more inode number paranoia checks ext4: avoid running out of journal credits when appending to an inline file jbd2: don't mark block as modified if the handle is out of credits ext4: never move the system.data xattr out of the inode body ext4: clear i_data in ext4_inode_info when removing inline data ext4: include the illegal physical block in the bad map ext4_error msg ext4: verify the depth of extent tree in ext4_find_extent() ext4: only look at the bg_flags field if it is valid ext4: make sure bitmaps and the inode table don't overlap with bg descriptors ext4: always check block group bounds in ext4_init_block_bitmap() ext4: always verify the magic number in xattr blocks ext4: add corruption check in ext4_xattr_set_entry() ext4: add warn_on_error mount option
| * | ext4: check superblock mapped prior to committingJon Derrick2018-07-031-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch attempts to close a hole leading to a BUG seen with hot removals during writes [1]. A block device (NVME namespace in this test case) is formatted to EXT4 without partitions. It's mounted and write I/O is run to a file, then the device is hot removed from the slot. The superblock attempts to be written to the drive which is no longer present. The typical chain of events leading to the BUG: ext4_commit_super() __sync_dirty_buffer() submit_bh() submit_bh_wbc() BUG_ON(!buffer_mapped(bh)); This fix checks for the superblock's buffer head being mapped prior to syncing. [1] https://www.spinics.net/lists/linux-ext4/msg56527.html Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: add more mount time checks of the superblockTheodore Ts'o2018-06-181-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's ext4 mount-time checks were more permissive than e2fsprogs's libext2fs checks when opening a file system. The superblock is considered too insane for debugfs or e2fsck to operate on it, the kernel has no business trying to mount it. This will make file system fuzzing tools work harder, but the failure cases that they find will be more useful and be easier to evaluate. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: add more inode number paranoia checksTheodore Ts'o2018-06-173-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted. Also, if the superblock's first inode number field is too small, refuse to mount the file system. This addresses CVE-2018-10882. https://bugzilla.kernel.org/show_bug.cgi?id=200069 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: avoid running out of journal credits when appending to an inline fileTheodore Ts'o2018-06-173-57/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | jbd2: don't mark block as modified if the handle is out of creditsTheodore Ts'o2018-06-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org