summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_inode.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: get rid of unnecessary xfs_perag_{get,put} pairsGao Xiang2020-07-141-27/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the course of some operations, we look up the perag from the mount multiple times to get or change perag information. These are often very short pieces of code, so while the lookup cost is generally low, the cost of the lookup is far higher than the cost of the operation we are doing on the perag. Since we changed buffers to hold references to the perag they are cached in, many modification contexts already hold active references to the perag that are held across these operations. This is especially true for any operation that is serialised by an allocation group header buffer. In these cases, we can just use the buffer's reference to the perag to avoid needing to do lookups to access the perag. This means that many operations don't need to do perag lookups at all to access the perag because they've already looked up objects that own persistent references and hence can use that reference instead. Cc: Dave Chinner <dchinner@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_inobp_check()Dave Chinner2020-07-071-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This debug code is called on every xfs_iflush() call, which then checks every inode in the buffer for non-zero unlinked list field. Hence it checks every inode in the cluster buffer every time a single inode on that cluster it flushed. This is resulting in: - 38.91% 5.33% [kernel] [k] xfs_iflush - 17.70% xfs_iflush - 9.93% xfs_inobp_check 4.36% xfs_buf_offset 10% of the CPU time spent flushing inodes is repeatedly checking unlinked fields in the buffer. We don't need to do this. The other place we call xfs_inobp_check() is xfs_iunlink_update_dinode(), and this is after we've done this assert for the agino we are about to write into that inode: ASSERT(xfs_verify_agino_or_null(mp, agno, next_agino)); which means we've already checked that the agino we are about to write is not 0 on debug kernels. The inode buffer verifiers do everything else we need, so let's just remove this debug code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rework xfs_iflush_cluster() dirty inode iterationDave Chinner2020-07-071-96/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have all the dirty inodes attached to the cluster buffer, we don't actually have to do radix tree lookups to find them. Sure, the radix tree is efficient, but walking a linked list of just the dirty inodes attached to the buffer is much better. We are also no longer dependent on having a locked inode passed into the function to determine where to start the lookup. This means we can drop it from the function call and treat all inodes the same. We also make xfs_iflush_cluster skip inodes marked with XFS_IRECLAIM. This we avoid races with inodes that reclaim is actively referencing or are being re-initialised by inode lookup. If they are actually dirty, they'll get written by a future cluster flush.... We also add a shutdown check after obtaining the flush lock so that we catch inodes that are dirty in memory and may have inconsistent state due to the shutdown in progress. We abort these inodes directly and so they remove themselves directly from the buffer list and the AIL rather than having to wait for the buffer to be failed and callbacks run to be processed correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rename xfs_iflush_int()Dave Chinner2020-07-071-147/+146
| | | | | | | | | | | | with xfs_iflush() gone, we can rename xfs_iflush_int() back to xfs_iflush(). Also move it up above xfs_iflush_cluster() so we don't need the forward definition any more. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: xfs_iflush() is no longer necessaryDave Chinner2020-07-071-93/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Now we have a cached buffer on inode log items, we don't need to do buffer lookups when flushing inodes anymore - all we need to do is lock the buffer and we are ready to go. This largely gets rid of the need for xfs_iflush(), which is essentially just a mechanism to look up the buffer and flush the inode to it. Instead, we can just call xfs_iflush_cluster() with a few modifications to ensure it also flushes the inode we already hold locked. This allows the AIL inode item pushing to be almost entirely non-blocking in XFS - we won't block unless memory allocation for the cluster inode lookup blocks or the block device queues are full. Writeback during inode reclaim becomes a little more complex because we now have to lock the buffer ourselves, but otherwise this change is largely a functional no-op that removes a whole lot of code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: attach inodes to the cluster buffer when dirtiedDave Chinner2020-07-071-19/+5
| | | | | | | | | | | | | Rather than attach inodes to the cluster buffer just when we are doing IO, attach the inodes to the cluster buffer when they are dirtied. The means the buffer always carries a list of dirty inodes that reference it, and we can use that list to make more fundamental changes to inode writeback that aren't otherwise possible. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rework stale inodes in xfs_ifree_clusterDave Chinner2020-07-071-99/+62
| | | | | | | | | | | | | | | | | | | | | | | | Once we have inodes pinning the cluster buffer and attached whenever they are dirty, we no longer have a guarantee that the items are flush locked when we lock the cluster buffer. Hence we cannot just walk the buffer log item list and modify the attached inodes. If the inode is not flush locked, we have to ILOCK it first and then flush lock it to do all the prerequisite checks needed to avoid races with other code. This is already handled by xfs_ifree_get_one_inode(), so rework the inode iteration loop and function to update all inodes in cache whether they are attached to the buffer or not. Note: we also remove the copying of the log item lsn to the ili_flush_lsn as xfs_iflush_done() now uses the XFS_ISTALE flag to trigger aborts and so flush lsn matching is not needed in IO completion for processing freed inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: get rid of log item callbacksDave Chinner2020-07-071-2/+3
| | | | | | | | | | | They are not used anymore, so remove them from the log item and the buffer iodone attachment interfaces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: make inode IO completion buffer centricDave Chinner2020-07-061-4/+2
| | | | | | | | | | | | | | | | | | | Having different io completion callbacks for different inode states makes things complex. We can detect if the inode is stale via the XFS_ISTALE flag in IO completion, so we don't need a special callback just for this. This means inodes only have a single iodone callback, and inode IO completion is entirely buffer centric at this point. Hence we no longer need to use a log item callback at all as we can just call xfs_iflush_done() directly from the buffer completions and walk the buffer log item list to complete the all inodes under IO. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: mark inode buffers in cacheDave Chinner2020-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | Inode buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call, and makes future modifications much simpler. While this is largely a refactor of existing functionality, we broaden the scope of the flag to beyond where inodes are explicitly attached because future changes need to know what type of log items are attached to the buffer. Adding this buffer flag may invoke the inode iodone callback in cases where it wouldn't have been previously, but this is not a functional change because the callback is identical to the normal buffer write iodone callback when inodes are not attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: add an inode item lockDave Chinner2020-07-061-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode log item is kind of special in that it can be aggregating new changes in memory at the same time time existing changes are being written back to disk. This means there are fields in the log item that are accessed concurrently from contexts that don't share any locking at all. e.g. updating ili_last_fields occurs at flush time under the ILOCK_EXCL and flush lock at flush time, under the flush lock at IO completion time, and is read under the ILOCK_EXCL when the inode is logged. Hence there is no actual serialisation between reading the field during logging of the inode in transactions vs clearing the field in IO completion. We currently get away with this by the fact that we are only clearing fields in IO completion, and nothing bad happens if we accidentally log more of the inode than we actually modify. Worst case is we consume a tiny bit more memory and log bandwidth. However, if we want to do more complex state manipulations on the log item that requires updates at all three of these potential locations, we need to have some mechanism of serialising those operations. To do this, introduce a spinlock into the log item to serialise internal state. This could be done via the xfs_inode i_flags_lock, but this then leads to potential lock inversion issues where inode flag updates need to occur inside locks that best nest inside the inode log item locks (e.g. marking inodes stale during inode cluster freeing). Using a separate spinlock avoids these sorts of problems and simplifies future code. This does not touch the use of ili_fields in the item formatting code - that is entirely protected by the ILOCK_EXCL at this point in time, so it remains untouched. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove logged flag from inode log itemDave Chinner2020-07-061-9/+4
| | | | | | | | | | | | This was used to track if the item had logged fields being flushed to disk. We log everything in the inode these days, so this logic is no longer needed. Remove it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: Don't allow logging of XFS_ISTALE inodesDave Chinner2020-07-061-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In tracking down a problem in this patchset, I discovered we are reclaiming dirty stale inodes. This wasn't discovered until inodes were always attached to the cluster buffer and then the rcu callback that freed inodes was assert failing because the inode still had an active pointer to the cluster buffer after it had been reclaimed. Debugging the issue indicated that this was a pre-existing issue resulting from the way the inodes are handled in xfs_inactive_ifree. When we free a cluster buffer from xfs_ifree_cluster, all the inodes in cache are marked XFS_ISTALE. Those that are clean have nothing else done to them and so eventually get cleaned up by background reclaim. i.e. it is assumed we'll never dirty/relog an inode marked XFS_ISTALE. On journal commit dirty stale inodes as are handled by both buffer and inode log items to run though xfs_istale_done() and removed from the AIL (buffer log item commit) or the log item will simply unpin it because the buffer log item will clean it. What happens to any specific inode is entirely dependent on which log item wins the commit race, but the result is the same - stale inodes are clean, not attached to the cluster buffer, and not in the AIL. Hence inode reclaim can just free these inodes without further care. However, if the stale inode is relogged, it gets dirtied again and relogged into the CIL. Most of the time this isn't an issue, because relogging simply changes the inode's location in the current checkpoint. Problems arise, however, when the CIL checkpoints between two transactions in the xfs_inactive_ifree() deferops processing. This results in the XFS_ISTALE inode being redirtied and inserted into the CIL without any of the other stale cluster buffer infrastructure being in place. Hence on journal commit, it simply gets unpinned, so it remains dirty in memory. Everything in inode writeback avoids XFS_ISTALE inodes so it can't be written back, and it is not tracked in the AIL so there's not even a trigger to attempt to clean the inode. Hence the inode just sits dirty in memory until inode reclaim comes along, sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming of a dirty inode caused use after free, list corruptions and other nasty issues later in this patchset. Hence this patch addresses a violation of the "never log XFS_ISTALE inodes" caused by the deferops processing rolling a transaction and relogging a stale inode in xfs_inactive_free. It also adds a bunch of asserts to catch this problem in debug kernels so that we don't reintroduce this problem in future. Reproducer for this issue was generic/558 on a v4 filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: move helpers that lock and unlock two inodes against userspace IODarrick J. Wong2020-07-061-0/+93
| | | | | | | | Move the double-inode locking helpers to xfs_inode.c since they're not specific to reflink. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* Merge tag 'xfs-5.8-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2020-06-131-1/+3
|\ | | | | | | | | | | | | | | | | Pull xfs fix from Darrick Wong: "We've settled down into the bugfix phase; this one fixes a resource leak on an error bailout path" * tag 'xfs-5.8-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Add the missed xfs_perag_put() for xfs_ifree_cluster()
| * xfs: Add the missed xfs_perag_put() for xfs_ifree_cluster()Chuhong Yuan2020-06-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | xfs_ifree_cluster() calls xfs_perag_get() at the beginning, but forgets to call xfs_perag_put() in one failed path. Add the missed function call to fix it. Fixes: ce92464c180b ("xfs: make xfs_trans_get_buf return an error code") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* | mmap locking API: convert mmap_sem commentsMichel Lespinasse2020-06-091-7/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* xfs: move the fork format fields into struct xfs_iforkChristoph Hellwig2020-05-191-20/+16
| | | | | | | | | | | | Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: move the per-fork nextents fields into struct xfs_iforkChristoph Hellwig2020-05-191-11/+8
| | | | | | | | | | | | | | | There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_ifree_local_dataChristoph Hellwig2020-05-191-20/+10
| | | | | | | | | | | | xfs_ifree only need to free inline data in the data fork, as we've already taken care of the attr fork before (and in fact freed the fork structure). Just open code the freeing of the inline data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: improve local fork verificationChristoph Hellwig2020-05-191-19/+9
| | | | | | | | | | | | Call the data/attr local fork verifiers as soon as we are ready for them. This keeps them close to the code setting up the forks, and avoids a few branches later on. Also open code xfs_inode_verify_forks in the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: refactor xfs_inode_verify_forksChristoph Hellwig2020-05-191-17/+4
| | | | | | | | | | | | | | The split between xfs_inode_verify_forks and the two helpers implementing the actual functionality is a little strange. Reshuffle it so that xfs_inode_verify_forks verifies if the data and attr forks are actually in local format and only call the low-level helpers if that is the case. Handle the actual error reporting in the low-level handlers to streamline the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove xfs_ifork_opsChristoph Hellwig2020-05-191-2/+2
| | | | | | | | | | | | xfs_ifork_ops add up to two indirect calls per inode read and flush, despite just having a single instance in the kernel. In xfsprogs phase6 in xfs_repair overrides the verify_dir method to deal with inodes that do not have a valid parent, but that can be fixed pretty easily by ensuring they always have a valid looking parent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove unused iget_flags param from xfs_imap_to_bp()Brian Foster2020-05-071-4/+3
| | | | | | | | | | | iget_flags is unused in xfs_imap_to_bp(). Remove the parameter and fix up the callers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove unused iflush stale parameterBrian Foster2020-05-071-1/+1
| | | | | | | | | | | | The stale parameter was used to control the now unused shutdown parameter of xfs_trans_ail_remove(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove unnecessary shutdown check from xfs_iflush()Brian Foster2020-05-071-13/+0
| | | | | | | | | | | | | | | The shutdown check in xfs_iflush() duplicates checks down in the buffer code. If the fs is shut down, xfs_trans_read_buf_map() always returns an error and falls into the same error path. Remove the unnecessary check along with the warning in xfs_imap_to_bp() that generates excessive noise in the log if the fs is shut down. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: simplify inode flush error handlingBrian Foster2020-05-071-72/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | The inode flush code has several layers of error handling between the inode and cluster flushing code. If the inode flush fails before acquiring the backing buffer, the inode flush is aborted. If the cluster flush fails, the current inode flush is aborted and the cluster buffer is failed to handle the initial inode and any others that might have been attached before the error. Since xfs_iflush() is the only caller of xfs_iflush_cluster(), the error handling between the two can be condensed in the top-level function. If we update xfs_iflush_int() to always fall through to the log item update and attach the item completion handler to the buffer, any errors that occur after the first call to xfs_iflush_int() can be handled with a buffer I/O failure. Lift the error handling from xfs_iflush_cluster() into xfs_iflush() and consolidate with the existing error handling. This also replaces the need to release the buffer because failing the buffer with XBF_ASYNC drops the current reference. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: factor out buffer I/O failure codeBrian Foster2020-05-071-5/+1
| | | | | | | | | | | | | We use the same buffer I/O failure code in a few different places. It's not much code, but it's not necessarily self-explanatory. Factor it into a helper and document it in one place. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove the xfs_inode_log_item_t typedefChristoph Hellwig2020-05-041-2/+2
| | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: factor out a new xfs_log_force_inode helperChristoph Hellwig2020-04-061-0/+19
| | | | | | | | | | Create a new helper to force the log up to the last LSN touching an inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: fix inode number overflow in ifree cluster helperBrian Foster2020-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Qian Cai reports seemingly random buffer read verifier errors during filesystem writeback. This was isolated to a recent patch that factored out some inode cluster freeing code and happened to cast an unsigned inode number type to a signed value. If the inode number value overflows, we can skip marking in-core inodes associated with the underlying buffer stale at the time the physical inodes are freed. If such an inode happens to be dirty, xfsaild will eventually attempt to write it back over non-inode blocks. The invalidation of the underlying inode buffer causes writeback to read the buffer from disk. This fails the read verifier (preventing eventual corruption) if the buffer no longer looks like an inode cluster. Analysis by Dave Chinner. Fix up the helper to use the proper type for inode number values. Fixes: 5806165a6663 ("xfs: factor inode lookup from xfs_ifree_cluster") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove unnecessary ternary from xfs_createKaixu Xia2020-03-281-2/+1
| | | | | | | | | | | | Since the "no-allocation" reservations for file creations has been removed, the resblks value should be larger than zero, so remove unnecessary ternary conditional. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: s/judgment/ternary/] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: factor inode lookup from xfs_ifree_clusterDave Chinner2020-03-271-68/+84
| | | | | | | | | | | | | | There's lots of indent in this code which makes it a bit hard to follow. We are also going to completely rework the inode lookup code as part of the inode reclaim rework, so factor out the inode lookup code from the inode cluster freeing code. Based on prototype code from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove the di_version field from struct icdinodeChristoph Hellwig2020-03-191-14/+2
| | | | | | | | | | | | | | We know the version is 3 if on a v5 file system. For earlier file systems formats we always upgrade the remaining v1 inodes to v2 and thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode helper to check if we deal with small or large dinodes, and thus remove the need for the di_version field in struct icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: simplify di_flags2 inheritance in xfs_iallocChristoph Hellwig2020-03-191-10/+3
| | | | | | | | | | | | | | di_flags2 is initialized to zero for v4 and earlier file systems. This means di_flags2 can only be non-zero for a v5 file systems, in which case both the parent and child inodes can store the field. Remove the extra di_version check, and also remove the rather pointless local di_flags2 variable while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: add a function to deal with corrupt buffers post-verifiersDarrick J. Wong2020-03-121-2/+2
| | | | | | | | | | | | Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove XFS_BUF_TO_AGIChristoph Hellwig2020-03-111-3/+3
| | | | | | | | | | | Just dereference bp->b_addr directly and make the code a little simpler and more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove the icdinode di_uid/di_gid membersChristoph Hellwig2020-03-031-10/+4
| | | | | | | | | Use the Linux inode i_uid/i_gid members everywhere and just convert from/to the scalar value when reading or writing the on-disk inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: ensure that the inode uid/gid match values match the icdinode onesChristoph Hellwig2020-03-031-2/+6
| | | | | | | | | | Instead of only synchronizing the uid/gid values in xfs_setup_inode, ensure that they always match to prepare for removing the icdinode fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: make xfs_trans_get_buf return an error codeDarrick J. Wong2020-01-261-6/+6
| | | | | | | | | Convert xfs_trans_get_buf() to return numeric error codes like most everywhere else in xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove unused variable 'done'YueHaibing2020-01-241-1/+0
| | | | | | | | | | | | | | | | fs/xfs/xfs_inode.c: In function 'xfs_itruncate_extents_flags': fs/xfs/xfs_inode.c:1523:8: warning: unused variable 'done' [-Wunused-variable] commit 4bbb04abb4ee ("xfs: truncate should remove all blocks, not just to the end of the page cache") left behind this, so remove it. Fixes: 4bbb04abb4ee ("xfs: truncate should remove all blocks, not just to the end of the page cache") Reported-by: Hulk Robot <hulkci@huawei.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: truncate should remove all blocks, not just to the end of the page cacheDarrick J. Wong2020-01-141-12/+12
| | | | | | | | | | | | | | | xfs_itruncate_extents_flags() is supposed to unmap every block in a file from EOF onwards. Oddly, it uses s_maxbytes as the upper limit to the bunmapi range, even though s_maxbytes reflects the highest offset the pagecache can support, not the highest offset that XFS supports. The result of this confusion is that if you create a 20T file on a 64-bit machine, mount the filesystem on a 32-bit machine, and remove the file, we leak everything above 16T. Fix this by capping the bunmapi request at the maximum possible block offset, not s_maxbytes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()kaixuxia2019-11-131-0/+17
| | | | | | | | | | | | | | | | | | | | | When target_ip exists in xfs_rename(), the xfs_dir_replace() call may need to hold the AGF lock to allocate more blocks, and then invoking the xfs_droplink() call to hold AGI lock to drop target_ip onto the unlinked list, so we get the lock order AGF->AGI. This would break the ordering constraint on AGI and AGF locking - inode allocation locks the AGI, then can allocate a new extent for new inodes, locking the AGF after the AGI. In this patch we check whether the replace operation need more blocks firstly. If so, acquire the agi lock firstly to preserve locking order(AGI/AGF). Actually, the locking order problem only occurs when we are locking the AGI/AGF of the same AG. For multiple AGs the AGI lock will be released after the transaction committed. Signed-off-by: kaixuxia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: reword the comment] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: merge the projid fields in struct xfs_icdinodeChristoph Hellwig2019-11-131-3/+3
| | | | | | | | | There is no point in splitting the fields like this in an purely in-memory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use a struct timespec64 for the in-core crtimeChristoph Hellwig2019-11-131-2/+1
| | | | | | | | | | | struct xfs_icdinode is purely an in-memory data structure, so don't use a log on-disk structure for it. This simplifies the code a bit, and also reduces our include hell slightly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a minor indenting problem in xfs_trans_ichgtime] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: always log corruption errorsDarrick J. Wong2019-11-041-3/+12
| | | | | | | | | Make sure we log something to dmesg whenever we return -EFSCORRUPTED up the call stack. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: remove the duplicated inode log fieldmask setkaixuxia2019-10-211-1/+0
| | | | | | | | | The xfs_bumplink() call has set the inode log fieldmask XFS_ILOG_CORE, so the next xfs_trans_log_inode() call is not necessary. Signed-off-by: kaixuxia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: ignore extent size hints for always COW inodesChristoph Hellwig2019-10-211-0/+6
| | | | | | | | | | There is no point in applying extent size hints for always COW inodes, as we would just have to COW any extra allocation beyond the data actually written. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUTkaixuxia2019-09-041-41/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performing rename operation with RENAME_WHITEOUT flag, we will hold AGF lock to allocate or free extents in manipulating the dirents firstly, and then doing the xfs_iunlink_remove() call last to hold AGI lock to modify the tmpfile info, so we the lock order AGI->AGF. The big problem here is that we have an ordering constraint on AGF and AGI locking - inode allocation locks the AGI, then can allocate a new extent for new inodes, locking the AGF after the AGI. Hence the ordering that is imposed by other parts of the code is AGI before AGF. So we get an ABBA deadlock between the AGI and AGF here. Process A: Call trace: ? __schedule+0x2bd/0x620 schedule+0x33/0x90 schedule_timeout+0x17d/0x290 __down_common+0xef/0x125 ? xfs_buf_find+0x215/0x6c0 [xfs] down+0x3b/0x50 xfs_buf_lock+0x34/0xf0 [xfs] xfs_buf_find+0x215/0x6c0 [xfs] xfs_buf_get_map+0x37/0x230 [xfs] xfs_buf_read_map+0x29/0x190 [xfs] xfs_trans_read_buf_map+0x13d/0x520 [xfs] xfs_read_agf+0xa6/0x180 [xfs] ? schedule_timeout+0x17d/0x290 xfs_alloc_read_agf+0x52/0x1f0 [xfs] xfs_alloc_fix_freelist+0x432/0x590 [xfs] ? down+0x3b/0x50 ? xfs_buf_lock+0x34/0xf0 [xfs] ? xfs_buf_find+0x215/0x6c0 [xfs] xfs_alloc_vextent+0x301/0x6c0 [xfs] xfs_ialloc_ag_alloc+0x182/0x700 [xfs] ? _xfs_trans_bjoin+0x72/0xf0 [xfs] xfs_dialloc+0x116/0x290 [xfs] xfs_ialloc+0x6d/0x5e0 [xfs] ? xfs_log_reserve+0x165/0x280 [xfs] xfs_dir_ialloc+0x8c/0x240 [xfs] xfs_create+0x35a/0x610 [xfs] xfs_generic_create+0x1f1/0x2f0 [xfs] ... Process B: Call trace: ? __schedule+0x2bd/0x620 ? xfs_bmapi_allocate+0x245/0x380 [xfs] schedule+0x33/0x90 schedule_timeout+0x17d/0x290 ? xfs_buf_find+0x1fd/0x6c0 [xfs] __down_common+0xef/0x125 ? xfs_buf_get_map+0x37/0x230 [xfs] ? xfs_buf_find+0x215/0x6c0 [xfs] down+0x3b/0x50 xfs_buf_lock+0x34/0xf0 [xfs] xfs_buf_find+0x215/0x6c0 [xfs] xfs_buf_get_map+0x37/0x230 [xfs] xfs_buf_read_map+0x29/0x190 [xfs] xfs_trans_read_buf_map+0x13d/0x520 [xfs] xfs_read_agi+0xa8/0x160 [xfs] xfs_iunlink_remove+0x6f/0x2a0 [xfs] ? current_time+0x46/0x80 ? xfs_trans_ichgtime+0x39/0xb0 [xfs] xfs_rename+0x57a/0xae0 [xfs] xfs_vn_rename+0xe4/0x150 [xfs] ... In this patch we move the xfs_iunlink_remove() call to before acquiring the AGF lock to preserve correct AGI/AGF locking order. Signed-off-by: kaixuxia <kaixuxia@tencent.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* fs: xfs: Remove KM_NOSLEEP and KM_SLEEP.Tetsuo Handa2019-08-261-1/+1
| | | | | | | | | Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>