summaryrefslogtreecommitdiffstats
path: root/fs/xfs/libxfs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Revert "xfs: grab dquots without taking the ilock"Christoph Hellwig2017-07-131-2/+0
| | | | | | | | | | | | This reverts commit 50e0bdbe9f48f98bb02eac7030d682f4716884ae. The new XFS_QMOPT_NOLOCK isn't used at all, and conditional locking based on a flag is always the wrong thing to do - we should be having helpers that can be called without the lock instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: fixup xfs_attr_get_ilockedChristoph Hellwig2017-07-131-1/+3
| | | | | | | | | The comment mentioned the wrong lock. Also add an ASSERT to assert this locking precondition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: don't crash on unexpected holes in dir/attr btreesDarrick J. Wong2017-07-084-5/+5
| | | | | | | | | | | | | | | | In quite a few places we call xfs_da_read_buf with a mappedbno that we don't control, then assume that the function passes back either an error code or a buffer pointer. Unfortunately, if mappedbno == -2 and bno maps to a hole, we get a return code of zero and a NULL buffer, which means that we crash if we actually try to use that buffer pointer. This happens immediately when we set the buffer type for transaction context. Therefore, check that we have no error code and a non-NULL bp before trying to use bp. This patch is a follow-up to an incomplete fix in 96a3aefb8ffde231 ("xfs: don't crash if reading a directory results in an unexpected hole"). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLENDarrick J. Wong2017-07-073-3/+4
| | | | | | | | | | | | | | | | | XFS has a maximum symlink target length of 1024 bytes; this is a holdover from the Irix days. Unfortunately, the constant establishing this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN, which is 4096. The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means that xfs_repair doesn't complain about oversized symlinks. Since this is an on-disk format constraint, put the define in the XFS namespace and move everything over to use the new name. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: grab dquots without taking the ilockDarrick J. Wong2017-06-281-0/+2
| | | | | | | | | Add a new dqget flag that grabs the dquot without taking the ilock. This will be used by the scrubber (which will have already grabbed the ilock) to perform basic sanity checking of the quota data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: remove unneeded parameter from XFS_TEST_ERRORDarrick J. Wong2017-06-289-29/+17
| | | | | | | | | Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
* xfs: pass along transaction context when reading xattr block buffersDarrick J. Wong2017-06-202-10/+21
| | | | | | | | | | | Teach the extended attribute reading functions to pass along a transaction context if one was supplied. The extended attribute scrub code will use transactions to lock buffers and avoid deadlocking with itself in the case of loops; since it will already have the inode locked, also create xattr get/list helpers that don't take locks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: pass along transaction context when reading directory block buffersDarrick J. Wong2017-06-201-2/+2
| | | | | | | | | Teach the directory reading functions to pass along a transaction context if one was supplied. The directory scrub code will use transactions to lock buffers and avoid deadlocking with itself in the case of loops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: return the hash value of a leaf1 directory blockDarrick J. Wong2017-06-203-10/+12
| | | | | | | | | | Modify the existing dir leafn lasthash function to enable us to calculate the highest hash value of a leaf1 block. This will be used by the directory scrubbing code to check the sanity of hashes in leaf1 directory blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: export _inobt_btrec_to_irec and _ialloc_cluster_alignment for scrubDarrick J. Wong2017-06-192-17/+32
| | | | | | | | | Create a function to extract an in-core inobt record from a generic btree_rec union so that scrub will be able to check inobt records and check inode block alignment. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: plumb in needed functions for range querying of various btreesDarrick J. Wong2017-06-192-0/+48
| | | | | | | | | | Plumb in the pieces (init_high_key, diff_two_keys) necessary to call query_range on the inode space and block mapping btrees and to extract raw btree records. This will eventually be used by the inobt and bmbt scrubbers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: export various function for the online scrubberDarrick J. Wong2017-06-1911-11/+35
| | | | | | | | Export various internal functions so that the online scrubber can use them to check the state of metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: always compile the btree inorder check functionsDarrick J. Wong2017-06-196-26/+0
| | | | | | | | The btree record and key inorder check functions will be used by the btree scrubber code, so make sure they're always built. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: remove double-underscore integer typesDarrick J. Wong2017-06-1932-392/+392
| | | | | | | | | | | | | | | | | | | | This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: optimize _btree_query_allDarrick J. Wong2017-06-191-5/+7
| | | | | | | | | Don't bother wandering our way through the leaf nodes when the caller issues a query_all; just zoom down the left side of the tree and walk rightwards along level zero. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: remove XFS_HSIZEChristoph Hellwig2017-06-191-4/+0
| | | | | | | | | | | | | | XFS_HSIZE is an extremly confusing way to calculate the size of handle_t. Given that handle_t always only had two sizes, and one of them isn't even covered by XFS_HSIZE to start with just remove the macro and use a constant sizeof expression. Note that XFS_HSIZE isn't used in xfsprogs, xfsdump or xfstests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: try to avoid blowing out the transaction reservation when bunmaping a ↵Darrick J. Wong2017-06-194-18/+48
| | | | | | | | | | | | | | | | | | | | | | | shared extent In a pathological scenario where we are trying to bunmapi a single extent in which every other block is shared, it's possible that trying to unmap the entire large extent in a single transaction can generate so many EFIs that we overflow the transaction reservation. Therefore, use a heuristic to guess at the number of blocks we can safely unmap from a reflink file's data fork in an single transaction. This should prevent problems such as the log head slamming into the tail and ASSERTs that trigger because we've exceeded the transaction reservation. Note that since bunmapi can fail to unmap the entire range, we must also teach the deferred unmap code to roll into a new transaction whenever we get low on reservation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: random edits, all bugs are my fault] Signed-off-by: Christoph Hellwig <hch@lst.de>
* xfs: fix unaligned access in xfs_btree_visit_blocksEric Sandeen2017-05-251-1/+1
| | | | | | | | | | | | This structure copy was throwing unaligned access warnings on sparc64: Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs] xfs_btree_copy_ptrs does a memcpy, which avoids it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: avoid mount-time deadlock in CoW extent recoveryDarrick J. Wong2017-05-191-12/+31
| | | | | | | | | | | If a malicious user corrupts the refcount btree to cause a cycle between different levels of the tree, the next mount attempt will deadlock in the CoW recovery routine while grabbing buffer locks. We can use the ability to re-grab a buffer that was previous locked to a transaction to avoid deadlocks, so do that here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: fix warnings about unused stack variablesDarrick J. Wong2017-05-161-2/+0
| | | | | | | | Reduce stack usage and get rid of compiler warnings by eliminating unused variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
* xfs: fix indlen accounting error on partial delalloc conversionBrian Foster2017-05-161-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The delalloc -> real block conversion path uses an incorrect calculation in the case where the middle part of a delalloc extent is being converted. This is documented as a rare situation because XFS generally attempts to maximize contiguity by converting as much of a delalloc extent as possible. If this situation does occur, the indlen reservation for the two new delalloc extents left behind by the conversion of the middle range is calculated and compared with the original reservation. If more blocks are required, the delta is allocated from the global block pool. This delta value can be characterized as the difference between the new total requirement (temp + temp2) and the currently available reservation minus those blocks that have already been allocated (startblockval(PREV.br_startblock) - allocated). The problem is that the current code does not account for previously allocated blocks correctly. It subtracts the current allocation count from the (new - old) delta rather than the old indlen reservation. This means that more indlen blocks than have been allocated end up stashed in the remaining extents and free space accounting is broken as a result. Fix up the calculation to subtract the allocated block count from the original extent indlen and thus correctly allocate the reservation delta based on the difference between the new total requirement and the unused blocks from the original reservation. Also remove a bogus assert that contradicts the fact that the new indlen reservation can be larger than the original indlen reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2017-05-0618-383/+584
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.12. The big new feature for this release is the new space mapping ioctl that we've been discussing since LSF2016, but other than that most of the patches are larger bug fixes, memory corruption prevention, and other cleanups. Summary: - various code cleanups - introduce GETFSMAP ioctl - various refactoring - avoid dio reads past eof - fix memory corruption and other errors with fragmented directory blocks - fix accidental userspace memory corruptions - publish fs uuid in superblock - make fstrim terminatable - fix race between quotaoff and in-core inode creation - avoid use-after-free when finishing up w/ buffer heads - reserve enough space to handle bmap tree resizing during cow remap" * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: fix use-after-free in xfs_finish_page_writeback xfs: reserve enough blocks to handle btree splits when remapping xfs: wait on new inodes during quotaoff dquot release xfs: update ag iterator to support wait on new inodes xfs: support ability to wait on new inodes xfs: publish UUID in struct super_block xfs: Allow user to kill fstrim process xfs: better log intent item refcount checking xfs: fix up quotacheck buffer list error handling xfs: remove xfs_trans_ail_delete_bulk xfs: don't use bool values in trace buffers xfs: fix getfsmap userspace memory corruption while setting OF_LAST xfs: fix __user annotations for xfs_ioc_getfsmap xfs: corruption needs to respect endianess too! xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap xfs: simplify validation of the unwritten extent bit xfs: remove unused values from xfs_exntst_t xfs: remove the unused XFS_MAXLINK_1 define xfs: more do_div cleanups ...
| * xfs: reserve enough blocks to handle btree splits when remappingDarrick J. Wong2017-05-031-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xfs_reflink_end_cow, we erroneously reserve only enough blocks to handle adding 1 extent. This is problematic if we fragment free space, have to do CoW, and then have to perform multiple bmap btree expansions. Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to handle btree splits, so log recovery fails after our first error causes the filesystem to go down. Therefore, refactor the transaction block reservation macros until we have a macro that works for our deferred (re)mapping activities, and fix both problems by using that macro. With 1k blocks we can hit this fairly often in g/187 if the scratch fs is big enough. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: simplify validation of the unwritten extent bitChristoph Hellwig2017-04-256-122/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XFS only supports the unwritten extent bit in the data fork, and only if the file system has a version 5 superblock or the unwritten extent feature bit. We currently have two routines that validate the invariant: xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met, and xfs_validate_extent that triggers and assert in debug build. Both of them iterate over all extents of an inode fork when called, which isn't very efficient. This patch instead adds a new helper that verifies the invariant one extent at a time, and calls it from the places where we iterate over all extents to converted them from or two the in-memory format. The callers then return -EFSCORRUPTED when reading invalid extents from disk, or trigger an assert when writing them to disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove unused values from xfs_exntst_tChristoph Hellwig2017-04-251-1/+0
| | | | | | | | | | | | | | | | | | | | We only ever use the normal and unwritten states. And the actual ondisk format (this enum isn't despite being in xfs_format.h) only has space for the unwritten bit anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove the unused XFS_MAXLINK_1 defineChristoph Hellwig2017-04-251-2/+0
| | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: more do_div cleanupsEric Sandeen2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some architectures do_div does the pointer compare trick to make sure that we've sent it an unsigned 64-bit number. (Why unsigned? I don't know.) Fix up the few places that squawk about this; in xfs_bmap_wants_extents() we just used a bare int64_t so change that to unsigned. In xfs_adjust_extent_unmap_boundaries() all we wanted was the mod, and we have an xfs-specific function to handle that w/o side effects, which includes proper casting for do_div. In xfs_daddr_to_ag[b]no, we were using the wrong type anyway; XFS_BB_TO_FSBT returns a block in the filesystem, so use xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness from that type as a bonus. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove bmap block allocation retriesChristoph Hellwig2017-04-252-46/+0
| | | | | | | | | | | | | | | | | | | | Now that reflink operations don't set the firstblock value we don't need the workarounds for non-NULL firstblock values without a prior allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove xfs_bmap_remap_allocChristoph Hellwig2017-04-251-58/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar to what we do in the space allocator. But the reflink code doesn't touch the allocation btree unlike the normal space allocator, so we couldn't care less about the state of the AGFL. So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: introduce xfs_bmapi_remapChristoph Hellwig2017-04-251-45/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | Add a new helper to be used for reflink extent list additions instead of funneling them through xfs_bmapi_write and overloading the firstblock member in struct xfs_bmalloca and struct xfs_alloc_args. With some small changes to xfs_bmap_remap_alloc this also means we do not need a xfs_bmalloca structure for this case at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: pass individual arguments to xfs_bmap_add_extent_hole_realChristoph Hellwig2017-04-251-59/+64
| | | | | | | | | | | | | | | | | | For the reflink case we'd much rather pass the required arguments than faking up a struct xfs_bmalloca. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove attr fork handling in xfs_bmap_finish_oneChristoph Hellwig2017-04-251-6/+3
| | | | | | | | | | | | | | | | | | We never do COW operations for the attr fork, so don't pretend we handle them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: fix integer truncation in xfs_bmap_remap_allocChristoph Hellwig2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | bno should be a xfs_fsblock_t, which is 64-bit wides instead of a xfs_aglock_t, which truncates the value to 32 bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: simplify xfs_calc_dquots_per_chunkEric Sandeen2017-04-121-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ndquots is a 32-bit value, and we don't care about the remainder; there is no reason to use do_div here, it seems to be the result of a decade+ historical accident. Worse, the do_div implementation in userspace breaks when fed a 32-bit dividend, so we commented it out there in any case. Change to simple division, and then we can change userspace to match, and mandate a 64-bit dividend in the do_div() in userspace as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com>
| * xfs: implement the GETFSMAP ioctlDarrick J. Wong2017-04-043-0/+43
| | | | | | | | | | | | | | | | Introduce a new ioctl that uses the reverse mapping btree to return information about the physical layout of the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * xfs: add a couple of queries to iterate free extents in the rtbitmapDarrick J. Wong2017-04-041-0/+70
| | | | | | | | | | | | | | | | | | Add _query_range and _query_all functions to the realtime bitmap allocator. These two functions are similar in usage to the btree functions with the same name and will be used for getfsmap and scrub. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * xfs: create a function to query all records in a btreeDarrick J. Wong2017-04-046-7/+57
| | | | | | | | | | | | | | | | | | Create a helper function that will query all records in a btree. This will be used by the online repair functions to examine every record in a btree to rebuild a second btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * xfs: provide a query_range function for freespace btreesDarrick J. Wong2017-04-042-0/+52
| | | | | | | | | | | | | | | | | | Implement a query_range function for the bnobt and cntbt. This will be used for getfsmap fallback if there is no rmapbt and by the online scrub and repair code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * xfs: plumb in needed functions for range querying of the freespace btreesDarrick J. Wong2017-04-041-43/+129
| | | | | | | | | | | | | | | | | | | | | | Plumb in the pieces (init_high_key, diff_two_keys) necessary to call query_range on the free space btrees. Remove the debugging asserts so that we can make queries starting from block 0. While we're at it, merge the redundant "if (btnum ==" hunks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * xfs: fix up inode validation failure messageEric Sandeen2017-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | "xfs_iread: validation failed for inode 96 failed" One "failed" seems like enough. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: remove the ISUNWRITTEN macroChristoph Hellwig2017-04-041-1/+0
| | | | | | | | | | | | | | | | Opencoding the trivial checks makes it much easier to read (and grep..). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: factor out a xfs_bmap_is_real_extent helperChristoph Hellwig2017-04-041-0/+12
| | | | | | | | | | | | | | | | | | This checks for all the non-normal extent types, including handling both encodings of delayed allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: rework the inline directory verifiersDarrick J. Wong2017-03-284-47/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inline directory verifiers should be called on the inode fork data, which means after iformat_local on the read side, and prior to ifork_flush on the write side. This makes the fork verifier more consistent with the way buffer verifiers work -- i.e. they will operate on the memory buffer that the code will be reading and writing directly. Furthermore, revise the verifier function to return -EFSCORRUPTED so that we don't flood the logs with corruption messages and assert notices. This has been a particular problem with xfs/348, which triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the kernel when CONFIG_XFS_DEBUG=y. Disk corruption isn't supposed to do that, at least not in a verifier. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: get the inode d_ops the proper way v3: describe the bug that this patch fixes; no code changes
* | xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFSMichal Hocko2017-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite some time ago. We would like to make this concept more generic and use it for other filesystems as well. Let's start by giving the flag a more generic name PF_MEMALLOC_NOFS which is in line with an exiting PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO contexts. Replace all PF_FSTRANS usage from the xfs code in the first step before we introduce a full API for it as xfs uses the flag directly anyway. This patch doesn't introduce any functional change. Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | xfs: rework the inline directory verifiersDarrick J. Wong2017-04-034-47/+56
|/ | | | | | | | | | | | | | | | | | | The inline directory verifiers should be called on the inode fork data, which means after iformat_local on the read side, and prior to ifork_flush on the write side. This makes the fork verifier more consistent with the way buffer verifiers work -- i.e. they will operate on the memory buffer that the code will be reading and writing directly. Furthermore, revise the verifier function to return -EFSCORRUPTED so that we don't flood the logs with corruption messages and assert notices. This has been a particular problem with xfs/348, which triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the kernel when CONFIG_XFS_DEBUG=y. Disk corruption isn't supposed to do that, at least not in a verifier. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: verify inline directory data forksDarrick J. Wong2017-03-154-4/+113
| | | | | | | | | | | | | When we're reading or writing the data fork of an inline directory, check the contents to make sure we're not overflowing buffers or eating garbage data. xfs/348 corrupts an inline symlink into an inline directory, triggering a buffer overflow bug. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> --- v2: add more checks consistent with _dir2_sf_check and make the verifier usable from anywhere.
* xfs: try any AG when allocating the first btree block when reflinkingChristoph Hellwig2017-03-082-6/+10
| | | | | | | | | | | | | | | | | | | | | When a reflink operation causes the bmap code to allocate a btree block we're currently doing single-AG allocations due to having ->firstblock set and then try any higher AG due a little reflink quirk we've put in when adding the reflink code. But given that we do not have a minleft reservation of any kind in this AG we can still not have any space in the same or higher AG even if the file system has enough free space. To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back path instead. [And yes, we need to redo this properly instead of piling hacks over hacks. I'm working on that, but it's not going to be a small series. In the meantime this fixes the customer reported issue] Also add a warning for failing allocations to make it easier to debug. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: use iomap new flag for newly allocated delalloc blocksBrian Foster2017-03-081-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fa7f138 ("xfs: clear delalloc and cache on buffered write failure") fixed one regression in the iomap error handling code and exposed another. The fundamental problem is that if a buffered write is a rewrite of preexisting delalloc blocks and the write fails, the failure handling code can punch out preexisting blocks with valid file data. This was reproduced directly by sub-block writes in the LTP kernel/syscalls/write/write03 test. A first 100 byte write allocates a single block in a file. A subsequent 100 byte write fails and punches out the block, including the data successfully written by the previous write. To address this problem, update the ->iomap_begin() handler to distinguish newly allocated delalloc blocks from preexisting delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the ->iomap_end() handler to decide when a failed or short write should punch out delalloc blocks. This introduces the subtle requirement that ->iomap_begin() should never combine newly allocated delalloc blocks with existing blocks in the resulting iomap descriptor. This can occur when a new delalloc reservation merges with a neighboring extent that is part of the current write, for example. Therefore, drop the post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and just return the record inserted into the fork. This ensures only new blocks are returned and thus that preexisting delalloc blocks are always handled as "found" blocks and not punched out on a failed rewrite. Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AGChristoph Hellwig2017-02-182-18/+2
| | | | | | | | | XFS_ALLOCTYPE_ANY_AG was only used for the RT allocator and is unused now, and XFS_ALLOCTYPE_START_AG has been unused for a while. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: tune down agno asserts in the bmap codeChristoph Hellwig2017-02-171-16/+6
| | | | | | | | | | | | | | | | | | | | | In various places we currently assert that xfs_bmap_btalloc allocates from the same as the firstblock value passed in, unless it's either NULLAGNO or the dop_low flag is set. But the reflink code does not fully follow this convention as it passes in firstblock purely as a hint for the allocator without actually having previous allocations in the transaction, and without having a minleft check on the current AG, leading to the assert firing on a very full and heavily used file system. As even the reflink code only allocates from equal or higher AGs for now we can simply the check to always allow for equal or higher AGs. Note that we need to eventually split the two meanings of the firstblock value. At that point we can also allow the reflink code to allocate from any AG instead of limiting it in any way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>