summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_dquot.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: remove the ip argument to xfs_defer_finishChristoph Hellwig2017-09-011-1/+1
| | | | | | | | | | And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* Revert "xfs: grab dquots without taking the ilock"Christoph Hellwig2017-07-131-10/+4
| | | | | | | | | | | | This reverts commit 50e0bdbe9f48f98bb02eac7030d682f4716884ae. The new XFS_QMOPT_NOLOCK isn't used at all, and conditional locking based on a flag is always the wrong thing to do - we should be having helpers that can be called without the lock instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: fix contiguous dquot chunk iteration livelockBrian Foster2017-07-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | | The patch below updated xfs_dq_get_next_id() to use the XFS iext lookup helpers to locate the next quota id rather than to seek for data in the quota file. The updated code fails to correctly handle the case where the quota inode might have contiguous chunks part of the same extent. In this case, the start block offset is calculated based on the next expected id but the extent lookup returns the same start offset as for the previous chunk. This causes the returned id to go backwards and livelocks the quota iteration. This problem is reproduced intermittently by generic/232. To handle this case, check whether the startoff from the extent lookup is behind the startoff calculated from the next quota id. If so, bump up got.br_startoff to the specific file offset that is expected to hold the next dquot chunk. Fixes: bda250dbaf39 ("xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extentChristoph Hellwig2017-07-021-44/+22
| | | | | | | | | | | | | This goes straight to a single lookup in the extent list and avoids a roundtrip through two layers that don't add any value for the simple quoata file that just has data or holes and no page cache, delayed allocation, unwritten extent or COW fork (which btw, doesn't seem to be handled by the existing SEEK HOLE/DATA code). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: grab dquots without taking the ilockDarrick J. Wong2017-06-281-4/+10
| | | | | | | | | Add a new dqget flag that grabs the dquot without taking the ilock. This will be used by the scrubber (which will have already grabbed the ilock) to perform basic sanity checking of the quota data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xfs: remove double-underscore integer typesDarrick J. Wong2017-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: don't wrap ID in xfs_dq_get_next_idEric Sandeen2017-01-171-0/+4
| | | | | | | | | | | | | | | | | | | | The GETNEXTQOTA ioctl takes whatever ID is sent in, and looks for the next active quota for an user equal or higher to that ID. But if we are at the maximum ID and then ask for the "next" one, we may wrap back to zero. In this case, userspace may loop forever, because it will start querying again at zero. We'll fix this in userspace as well, but for the kernel, return -ENOENT if we ask for the next quota ID past UINT_MAX so the caller knows to stop. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: rename flist/free_list to dfopsDarrick J. Wong2016-08-031-5/+5
| | | | | | | | | | Mechanical change of flist/free_list to dfops, since they're now deferred ops, not just a freeing list. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*Darrick J. Wong2016-08-031-5/+5
| | | | | | | | | | Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: rework xfs_bmap_free callers to use xfs_defer_opsDarrick J. Wong2016-08-031-0/+1
| | | | | | | | | | | | Restructure everything that used xfs_bmap_free to use xfs_defer_ops instead. For now we'll just remove the old symbols and play some cpp magic to make it work; in the next patch we'll actually rename everything. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: allocate log vector buffers outside CIL context lockDave Chinner2016-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problems we currently have with delayed logging is that under serious memory pressure we can deadlock memory reclaim. THis occurs when memory reclaim (such as run by kswapd) is reclaiming XFS inodes and issues a log force to unpin inodes that are dirty in the CIL. The CIL is pushed, but this will only occur once it gets the CIL context lock to ensure that all committing transactions are complete and no new transactions start being committed to the CIL while the push switches to a new context. The deadlock occurs when the CIL context lock is held by a committing process that is doing memory allocation for log vector buffers, and that allocation is then blocked on memory reclaim making progress. Memory reclaim, however, is blocked waiting for a log force to make progress, and so we effectively deadlock at this point. To solve this problem, we have to move the CIL log vector buffer allocation outside of the context lock so that memory reclaim can always make progress when it needs to force the log. The problem with doing this is that a CIL push can take place while we are determining if we need to allocate a new log vector buffer for an item and hence the current log vector may go away without warning. That means we canot rely on the existing log vector being present when we finally grab the context lock and so we must have a replacement buffer ready to go at all times. To ensure this, introduce a "shadow log vector" buffer that is always guaranteed to be present when we gain the CIL context lock and format the item. This shadow buffer may or may not be used during the formatting, but if the log item does not have an existing log vector buffer or that buffer is too small for the new modifications, we swap it for the new shadow buffer and format the modifications into that new log vector buffer. The result of this is that for any object we modify more than once in a given CIL checkpoint, we double the memory required to track dirty regions in the log. For single modifications then we consume the shadow log vectorwe allocate on commit, and that gets consumed by the checkpoint. However, if we make multiple modifications, then the second transaction commit will allocate a shadow log vector and hence we will end up with double the memory usage as only one of the log vectors is consumed by the CIL checkpoint. The remaining shadow vector will be freed when th elog item is freed. This can probably be optimised in future - access to the shadow log vector is serialised by the object lock (as opposited to the active log vector, which is controlled by the CIL context lock) and so we can probably free shadow log vector from some objects when the log item is marked clean on removal from the AIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* Merge branch 'xfs-4.7-misc-fixes' into for-nextDave Chinner2016-05-201-1/+1
|\
| * xfs: mute some sparse warningsEryu Guan2016-04-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | These three warnings are fixed: fs/xfs/xfs_inode.c:1033:44: warning: Using plain integer as NULL pointer fs/xfs/xfs_inode_item.c:525:20: warning: context imbalance in 'xfs_inode_item_push' - unexpected unlock fs/xfs/xfs_dquot.c:696:1: warning: symbol 'xfs_dq_get_next_id' was not declared. Should it be static? Signed-off-by: Eryu Guan <guaneryu@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | xfs: better xfs_trans_alloc interfaceChristoph Hellwig2016-04-061-4/+3
|/ | | | | | | | | | | | | | | | | | Merge xfs_trans_reserve and xfs_trans_alloc into a single function call that returns a transaction with all the required log and block reservations, and which allows passing transaction flags directly to avoid the cumbersome _xfs_trans_alloc interface. While we're at it we also get rid of the transaction type argument that has been superflous since we stopped supporting the non-CIL logging mode. The guts of it will be removed in another patch. [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: Split default quota limits by quota typeCarlos Maiolino2016-02-081-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Default quotas are globally set due historical reasons. IRIX only supported user and project quotas, and default quota was only applied to user quotas. In Linux, when a default quota is set, all different quota types inherits the same default value. An user with a quota limit larger than the default quota value, will still be limited to the default value because the group quotas also inherits the default quotas. Unless the group which the user belongs to have a custom quota limit set. This patch aims to split the default quota value by quota type. Allowing each quota type having different default values. Default time limits are still set globally. XFS does not set a per-user/group timer, but a single global timer. For changing this behavior, some changes should be made in user-space tools another bugs being fixed. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: wire up Q_XGETNEXTQUOTA / get_nextdqblkEric Sandeen2016-02-081-0/+96
| | | | | | | | | | | | | | | Add code to allow the Q_XGETNEXTQUOTA quotactl to quickly find all active quotas by examining the quota inode, and skipping over unallocated or uninitialized regions. Userspace can then use this interface rather than i.e. a getpwent() loop when asked to report all active quotas. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: get quota inode from mp & flags rather than dqpEric Sandeen2016-02-081-1/+2
| | | | | | | | | | | | | Allow us to get the appropriate quota inode from any mp & quota flags, not necessarily associated with a particular dqp. Needed for when we are searching for the next active ID with quotas and we want to examine the quota inode. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: don't overflow quota ID when initializing dqblkEric Sandeen2016-02-081-2/+2
| | | | | | | | | | | | | | Quota IDs are unsigned, and so we can pass in values up to 2^32-1. But if we try to initialize a block containing values over MAX_INT, curid will overflow and assert. curid holds a quota ID, so give it the proper xfs_dqid_t type (and remove the now-impossible ASSERT). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: eliminate committed arg from xfs_bmap_finishEric Sandeen2016-01-111-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the associated comments were replicated several times across the attribute code, all dealing with what to do if the transaction was or wasn't committed. And in that replicated code, an ASSERT() test of an uninitialized variable occurs in several locations: error = xfs_attr_thing(&args); if (!error) { error = xfs_bmap_finish(&args.trans, args.flist, &committed); } if (error) { ASSERT(committed); If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish, never set "committed", and then test it in the ASSERT. Fix this up by moving the committed state internal to xfs_bmap_finish, and add a new inode argument. If an inode is passed in, it is passed through to __xfs_trans_roll() and joined to the transaction there if the transaction was committed. xfs_qm_dqalloc() was a little unique in that it called bjoin rather than ijoin, but as Dave points out we can detect the committed state but checking whether (*tpp != tp). Addresses-Coverity-Id: 102360 Addresses-Coverity-Id: 102361 Addresses-Coverity-Id: 102363 Addresses-Coverity-Id: 102364 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: per-filesystem stats counter implementationBill O'Donnell2015-10-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | This patch modifies the stats counting macros and the callers to those macros to properly increment, decrement, and add-to the xfs stats counts. The counts for global and per-fs stats are correctly advanced, and cleared by writing a "1" to the corresponding clear file. global counts: /sys/fs/xfs/stats/stats per-fs counts: /sys/fs/xfs/sda*/stats/stats global clear: /sys/fs/xfs/stats/stats_clear per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around stats structures and macros. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* Merge branch 'xfs-misc-fixes-for-4.3-2' into for-nextDave Chinner2015-08-201-1/+1
|\
| * xfs: dquots should be stamped with sb_meta_uuidDave Chinner2015-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | Once the sb_uuid is changed, the wrong uuid is stamped into new dquots on disk. Found by inspection, verified by generic/219. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | xfs: add helper to conditionally remove items from the AILBrian Foster2015-08-191-6/+2
|/ | | | | | | | | Several areas of code duplicate a pattern where we take the AIL lock, check whether an item is in the AIL and remove it if so. Create a new helper for this pattern and use it where appropriate. Signed-off-by: Brian Foster <bfoster@redhat.com>
* xfs: saner xfs_trans_commit interfaceChristoph Hellwig2015-06-041-1/+1
| | | | | | | | | | | | | | | The flags argument to xfs_trans_commit is not useful for most callers, as a commit of a transaction without a permanent log reservation must pass 0 here, and all callers for a transaction with a permanent log reservation except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove the flags argument from the public xfs_trans_commit interfaces, and introduce low-level __xfs_trans_commit variant just for xfs_trans_roll that regrants a log reservation instead of releasing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: remove the flags argument to xfs_trans_cancelChristoph Hellwig2015-06-041-5/+1
| | | | | | | | | | | | | | | | | | | | xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and XFS_TRANS_ABORT. Both of them are a direct product of the transaction state, and can be deducted: - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled, and XFS_TRANS_ABORT is a noop for a transaction that is not dirty. - any transaction with a permanent log reservation needs XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent log reservation is invalid. So just remove the flags argument and do the right thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: move most of xfs_sb.h to xfs_format.hChristoph Hellwig2014-11-281-1/+0
| | | | | | | | | More on-disk format consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: merge xfs_ag.h into xfs_format.hChristoph Hellwig2014-11-281-1/+0
| | | | | | | | | | More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: quotacheck leaves dquot buffers without verifiersDave Chinner2014-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running xfs/305, I noticed that quotacheck was flushing dquot buffers that did not have the xfs_dquot_buf_ops verifiers attached: XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8 ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00 DQ....e......... ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001 ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000 ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80 Call Trace: [<ffffffff81cf1cca>] dump_stack+0x45/0x56 [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0 [<ffffffff810db520>] ? wake_up_state+0x20/0x20 [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0 [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220 [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90 [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90 [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0 [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0 [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0 [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340 [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0 [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170 [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20 [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0 [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120 [<ffffffff811e757e>] do_mount+0x23e/0xad0 [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50 [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150 [<ffffffff811e8103>] SyS_mount+0x83/0xc0 [<ffffffff81cfd40b>] tracesys+0xdd/0xe2 This was caused by dquot buffer readahead not attaching a verifier structure to the buffer when readahead was issued, resulting in the followup read of the buffer finding a valid buffer and so not attaching new verifiers to the buffer as part of the read. Also, when a verifier failure occurs, we then read the buffer without verifiers. Attach the verifiers manually after this read so that if the buffer is then written it will be verified that the corruption has been repaired. Further, when flushing a dquot we don't ask for a verifier when reading in the dquot buffer the dquot belongs to. Most of the time this isn't an issue because the buffer is still cached, but when it is not cached it will result in writing the dquot buffer without having the verfier attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: global error sign conversionDave Chinner2014-06-251-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: Nuke XFS_ERROR macroEric Sandeen2014-06-221-7/+7
| | | | | | | | | | | | | | XFS_ERROR was designed long ago to trap return values, but it's not runtime configurable, it's not consistently used, and we can do similar error trapping with ftrace scripts and triggers from userspace. Just nuke XFS_ERROR and associated bits. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: return is not a functionEric Sandeen2014-06-221-6/+6
| | | | | | | | | | | | | | | return is not a function. "return(EIO);" is silly; "return (EIO);" moreso. return is not a function. Nuke the pointless parens. [dchinner: catch a couple of extra cases in xfs_attr_list.c, xfs_acl.c and xfs_linux.h.] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* Merge branch 'xfs-misc-fixes-3-for-3.16' into for-nextDave Chinner2014-06-091-3/+3
|\
| * xfs: kill xfs_buf_geterror()Dave Chinner2014-06-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the callers are just calling ASSERT(!xfs_buf_geterror()) which means they are checking for bp->b_error == 0. If bp is null in this case, we will assert fail, and hence it's no different in result to oopsing because of a null bp. In some cases, errors have already been checked for or the function returning the buffer can't return a buffer with an error, so it's just a redundant assert. Either way, the assert can either be removed. The other two non-assert callers can just test for a buffer and error properly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | xfs: remove dquot hintsDave Chinner2014-05-051-45/+8
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | group and project quota hints are currently stored on the user dquot. If we are attaching quotas to the inode, then the group and project dquots are stored as hints on the user dquot to save having to look them up again later. The thing is, the hints are not used for that inode for the rest of the life of the inode - the dquots are attached directly to the inode itself - so the only time the hints are used is when an inode first has dquots attached. When the hints on the user dquot don't match the dquots being attache dto the inode, they are then removed and replaced with the new hints. If a user is concurrently modifying files in different group and/or project contexts, then this leads to thrashing of the hints attached to user dquot. If user quotas are not enabled, then hints are never even used. So, if the hints are used to avoid the cost of the lookup, is the cost of the lookup significant enough to justify the hint infrstructure? Maybe it was once, when there was a global quota manager shared between all XFS filesystems and was hash table based. However, lookups are now much simpler, requiring only a single lock and radix tree lookup local to the filesystem and no hash or LRU manipulations to be made. Hence the cost of lookup is much lower than when hints were implemented. Turns out that benchmarks show that, too, with thir being no differnce in performance when doing file creation workloads as a single user with user, group and project quotas enabled - the hints do not make the code go any faster. In fact, removing the hints shows a 2-3% reduction in the time it takes to create 50 million inodes.... So, let's just get rid of the hints and the complexity around them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: use tr_qm_dqalloc log reservation for dquot allocBrian Foster2014-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dquot allocation path in xfs_qm_dqread() currently uses the attribute set log reservation, which appears to be incorrect. We have reports of transaction reservation overruns with the current code. E.g., a repeated run of xfstests test generic/270 on a 512b block size fs occassionally produces the following in dmesg: XFS (sdN): xlog_write: reservation summary: trans type = QM_DQALLOC (30) unit res = 7080 bytes current res = -632 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 XFS (sdN): xlog_write: reservation ran out. Need to up reservation The dquot allocation case should consist of a write reservation (i.e., we are allocating a range of the internal quota file) plus the size of the actual dquots. We already have a log reservation definition for this operation (tr_qm_dqalloc). Use it in xfs_qm_dqread() and update the log reservation calculation function to use the write res. calculation function rather than reading the assumed to be pre-calculated value directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: use xfs_ilock_data_map_shared in xfs_qm_dqtobpChristoph Hellwig2013-12-181-3/+4
| | | | | | | | | | We might not have read in the extent list at this point, so make sure we take the ilock exclusively if we have to do so. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: decouple inode and bmap btree header filesDave Chinner2013-10-231-4/+1
| | | | | | | | | | | | | | | | | | | | Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: decouple log and transaction headersDave Chinner2013-10-231-4/+6
| | | | | | | | | | | | | | | | | | | | | xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split dquot buffer operations outDave Chinner2013-10-231-114/+3
| | | | | | | | | | | Parts of userspace want to be able to read and modify dquot buffers (e.g. xfs_db) so we need to split out the reading and writing of these buffers so it is easy to shared code with libxfs in userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: create a shared header file for format-related informationDave Chinner2013-10-231-0/+1
| | | | | | | | | | | | | | | | | | | All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: lockdep needs to know about 3 dquot-deep nestingDave Chinner2013-10-011-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: convert dquot cache lru to list_lruDave Chinner2013-09-111-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the XFS dquot lru to use the list_lru construct and convert the shrinker to being node aware. [glommer@openvz.org: edited for conflicts + warning fixes] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* xfs: refactor xfs_trans_reserve() interfaceJie Liu2013-08-131-4/+2
| | | | | | | | | | | | | | | | With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: create xfs_bmap_util.[ch]Dave Chinner2013-08-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | There is a bunch of code in xfs_bmap.c that is kernel specific and not shared with userspace. To minimise the difference between the kernel and userspace code, shift this unshared code to xfs_bmap_util.c, and the declarations to xfs_bmap_util.h. The biggest issue here is xfs_bmap_finish() - userspace has it's own definition of this function, and so we need to move it out of xfs_bmap.[ch]. This means several other files need to include xfs_bmap_util.h as well. It also introduces and interesting dance for the stack switching code in xfs_bmapi_allocate(). The stack switching/workqueue code is actually moved to xfs_bmap_util.c, so that userspace can simply use a #define in a header file to connect the dots without needing to know about the stack switch code at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: separate dquot on disk format definitions out of xfs_quota.hDave Chinner2013-08-121-0/+1
| | | | | | | | | | | | | The on disk format definitions of the on-disk dquot, log formats and quota off log formats are all intertwined with other definitions for quotas. Separate them out into their own header file so they can easily be shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Add pquota fields where gquota is used.Chandra Seetharaman2013-07-111-3/+12
| | | | | | | | | | | | | | | | | | | Add project quota changes to all the places where group quota field is used: * add separate project quota members into various structures * split project quota and group quotas so that instead of overriding the group quota members incore, the new project quota members are used instead * get rid of usage of the OQUOTA flag incore, in favor of separate group and project quota flags. * add a project dquot argument to various functions. Not using the pquotino field from superblock yet. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Code cleanup and removal of some typedef usageChandra Seetharaman2013-06-281-6/+6
| | | | | | | | | | | In preparation for combined pquota/gquota support, for the sake of readability, do some code cleanup surrounding the affected code. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Replace macro XFS_DQ_TO_QIP with a functionChandra Seetharaman2013-06-281-1/+1
| | | | | | | | | | In preparation for combined pquota/gquota support, for the sake of readability, change the macro to an inline function. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Replace macro XFS_DQUOT_TREE with a functionChandra Seetharaman2013-06-281-1/+1
| | | | | | | | | | In preparation for combined pquota/gquota support, for the sake of readability, change the macro to an inline function. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: rework dquot CRCsDave Chinner2013-06-051-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculating dquot CRCs when the backing buffer is written back just doesn't work reliably. There are several places which manipulate dquots directly in the buffers, and they don't calculate CRCs appropriately, nor do they always set the buffer up to calculate CRCs appropriately. Firstly, if we log a dquot buffer (e.g. during allocation) it gets logged without valid CRC, and so on recovery we end up with a dquot that is not valid. Secondly, if we recover/repair a dquot, we don't have a verifier attached to the buffer and hence CRCs are not calculated on the way down to disk. Thirdly, calculating the CRC after we've changed the contents means that if we re-read the dquot from the buffer, we cannot verify the contents of the dquot are valid, as the CRC is invalid. So, to avoid all the dquot CRC errors that are being detected by the read verifier, change to using the same model as for inodes. That is, dquot CRCs are calculated and written to the backing buffer at the time the dquot is flushed to the backing buffer. If we modify the dquot directly in the backing buffer, calculate the CRC immediately after the modification is complete. Hence the dquot in the on-disk buffer should always have a valid CRC. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>