summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_dquot.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: run an eofblocks scan on ENOSPC/EDQUOTBrian Foster2014-07-241-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: remove dquot hintsDave Chinner2014-05-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | group and project quota hints are currently stored on the user dquot. If we are attaching quotas to the inode, then the group and project dquots are stored as hints on the user dquot to save having to look them up again later. The thing is, the hints are not used for that inode for the rest of the life of the inode - the dquots are attached directly to the inode itself - so the only time the hints are used is when an inode first has dquots attached. When the hints on the user dquot don't match the dquots being attache dto the inode, they are then removed and replaced with the new hints. If a user is concurrently modifying files in different group and/or project contexts, then this leads to thrashing of the hints attached to user dquot. If user quotas are not enabled, then hints are never even used. So, if the hints are used to avoid the cost of the lookup, is the cost of the lookup significant enough to justify the hint infrstructure? Maybe it was once, when there was a global quota manager shared between all XFS filesystems and was hash table based. However, lookups are now much simpler, requiring only a single lock and radix tree lookup local to the filesystem and no hash or LRU manipulations to be made. Hence the cost of lookup is much lower than when hints were implemented. Turns out that benchmarks show that, too, with thir being no differnce in performance when doing file creation workloads as a single user with user, group and project quotas enabled - the hints do not make the code go any faster. In fact, removing the hints shows a 2-3% reduction in the time it takes to create 50 million inodes.... So, let's just get rid of the hints and the complexity around them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: create a shared header file for format-related informationDave Chinner2013-10-231-2/+0
| | | | | | | | | | | | | | | | | | | All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Add pquota fields where gquota is used.Chandra Seetharaman2013-07-111-2/+5
| | | | | | | | | | | | | | | | | | | Add project quota changes to all the places where group quota field is used: * add separate project quota members into various structures * split project quota and group quotas so that instead of overriding the group quota members incore, the new project quota members are used instead * get rid of usage of the OQUOTA flag incore, in favor of separate group and project quota flags. * add a project dquot argument to various functions. Not using the pquotino field from superblock yet. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Replace macro XFS_DQ_TO_QIP with a functionChandra Seetharaman2013-06-281-4/+0
| | | | | | | | | | In preparation for combined pquota/gquota support, for the sake of readability, change the macro to an inline function. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: xfs_dquot prealloc throttling watermarks and low free spaceBrian Foster2013-03-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | Enable tracking of high and low watermarks for preallocation throttling of files under quota restrictions. These values are calculated when the quota limit is read from disk or modified and cached for later use by the throttling algorithm. The high watermark specifies when preallocation is disabled, the low watermark specifies when throttling is enabled and the low free space data structure contains precalculated low free space limits to serve as input to determine the level of throttling required. Note that the low free space data structure is based on the existing global low free space data structure with the exception of using three stages (5%, 3% and 1%) rather than five to reduce the impact of xfs_dquot memory overhead. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_tBrian Foster2013-03-221-2/+2
| | | | | | | | | | | Modify xfs_qm_adjust_dqlimits() to take the xfs_dquot as a parameter instead of just the xfs_disk_dquot_t so we can update in-memory fields if necessary. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: convert buffer verifiers to an ops structure.Dave Chinner2012-11-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To separate the verifiers from iodone functions and associate read and write verifiers at the same time, introduce a buffer verifier operations structure to the xfs_buf. This avoids the need for assigning the write verifier, clearing the iodone function and re-running ioend processing in the read verifier, and gets rid of the nasty "b_pre_io" name for the write verifier function pointer. If we ever need to, it will also be easier to add further content specific callbacks to a buffer with an ops structure in place. We also avoid needing to export verifier functions, instead we can simply export the ops structures for those that are needed outside the function they are defined in. This patch also fixes a directory block readahead verifier issue it exposed. This patch also adds ops callbacks to the inode/alloc btree blocks initialised by growfs. These will need more work before they will work with CRCs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: add pre-write metadata buffer verifier callbacksDave Chinner2012-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | These verifiers are essentially the same code as the read verifiers, but do not require ioend processing. Hence factor the read verifier functions and add a new write verifier wrapper that is used as the callback. This is done as one large patch for all verifiers rather than one patch per verifier as the change is largely mechanical. This includes hooking up the write verifier via the read verifier function. Hooking up the write verifier for buffers obtained via xfs_trans_get_buf() will be done in a separate patch as that touches code in many different places rather than just the verifier functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: verify dquot blocks as they are read from diskDave Chinner2012-11-161-0/+1
| | | | | | | | | | | | | | Add a dquot buffer verify callback function and pass it into the buffer read functions. This checks all the dquots in a buffer, but cannot completely verify the dquot ids are correct. Also, errors cannot be repaired, so an additional function is added to repair bad dquots in the buffer if such an error is detected in a context where repair is allowed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: on-stack delayed write buffer listsChristoph Hellwig2012-05-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queue delwri buffers on a local on-stack list instead of a per-buftarg one, and write back the buffers per-process instead of by waking up xfsbufd. This is now easily doable given that we have very few places left that write delwri buffers: - log recovery: Only done at mount time, and already forcing out the buffers synchronously using xfs_flush_buftarg - quotacheck: Same story. - dquot reclaim: Writes out dirty dquots on the LRU under memory pressure. We might want to look into doing more of this via xfsaild, but it's already more optimal than the synchronous inode reclaim that writes each buffer synchronously. - xfsaild: This is the main beneficiary of the change. By keeping a local list of buffers to write we reduce latency of writing out buffers, and more importably we can remove all the delwri list promotions which were hitting the buffer cache hard under sustained metadata loads. The implementation is very straight forward - xfs_buf_delwri_queue now gets a new list_head pointer that it adds the delwri buffers to, and all callers need to eventually submit the list using xfs_buf_delwi_submit or xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are skipped in xfs_buf_delwri_queue, assuming they already are on another delwri list. The biggest change to pass down the buffer list was done to the AIL pushing. Now that we operate on buffers the trylock, push and pushbuf log item methods are merged into a single push routine, which tries to lock the item, and if possible add the buffer that needs writeback to the buffer list. This leads to much simpler code than the previous split but requires the individual IOP_PUSH instances to unlock and reacquire the AIL around calls to blocking routines. Given that xfsailds now also handle writing out buffers, the conditions for log forcing and the sleep times needed some small changes. The most important one is that we consider an AIL busy as long we still have buffers to push, and the other one is that we do increment the pushed LSN for buffers that are under flushing at this moment, but still count them towards the stuck items for restart purposes. Without this we could hammer on stuck items without ever forcing the log and not make progress under heavy random delete workloads on fast flash storage devices. [ Dave Chinner: - rebase on previous patches. - improved comments for XBF_DELWRI_Q handling - fix XBF_ASYNC handling in queue submission (test 106 failure) - rename delwri submit function buffer list parameters for clarity - xfs_efd_item_push() should return XFS_ITEM_PINNED ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: do not write the buffer from xfs_qm_dqflushChristoph Hellwig2012-05-141-1/+1
| | | | | | | | | | | | | | | Instead of writing the buffer directly from inside xfs_qm_dqflush return it to the caller and let the caller decide what to do with the buffer. Also remove the pincount check in xfs_qm_dqflush that all non-blocking callers already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY check that all callers already perform. [ Dave Chinner: fixed build error cause by missing '{'. ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove the per-filesystem list of dquotsChristoph Hellwig2012-03-141-2/+0
| | | | | | | | | | | Instead of keeping a separate per-filesystem list of dquots we can walk the radix tree for the two places where we need to iterate all quota structures. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: use per-filesystem radix trees for dquot lookupChristoph Hellwig2012-03-141-12/+0
| | | | | | | | | | | Replace the global hash tables for looking up in-memory dquot structures with per-filesystem radix trees to allow scaling to a large number of in-memory dquot structures. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: per-filesystem dquot LRU listsChristoph Hellwig2012-03-141-1/+1
| | | | | | | | | | | | | | | | Replace the global dquot lru lists with a per-filesystem one. Note that the shrinker isn't wire up to the per-superblock VFS shrinker infrastructure as would have problems summing up and splitting the counts for inodes and dquots. I don't think this is a major problem as the quota cache isn't as interwinded with the inode cache as the dentry cache is, because an inode that is dropped from the cache will generally release a dquot reference, but most of the time it won't be the last one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove xfs_trans_unlocked_itemChristoph Hellwig2012-02-231-2/+1
| | | | | | | | | | | | | | | There is no reason to wake up log space waiters when unlocking inodes or dquots, and the commit log has no explanation for this function either. Given that we now have exact log space wakeups everywhere we can assume the reason for this function was to paper over log space races in earlier XFS versions. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* Define a new function xfs_inode_dquot()Chandra Seetharaman2012-02-031-0/+13
| | | | | | | | | | | | | Define a new function xfs_inode_dquot() that takes a inode pointer and a disk quota type and returns the quota pointer for the specified quota type. This simplifies the xfs_qm_dqget() error path significantly. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* Define a new function xfs_this_quota_on()Chandra Seetharaman2012-02-031-4/+13
| | | | | | | | | | | Create a new function xfs_this_quota_on() that takes a xfs_mount data structure and a disk quota type and returns true if the specified type of quota is ON in the xfs_mount data structure. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove XFS_QMOPT_DQSUSERChristoph Hellwig2011-12-151-0/+2
| | | | | | | | | | | Just read the id 0 dquot from disk directly in xfs_qm_init_quotainfo instead of going through dqget and requiring a special flag to not add the dquot to any lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: add a xfs_dqhold helperChristoph Hellwig2011-12-151-2/+8
| | | | | | | | | | | | | | | | Factor the common pattern of: xfs_dqlock(dqp); XFS_DQHOLD(dqp); xfs_dqunlock(dqp); into a new helper, and remove XFS_DQHOLD now that only one other caller is left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: flatten the dquot lock orderingChristoph Hellwig2011-12-141-1/+1
| | | | | | | | | | | | | Introduce a new XFS_DQ_FREEING flag that tells lookup and mplist walks to skip a dquot that is beeing freed, and use this avoid the trylock on the hash and mplist locks in xfs_qm_dqreclaim_one. Also simplify xfs_dqpurge by moving the inodes to a dispose list after marking them XFS_DQ_FREEING and avoid the locker ordering constraints. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: cleanup dquot locking helpersChristoph Hellwig2011-12-131-6/+19
| | | | | | | | | | Mark the trivial lock wrappers as inline, and make the naming consistent for all of them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove subdirectoriesChristoph Hellwig2011-08-121-0/+137
Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the annoying subdirectories in the XFS source code. Besides the large amount of file rename the only changes are to the Makefile, a few files including headers with the subdirectory prefix, and the binary sysctl compat code that includes a header under fs/xfs/ from kernel/. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>