summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_mount.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: rename log structure to xlogMark Tinguely2012-06-211-2/+2
| | | | | | | | | Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Do background CIL flushes via a workqueueDave Chinner2012-05-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doing background CIL flushes adds significant latency to whatever async transaction that triggers it. To avoid blocking async transactions on things like waiting for log buffer IO to complete, move the CIL push off into a workqueue. By moving the push work into a workqueue, we remove all the latency that the commit adds from the foreground transaction commit path. This also means that single threaded workloads won't do the CIL push procssing, leaving them more CPU to do more async transactions. To do this, we need to keep track of the sequence number we have pushed work for. This avoids having many transaction commits attempting to schedule work for the same sequence, and ensures that we only ever have one push (background or forced) in progress at a time. It also means that we don't need to take the CIL lock in write mode to check for potential background push races, which reduces lock contention. To avoid potential issues with "smart" IO schedulers, don't use the workqueue for log force triggered flushes. Instead, do them directly so that the log IO is done directly by the process issuing the log force and so doesn't get stuck on IO elevator queue idling incorrectly delaying the log IO from the workqueue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: implement freezing by emptying the AILChristoph Hellwig2012-05-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we write back all metadata either synchronously or through the AIL we can simply implement metadata freezing in terms of emptying the AIL. The implementation for this is fairly simply and straight-forward: A new routine is added that asks the xfsaild to push the AIL to the end and waits for it to complete and send a wakeup. The routine will then loop if the AIL is not actually empty, and continue to do so until the AIL is compeltely empty. We keep an inode reclaim pass in the freeze process to avoid having memory pressure have to reclaim inodes that require dirtying the filesystem to be reclaimed after the freeze has completed. This means we can also treat unmount in the exact same way as freeze. As an upside we can now remove the radix tree based inode writeback and xfs_unmountfs_writesb. [ Dave Chinner: - Cleaned up commit message. - Added inode reclaim passes back into freeze. - Cleaned up wakeup mechanism to avoid the use of a new sleep counter variable. ] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: use per-filesystem I/O completion workqueuesChristoph Hellwig2012-03-051-0/+3
| | | | | | | | | | | | | | | | The new concurrency managed workqueues are cheap enough that we can create per-filesystem instead of global workqueues. This allows us to remove the trylock or defer scheme on the ilock, which is not helpful once we have outstanding log reservations until finishing a size update. Also allow the default concurrency on this workqueues so that I/O completions blocking on the ilock for one inode do not block process for another inode. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* Change xfs_sb_from_disk() interface to take a mount pointerChandra Seetharaman2012-02-031-1/+1
| | | | | | | | | | | | | Change xfs_sb_from_disk() interface to take a mount pointer instead of a superblock pointer. This is to print mount point specific error messages in future fixes. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove the deprecated nodelaylog optionChristoph Hellwig2011-12-081-1/+0
| | | | | | | | | | | The delaylog mode has been the default for a long time, and the nodelaylog option has been scheduled for removal in Linux 3.3. Remove it and code only used by it now that we have opened the 3.3 window. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Remove the second parameter to xfs_sb_count()Chandra Seetharaman2011-07-211-1/+1
| | | | | | | | | | Remove the second parameter to xfs_sb_count() since all callers of the function set them. Also, fix the header comment regarding it being called periodically. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: add online discard supportChristoph Hellwig2011-05-241-0/+1
| | | | | | | | | | | | | | Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: introduce background inode reclaim workDave Chinner2011-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background inode reclaim needs to run more frequently that the XFS syncd work is run as 30s is too long between optimal reclaim runs. Add a new periodic work item to the xfs syncd workqueue to run a fast, non-blocking inode reclaim scan. Background inode reclaim is kicked by the act of marking inodes for reclaim. When an AG is first marked as having reclaimable inodes, the background reclaim work is kicked. It will continue to run periodically untill it detects that there are no more reclaimable inodes. It will be kicked again when the first inode is queued for reclaim. To ensure shrinker based inode reclaim throttles to the inode cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the background inode reclaim so that when we are low on memory we are trying to reclaim inodes as efficiently as possible. This kick shoul d not be necessary, but it will protect against failures to kick the background reclaim when inodes are first dirtied. To provide the rate throttling, make the shrinker pass do synchronous inode reclaim so that it blocks on inodes under IO. This means that the shrinker will reclaim inodes rather than just skipping over them, but it does not adversely affect the rate of reclaim because most dirty inodes are already under IO due to the background reclaim work the shrinker kicked. These two modifications solve one of the two OOM killer invocations Chris Mason reported recently when running a stress testing script. The particular workload trigger for the OOM killer invocation is where there are more threads than CPUs all unlinking files in an extremely memory constrained environment. Unlike other solutions, this one does not have a performance impact on performance when memory is not constrained or the number of concurrent threads operating is <= to the number of CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: convert ENOSPC inode flushing to use new syncd workqueueDave Chinner2011-04-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: introduce a xfssyncd workqueueDave Chinner2011-04-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | All of the work xfssyncd does is background functionality. There is no need for a thread per filesystem to do this work - it can al be managed by a global workqueue now they manage concurrency effectively. Introduce a new gglobal xfssyncd workqueue, and convert the periodic work to use this new functionality. To do this, use a delayed work construct to schedule the next running of the periodic sync work for the filesystem. When the sync work is complete, queue a new delayed work for the next running of the sync work. For laptop mode, we wait on completion for the sync works, so ensure that the sync work queuing interface can flush and wait for work to complete to enable the work queue infrastructure to replace the current sequence number and wakeup that is used. Because the sync work does non-trivial amounts of work, mark the new work queue as CPU intensive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: dynamic speculative EOF preallocationDave Chinner2011-01-041-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the size of the speculative preallocation during delayed allocation is fixed by either the allocsize mount option of a default size. We are seeing a lot of cases where we need to recommend using the allocsize mount option to prevent fragmentation when buffered writes land in the same AG. Rather than using a fixed preallocation size by default (up to 64k), make it dynamic by basing it on the current inode size. That way the EOF preallocation will increase as the file size increases. Hence for streaming writes we are much more likely to get large preallocations exactly when we need it to reduce fragementation. For default settings, the size of the initial extents is determined by the number of parallel writers and the amount of memory in the machine. For 4GB RAM and 4 concurrent 32GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576 1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576 2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152 3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304 4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608 5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208 6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088 7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088 and for 16 concurrent 16GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144 1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144 2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288 3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576 4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152 5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304 6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608 7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208 Because it is hard to take back specualtive preallocation, cases where there are large slow growing log files on a nearly full filesystem may cause premature ENOSPC. Hence as the filesystem nears full, the maximum dynamic prealloc size іs reduced according to this table (based on 4k block size): freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) This should reduce the amount of space held in speculative preallocation for such cases. The allocsize mount option turns off the dynamic behaviour and fixes the prealloc size to whatever the mount option specifies. i.e. the behaviour is unchanged. Signed-off-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove xfs_cred.hChristoph Hellwig2010-10-181-1/+0
| | | | | | | | | We're not actually passing around credentials inside XFS for a while now, so remove all xfs_cred.h with it's cred_t typedef and all instances of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: do not use xfs_mod_incore_sb for per-cpu countersChristoph Hellwig2010-10-181-0/+4
| | | | | | | | | Export xfs_icsb_modify_counters and always use it for modifying the per-cpu counters. Remove support for per-cpu counters from xfs_mod_incore_sb to simplify it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: remove XFS_MOUNT_NO_PERCPU_SBChristoph Hellwig2010-10-181-2/+0
| | | | | | | | | Fail the mount if we can't allocate memory for the per-CPU counters. This is consistent with how we handle everything else in the mount path and makes the superblock counter modification a lot simpler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: split inode AG walking into separate code for reclaimDave Chinner2010-10-181-0/+2
| | | | | | | | | | The reclaim walk requires different locking and has a slightly different walk algorithm, so separate it out so that it can be optimised separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: remove obsolete osyncisosync mount optionChristoph Hellwig2010-07-261-2/+0
| | | | | | | | | | | Since Linux 2.6.33 the kernel has support for real O_SYNC, which made the osyncisosync option a no-op. Warn the users about this and remove the mount flag for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: drop dmapi hooksChristoph Hellwig2010-07-261-67/+0
| | | | | | | | | | | | | | Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: convert inode shrinker to per-filesystem contextsDave Chinner2010-07-201-1/+1
| | | | | | | | | | | | Now the shrinker passes us a context, wire up a shrinker context per filesystem. This allows us to remove the global mount list and the locking problems that introduced. It also means that a shrinker call does not need to traverse clean filesystems before finding a filesystem with reclaimable inodes. This significantly reduces scanning overhead when lots of filesystems are present. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: Introduce delayed logging core codeDave Chinner2010-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The delayed logging code only changes in-memory structures and as such can be enabled and disabled with a mount option. Add the mount option and emit a warning that this is an experimental feature that should not be used in production yet. We also need infrastructure to track committed items that have not yet been written to the log. This is what the Committed Item List (CIL) is for. The log item also needs to be extended to track the current log vector, the associated memory buffer and it's location in the Commit Item List. Extend the log item and log vector structures to enable this tracking. To maintain the current log format for transactions with delayed logging, we need to introduce a checkpoint transaction and a context for tracking each checkpoint from initiation to transaction completion. This includes adding a log ticket for tracking space log required/used by the context checkpoint. To track all the changes we need an io vector array per log item, rather than a single array for the entire transaction. Using the new log vector structure for this requires two passes - the first to allocate the log vector structures and chain them together, and the second to fill them out. This log vector chain can then be passed to the CIL for formatting, pinning and insertion into the CIL. Formatting of the log vector chain is relatively simple - it's just a loop over the iovecs on each log vector, but it is made slightly more complex because we re-write the iovec after the copy to point back at the memory buffer we just copied into. This code also needs to pin log items. If the log item is not already tracked in this checkpoint context, then it needs to be pinned. Otherwise it is already pinned and we don't need to pin it again. The only other complexity is calculating the amount of new log space the formatting has consumed. This needs to be accounted to the transaction in progress, and the accounting is made more complex becase we need also to steal space from it for log metadata in the checkpoint transaction. Calculate all this at insert time and update all the tickets, counters, etc correctly. Once we've formatted all the log items in the transaction, attach the busy extents to the checkpoint context so the busy extents live until checkpoint completion and can be processed at that point in time. Transactions can then be freed at this point in time. Now we need to issue checkpoints - we are tracking the amount of log space used by the items in the CIL, so we can trigger background checkpoints when the space usage gets to a certain threshold. Otherwise, checkpoints need ot be triggered when a log synchronisation point is reached - a log force event. Because the log write code already handles chained log vectors, writing the transaction is trivial, too. Construct a transaction header, add it to the head of the chain and write it into the log, then issue a commit record write. Then we can release the checkpoint log ticket and attach the context to the log buffer so it can be called during Io completion to complete the checkpoint. We also need to allow for synchronising multiple in-flight checkpoints. This is needed for two things - the first is to ensure that checkpoint commit records appear in the log in the correct sequence order (so they are replayed in the correct order). The second is so that xfs_log_force_lsn() operates correctly and only flushes and/or waits for the specific sequence it was provided with. To do this we need a wait variable and a list tracking the checkpoint commits in progress. We can walk this list and wait for the checkpoints to change state or complete easily, an this provides the necessary synchronisation for correct operation in both cases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: add a shrinker to background inode reclaimDave Chinner2010-04-291-0/+1
| | | | | | | | | | | | | | On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'for-2.6.34-rc1-batch2' into for-linusAlex Elder2010-03-051-0/+2
|\
| * xfs: kill xfs_lrw.hChristoph Hellwig2010-03-011-0/+2
| | | | | | | | | | | | | | | | | | Move the two declarations to better fitting headers now that xfs_lrw.c is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-03-031-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: add __percpu sparse annotations to what's left percpu: add __percpu sparse annotations to fs percpu: add __percpu sparse annotations to core kernel subsystems local_t: Remove leftover local.h this_cpu: Remove pageset_notifier this_cpu: Page allocator conversion percpu, x86: Generic inc / dec percpu instructions local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c module: Use this_cpu_xx to dynamically allocate counters local_t: Remove cpu_local_xx macros percpu: refactor the code in pcpu_[de]populate_chunk() percpu: remove compile warnings caused by __verify_pcpu_ptr() percpu: make accessors check for percpu pointer in sparse percpu: add __percpu for sparse. percpu: make access macros universal percpu: remove per_cpu__ prefix.
| * percpu: add __percpu sparse annotations to fsTejun Heo2010-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add __percpu sparse annotations to fs. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
* | xfs: more reserved blocks fixupsEric Sandeen2010-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This mangles the reserved blocks counts a little more. 1) add a helper function for the default reserved count 2) add helper functions to save/restore counts on ro/rw 3) save/restore reserved blocks on freeze/thaw 4) disallow changing reserved count while readonly V2: changed field name to match Dave's changes Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
* | xfs: don't hold onto reserved blocks on remount,roDave Chinner2010-01-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hold onto reserved blocks when doing a remount,ro we end up writing the blocks used count to disk that includes the reserved blocks. Reserved blocks are not actually used, so this results in the values in the superblock being incorrect. Hence if we run xfs_check or xfs_repair -n while the filesystem is mounted remount,ro we end up with an inconsistent filesystem being reported. Also, running xfs_copy on the remount,ro filesystem will result in an inconsistent image being generated. To fix this, unreserve the blocks when doing the remount,ro, and reserved them again on remount,rw. This way a remount,ro filesystem will appear consistent on disk to all utilities. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | xfs: convert DM ops to use unsigned char namesDave Chinner2010-01-201-1/+2
| | | | | | | | | | | | | | | | | | dmops uses a signed char for it's namespace event. To be consistent with the rest of the code, convert them to unsigned char for the namespace string. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | xfs: Add trace points for per-ag refcount debugging.Dave Chinner2010-01-151-23/+2
| | | | | | | | | | | | | | | | | | Uninline xfs_perag_{get,put} so that tracepoints can be inserted into them to speed debugging of reference count problems. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | xfs: Reference count per-ag structuresDave Chinner2010-01-151-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | Reference count the per-ag structures to ensure that we keep get/put pairs balanced. Assert that the reference counts are zero at unmount time to catch leaks. In future, reference counts will enable us to safely remove perag structures by allowing us to detect when they are no longer in use. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | xfs: Replace per-ag array with a radix treeDave Chinner2010-01-151-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of an array for the per-ag structures requires reallocation of the array when growing the filesystem. This requires locking access to the array to avoid use after free situations, and the locking is difficult to get right. To avoid needing to reallocate an array, change the per-ag structures to an allocated object per ag and index them using a tree structure. The AGs are always densely indexed (hence the use of an array), but the number supported is 2^32 and lookups tend to be random and hence indexing needs to scale. A simple choice is a radix tree - it works well with this sort of index. This change also removes another large contiguous allocation from the mount/growfs path in XFS. The growing process now needs to change to only initialise the new AGs required for the extra space, and as such only needs to exclusively lock the tree for inserts. The rest of the code only needs to lock the tree while doing lookups, and hence this will remove all the deadlocks that currently occur on the m_perag_lock as it is now an innermost lock. The lock is also changed to a spinlock from a read/write lock as the hold time is now extremely short. To complete the picture, the per-ag structures will need to be reference counted to ensure that we don't free/modify them while they are still in use. This will be done in subsequent patch. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | xfs: rename xfs_get_peragDave Chinner2010-01-151-4/+4
|/ | | | | | | | | | | | xfs_get_perag is really getting the perag that an inode belongs to based on it's inode number. Convert the use of this function to just get the perag from a provided ag number. Use this new function to obtain the per-ag structure when traversing the per AG inode trees for sync and reclaim. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: cleanup dmapi macros in the umount pathChristoph Hellwig2009-12-111-4/+19
| | | | | | | | | | Stop the flag saving as we never mangle those in the unmount path, and hide all the weird arguents to the dmapi code inside the XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: kill the STATIC_INLINE macroChristoph Hellwig2009-12-111-2/+2
| | | | | | | | | | | Remove our own STATIC_INLINE macro. For small function inside implementation files just use STATIC and let gcc inline it, and for those in headers do the normal static inline - they are all small enough to be inlined for debug builds, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: add more statics & drop some unused functionsEric Sandeen2009-08-311-3/+0
| | | | | | | | | | A lot more functions could be made static, but they need forward declarations; this does some easy ones, and also found a few unused functions in the process. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>
* xfs: kill xfs_qmopsChristoph Hellwig2009-06-081-82/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kill the quota ops function vector and replace it with direct calls or stubs in the CONFIG_XFS_QUOTA=n case. Make sure we check XFS_IS_QUOTA_RUNNING in the right spots. We can remove the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set otherwise. This brings us back closer to the way this code worked in IRIX and earlier Linux versions, but we keep a lot of the more useful factoring of common code. Eventually we should also kill xfs_qm_bhv.c, but that's left for a later patch. Reduces the size of the source code by about 250 lines and the size of XFS module by about 1.5 kilobytes with quotas enabled: text data bss dec hex filename 615957 2960 3848 622765 980ad fs/xfs/xfs.o 617231 3152 3848 624231 98667 fs/xfs/xfs.o.old Fallout: - xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects the inode locked and xfs_qm_dqattach which does the locking around it, thus removing XFS_QMOPT_ILOCKED. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
* xfs: use xfs_sync_inodes() for device flushingDave Chinner2009-04-061-1/+1
| | | | | | | | | | | | | | | | Currently xfs_device_flush calls sync_blockdev() which is a no-op for XFS as all it's metadata is held in a different address to the one sync_blockdev() works on. Call xfs_sync_inodes() instead to flush all the delayed allocation blocks out. To do this as efficiently as possible, do it via two passes - one to do an async flush of all the dirty blocks and a second to wait for all the IO to complete. This requires some modification to the xfs-sync_inodes_ag() flush code to do efficiently. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: remove m_attroffsetChristoph Hellwig2009-03-291-1/+0
| | | | | | | | | | | | | With the upcoming v3 inodes the default attroffset needs to be calculated for each specific inode, so we can't cache it in the superblock anymore. Also replace the assert for wrong inode sizes with a proper error check also included in non-debug builds. Note that the ENOSYS return for that might seem odd, but that error is returned by xfs_mount_validate_sb for all theoretically valid but not supported filesystem geometries. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
* xfs: fix various typosMalcolm Parsons2009-03-291-2/+2
| | | | | Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: remove m_litinoChristoph Hellwig2009-03-291-1/+0
| | | | | | | | | | With the upcoming v3 inodes the inode data/attr area size needs to be calculated for each specific inode, so we can't cache it in the superblock anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com>
* xfs: kill ino64 mount optionChristoph Hellwig2009-03-291-4/+0
| | | | | | | | | | | The ino64 mount option adds a fixed offset to 32bit inode numbers to bring them into the 64bit range. There's no need for this kind of debug tool given that it's easy to produce real 64bit inode numbers for testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com>
* xfs: get rid of indirections in the quotaops implementationChristoph Hellwig2009-02-091-4/+0
| | | | | | | | | Currently we call from the nicely abstracted linux quotaops into a ugly multiplexer just to split the calls out at the same boundary again. Rewrite the quota ops handling to remove that obfucation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
* xfs: remove superflous inobt macrosChristoph Hellwig2009-02-091-1/+1
| | | | | | | | | xfs_ialloc_btree.h has a a cuple of macros that only obsfucate the code but don't provide any abstraction benefits. This patches removes those and cleans up the reamaining defintions up a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
* xfs: remove unused XFS_MOUNT_ILOCK/XFS_MOUNT_IUNLOCKChristoph Hellwig2009-02-041-3/+0
| | | | | | | These aren't only unused but also reference a lock that doesn't exist anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>
* xfs: fix bad_features2 fixups for the root filesystemChristoph Hellwig2009-01-191-0/+3
| | | | | | | | | | | | | Currently the bad_features2 fixup and the alignment updates in the superblock are skipped if we mount a filesystem read-only. But for the root filesystem the typical case is to mount read-only first and only later remount writeable so we'll never perform this update at all. It's not a big problem but means the logs of people needing the fixup get spammed at every boot because they never happen on disk. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
* [XFS] Remove the rest of the macro-to-function indirections.Eric Sandeen2009-01-161-4/+2
| | | | | | | | Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
* [XFS] resync headers with libxfsChristoph Hellwig2008-12-111-2/+2
| | | | | | | | | | | | | - xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been around in userspace for some time - xfs_inode.h: move a few things out of __KERNEL__ that are needed by userspace - xfs_mount.h: only include xfs_sync.h under __KERNEL__ - xfs_inode.c: minor whitespace fixup. I accidentaly changes this when importing this file for use by userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
* kill xfs_unmount_flushChristoph Hellwig2008-12-041-1/+0
| | | | | | | | | | | | There's almost nothing left in this function, instead remove the IRELE on the real times inodes and the call to XFS_QM_UNMOUNT into xfs_unmountfs. For the regular unmount case that means it now also happenes after dmapi notification, but otherwise there is no difference in behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>
* no explicit xfs_iflush for special inodes during unmountChristoph Hellwig2008-12-041-1/+1
| | | | | | | | | | | Currently we explicitly call xfs_iflush on the quota, real-time and root inodes from xfs_unmount_flush. But we just called xfs_sync_inodes with SYNC_ATTR and do an XFS_bflush aka xfs_flush_buftarg to make sure all inodes are on disk already, so there is no need for these special cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>
* remove leftovers of shared read-only supportChristoph Hellwig2008-12-041-2/+0
| | | | | | | | | We never supported shared read-only filesystems, so remove the dead code left over from IRIX for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>