summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: do not resize a seeding deviceLiu Bo2012-06-151-0/+7
| | | | | | | Seeding devices are not supposed to change any more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix missing inherited flag in renameLiu Bo2012-06-151-3/+6
| | | | | | | | | | | | | When we move a file into a directory with compression flag, we need to inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well. But if we move a file into a directory without compression flag, we need to clear both of them. It is the way how our setflags deals with compression flag, so keep the same behaviour here. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason2012-06-153-35/+70
|\ | | | | | | for-linus
| * Btrfs: fix race in tree mod log additionJan Schmidt2012-06-141-4/+19
| | | | | | | | | | | | | | | | | | When adding to the tree modification log, we grab two locks at different stages. We must not drop the outer lock until we're done with section protected by the inner lock. This moves the unlock call for the outer lock to the appropriate position. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add btrfs_next_old_leafJan Schmidt2012-06-143-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | To make sense of the tree mod log, the backref walker not only needs btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was calling btrfs_search_slot. This obviously didn't give the correct result. This commit adds btrfs_next_old_leaf, a drop-in replacement for btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot with this time_seq parameter. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: fix return value for __tree_mod_log_oldest_rootJan Schmidt2012-06-141-13/+20
| | | | | | | | | | | | | | | | | | | | In __tree_mod_log_oldest_root() we must return the found operation even if it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there are no operations to be rewinded and returns immediately. The code in the caller is modified to improve readability. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: use btrfs_read_lock_root_node in get_old_rootJan Schmidt2012-06-141-4/+16
| | | | | | | | | | | | | | | | | | get_old_root could race with root node updates because we weren't locking the node early enough. Use btrfs_read_lock_root_node to grab the root locked in the very beginning and release the lock as soon as possible (just like btrfs_search_slot does). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_refJan Schmidt2012-06-141-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When resolving indirect refs, we used to call btrfs_next_leaf in case we didn't find an exact match. While we should find exact matches most of the time, in case we don't, we must continue searching. Treating those matches differently depending on the level we're searching doesn't make sense. Even worse, we might end up searching for a key larger than the largest, in which case there is no next_leaf and subsequent jobs would fail. This commit drops the bogous lines. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: remove call to btrfs_header_nritems with no effectJan Schmidt2012-06-041-3/+0
| | | | | | | | | | | | | | | | This is a leftover from cleanup patch 559af821. Before the cleanup, btrfs_header_nritems was called inside an if condition. As it has no side effects we need to preserve here, it should simply be dropped. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | Btrfs: fix incompat flags settingLi Zefan2012-06-151-1/+1
| | | | | | | | | | | | | | It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which has only one bit set. Signed-off-by: Li Zefan <lizefan@huawei.com>
* | Btrfs: fix defrag regressionLi Zefan2012-06-151-48/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file has 3 small extents: | ext1 | ext2 | ext3 | Running "btrfs fi defrag" will only defrag the last two extents, if those extent mappings hasn't been read into memory from disk. This bug was introduced by commit 17ce6ef8d731af5edac8c39e806db4c7e1f6956f ("Btrfs: add a check to decide if we should defrag the range") The cause is, that commit looked into previous and next extents using lookup_extent_mapping() only. While at it, remove the code that checks the previous extent, since it's sufficient to check the next extent. Signed-off-by: Li Zefan <lizefan@huawei.com>
* | Btrfs: call filemap_fdatawrite twice for compressionJosef Bacik2012-06-153-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: keep inode pinned when compressing writesJosef Bacik2012-06-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported lots of problems using compression on the new code and it turns out part of the problem was that igrab() was failing when we added a new ordered extent. This is because when writing out an inode under compression we immediately return without actually doing anything to the pages, and then in another thread at some point down the line actually do the ordered dance. The problem is between the point that we start writeback and we actually add the ordered extent we could be trying to reclaim the inode, which makes igrab() return NULL. So we need to do an igrab() when we create the async extent and then drop it when we are done with it. This makes sure we stay pinned in memory until the ordered extent can get a reference on it and we are good to go. With this patch we no longer panic in btrfs_finish_ordered_io(). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: implement ->show_devnameJosef Bacik2012-06-151-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because btrfs can remove the device that was mounted we need to have a ->show_devname so that in this case we can print out some other device in the file system to /proc/mount. So if there are multiple devices in a btrfs file system we will just print the device with the lowest devid that we can find. This will make everything consistent and deal with device removal properly. The drawback is if you mount with a device that is higher than the lowest devicd it won't show up as the mounted device in /proc/mounts, but this is a small price to pay. This was inspired by Miao Xie's patch. Thanks, Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: use rcu to protect device->nameJosef Bacik2012-06-158-64/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: unlock everything properly in the error case for nocowJosef Bacik2012-06-151-2/+35
| | | | | | | | | | | | | | | | | | I was getting hung on umount when a transaction was aborted because a range of one of the free space inodes was still locked. This is because the nocow stuff doesn't unlock anything on error. This fixed the problem and I verified that is what was happening. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: fix btrfs_destroy_marked_extentsJosef Bacik2012-06-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | So we're forcing the eb's to have their ref count set to 1 so invalidatepage works but this breaks lots of things, for example root nodes, and is just plain wrong, we don't need to just evict all of this stuff. Also drop the invalidatepage altogether and add a page_cache_release(). With this patch we no longer hang when trying to access the root nodes after an aborted transaction and we no longer leak memory. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: abort the transaction if the commit failsJosef Bacik2012-06-151-2/+8
| | | | | | | | | | | | | | | | | | If a transaction commit fails we don't abort it so we don't set an error on the file system. This patch fixes that by actually calling the abort stuff and then adding a check for a fs error in the transaction start stuff to make sure it is caught properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: wake up transaction waiters when aborting a transactionJosef Bacik2012-06-152-6/+7
| | | | | | | | | | | | | | | | | | | | | | I was getting lots of hung tasks and a NULL pointer dereference because we are not cleaning up the transaction properly when it aborts. First we need to reset the running_transaction to NULL so we don't get a bad dereference for any start_transaction callers after this. Also we cannot rely on waitqueue_active() since it's just a list_empty(), so just call wake_up() directly since that will do the barrier for us and such. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: fix locking in btrfs_destroy_delayed_refsJosef Bacik2012-06-151-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | The transaction abort stuff was throwing warnings from the list debugging code because we do a list_del_init outside of the delayed_refs spin lock. The delayed refs locking makes baby Jesus cry so it's not hard to get wrong, but we need to take the ref head mutex to make sure it's not being processed currently, and so if it is we need to drop the spin lock and then take and drop the mutex and do the search again. If we can take the mutex then we can safely remove the head from the list and carry on. Now when the transaction aborts I don't get the list debugging warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an errorJosef Bacik2012-06-151-2/+2
|/ | | | | | | | | | | While doing my enospc work I got a transaction abortion that resulted in a panic when we tried to unlock_page() an already unlocked page. This is because we aren't calling extent_clear_unlock_delalloc with the locked page so it was unlocking all the pages in the range. This is wrong since __extent_writepage expects to have the page locked still unless we return *page_started as 1. This should keep us from panicing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason2012-05-3114-252/+1368
|\ | | | | | | | | | | | | | | | | for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix tree mod log rewinded level and rewinding of moved keysJan Schmidt2012-05-311-2/+4
| | | | | | | | | | | | | | | | | | | | | | When we rewind REMOVE_WHILE_FREEING operations, there's code that allocates a fresh buffer instead of cloning the old one. Setting that buffer's level correctly was missing in this case. When rewinding a MOVE_KEYS operation, btrfs_node_key_ptr_offset(slot) was missing for memmove_extent_buffer()'s arguments. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: fix tree mod log del_ptrJan Schmidt2012-05-311-6/+7
| | | | | | | | | | | | | | Logging for del_ptr when we're not deleting the last pointer was wrong. This fixes both, duplicate log entries and log sequence. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add tree_mod_dont_log helperJan Schmidt2012-05-311-9/+15
| | | | | | | | | | | | Replace duplicate code by small inline helper function. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add missing spin_lock for insertion into tree mod logJan Schmidt2012-05-311-5/+18
| | | | | | | | | | | | | | tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before doing so. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add inodes before dropping the extent lock in find_all_leafsJan Schmidt2012-05-313-6/+43
| | | | | | | | | | | | | | | | | | | | We must build up the inode list with the extent lock held after following indirect refs. This also requires an extension to ulists, which allows to modify the stored aux value in case a key already exists in the list. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: use delayed ref sequence numbers for all fs-tree updatesJan Schmidt2012-05-303-23/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The sequence number for delayed refs is needed to postpone certain delayed refs for a very short period while walking backrefs. Before the tree modification log, we thought we'd only have to hold back those references that don't have a counter operation. While now we've the tree mod log, we're rewinding fs tree blocks to a defined consistent state. We cannot know in advance for which tree block we'll be doing rewind operations later. Therefore, we must postpone all the delayed refs for fs-tree blocks, even those having a counter operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: tree mod log sanity checks in join_transactionJan Schmidt2012-05-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | When a fresh transaction begins, the tree mod log must be clean. Users of the tree modification log must ensure they never span across transaction boundaries. We reset the sequence to 0 in this safe situation to make absolutely sure overflow can't happen. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: fs_info variable for join_transactionJan Schmidt2012-05-301-18/+19
| | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: use the tree modification log for backref resolvingJan Schmidt2012-05-302-17/+29
| | | | | | | | | | | | | | | | This enables backref resolving on life trees while they are changing. This is a prerequisite for quota groups and just nice to have for everything else. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add btrfs_search_old_slotJan Schmidt2012-05-302-4/+317
| | | | | | | | | | | | | | | | | | The tree modification log together with the current state of the tree gives a consistent, old version of the tree. btrfs_search_old_slot is used to search through this old version and return old (dummy!) extent buffers. Naturally, this function cannot do any tree modifications. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add del_ptr and insert_ptr modifications to the tree mod logJan Schmidt2012-05-301-10/+32
| | | | | | | | | | | | | | Record all relevant modifications to block pointers in the tree mod log so that we can rewind them later on for backref walking. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: put all block modifications into the tree mod logJan Schmidt2012-05-301-0/+36
| | | | | | | | | | | | | | | | | | When running functions that can make changes to the internal trees (e.g. btrfs_search_slot), we check if somebody may be interested in the block we're currently modifying. If so, we record our modification to be able to rewind it later on. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add tree modification log functionsJan Schmidt2012-05-302-1/+412
| | | | | | | | | | | | | | | | | | | | | | | | The tree mod log will log modifications made fs-tree nodes. Most modifications are done by autobalance of the tree. Such changes are recorded as long as a block entry exists. When released, the log is cleaned. With the tree modification log, it's possible to reconstruct a consistent old state of the tree. This is required to do backref walking on a busy file system. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add tree mod log to fs_infoJan Schmidt2012-05-262-0/+14
| | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: dummy extent buffers for tree mod logJan Schmidt2012-05-262-7/+76
| | | | | | | | | | | | | | | | The tree modification log needs two ways to create dummy extent buffers, once by allocating a fresh one (to rebuild an old root) and once by cloning an existing one (to make private rewind modifications) to it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: move struct seq_list to ctree.hJan Schmidt2012-05-262-5/+7
| | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: don't set for_cow parameter for tree block functionsJan Schmidt2012-05-265-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed parameter for_cow = 1. In fact, these two functions should never mark their tree modification operations as for_cow, because they can change the number of blocks referenced by a tree. Hence, we remove the extra for_cow parameter from these functions and make them pass a zero down. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: look into the extent during find_all_leafsJan Schmidt2012-05-262-84/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch we called find_all_leafs for a data extent, then called find_all_roots and then looked into the extent to grab the information we were seeking. This was done without holding the leaves locked to avoid deadlocks. However, this can obviouly race with concurrent tree modifications. Instead, we now look into the extent while we're holding the lock during find_all_leafs and store this information together with the leaf list. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: bugfix: ignore the wrong key for indirect tree block backrefsJan Schmidt2012-05-261-50/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The key we store with a tree block backref is only a hint. It is set when the ref is created and can remain correct for a long time. As the tree is rebalanced, however, eventually the key no longer points to the correct destination. With this patch, we change find_parent_nodes to no longer add keys unless it knows for sure they're correct (e.g. because they're for an extent data backref). Then when we later encounter a backref ref with no parent and no key set, we grab the block and take the first key from the block itself. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: bugfix in btrfs_find_parent_nodesJan Schmidt2012-05-261-2/+3
| | | | | | | | | | | | | | | | | | That one has been around since the addition of backref.c. Due to the way we calculate our slot numbers, after adding inline refs we're missing one keyed ref unless it's located at the beginning of a new leaf. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: ulist realloc bugfixJan Schmidt2012-05-263-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ulist_next gets the pointer to the previously returned element to find the next element from there. However, when we call ulist_add while iteration with ulist_next is in progress (ulist explicitly supports this), we can realloc the ulist internal memory, which makes the pointer to the previous element useless. Instead, we now use an iterator parameter that's independent from the internal pointers. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | Merge branch 'for-chris' of ↵Chris Mason2012-05-3029-616/+1483
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into HEAD
| * | Btrfs: fix false positive in check-integrity on unmountStefan Behrens2012-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | During unmount, it could happen that the integrity checker printed a warning message "attempt to free ... on umount which is not yet iodone" which turned out to be a false positive. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| * | Btrfs: fix runtime warning in check-integrity check data modeStefan Behrens2012-05-301-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a file_extent_item was located at the very end of a leaf and there was not enough space to hold a full item, but there was enough space to hold one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a short item, a warning was printed anyway. This check is now fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| * | Btrfs: set ioprio of scrub readahead to idleStefan Behrens2012-05-302-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Reduce ioprio class of scrub readahead threads to idle priority. This setting is fixed. This priority has shown the best performance during all measurements. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| * | Btrfs: fix return code in drop_objectid_itemsJosef Bacik2012-05-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So dpkg fsync()'s the file and the directory containing the file whenever it writes to a file which is really slow in btrfs. This is partly because fsync()'ing a directory _always_ committed the transaction instead of just going to the tree log. This is because drop_objectid_items() would return 1 since it does a btrfs_search_slot() which returns 1. In tree-log jargon this means that we have to commit the transaction to be safe. So just check if ret is greater than 0 and set it to 0 if it does. With this patch we now use the tree-log instead of committing the entire transaction, which is twice as fast on my box. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | Btrfs: check to see if the inode is in the log before fsyncingJosef Bacik2012-05-303-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have this check down in the actual logging code, but this is after we start a transaction and all that good stuff. So move the helper inode_in_log() out so we can call it in fsync() and avoid starting a transaction altogether and just exit if we've already fsync()'ed this file recently. You would notice this issue if you fsync()'ed a file over and over again until the transaction committed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | Btrfs: return value of btrfs_read_buffer is checked correctlyTsutomu Itoh2012-05-302-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | btrfs_read_buffer() has the possibility of returning the error. Therefore, I add the code in which the return value of btrfs_read_buffer() is checked. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>