summaryrefslogtreecommitdiffstats
path: root/fs/btrfs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRsDan Carpenter2010-06-111-0/+4
| | | | | | | | | btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a check for that. It's not clear to me if it can also return NULL pointers or not so I left the original NULL pointer check as is. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: unwind after btrfs_start_transaction() errorsDan Carpenter2010-06-111-1/+1
| | | | | | | | | This was added by a22285a6a3: "Btrfs: Integrate metadata reservation with start_transaction". If we goto out here then we skip all the unwinding and there are locks still held etc. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: btrfs_iget() returns ERR_PTRDan Carpenter2010-06-111-2/+2
| | | | | | | btrfs_iget() returns an ERR_PTR() on failure and not null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: handle kzalloc() failure in open_ctree()Dan Carpenter2010-06-111-2/+5
| | | | | | | Unwind and return -ENOMEM if the allocation fails here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: handle error returns from btrfs_lookup_dir_item()Dan Carpenter2010-06-111-0/+2
| | | | | | | | If btrfs_lookup_dir_item() fails, we should can just let the mount fail with an error. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix BUG_ON for fs converted from extNYan, Zheng2010-06-111-1/+2
| | | | | | | | Tree blocks can live in data block groups in FS converted from extN. So it's easy to trigger the BUG_ON. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix null dereference in relocation.cYan, Zheng2010-06-111-3/+4
| | | | | | | | Fix a potential null dereference in relocation.c Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Acked-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix remap_file_pages errorMiao Xie2010-06-111-1/+8
| | | | | | | | when we use remap_file_pages() to remap a file, remap_file_pages always return error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: uninitialized data is check_path_shared()Dan Carpenter2010-06-111-1/+1
| | | | | | | | | | refs can be used with uninitialized data if btrfs_lookup_extent_info() fails on the first pass through the loop. In the original code if that happens then check_path_shared() probably returns 1, this patch changes it to return 1 for safety. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix fallocate regressionJosef Bacik2010-06-111-1/+1
| | | | | | | | | Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we dropped passing the mode to the function that actually does the preallocation. This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix loop device on top of btrfsMiao Xie2010-06-111-0/+1
| | | | | | | | | | | | | | | | | We cannot use the loop device which has been connected to a file in the btrf The reproduce steps is following: # dd if=/dev/zero of=vdev0 bs=1M count=1024 # losetup /dev/loop0 vdev0 # mkfs.btrfs /dev/loop0 ... failed to zero device start -5 The reason is that the btrfs don't implement either ->write_begin or ->write the VFS API, so we fix it by setting ->write to do_sync_write(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add more error checking to btrfs_dirty_inodeChris Mason2010-05-271-2/+13
| | | | | | | The ENOSPC code will now return ENOSPC to btrfs_start_transaction. btrfs_dirty_inode needs to check for this and error out appropriately. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: allow unaligned DIOChris Mason2010-05-271-3/+35
| | | | | | | In order to support DIO that isn't aligned to the filesystem blocksize, we fall back to buffered for any unaligned DIOs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: drop verbose enospc printkChris Mason2010-05-271-0/+2
| | | | | | Less printk is good printk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix block generation verification raceYan, Zheng2010-05-271-1/+1
| | | | | | | | | | After the path is released, the generation number got from block pointer is no long valid. The race may cause disk corruption, because verify_parent_transid() calls clear_extent_buffer_uptodate() when generation numbers mismatch. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix preallocation and nodatacow checks in O_DIRECTChris Mason2010-05-271-16/+140
| | | | | | | | | | | | The O_DIRECT code wasn't checking for multiple references on preallocated or nodatacow extents. This means it wasn't honoring snapshots properly. The fix here is to add an explicit check for multiple references This also fixes the math for selecting the correct disk block, making sure not to go past the end of the extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: avoid ENOSPC errors in btrfs_dirty_inodeChris Mason2010-05-261-3/+11
| | | | | | | | | | | | | btrfs_dirty_inode tries to sneak in without much waiting or space reservation, mostly for performance reasons. This usually works well but can cause problems when there are many many writers. When btrfs_update_inode fails with ENOSPC, we fallback to a slower btrfs_start_transaction call that will reserve some space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: move O_DIRECT space reservation to btrfs_direct_IOChris Mason2010-05-262-6/+8
| | | | | | | | | This moves the delalloc space reservation done for O_DIRECT into btrfs_direct_IO. This way we don't leak reserved space if the generic O_DIRECT write code errors out before it calls into btrfs_direct_IO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: rework O_DIRECT enospc handlingChris Mason2010-05-264-30/+49
| | | | | | | | | | | | | | | | | | | | | | | | | This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use async helpers for DIO write checksummingChris Mason2010-05-255-17/+52
| | | | | | | | | | | The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: don't walk around with task->state != TASK_RUNNINGChris Mason2010-05-252-3/+4
| | | | | | | | Yan Zheng noticed two places we were doing a lot of work without task->state set to TASK_RUNNING. This sets the state properly after we get ready to sleep but decide not to. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: do aio_write instead of writeJosef Bacik2010-05-252-83/+104
| | | | | | | | | In order for AIO to work, we need to implement aio_write. This patch converts our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and nothing broke, and the AIO stuff magically started working. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add basic DIO read/write supportJosef Bacik2010-05-256-36/+631
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Metadata ENOSPC handling for balanceYan, Zheng2010-05-255-747/+1163
| | | | | | | | | | | | | | This patch adds metadata ENOSPC handling for the balance code. It is consisted by following major changes: 1. Avoid COW tree leave in the phrase of merging tree. 2. Handle interaction with snapshot creation. 3. make the backref cache can live across transactions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Pre-allocate space for data relocationYan, Zheng2010-05-253-45/+92
| | | | | | | | Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Metadata ENOSPC handling for tree logYan, Zheng2010-05-255-128/+156
| | | | | | | | | Previous patches make the allocater return -ENOSPC if there is no unreserved free metadata space. This patch updates tree log code and various other places to propagate/handle the ENOSPC error. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Metadata reservation for orphan inodesYan, Zheng2010-05-259-66/+365
| | | | | | | reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Introduce global metadata reservationYan, Zheng2010-05-258-76/+241
| | | | | | | Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Update metadata reservation for delayed allocationYan, Zheng2010-05-259-415/+232
| | | | | | | | | | | | | | Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Integrate metadata reservation with start_transactionYan, Zheng2010-05-2515-528/+678
| | | | | | | | | | | | | | | Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Introduce contexts for metadata reservationYan, Zheng2010-05-257-385/+853
| | | | | | | | | | | | | | | | | | Introducing metadata reseravtion contexts has two major advantages. First, it makes metadata reseravtion more traceable. Second, it can reclaim freed space and re-add them to the itself after transaction committed. Besides add btrfs_block_rsv structure and related helper functions, This patch contains following changes: Move code that decides if freed tree block should be pinned into btrfs_free_tree_block(). Make space accounting more accurate, mainly for handling read only block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Kill init_btrfs_i()Yan, Zheng2010-05-251-36/+28
| | | | | | | All code in init_btrfs_i can be moved into btrfs_alloc_inode() Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Shrink delay allocated space in a synchronizedYan, Zheng2010-05-254-121/+88
| | | | | | | | | Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Kill allocate_wait in space_infoYan, Zheng2010-05-252-76/+58
| | | | | | | | We already have fs_info->chunk_mutex to avoid concurrent chunk creation. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Link block groups of different raid typesYan, Zheng2010-05-253-55/+121
| | | | | | | | | The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2010-05-151-0/+5
|\ | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check for read permission on src file in the clone ioctl
| * Btrfs: check for read permission on src file in the clone ioctlDan Rosenberg2010-05-151-0/+5
| | | | | | | | | | | | | | The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | btrfs: convert to using bdi_setup_and_register()Jens Axboe2010-04-261-11/+1
| | | | | | | | | | | | | | It's now a provided helper, so get rid of the internal setup and btrfs atomic_t bdi enumerator. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | Merge branch 'master' of ↵Linus Torvalds2010-04-132-5/+21
|\| | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure the chunk allocator doesn't create zero length chunks Btrfs: fix data enospc check overflow
| * Btrfs: make sure the chunk allocator doesn't create zero length chunksChris Mason2010-04-061-0/+6
| | | | | | | | | | | | | | | | A recent commit allowed for smaller chunks to be created, but didn't make sure they were always bigger than a stripe. After some divides, this led to zero length stripes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix data enospc check overflowJosef Bacik2010-04-051-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we account for reserved space we get from the allocator before we actually account for allocating delalloc space, we can have a small window where the amount of "used" space in a space_info is more than the total amount of space in the space_info. This will cause a overflow in our check, so it will seem like we have _tons_ of free space, and we'll allow reservations to occur that will end up larger than the amount of space we have. I've seen users report ENOSPC panic's in cow_file_range a few times recently, so I tried to reproduce this problem and found I could reproduce it if I ran one of my tests in a loop for like 20 minutes. With this patch my test ran all night without issues. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2010-04-0512-204/+90
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: add check for changed leaves in setup_leaf_for_split Btrfs: create snapshot references in same commit as snapshot Btrfs: fix small race with delalloc flushing waitqueue's Btrfs: use add_to_page_cache_lru, use __page_cache_alloc Btrfs: fix chunk allocate size calculation Btrfs: kill max_extent mount option Btrfs: fail to mount if we have problems reading the block groups Btrfs: check btrfs_get_extent return for IS_ERR() Btrfs: handle kmalloc() failure in inode lookup ioctl Btrfs: dereferencing freed memory Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() Btrfs: Remove unnecessary finish_wait() in wait_current_trans() Btrfs: add NULL check for do_walk_down() Btrfs: remove duplicate include in ioctl.c Fix trivial conflict in fs/btrfs/compression.c due to slab.h include cleanups.
| * Btrfs: add check for changed leaves in setup_leaf_for_splitChris Mason2010-04-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setup_leaf_for_split needs to drop the path and search again, and has checks to see if the item we want to split changed size. But, it misses the case where the leaf changed and now has enough room for the item we want to insert. This adds an extra check to make sure the leaf really needs splitting before we call btrfs_split_leaf(), which keeps us from trying to split a leaf with a single item. btrfs_split_leaf() will blindly split the single item leaf, leaving us with one good leaf and one empty leaf and then a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: create snapshot references in same commit as snapshotSage Weil2010-04-051-66/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This creates the reference to a new snapshot in the same commit as the snapshot itself. This avoids the need for a second commit in order for a snapshot to be persistent, and also avoids the problem of "leaking" a new snapshot tree root if the host crashes before the second commit takes place. It is not at all clear to me why it wasn't always done this way. If there is still a reason for the two-stage {create,finish}_pending_snapshots() approach I'm missing something! :) I've been running this for a couple weeks under pretty heavy usage (a few snapshots per minute) without obvious problems. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix small race with delalloc flushing waitqueue'sJosef Bacik2010-04-051-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Everytime we start a new flushing thread, we init the waitqueue if there isn't a flushing thread running. The problem with this is we check space_info->flushing, which we clear right before doing a wake_up on the flushing waitqueue, which causes problems if we init the waitqueue in the middle of clearing the flushing flagh and calling wake_up. This is hard to hit, but the code is wrong anyway, so init the flushing/allocating waitqueue when creating the space info and let it be. I haven't seen the panic since I've been using this patch. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: use add_to_page_cache_lru, use __page_cache_allocNick Piggin2010-04-052-32/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Pagecache pages should be allocated with __page_cache_alloc, so they obey pagecache memory policies. add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions. Signed-off-by: Nick Piggin <npiggin@suse.de>: Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix chunk allocate size calculationJosef Bacik2010-03-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | If the amount of free space left in a device is less than what we think should be the minimum size, just ignore the minimum size and use the amount we have. I ran into this running tests on a 600mb volume, the chunk allocator wouldn't let me allocate the last 52mb of the disk for data because we want to have at least 64mb chunks for data. This patch fixes that problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: kill max_extent mount optionJosef Bacik2010-03-316-81/+13
| | | | | | | | | | | | | | | | | | | | | | As Yan pointed out, theres not much reason for all this complicated math to account for file extents being split up into max_extent chunks, since they are likely to all end up in the same leaf anyway. Since there isn't much reason to use max_extent, just remove the option altogether so we have one less thing we need to test. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fail to mount if we have problems reading the block groupsJosef Bacik2010-03-312-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't actually check the return value of btrfs_read_block_groups, so we can possibly succeed to mount, but then fail to say read the superblock xattr for selinux which will cause the vfs code to deactivate the super. This is a problem because in find_free_extent we just assume that we will find the right space_info for the allocation we want. But if we failed to read the block groups, we won't have setup any space_info's, and we'll hit a NULL pointer deref in find_free_extent. This patch fixes that problem by checking the return value of btrfs_read_block_groups, and failing out properly. I've also added a check in find_free_extent so if for some reason we don't find an appropriate space_info, we just return -ENOSPC. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: check btrfs_get_extent return for IS_ERR()Dan Carpenter2010-03-311-1/+1
| | | | | | | | | | | | | | btrfs_get_extent() never returns NULL, only a valid pointer or ERR_PTR() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>