summaryrefslogtreecommitdiffstats
path: root/drivers/md/bcache/btree.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* sched/headers: Prepare to use <linux/rcuupdate.h> instead of ↵Ingo Molnar2017-03-021-0/+2
| | | | | | | | | | | | | | | | <linux/rculist.h> in <linux/sched.h> We don't actually need the full rculist.h header in sched.h anymore, we will be able to include the smaller rcupdate.h header instead. But first update code that relied on the implicit header inclusion. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | <linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* bcache: Make gc wakeup sane, remove set_task_state()Kent Overstreet2016-12-171-19/+20
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* block,fs: use REQ_* flags directlyChristoph Hellwig2016-11-011-2/+2
| | | | | | | | | Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: export bio_free_pages to other modulesGuoqing Jiang2016-09-221-5/+1
| | | | | | | | | | | | | | | bio_free_pages is introduced in commit 1dfa0f68c040 ("block: add a helper to free bio bounce buffer pages"), we can reuse the func in other modules after it was imported. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@fb.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: use bio op accessorsMike Christie2016-06-071-2/+2
| | | | | | | | | Separate the op from the rq_flag_bits and have bcache set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: bch_gc_thread() is not freezableJiri Kosina2016-05-241-2/+0
| | | | | | | | bch_gc_thread() doesn't mark itself freezable, so calling try_to_freeze() in its context is just an expensive no-op. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: Add a cond_resched() call to gcKent Overstreet2015-12-311-0/+1
| | | | | | | | Signed-off-by: Takashi Iwai <tiwai@suse.de> Tested-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: Kent Overstreet <kmo@daterainc.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: fix a livelock when we cause a huge number of cache missesZheng Liu2015-12-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subject : [PATCH v2] bcache: fix a livelock in btree lock Date : Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM) This commit tries to fix a livelock in bcache. This livelock might happen when we causes a huge number of cache misses simultaneously. When we get a cache miss, bcache will execute the following path. ->cached_dev_make_request() ->cached_dev_read() ->cached_lookup() ->bch->btree_map_keys() ->btree_root() <------------------------ ->bch_btree_map_keys_recurse() | ->cache_lookup_fn() | ->cached_dev_cache_miss() | ->bch_btree_insert_check_key() -| [If btree->seq is not equal to seq + 1, we should return EINTR and traverse btree again.] In bch_btree_insert_check_key() function we first need to check upgrade flag (op->lock == -1), and when this flag is true we need to release read btree->lock and try to take write btree->lock. During taking and releasing this write lock, btree->seq will be monotone increased in order to prevent other threads modify this in cache miss (see btree.h:74). But if there are some cache misses caused by some requested, we could meet a livelock because btree->seq is always changed by others. Thus no one can make progress. This commit will try to take write btree->lock if it encounters a race when we traverse btree. Although it sacrifice the scalability but we can ensure that only one can modify the btree. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Tested-by: Joshua Schmid <jschmid@suse.com> Tested-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: Joshua Schmid <jschmid@suse.com> Cc: Zhu Yanhai <zhu.yanhai@gmail.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* block: add a bi_error field to struct bioChristoph Hellwig2015-07-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: try to set b->parent properlySlava Pestov2014-08-051-16/+21
| | | | | | | bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size before; now it passes. Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
* bcache: fix use-after-free in btree_gc_coalesce()Slava Pestov2014-08-051-0/+1
| | | | | | | | | | | If we goto out_nocoalesce after we free new_nodes[0], we end up freeing new_nodes[0] again. This was generating a lockdep warning. The fix is to set new_nodes[0] to NULL, since the out_nocoalesce path safely ignores NULL entries in the new_nodes array. This regression was introduced in 2d7f9531. Change-Id: I76564d7257800583214376b4bacf236cda90c89c
* bcache: fix crash in bcache_btree_node_alloc_fail tracepointSlava Pestov2014-08-051-1/+1
| | | | | | 'b' was NULL. Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
* bcache: Allocate bounce buffers with GFP_NOWAITKent Overstreet2014-08-051-1/+1
| | | | | | | There's no point in blocking on these allocations, since our fallback paths will probably go faster than blocking. Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c
* bcache: Make sure to pass GFP_WAIT to mempool_alloc()Kent Overstreet2014-08-051-1/+1
| | | | | | | | this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT. bcache uses GFP_NOWAIT in various other places where we have a fallback, circuits must've gotten crossed when writing this code or something. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: wait for buckets when allocating new btree rootSlava Pestov2014-08-051-3/+9
| | | | | | | | Tested: - sometimes bcache_tier test would hang on startup with a failure to allocate the btree root -- no longer seeing this Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill bucket->gc_genKent Overstreet2014-03-181-4/+3
| | | | | | | | gc_gen was a temporary used to recalculate last_gc, but since we only need bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update last_gc directly. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill unused freelistKent Overstreet2014-03-181-4/+37
| | | | | | | | This was originally added as at optimization that for various reasons isn't needed anymore, but it does add a lot of nasty corner cases (and it was responsible for some recently fixed bugs). Just get rid of it now. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Rework btree cache reserve handlingKent Overstreet2014-03-181-92/+133
| | | | | | | | | | | | | | | | | | | This changes the bucket allocation reserves to use _real_ reserves - separate freelists - instead of watermarks, which if nothing else makes the current code saner to reason about and is going to be important in the future when we add support for multiple btrees. It also adds btree_check_reserve(), which checks (and locks) the reserves for both bucket allocation and memory allocation for btree nodes; the old code just kinda sorta assumed that since (e.g. for btree node splits) it had the root locked and that meant no other threads could try to make use of the same reserve; this technically should have been ok for memory allocation (we should always have a reserve for memory allocation (the btree node cache is used as a reserve and we preallocate it)), but multiple btrees will mean that locking the root won't be sufficient anymore, and for the bucket allocation reserve it was technically possible for the old code to deadlock. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill btree_io_wqKent Overstreet2014-03-181-20/+2
| | | | | | | | With the locking rework in the last patch, this shouldn't be needed anymore - btree_node_write_work() only takes b->write_lock which is never held for very long. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: btree locking reworkKent Overstreet2014-03-181-45/+119
| | | | | | | | | | | | | | | | | | Add a new lock, b->write_lock, which is required to actually modify - or write - a btree node; this lock is only held for short durations. This means we can write out a btree node without taking b->lock, which _is_ held for long durations - solving a deadlock when btree_flush_write() (from the journalling code) is called with a btree node locked. Right now just occurs in bch_btree_set_root(), but with an upcoming journalling rework is going to happen a lot more. This also turns b->lock is now more of a read/intent lock instead of a read/write lock - but not completely, since it still blocks readers. May turn it into a real intent lock at some point in the future. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a race when freeing btree nodesKent Overstreet2014-03-181-33/+20
| | | | | | | | | | | | This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the bucket on the unused freelist, where it can be reused right away without any ordering requirements. It would be better to wait on at least a journal write to go down before reusing the bucket. bch_btree_set_root() does this, and inserting into non leaf nodes is completely synchronous so we should be ok, but future patches are just going to get rid of the unused freelist - it was needed in the past for various reasons but shouldn't be anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add a real GC_MARK_RECLAIMABLEKent Overstreet2014-03-181-6/+12
| | | | | | | This means the garbage collection code can better check for data and metadata pointers to the same buckets. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill dead cgroup codeKent Overstreet2014-03-181-4/+0
| | | | | | This hasn't been used or even enabled in ages. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix another bug recovering from unclean shutdownKent Overstreet2014-03-181-51/+31
| | | | | | | | | | | The on disk bucket gens are allowed to be out of date, when we reuse buckets that didn't have any live data in them. To deal with this, the initial gc has to update the bucket gen when we find a pointer gen newer than the bucket's gen. Unfortunately we weren't doing this for pointers in the journal that we're about to replay. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a bug recovering from unclean shutdownKent Overstreet2014-03-181-2/+2
| | | | | | | The code to fixup incorrect bucket prios incorrectly did not skip btree node freeing keys Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Minor fixes from kbuild robotKent Overstreet2014-01-291-1/+1
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USEDDarrick J. Wong2014-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The BUG_ON at the end of __bch_btree_mark_key can be triggered due to an integer overflow error: BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13); ... SET_GC_SECTORS_USED(g, min_t(unsigned, GC_SECTORS_USED(g) + KEY_SIZE(k), (1 << 14) - 1)); BUG_ON(!GC_SECTORS_USED(g)); In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide. While the SET_ code tries to ensure that the field doesn't overflow by clamping it to (1<<14)-1 == 16383, this is incorrect because 16383 requires 14 bits. Therefore, if GC_SECTORS_USED() + KEY_SIZE() = 8192, the SET_ statement tries to store 8192 into a 13-bit field. In a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON. Therefore, create a field width constant and a max value constant, and use those to create the bitfield and check the inputs to SET_GC_SECTORS_USED. Arguably the BITMASK() template ought to have BUG_ON checks for too-large values, but that's a separate patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* bcache: Don't return -EINTR when insert finishedKent Overstreet2014-01-081-2/+4
| | | | | | | | We need to return -EINTR after a split because we invalidated iterators (and freed the btree node) - but if we were finished inserting, we don't want to redo the traversal. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Move insert_fixup() to btree_keys_opsKent Overstreet2014-01-081-229/+17
| | | | | | | Now handling overlapping extents/keys is a method that's specific to what the btree node contains. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert sorting to btree_keysKent Overstreet2014-01-081-3/+3
| | | | | | More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert debug code to btree_keysKent Overstreet2014-01-081-4/+4
| | | | | | More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert btree_iter to struct btree_keysKent Overstreet2014-01-081-11/+11
| | | | | | More work to disentangle bset.c from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add bch_btree_keys_u64s_remaining()Kent Overstreet2014-01-081-13/+15
| | | | | | Helper function to explicitly check how much space is free in a btree node Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add struct btree_keysKent Overstreet2014-01-081-79/+74
| | | | | | Soon, bset.c won't need to depend on struct btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Abstract out stuff needed for sortingKent Overstreet2014-01-081-0/+6
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Rename/shuffle various code aroundKent Overstreet2014-01-081-104/+63
| | | | | | More work to disentangle bset.c from the rest of the code: Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add struct bset_sort_stateKent Overstreet2014-01-081-4/+10
| | | | | | | More disentangling bset.c from the rest of the bcache code - soon, the sorting routines won't have any dependencies on any outside structs. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Split out sort_extent_cmp()Kent Overstreet2014-01-081-2/+8
| | | | | | | Only use extent comparison for comparing extents, so we're not using START_KEY() on other key types (i.e. btree pointers) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Bkey indexing renamingKent Overstreet2014-01-081-15/+18
| | | | | | | | | More refactoring: node() -> bset_bkey_idx() end() -> bset_bkey_last() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Make bch_keylist_realloc() take u64s, not nptrsKent Overstreet2014-01-081-3/+3
| | | | | | | | | | Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic 2s Also - split out the part that checks against journal entry size so as to avoid a dependancy on struct cache_set in bset.c Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Btree verify code improvementsKent Overstreet2014-01-081-2/+14
| | | | | | | Used this fixed code to find and fix the bug fixed by a4d885097b0ac0cd1337f171f2d4b83e946094d4. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: kill index()Kent Overstreet2014-01-081-3/+3
| | | | | | That was a terrible name for a macro, add some better helpers to replace it. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Do bkey_put() in btree_split() error pathKent Overstreet2014-01-081-1/+4
| | | | | | | | This error path shouldn't have been hit in practice.. and we've got reworked reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got code for this error path it should be correct. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Rework allocator reservesKent Overstreet2014-01-081-2/+32
| | | | | | | | | | | | We need a reserve for allocating buckets for new btree nodes - and now that we've got multiple btrees, it really needs to be per btree. This reworks the reserves so we've got separate freelists for each reserve instead of watermarks, which seems to make things a bit cleaner, and it adds some code so that btree_split() can make sure the reserve is available before it starts. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: kill closure locking usageKent Overstreet2014-01-081-18/+34
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Minor btree cache fixKent Overstreet2014-01-081-7/+3
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe2013-12-311-2/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
| * bcache: fix for gc and writeback raceNicholas Swenson2013-12-161-0/+22
| | | | | | | | | | | | | | | | | | Garbage collector needs to check keys in the writeback keybuf to make sure it's not invalidating buckets to which the writeback keys point to. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix dirty_data accountingKent Overstreet2013-12-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes necessary that got squashed in with other patches; the full patch against 3.10 is 408cc2f47eeac93a, available at: git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 2a46036..4a12b2f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert, if (KEY_START(k) > KEY_START(insert) + sectors_found) goto check_failed; - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) + if (KEY_PTRS(k) != KEY_PTRS(replace_key) || + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) goto check_failed; /* skip past gen */