summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/buckets.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: BCH_WATERMARK_interior_updatesKent Overstreet2024-04-021-0/+1
| | | | | | | | | | | | This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trans_account_disk_usage_change()Kent Overstreet2024-01-211-0/+2
| | | | | | | | | | | The disk space accounting rewrite is splitting out accounting for each replicas set - those are moving to btree keys, instead of percpu counters. This breaks bch2_trans_fs_usage_apply() up, splitting out the part we will still need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: helpers for printing data typesKent Overstreet2024-01-211-0/+15
| | | | | | We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: unify extent triggerKent Overstreet2024-01-061-3/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move stripe triggers to ec.cKent Overstreet2024-01-061-3/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move bch2_mark_alloc() to alloc_background.cKent Overstreet2024-01-061-2/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: unify reservation triggerKent Overstreet2024-01-061-2/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill mem_trigger_run_overwrite_then_insert()Kent Overstreet2024-01-061-4/+1
| | | | | | now that type signatures are unified, redundant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: mark now takes bkey_sKent Overstreet2024-01-061-6/+6
| | | | | | | | Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans_mark now takes bkey_sKent Overstreet2024-01-061-4/+4
| | | | | | | | Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move reflink_p triggers into reflink.cKent Overstreet2024-01-011-4/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_usage_to_text()Kent Overstreet2024-01-011-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()Kent Overstreet2024-01-011-0/+3
| | | | | | | bch2_update_cached_sectors_list() is closer to how the new disk space accounting works, called from trans_mark(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: All triggers are BTREE_TRIGGER_WANTS_OLD_AND_NEWKent Overstreet2023-10-311-0/+14
| | | | | | | | Upcoming rebalance_work btree will require extent triggers to be BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion, let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure devices are always correctly initializedKent Overstreet2023-10-311-0/+1
| | | | | | | | | | | We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bucket_lock() is now a sleepable lockKent Overstreet2023-10-221-2/+5
| | | | | | | fsck_err() may sleep - it takes a mutex and may allocate memory, so bucket_lock() needs to be a sleepable lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change bucket_lock() to use bit_spin_lock()Kent Overstreet2023-10-221-3/+30
| | | | | | | | | | | | | | | bucket_lock() previously open coded a spinlock, because we need to cram a spinlock into a single byte. But it turns out not all archs support xchg() on a single byte; since we need struct bucket to be small, this means we have to play fun games with casts and ifdefs for endianness. This fixes building on 32 bit arm, and likely other architectures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Remove undefined behavior in bch2_dev_buckets_reserved()Josh Poimboeuf2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | In general it's a good idea to avoid using bare unreachable() because it introduces undefined behavior in compiled code. In this case it even confuses GCC into emitting an empty unused bch2_dev_buckets_reserved.part.0() function. Use BUG() instead, which is nice and defined. While in theory it should never trigger, if something were to go awry and the BCH_WATERMARK_NR case were to actually hit, the failure mode is much more robust. Fixes the following warnings: vmlinux.o: warning: objtool: bch2_bucket_alloc_trans() falls through to next function bch2_reset_alloc_cursors() vmlinux.o: warning: objtool: bch2_dev_buckets_reserved.part.0() is missing an ELF size annotation Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: sb-members.cKent Overstreet2023-10-221-1/+46
| | | | | | | Split out a new file for bch_sb_field_members - we'll likely want to move more code here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move inode triggers to inode.cKent Overstreet2023-10-221-3/+14
| | | | | | bit of reorg Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_WATERMARK_reclaimKent Overstreet2023-10-221-0/+1
| | | | | | | Add another watermark for journal reclaim - this is needed for the next patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet2023-10-221-14/+14
| | | | | | This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a buffer overrun in bch2_fs_usage_read()Kent Overstreet2023-10-221-2/+16
| | | | | | | | | | We were copying the size of a struct bch_fs_usage_online to a struct bch_fs_usage, which is 8 bytes smaller. This adds some new helpers so we can do this correctly, and get rid of some magic +1s too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: RESERVE_stripeKent Overstreet2023-10-221-0/+3
| | | | | | | | | | | | | Rework stripe creation path - new algorithm for deciding when to create new stripes or reuse existing stripes. We add a new allocation watermark, RESERVE_stripe, above RESERVE_none. Then we always try to create a new stripe by doing RESERVE_stripe allocations; if this fails, we reuse an existing stripe and allocate buckets for it with the reserve watermark for the given write (RESERVE_none or RESERVE_movinggc). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve dev_alloc_debug_to_text()Kent Overstreet2023-10-221-0/+2
| | | | | | Now we also print the number of buckets reserved for each watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_mark_key() now takes btree_id & levelKent Overstreet2023-10-221-6/+12
| | | | | | | btree & level are passed to trans_mark - for backpointers - bch2_mark_key() should take them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New on disk format: BackpointersKent Overstreet2023-10-221-0/+19
| | | | | | | | | | | | | | | | | | This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Btree write bufferKent Overstreet2023-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fixes for building in userspaceKent Overstreet2023-10-221-0/+4
| | | | | | | | | | | | | | - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move bkey bkey_unpack_key() to bkey.hKent Overstreet2023-10-221-2/+0
| | | | | | | | | | Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Optimize bch2_dev_usage_read()Kent Overstreet2023-10-221-1/+9
| | | | | | | | | - add bch2_dev_usage_read_fast(), which doesn't return by value - bch_dev_usage is big enough that we don't want the silent memcpy - tweak the allocation path to only call bch2_dev_usage_read() once per bucket allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix __dev_available().Daniel Hill2023-10-221-6/+6
| | | | | | | | __dev_available() now calculates available buckets correctly. Previously it would almost always return 0 when we have cached data. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out dev_buckets_free()Kent Overstreet2023-10-221-0/+13
| | | | | | | | | | | | | Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Plumb btree_id & level to trans_markKent Overstreet2023-10-221-32/+5
| | | | | | | For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fold bucket_state in to BCH_DATA_TYPES()Kent Overstreet2023-10-221-13/+10
| | | | | | | | | | | | | | | | | Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Convert .key_invalid methods to printbufsKent Overstreet2023-10-221-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: gc mark fn fixes, cleanupsKent Overstreet2023-10-221-3/+3
| | | | | | | | | mark_stripe_bucket() was busted; it was using @new unitialized. Also, clean up all the gc mark functions, and convert them to the same style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Kill struct bucket_markKent Overstreet2023-10-221-14/+10
| | | | | | | | This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill main in-memory bucket arrayKent Overstreet2023-10-221-16/+4
| | | | | | | All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_usage_update() no longer depends on bucket_markKent Overstreet2023-10-221-7/+0
| | | | | | | | | | | | This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill allocator threads & freelistsKent Overstreet2023-10-221-34/+28
| | | | | | | | | | | | | | | Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Freespace, need_discard btreesKent Overstreet2023-10-221-10/+0
| | | | | | | | | | | This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: KEY_TYPE_alloc_v4Kent Overstreet2023-10-221-0/+8
| | | | | | | | | | | | This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move trigger fns to bkey_opsKent Overstreet2023-10-221-0/+13
| | | | | | | | This replaces the switch statements in bch2_mark_key(), bch2_trans_mark_key() with new bkey methods - prep work for the next patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Consolidate trigger code a bitKent Overstreet2023-10-221-3/+0
| | | | | | | Upcoming patches are doing more work on the triggers code, this patch just moves code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_trans_mark_key() now takes a bkey_i *Kent Overstreet2023-10-221-1/+26
| | | | | | | We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_gc_gens() no longer uses bucket arrayKent Overstreet2023-10-221-6/+0
| | | | | | | | Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Ignore cached data when calculating fragmentationKent Overstreet2023-10-221-5/+0
| | | | | | | | | | | | | | | | Previously, bucket fragmentation was considered to be bucket size - total amount of live data, both dirty and cached. This meant that if a bucket was full but only a small amount of data in it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the dirty data out of the bucket and the allocator wouldn't be able to invalidate and drop the cached data. This changes fragmentation to exclude cached data, so that copygc will evacuate these buckets and copygc/the allocator will always be able to make forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: New data structure for buckets waiting on journal commitKent Overstreet2023-10-221-8/+0
| | | | | | | | | | | | | Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: New in-memory array for bucket gensKent Overstreet2023-10-221-1/+19
| | | | | | | | The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>