summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_cache.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: btree_cache.freeable list fixesKent Overstreet2024-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | When allocating new btree nodes, we were leaving them on the freeable list - unlocked - allowing them to be reclaimed: ouch. Additionally, bch2_btree_node_free_never_used() -> bch2_btree_node_hash_remove was putting it on the freelist, while bch2_btree_node_free_never_used() was putting it back on the btree update reserve list - ouch. Originally, the code was written to always keep btree nodes on a list - live or freeable - and this worked when new nodes were kept locked. But now with the cycle detector, we can't keep nodes locked that aren't tracked by the cycle detector; and this is fine as long as they're not reachable. We also have better and more robust leak detection now, with memory allocation profiling, so the original justification no longer applies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rework btree node pinningKent Overstreet2024-09-211-0/+3
| | | | | | | | | | | | | | | | | | | In backpointers fsck, we do a seqential scan of one btree, and check references to another: extents <-> backpointers Checking references generates random lookups, so we want to pin that btree in memory (or only a range, if it doesn't fit in ram). Previously, this was done with a simple check in the shrinker - "if btree node is in range being pinned, don't free it" - but this generated OOMs, as our shrinker wasn't well behaved if there was less memory available than expected. Instead, we now have two different shrinkers and lru lists; the second shrinker being for pinned nodes, with seeks set much higher than normal - so they can still be freed if necessary, but we'll prefer not to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix failure to relock in bch2_btree_node_mem_alloc()Kent Overstreet2024-08-221-0/+2
| | | | | | | We weren't always so strict about trans->locked state - but now we are, and new assertions are shaking some bugs out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_to_text()Kent Overstreet2024-07-151-0/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add counters for failed shrinker reclaimDaniel Hill2024-05-091-1/+1
| | | | | | | | | This adds distinct counters for every reason the btree node shrinker can fail to free an object - if our shrinker isn't making progress, this will tell us why. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Run bch2_check_fix_ptrs() via triggersKent Overstreet2024-05-081-0/+3
| | | | | | | | | | | | | | | | | Currently, the reflink_p gc trigger does repair as well - turning a reflink_p key into an error key if the reflink_v it points to doesn't exist. This won't work with online check/repair, because the repair path once online will be subject to transaction restarts, but BTREE_TRIGGER_gc is not idempotant - we can't run it multiple times if we get a transaction restart. So we need to split these paths; to do so this patch calls check_fix_ptrs() by a new general path - a new trigger type, BTREE_TRIGGER_check_repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-7/+12
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Include btree_trans in more tracepointsKent Overstreet2024-01-011-2/+2
| | | | | | | This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-2/+3
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted sparse fixesKent Overstreet2023-10-221-1/+5
| | | | | | | | | - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Allow for unknown btree IDsKent Overstreet2023-10-221-1/+21
| | | | | | | | | | | | | | | | | We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_node_to_text() const correctnessKent Overstreet2023-10-221-2/+2
| | | | | | | This is for the Rust interface - Rust cares more about const than C does. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb btree_trans through btree cache codeKent Overstreet2023-10-221-2/+2
| | | | | | | | Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move bkey bkey_unpack_key() to bkey.hKent Overstreet2023-10-221-0/+1
| | | | | | | | | | Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New locking functionsKent Overstreet2023-10-221-2/+2
| | | | | | | | | | | | | | | | In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fix usage of six lock's percpu modeKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use x-macros for btree node flagsKent Overstreet2023-10-221-0/+2
| | | | | | This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Option improvementsKent Overstreet2023-10-221-2/+2
| | | | | | | | | | | | This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: btree_pathKent Overstreet2023-10-221-4/+3
| | | | | | | | | | | | | | | This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Further reduce iter->trans usageKent Overstreet2023-10-221-2/+3
| | | | | | | This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Always check for transaction restartsKent Overstreet2023-10-221-2/+2
| | | | | | | On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_btree_iter_relock_intent()Kent Overstreet2023-10-221-1/+1
| | | | | | | | This adds a new helper for btree_cache.c that does what we want where the iterator is still being traverse - and also eliminates some unnecessary transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Evict btree nodes we're deletingKent Overstreet2023-10-221-0/+2
| | | | | | | | | | | There was a bug that led to duplicate btree node pointers being inserted at the wrong level. The new topology repair code can fix that, except that the btree cache code gets confused when we read in a btree node from the pointer that was at the wrong level. This patch evicts nodes that we're deleting to, which nicely solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Update bch2_btree_verify()Kent Overstreet2023-10-221-0/+1
| | | | | | | | | | | bch2_btree_verify() verifies that the btree node on disk matches what we have in memory. This patch changes it to verify every replica, and also fixes it for interior btree nodes - there's a mem_ptr field which is used as a scratch space and needs to be zeroed out for comparing with what's on disk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix BTREE_FOREGROUND_MERGE_HYSTERESISKent Overstreet2023-10-221-1/+1
| | | | | | | We were multiplying instead of dividing - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_btree_node_get_sibling()Kent Overstreet2023-10-221-3/+0
| | | | | | | | | | | | | This patch reworks the btree node merge path to use a second btree iterator to get the sibling node - which means bch2_btree_iter_get_sibling() can be deleted. Also, it uses bch2_btree_iter_traverse_all() if necessary - which means it should be more reliable. We don't currently even try to make it work when trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction commit, hopefully this will be a worthwhile tradeoff. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet2023-10-221-2/+0
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add (partial) support for fixing btree topologyKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()Kent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | bch2_btree_and_journal_walk() walks the btree overlaying keys from the journal; it was introduced so that we could read in the alloc btree prior to journal replay being done, when journalling of updates to interior btree nodes was introduced. But it didn't have btree node prefetching, which introduced a severe regression with mount times, particularly on spinning rust. This patch implements btree node prefetching for the btree + journal walk, hopefully fixing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add btree cache stats to sysfsKent Overstreet2023-10-221-0/+1
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve tracing for transaction restartsKent Overstreet2023-10-221-1/+1
| | | | | | | | We have a bug where we can get stuck with a process spinning in transaction restarts - need more information. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Remove some uses of PAGE_SIZE in the btree codeKent Overstreet2023-10-221-6/+1
| | | | | | | For portability to userspace, we should try to avoid working in kernel pages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Slightly reduce btree split thresholdKent Overstreet2023-10-221-1/+1
| | | | | | | | 2/3rds performs a lot better than 3/4ths on the tested workloda, leading to significanly fewer btree node compactions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans_commit() path can now insert to interior nodesKent Overstreet2023-10-221-0/+3
| | | | | | | This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookupKent Overstreet2023-10-221-0/+7
| | | | | | | Nice performance optimization Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_ptr_v2Kent Overstreet2023-10-221-0/+2
| | | | | | | | | Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: introduce b->hash_valKent Overstreet2023-10-221-3/+10
| | | | | | | | | This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change btree split threshold to be in u64sKent Overstreet2023-10-221-1/+1
| | | | | | | | This fixes a bug with very small btree nodes where splitting would end up with one of the new nodes empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't pass around may_drop_locksKent Overstreet2023-10-221-3/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_bkey_cached_commonKent Overstreet2023-10-221-1/+1
| | | | | | | This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make bkey types globally uniqueKent Overstreet2023-10-221-3/+3
| | | | | | | | | | this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: revamp to_text methodsKent Overstreet2023-10-221-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Initial commitKent Overstreet2023-10-221-0/+91
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>