summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/backpointers.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: reconstruct_alloc cleanupKent Overstreet2024-03-141-2/+1
| | | | | | | | | Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Pin btree cache in ram for random access in fsckKent Overstreet2024-03-141-90/+47
| | | | | | | | | | | | | | | | | | | | | | | Various phases of fsck involve checking references from one btree to another: this means doing a sequential scan of one btree, and then mostly random access into the second. This is particularly painful for checking extents <-> backpointers; we can prefetch btree node access on the sequential scan, but not on the random access portion, and this is particularly painful on spinning rust, where we'd like to keep the pipeline fairly full of btree node reads so that the elevator can reduce seeking. This patch implements prefetching and pinning of the portion of the btree that we'll be doing random access to. We already calculate how much of the random access btree will fit in memory so it's a fairly straightforward change. This will put more pressure on system memory usage, so we introduce a new option, fsck_memory_usage_percent, which is the percentage of total system ram that fsck is allowed to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill more -EIO error codesKent Overstreet2024-03-141-2/+1
| | | | | | | | This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix backpointer_to_text() when dev does not existKent Overstreet2024-02-251-3/+5
| | | | | Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-1/+1
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: extents_to_bp_stateKent Overstreet2024-01-211-48/+41
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_and_val_eq()Kent Overstreet2024-01-211-3/+8
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fsck_err()s don't need to manually check c->sb.version anymoreKent Overstreet2024-01-061-2/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_btree_key() now declares loop iterKent Overstreet2024-01-011-5/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_err_(fn|msg) check if should printKent Overstreet2024-01-011-8/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make backpointer fsck wb flush check more rigorousKent Overstreet2024-01-011-13/+19
| | | | | | | | | | | | | | | backpointers fsck now always runs in rw mode - the btree is being modified while it runs, by e.g. copygc, rebalance, the discard worker, the invalidate worker. We could find a missing backpointer, flush the btree write buffer, and then on the next iteration find a new key at the exact same position - which will most likely need another write buffer flush. Hence, we have to check for an exact match on last_flushed, not just the pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: On missing backpointer to interior node, flush interior updatesKent Overstreet2024-01-011-0/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Explicity go RW for fsckKent Overstreet2024-01-011-4/+2
| | | | | | | This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: No need to allocate keys for write bufferKent Overstreet2024-01-011-1/+16
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELSKent Overstreet2024-01-011-64/+63
| | | | | | | | | | | | | | | It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior node btree updates; unfortunate but not terribly surprising it's a difficult problem - that was the original reason for gc_lock. BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch, this changes backpointers fsck to instead walk keys at one level of the btree at a time. This fixes the tiering_drop_alloc test, which stopped working with the patch to not flush the journal after journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-6/+6
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix null ptr deref in bch2_backpointer_get_node()Kent Overstreet2023-11-141-5/+5
| | | | | | | | | | bch2_btree_iter_peek_node() can return a NULL ptr (when the tree is shorter than the search depth); handle this with an early return. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: https://lore.kernel.org/linux-bcachefs/5fc3c28b-c232-4ec7-b0ac-4ef220ddf976@moroto.mountain/T/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Simplify, fix bch2_backpointer_get_key()Kent Overstreet2023-11-051-44/+33
| | | | | | | | | | | - backpointer_not_found() checks backpointers_no_use_write_buffer, no need to do it inbackpointer_get_key(). - always use backpointer_get_node() for pointers to nodes: backpointer_get_key() was sometimes returning the key from the root node unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill thing_it_points_to arg to backpointer_not_found()Kent Overstreet2023-11-051-7/+6
| | | | | | This can be calculated locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: use swab40 for bch_backpointer.bucket_offset bitfieldBrian Foster2023-11-051-1/+1
| | | | | | | | | | | | | The bucket_offset field of bch_backpointer is a 40-bit bitfield, but the bch2_backpointer_swab() helper uses swab32. This leads to inconsistency when an on-disk fs is accessed from an opposite endian machine. As it turns out, we already have an internal swab40() helper that is used from the bch_alloc_v4 swab callback. Lift it into the backpointers header file and use it consistently in both places. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-021-8/+12
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-2/+2
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a null ptr deref in bch2_get_alloc_in_memory_pos()Kent Overstreet2023-10-221-1/+1
| | | | | Reported-by: smatch Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-12/+10
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-4/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a handful of spelling mistakes in various messagesColin Ian King2023-10-221-1/+1
| | | | | | | There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: remove duplicate code between backpointer update pathsBrian Foster2023-10-221-17/+1
| | | | | Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Upgrade path fixesKent Overstreet2023-10-221-1/+1
| | | | | | | Some minor fixes to not print errors that are actually due to a verson upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate recovery passesKent Overstreet2023-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change check for invalid key typesKent Overstreet2023-10-221-1/+2
| | | | | | | | | | | As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted sparse fixesKent Overstreet2023-10-221-6/+6
| | | | | | | | | - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Allow for unknown btree IDsKent Overstreet2023-10-221-6/+8
| | | | | | | | | | | | | | | | | We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix leak in backpointers fsckKent Overstreet2023-10-221-2/+4
| | | | | | We were forgetting to exit a printbuf - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New error message helpersKent Overstreet2023-10-221-1/+9
| | | | | | | | | | | | | Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve backpointers error messageKent Overstreet2023-10-221-1/+1
| | | | | | | the error message here dated from when backpointers could be stored in alloc keys; now, we should always print the full key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_bkey_get_iter() helpersKent Overstreet2023-10-221-17/+12
| | | | | | | | | | | | | | | | Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_ops.min_val_sizeKent Overstreet2023-10-221-5/+0
| | | | | | | | | | | | | This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rip out code for storing backpointers in alloc keysKent Overstreet2023-10-221-262/+67
| | | | | | | | | | We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()Kent Overstreet2023-10-221-3/+4
| | | | | | | | This adds a flags param to bch2_backpointer_get_key() so that we can pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the extent immediately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve the backpointer to missing extent messageKent Overstreet2023-10-221-15/+24
| | | | | | | | We now print the pos where the backpointer was found in the btree, as well as the exact bucket:bucket_offset of the data, to aid in grepping through logs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_check_extents_to_backpointers()Kent Overstreet2023-10-221-11/+19
| | | | | | | | | | | | In rare cases, bch2_check_extents_to_backpointers() would incorrectly flag an extent has having a missing backpointer when we just needed to flush the btree write buffer - we weren't tracking the last flushed position correctly. This adds a level field to the last_flushed pos, fixing a bug where we'd sometimes fail on a new root node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flush write buffer as needed in backpointers repairKent Overstreet2023-10-221-6/+25
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a 64 bit divideKent Overstreet2023-10-221-1/+1
| | | | | | This fixes a build failure on 32 bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change bkey_invalid() rw param to flagsKent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* fixup bcachefs: New on disk format: BackpointersKent Overstreet2023-10-221-8/+27
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't use key cache during fsckKent Overstreet2023-10-221-2/+3
| | | | | | | | | | The btree key cache mainly helps with lock contention, at the cost of additional memory overhead. During some fsck passes the memory overhead really matters, but fsck is single threaded so lock contention is an issue - so skipping the key cache during fsck will help with performance. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Run check_extents_to_backpointers() in multiple passesKent Overstreet2023-10-221-18/+124
| | | | | | | | | | | | | Similer to the previous patch for check_backpointers_to_extents(), if the alloc + backpointers btrees do not fit in ram we need to run into multiple passes. The counting of btree nodes that fit in memory is different here, because we have to walk the alloc and backpointers btrees at the same time, since a backpointer could reside in either of them and we don't know which without checking both. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Run bch2_check_backpointers_to_extents() in multiple passes if ↵Kent Overstreet2023-10-221-13/+132
| | | | | | | | | | | | necessary When the extents + reflink btrees don't fit into memory this fsck pass becomes _much_ slower, since we're doing random lookups. This patch changes this pass to check how much of the relevant btrees will fit into memory, and run in multiple passes if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New on disk format: BackpointersKent Overstreet2023-10-221-0/+799
This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>