summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: kill inode_walker_entry.seen_this_posKent Overstreet2024-09-281-6/+0
| | | | | | dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix incorrect IS_ERR_OR_NULL usageKent Overstreet2024-09-281-1/+1
| | | | | | | | Returning a positive integer instead of an error code causes error paths to become very confused. Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix the memory leak in exception caseHongbo Li2024-09-281-0/+1
| | | | | | | | | | The pointer clean points the memory allocated by kmemdup, when the return value of bch2_sb_clean_validate_late is not zero. The memory pointed by clean is leaked. So we should free it in this case. Fixes: a37ad1a3aba9 ("bcachefs: sb-clean.c") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fast exit when darray_make_room failedHongbo Li2024-09-281-1/+3
| | | | | | | | | In downgrade_table_extra, the return value is needed. When it return failed, we should exit immediately. Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix iterator leak in check_subvol()Kent Overstreet2024-09-281-28/+26
| | | | | | A couple small error handling fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add snapshot to bch_inode_unpackedKent Overstreet2024-09-282-4/+7
| | | | | | this allows for various cleanups in fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: assign return error when iterating through layoutDiogo Jahchan Koike2024-09-281-1/+3
| | | | | | | | | | | | | | | | | syzbot reported a null ptr deref in __copy_user [0] In __bch2_read_super, when a corrupt backup superblock matches the default opts offset, no error is assigned to ret and the freed superblock gets through, possibly being assigned as the best sb in bch2_fs_open and being later dereferenced, causing a fault. Assign EINVALID to ret when iterating through layout. [0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix srcu warning in check_topologyKent Overstreet2024-09-281-0/+2
| | | | | | | check_topology doesn't need the srcu lock and doesn't use normal btree transactions - we can just drop the srcu lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix error path in check_dirent_inode_dirent()Kent Overstreet2024-09-281-3/+2
| | | | | | | fsck_err() jumps to the fsck_err label when bailing out; need to make sure bp_iter was initialized... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlappingPiotr Zalewski2024-09-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Zero-initialize part of allocated bounce buffer which wasn't touched by subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value use KMSAN bug[1]. After applying the patch reproducer still triggers stack overflow[2] but it seems unrelated to the uninit-value use warning. After further investigation it was found that stack overflow occurs because KMSAN adds too many function calls[3]. Backtrace of where the stack magic number gets smashed was added as a reply to syzkaller thread[3]. It was confirmed that task's stack magic number gets smashed after the code path where KSMAN detects uninit-value use is executed, so it can be assumed that it doesn't contribute in any way to uninit-value use detection. [1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 [2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com [3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 Fixes: ec4edd7b9d20 ("bcachefs: Prep work for variable size btree node buffers") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_is_inode_open() warning messageKent Overstreet2024-09-281-3/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add extra padding in bkey_make_mut_noupdate()Kent Overstreet2024-09-281-1/+2
| | | | | | | | This fixes a kasan splat in propagate_key_to_snapshot_leaves() - varint_decode_fast() does reads (that it never uses) up to 7 bytes past the end of the integer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Mark inode errors as autofixKent Overstreet2024-09-281-16/+16
| | | | | | | Most or all errors will be autofix in the future, we're currently just doing the ones that we know are well tested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()Kent Overstreet2024-09-241-0/+1
| | | | | | | | As we iterate we need to mark that we no longer need iterators - otherwise we'll infinite loop via the "too many iters" check when there's many snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure BCH_FS_accounting_replay_done is always setKent Overstreet2024-09-241-0/+3
| | | | | | | | if it doesn't get set we'll never be able to flush the btree write buffer; this only happens in fake rw mode, but prevents us from shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol()Ahmed Ehab2024-09-211-0/+2
| | | | | | | | | | | | | | | | | | Syzbot reports a problem that a warning is triggered due to suspicious use of rcu_dereference_check(). That is triggered by a call of bch2_snapshot_tree_oldest_subvol(). The cause of the warning is that inside bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls rcu_dereference() that requires a read lock to be held. Also, the call of bch2_snapshot_tree_next() eventually calls snapshot_t(). To fix this, call rcu_read_lock() before calling snapshot_t(). Then, release the lock after the termination of the while loop. Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com> Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: return err ptr instead of null in read sb cleanDiogo Jahchan Koike2024-09-211-1/+1
| | | | | | | | | | | | | | | | | syzbot reported a null-ptr-deref in bch2_fs_start. [0] When a sb is marked clear but doesn't have a clean section bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO lets through, eventually leading to a null ptr dereference down the line. Adjust read sb clean to return an ERR_PTR indicating the invalid clean section. [0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Remove duplicated include in backpointers.cYang Li2024-09-211-1/+0
| | | | | | | | | | The header files bbpos.h is included twice in backpointers.c, so one inclusion of each can be removed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't drop devices with stripe pointersKent Overstreet2024-09-214-9/+32
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devicesKent Overstreet2024-09-212-27/+60
| | | | | | | | | | | This factors out ec_strie_head_devs_update(), which initializes the bitmap of devices we're allocating from, and runs it every time c->rw_devs_change_count changes. We also cancel pending, not allocated stripes, since they may refer to devices that are no longer available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_fs.rw_devs_change_countKent Overstreet2024-09-212-4/+9
| | | | | | | | Add a counter that's incremented whenever rw devices change; this will be used for erasure coding so that it can keep ec_stripe_head in sync and not deadlock on a new stripe when a device it wants goes away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_remove_stripes()Kent Overstreet2024-09-214-3/+74
| | | | | | | We can now correctly force-remove a device that has stripes on it; this uses the new BCH_SB_MEMBER_INVALID sentinal value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trigger_ptr() calculates sectors even when no deviceKent Overstreet2024-09-212-10/+21
| | | | | | | This is necessary for erasure coded pointers to devices that have been removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve error messages in bch2_ec_read_extent()Kent Overstreet2024-09-213-19/+23
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve error message on too few devices for ecKent Overstreet2024-09-211-3/+16
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve bch2_new_stripe_to_text()Kent Overstreet2024-09-211-0/+2
| | | | | | also print out the new stripe key Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ec_stripe_head.nr_createdKent Overstreet2024-09-212-2/+6
| | | | | | additional debug stat Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_stripe.disk_labelKent Overstreet2024-09-214-16/+43
| | | | | | | | | | | When reshaping existing stripes, we should keep them on the same target that they were allocated on; to do this, we need to add a field to the btree stripe type. This is a tad awkward, because we only have 8 bits left, and targets are 16 bits - but we only need to store a label, not a full target. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: stripe_to_mem()Kent Overstreet2024-09-211-18/+15
| | | | | | factor out a common helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: EIO errcode cleanupKent Overstreet2024-09-215-27/+33
| | | | | | | We want to be using private errcodes whenever possible, for better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rework btree node pinningKent Overstreet2024-09-217-75/+150
| | | | | | | | | | | | | | | | | | | In backpointers fsck, we do a seqential scan of one btree, and check references to another: extents <-> backpointers Checking references generates random lookups, so we want to pin that btree in memory (or only a range, if it doesn't fit in ram). Previously, this was done with a simple check in the shrinker - "if btree node is in range being pinned, don't free it" - but this generated OOMs, as our shrinker wasn't well behaved if there was less memory available than expected. Instead, we now have two different shrinkers and lru lists; the second shrinker being for pinned nodes, with seeks set much higher than normal - so they can still be freed if necessary, but we'll prefer not to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: split up btree cache counters for live, freeableKent Overstreet2024-09-216-32/+47
| | | | | | | this is prep for introducing a second live list and shrinker for pinned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree cache counters should be size_tKent Overstreet2024-09-216-36/+37
| | | | | | | 32 bits won't overflow any time soon, but size_t is the correct type for counting objects in memory. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't count "skipped access bit" as touched in btree cache scanKent Overstreet2024-09-211-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Failed devices no longer require mounting in degraded modeKent Overstreet2024-09-211-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_rcu_noerror()Kent Overstreet2024-09-216-13/+22
| | | | | | bch2_dev_rcu() now properly errors if the device is invalid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Progress indicator for extents_to_backpointersKent Overstreet2024-09-211-6/+82
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_opts_to_text()Kent Overstreet2024-09-213-21/+35
| | | | | | | Factor out bch2_show_options() into a generic helper, for debugging option passing issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve "no device to read from" messageKent Overstreet2024-09-211-1/+7
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix compilation error for bch2_sb_member_allocHongbo Li2024-09-211-6/+10
| | | | | | | | | | | | | | Fix the following compilation error: ``` fs/bcachefs/sb-members.c: In function ‘bch2_sb_member_alloc’: fs/bcachefs/sb-members.c:508:2: error: a label can only be part of a statement and a declaration is not a statement 508 | unsigned nr_devices = max_t(unsigned, dev_idx + 1, c->sb.nr_devices); ``` Fixes: a7d364a133c7 ("bcachefs: bch2_sb_member_alloc()") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_sb_member_alloc()Kent Overstreet2024-09-213-46/+53
| | | | | | refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_remove_alloc() -> alloc_background.cKent Overstreet2024-09-213-27/+30
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move tabstop setup to bch2_dev_usage_to_text()Kent Overstreet2024-09-212-7/+9
| | | | | | No reason for it not to be where it's needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Options for recovery_passes, recovery_passes_excludeKent Overstreet2024-09-218-20/+33
| | | | | | | | This adds mount options for specifying recovery passes to run, or exclude; the immediate need for this is that backpointers fsck is having trouble completing, so we need a way to skip it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use mm_account_reclaimed_pages() when freeing btree nodesKent Overstreet2024-09-211-0/+11
| | | | | | | | | | | | | | When freeing in a shrinker callback, we need to notify memory reclaim, so it knows forward progress has been made. Normally this is done in e.g. slab code, but we're not freeing through slab - or rather we are, but these allocations are big, and use the kmalloc_large() path. This is really a bug in the slub code, but we're working around it here for now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use __GFP_ACCOUNT for reclaimable memoryKent Overstreet2024-09-212-0/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Hook up RENAME_WHITEOUT in rename.Sasha Finkelstein2024-09-214-14/+52
| | | | | | | This is needed for overlayfs, which is used by container managers. Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance writes use BCH_WRITE_ONLY_SPECIFIED_DEVSKent Overstreet2024-09-212-2/+3
| | | | | | | this was an oversight: rebalance is moving data to a specific device, so we don't want it falling back to the full filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_WRITE_ALLOC_NOWAIT no longer applies to open bucket allocationKent Overstreet2024-09-213-12/+16
| | | | | | | | rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't allocate from the full filesystem - but we don't want spurious allocation failures due to open buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix prototype to bch2_alloc_sectors_start_trans()Kent Overstreet2024-09-214-17/+18
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>