summaryrefslogtreecommitdiffstats
path: root/mm/sparse.c (unfollow)
Commit message (Collapse)AuthorFilesLines
7 daysPCI: rockchip-ep: Fix address translation unit programmingDamien Le Moal2-3/+17
The Rockchip PCIe endpoint controller handles PCIe transfers addresses by masking the lower bits of the programmed PCI address and using the same number of lower bits masked from the CPU address space used for the mapping. For a PCI mapping of <size> bytes starting from <pci_addr>, the number of bits masked is the number of address bits changing in the address range [pci_addr..pci_addr + size - 1]. However, rockchip_pcie_prog_ep_ob_atu() calculates num_pass_bits only using the size of the mapping, resulting in an incorrect number of mask bits depending on the value of the PCI address to map. Fix this by introducing the helper function rockchip_pcie_ep_ob_atu_num_bits() to correctly calculate the number of mask bits to use to program the address translation unit. The number of mask bits is calculated depending on both the PCI address and size of the mapping, and clamped between 8 and 20 using the macros ROCKCHIP_PCIE_AT_MIN_NUM_BITS and ROCKCHIP_PCIE_AT_MAX_NUM_BITS. As defined in the Rockchip RK3399 TRM V1.3 Part2, Sections 17.5.5.1.1 and 17.6.8.2.1, this clamping is necessary because: 1) The lower 8 bits of the PCI address to be mapped by the outbound region are ignored. So a minimum of 8 address bits are needed and imply that the PCI address must be aligned to 256. 2) The outbound memory regions are 1MB in size. So while we can specify up to 63-bits for the PCI address (num_bits filed uses bits 0 to 5 of the outbound address region 0 register), we must limit the number of valid address bits to 20 to match the memory window maximum size (1 << 20 = 1MB). Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/r/20241017015849.190271-2-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
14 daysPCI: endpoint: Fix pci_epc_map map_size kerneldoc stringRick Wertenbroek1-1/+1
Because some endpoint controllers have requirements on the alignment of the controller physical memory address that must be used to map a RC PCI address region, the map PCI start address is not necessarily the desired PCI base address to be mapped. This can result in map_pci_addr being lower than pci_addr as documented. This results in map_size covering the range map_pci_addr..pci_addr+pci_size. The old text had the pci_addr twice instead of map_pci_addr..pci_addr, so replace the erroneous kerneldoc string to reflect the actual range. Link: https://lore.kernel.org/r/20241114161032.3046202-1-rick.wertenbroek@gmail.com Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
14 daysPCI: endpoint: Clear secondary (not primary) EPC in pci_epc_remove_epf()Zijun Hu1-3/+3
In addition to a primary endpoint controller, an endpoint function may be associated with a secondary endpoint controller, epf->sec_epc, to provide NTB (non-transparent bridge) functionality. Previously, pci_epc_remove_epf() incorrectly cleared epf->epc instead of epf->sec_epc when removing from the secondary endpoint controller. Extend the epc->list_lock coverage and clear either epf->epc or epf->sec_epc as indicated. Link: https://lore.kernel.org/r/20241107-epc_rfc-v2-2-da5b6a99a66f@quicinc.com Fixes: 63840ff53223 ("PCI: endpoint: Add support to associate secondary EPC with EPF") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [mani: reworded subject and description] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org
14 daysPCI: endpoint: Fix PCI domain ID release in pci_epc_destroy()Zijun Hu1-3/+2
pci_epc_destroy() invokes pci_bus_release_domain_nr() to release the PCI domain ID, but there are two issues: - 'epc->dev' is passed to pci_bus_release_domain_nr() which was already freed by device_unregister(), leading to a use-after-free issue. - Domain ID corresponds to the EPC device parent, so passing 'epc->dev' is also wrong. Fix these issues by passing 'epc->dev.parent' to pci_bus_release_domain_nr() and also do it before device_unregister(). Fixes: 0328947c5032 ("PCI: endpoint: Assign PCI domain number for endpoint controllers") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241107-epc_rfc-v2-1-da5b6a99a66f@quicinc.com [mani: reworded subject and description] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org
2024-11-16PCI: endpoint: epf-mhi: Avoid NULL dereference if DT lacks 'mmio'Zhongqiu Han1-0/+6
If platform_get_resource_byname() fails and returns NULL because DT lacks an 'mmio' property for the MHI endpoint, dereferencing res->start will cause a NULL pointer access. Add a check to prevent it. Fixes: 1bf5f25324f7 ("PCI: endpoint: Add PCI Endpoint function driver for MHI bus") Link: https://lore.kernel.org/r/20241105120735.1240728-1-quic_zhonhan@quicinc.com Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> [kwilczynski: error message update per the review feedback] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
2024-11-16PCI: endpoint: Remove surplus return statement from ↵Wang Jiang1-2/+0
pci_epf_test_clean_dma_chan() Remove a surplus return statement from the void function that has been added in the commit commit 8353813c88ef ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities"). Especially, as an empty return statements at the end of a void functions serve little purpose. This fixes the following checkpatch.pl script warning: WARNING: void function return statements are not generally useful #296: FILE: drivers/pci/endpoint/functions/pci-epf-test.c:296: + return; +} Link: https://lore.kernel.org/r/tencent_F250BEE2A65745A524E2EFE70CF615CA8F06@qq.com Signed-off-by: Wang Jiang <jiangwang@kylinos.cn> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2024-11-16PCI: dwc: ep: Use align addr function for dw_pcie_ep_raise_{msi,msix}_irq()Niklas Cassel1-10/+10
Use the dw_pcie_ep_align_addr() function to calculate the alignment in dw_pcie_ep_raise_{msi,msix}_irq() instead of open coding the same. Link: https://lore.kernel.org/r/20241017132052.4014605-6-cassel@kernel.org Link: https://lore.kernel.org/r/20241104205144.409236-2-cassel@kernel.org Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> [kwilczynski: squashed patch that fixes memory map sizes] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-11-04PCI: endpoint: test: Synchronously cancel command handler workDamien Le Moal1-2/+2
Use cancel_delayed_work_sync() in pci_epf_test_epc_deinit() to ensure that the command handler is really stopped before proceeding with DMA and BAR cleanup. The same change is also done in pci_epf_test_link_down() to ensure that the link down handling completes with the command handler fully stopped. Link: https://lore.kernel.org/r/20241017010648.189889-1-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2024-11-04PCI: dwc: endpoint: Implement the pci_epc_ops::align_addr() operationDamien Le Moal1-0/+15
The function dw_pcie_prog_outbound_atu() used to program outbound ATU entries for mapping RC PCI addresses to local CPU addresses does not allow PCI addresses that are not aligned to the value of region_align of struct dw_pcie. This value is determined from the iATU hardware registers during probing of the iATU (done by dw_pcie_iatu_detect()). This value is thus valid for all DWC PCIe controllers, and valid regardless of the hardware configuration used when synthesizing the DWC PCIe controller. Implement the ->align_addr() endpoint controller operation to allow this mapping alignment to be transparently handled by endpoint function drivers through the function pci_epc_mem_map(). Link: https://lore.kernel.org/linux-pci/20241012113246.95634-7-dlemoal@kernel.org Link: https://lore.kernel.org/linux-pci/20241015090712.112674-1-dlemoal@kernel.org Link: https://lore.kernel.org/linux-pci/20241017132052.4014605-5-cassel@kernel.org Co-developed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> [mani: squashed the patch that changed phy_addr_t to u64] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: squashed patch that updated the pci_size variable] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-10-16PCI: endpoint: test: Use pci_epc_mem_map/unmap()Damien Le Moal1-179/+193
Modify the endpoint test driver to use the functions pci_epc_mem_map() and pci_epc_mem_unmap() for the read, write and copy tests. For each test case, the transfer (dma or mmio) are executed in a loop to ensure that potentially partial mappings returned by pci_epc_mem_map() are correctly handled. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241012113246.95634-6-dlemoal@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-10-16PCI: endpoint: Update documentationDamien Le Moal1-0/+29
Document the new functions pci_epc_mem_map() and pci_epc_mem_unmap(). Also add the documentation for the functions pci_epc_map_addr() and pci_epc_unmap_addr() that were missing. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241012113246.95634-5-dlemoal@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-10-16PCI: endpoint: Introduce pci_epc_mem_map()/unmap()Damien Le Moal2-0/+141
Some endpoint controllers have requirements on the alignment of the controller physical memory address that must be used to map a RC PCI address region. For instance, the endpoint controller of the RK3399 SoC uses at most the lower 20 bits of a physical memory address region as the lower bits of a RC PCI address region. For mapping a PCI address region of size bytes starting from pci_addr, the exact number of address bits used is the number of address bits changing in the address range [pci_addr..pci_addr + size - 1]. For this example, this creates the following constraints: 1) The offset into the controller physical memory allocated for a mapping depends on the mapping size *and* the starting PCI address for the mapping. 2) A mapping size cannot exceed the controller windows size (1MB) minus the offset needed into the allocated physical memory, which can end up being a smaller size than the desired mapping size. Handling these constraints independently of the controller being used in an endpoint function driver is not possible with the current EPC API as only the ->align field in struct pci_epc_features is provided but used for BAR (inbound ATU mappings) mapping only. A new API is needed for function drivers to discover mapping constraints and handle non-static requirements based on the RC PCI address range to access. Introduce the endpoint controller operation ->align_addr() to allow the EPC core functions to obtain the size and the offset into a controller address region that must be allocated and mapped to access a RC PCI address region. The size of the mapping provided by the align_addr() operation can then be used as the size argument for the function pci_epc_mem_alloc_addr() and the offset into the allocated controller memory provided can be used to correctly handle data transfers. For endpoint controllers that have PCI address alignment constraints, the align_addr() operation may indicate upon return an effective PCI address mapping size that is smaller (but not 0) than the requested PCI address region size. The controller ->align_addr() operation is optional: controllers that do not have any alignment constraints for mapping RC PCI address regions do not need to implement this operation. For such controllers, it is always assumed that the mapping size is equal to the requested size of the PCI region and that the mapping offset is 0. The function pci_epc_mem_map() is introduced to use this new controller operation (if it is defined) to handle controller memory allocation and mapping to a RC PCI address region in endpoint function drivers. This function first uses the ->align_addr() controller operation to determine the controller memory address size (and offset into) needed for mapping an RC PCI address region. The result of this operation is used to allocate a controller physical memory region using pci_epc_mem_alloc_addr() and then to map that memory to the RC PCI address space with pci_epc_map_addr(). Since ->align_addr() () may indicate that not all of a RC PCI address region can be mapped, pci_epc_mem_map() may only partially map the RC PCI address region specified. It is the responsibility of the caller (an endpoint function driver) to handle such smaller mapping by repeatedly using pci_epc_mem_map() over the desried PCI address range. The counterpart of pci_epc_mem_map() to unmap and free a mapped controller memory address region is pci_epc_mem_unmap(). Both functions operate using the new struct pci_epc_map data structure. This new structure represents a mapping PCI address, mapping effective size, the size of the controller memory needed for the mapping as well as the physical and virtual CPU addresses of the mapping (phys_base and virt_base fields). For convenience, the physical and virtual CPU addresses within that mapping to use to access the target RC PCI address region are also provided (phys_addr and virt_addr fields). Endpoint function drivers can use struct pci_epc_map to access the mapped RC PCI address region using the ->virt_addr and ->pci_size fields. Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241012113246.95634-4-dlemoal@kernel.org [mani: squashed the patch that changed phy_addr_t to u64] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-10-12PCI: endpoint: Improve pci_epc_mem_alloc_addr()Damien Le Moal1-3/+6
There is no point in attempting to allocate memory from an endpoint controller memory window if the requested size is larger than the memory window size. Add a check to skip bitmap_find_free_region() calls for such case. This check can be done without the mem->lock mutex held as memory window sizes are constant and never modified at runtime. Also change the final return to return NULL to simplify the code. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241012113246.95634-3-dlemoal@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-10-12PCI: endpoint: Introduce pci_epc_function_is_valid()Damien Le Moal1-48/+31
Introduce the epc core helper function pci_epc_function_is_valid() to verify that an epc pointer, a physical function number and a virtual function number are all valid. This avoids repeating the code pattern: if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions) return err; if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no])) return err; in many functions of the endpoint controller core code. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20241012113246.95634-2-dlemoal@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-09-30Linux 6.12-rc1v6.12-rc1Linus Torvalds1-2/+2
2024-09-29x86: kvm: fix build errorLinus Torvalds1-0/+2
The cpu_emergency_register_virt_callback() function is used unconditionally by the x86 kvm code, but it is declared (and defined) conditionally: #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD) void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback); ... leading to a build error when neither KVM_INTEL nor KVM_AMD support is enabled: arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’: arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration] 12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’: arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration] 12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix the build by defining empty helper functions the same way the old cpu_emergency_disable_virtualization() function was dealt with for the same situation. Maybe we could instead have made the call sites conditional, since the callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak fallback. I'll leave that to the kvm people to argue about, this at least gets the build going for that particular config. Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-28Reduce Coccinelle choices in string_choices.cocciJulia Lawall1-50/+41
The isomorphism neg_if_exp negates the test of a ?: conditional, making it unnecessary to have an explicit case for a negated test with the branches inverted. At the same time, we can disable neg_if_exp in cases where a different API function may be more suitable for a negated test. Finally, in the non-patch cases, E matches an expression with parentheses around it, so there is no need to mention () explicitly in the pattern. The () are still needed in the patch cases, because we want to drop them, if they are present. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Remove unnecessary parentheses for only one possible change.Hongbo Li1-8/+0
The parentheses are only needed if there is a disjunction, ie a set of possible changes. If there is only one pattern, we can remove these parentheses. Just like the format: - x + y not: ( - x + y ) Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_yes_no() replacementsHongbo Li1-0/+19
As other rules done, we add rules for str_yes_no() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_on_off() replacementsHongbo Li1-0/+19
As other rules done, we add rules for str_on_off() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_write_read() replacementsHongbo Li1-0/+19
As other rules done, we add rules for str_write_read() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_read_write() replacementsHongbo Li1-0/+19
As other rules done, we add rules for str_read_write() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_enable{d}_disable{d}() replacementsHongbo Li1-0/+38
As other rules done, we add rules for str_enable{d}_ disable{d}() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_lo{w}_hi{gh}() replacementsHongbo Li1-0/+38
As other rules done, we add rules for str_lo{w}_hi{gh}() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_hi{gh}_lo{w}() replacementsHongbo Li1-0/+42
As other rules done, we add rules for str_hi{gh}_lo{w}() to check the relative opportunities. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_false_true() replacementsHongbo Li1-0/+19
As done with str_true_false(), add checks for str_false_true() opportunities. A simple test can find over 9 cases currently exist in the tree. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28coccinelle: Add rules to find str_true_false() replacementsHongbo Li1-0/+19
After str_true_false() has been introduced in the tree, we can add rules for finding places where str_true_false() can be used. A simple test can find over 10 locations. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-09-28bcachefs: check_subvol_path() now prints subvol root inodeKent Overstreet2-20/+14
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: remove_backpointer() now checks if dirent points to inodeKent Overstreet1-6/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: dirent_points_to_inode() now warns on mismatchKent Overstreet1-28/+56
if an inode backpointer points to a dirent that doesn't point back, that's an error we should warn about. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix lost wake upAlan Huang1-3/+9
If the reader acquires the read lock and then the writer enters the slow path, while the reader proceeds to the unlock path, the following scenario can occur without the change: writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0) reader: this_cpu_dec(*lock->readers) reader: smp_mb() reader: state = atomic_read(&lock->state) (there is no waiting flag set) writer: six_set_bitmask() then the writer will sleep forever. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Check for logged ops when cleanKent Overstreet2-3/+13
If we shut down successfully, there shouldn't be any logged ops to resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: BCH_FS_clean_recoveryKent Overstreet4-3/+8
Add a filesystem flag to indicate whether we did a clean recovery - using c->sb.clean after we've got rw is incorrect, since c->sb is updated whenever we write the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Convert disk accounting BUG_ON() to WARN_ON()Kent Overstreet1-1/+1
We had a bug where disk accounting keys didn't always have their version field set in journal replay; change the BUG_ON() to a WARN(), and exclude this case since it's now checked for elsewhere (in the bkey validate function). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_applyKent Overstreet1-16/+20
This was added to avoid double-counting accounting keys in journal replay. But applied incorrectly (easily done since it applies to the transaction commit, not a particular update), it leads to skipping in-mem accounting for real accounting updates, and failure to give them a version number - which leads to journal replay becoming very confused the next time around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Check for accounting keys with bversion=0Kent Overstreet3-3/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: rename version -> bversionKent Overstreet17-30/+30
give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Don't delete unlinked inodes before logged op resumeKent Overstreet4-21/+36
Previously, check_inode() would delete unlinked inodes if they weren't on the deleted list - this code dating from before there was a deleted list. But, if we crash during a logged op (truncate or finsert/fcollapse) of an unlinked file, logged op resume will get confused if the inode has already been deleted - instead, just add it to the deleted list if it needs to be there; delete_dead_inodes runs after logged op resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix BCH_SB_ERRS() so we can reorderKent Overstreet5-8/+9
BCH_SB_ERRS() has a field for the actual enum val so that we can reorder to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix fsck warnings from bkey validationKent Overstreet2-2/+14
__bch2_fsck_err() warns if the current task has a btree_trans object and it wasn't passed in, because if it has to prompt for user input it has to be able to unlock it. But plumbing the btree_trans through bkey_validate(), as well as transaction restarts, is problematic - so instead make bkey fsck errors FSCK_AUTOFIX, which doesn't need to warn. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Move transaction commit path validation to as late as possibleKent Overstreet1-34/+34
In order to check for accounting keys with version=0, we need to run validation after they've been assigned version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix disk accounting attempting to mark invalid replicas entryKent Overstreet1-3/+18
This fixes the following bug, where a disk accounting key has an invalid replicas entry, and we attempt to add it to the superblock: bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644 bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read... accounting not marked in superblock replicas replicas cached: 1/1 [0], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 88): user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4] bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644 accounting not marked in superblock replicas replicas user: 1/1 [3], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 96): user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] accounting not marked in superblock replicas replicas user: 1/2 [3 7], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7] replicas_v0 (size 96): user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5] done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()Kent Overstreet3-6/+16
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix accounting read + device removalKent Overstreet1-18/+29
accounting read was checking if accounting replicas entries were marked in the superblock prior to applying accounting from the journal, which meant that a recently removed device could spuriously trigger a "not marked in superblocked" error (when journal entries zero out the offending counter). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: bch_accounting_modeKent Overstreet3-13/+26
Minor refactoring - replace multiple bool arguments with an enum; prep work for fixing a bug in accounting read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: fix transaction restart handling in check_extents(), check_dirents()Kent Overstreet1-39/+55
Dealing with outside state within a btree transaction is always tricky. check_extents() and check_dirents() have to accumulate counters for i_sectors and i_nlink (for subdirectories). There were two bugs: - transaction commit may return a restart; therefore we have to commit before accumulating to those counters - get_inode_all_snapshots() may return a transaction restart, before updating w->last_pos; then, on the restart, check_i_sectors()/check_subdir_count() would see inodes that were not for w->last_pos Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: kill inode_walker_entry.seen_this_posKent Overstreet1-6/+0
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: Fix incorrect IS_ERR_OR_NULL usageKent Overstreet1-1/+1
Returning a positive integer instead of an error code causes error paths to become very confused. Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: fix the memory leak in exception caseHongbo Li1-0/+1
The pointer clean points the memory allocated by kmemdup, when the return value of bch2_sb_clean_validate_late is not zero. The memory pointed by clean is leaked. So we should free it in this case. Fixes: a37ad1a3aba9 ("bcachefs: sb-clean.c") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28bcachefs: fast exit when darray_make_room failedHongbo Li1-1/+3
In downgrade_table_extra, the return value is needed. When it return failed, we should exit immediately. Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>