summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* spi: axi-spi-engine: move msg finalization out of irq handlerDavid Lechner2024-02-071-25/+15
| | | | | | | | | | | | | | | | | | | | | | As a general principal, it is best to do as little as possible in an interrupt handler. This patch reworks the AXI SPI Engine driver to move timer_delete_sync() and spi_finalize_current_message() out of the interrupt handler. Instead, spi_finalize_current_message() is moved to the transfer_one_message function (similar to nearly all other SPI controllers). A completion is now used to wait for the sync interrupt that indicates that the message is complete. The watchdog timer is no longer needed since we can use the wait_for_completion_timeout() function to wait for the message to complete with the same effect. As a bonus, these changes also improve throughput of the SPI controller. For example, this was tested on a ZynqMP with a 80MHz SCLK reading 4 byte samples from an ADC. The max measured throughput increased from 26k to 28k samples per second. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240207-axi-spi-engine-round-2-1-v2-2-40c0b4e85352@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: axi-spi-engine: remove use of ida for sync idDavid Lechner2024-02-071-22/+6
| | | | | | | | | | | | | | | | | | | | | Profiling has shown that ida_alloc_range() accounts for about 10% of the time spent in spi_sync() when using the AXI SPI Engine controller. This call is used to create a unique id for each SPI message to match to an IRQ when the message is complete. Since the core SPI code serializes messages in a message queue, we can only have one message in flight at a time, namely host->cur_msg. This means that we can use a fixed value instead of a unique id for each message since there can never be more than one message pending at a time. This patch removes the use of ida for the sync id and replaces it with a constant value. This simplifies the driver and improves performance. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240207-axi-spi-engine-round-2-1-v2-1-40c0b4e85352@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: drop gpf arg from __spi_split_transfer_maxsize()David Lechner2024-02-073-21/+11
| | | | | | | | | | | | | | | | The __spi_split_transfer_maxsize() function has a gpf argument to allow callers to specify the type of memory allocation that needs to be used. However, this function only allocates struct spi_transfer and is not intended to be used from atomic contexts so this type should always be GFP_KERNEL, so we can just drop the argument. Some callers of these functions also passed GFP_DMA, but since only struct spi_transfer is allocated and not any tx/rx buffers, this is not actually necessary and is removed in this commit. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240206200648.1782234-1-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: fsl-dspi: A couple of error handlingMark Brown2024-02-061-10/+5
|\ | | | | | | | | | | | | | | Merge series from andy.shevchenko@gmail.com: A couple of error handling improvements here: - unshadowing error code from dmaengine_slave_config() - making error messages uniform
| * spi: fsl-dspi: Unify error messaging in dspi_request_dma()andy.shevchenko@gmail.com2024-02-051-8/+5
| | | | | | | | | | | | | | | | | | | | | | Use dev_err_probe() for all messages in dspi_request_dma() for the sake of making them uniform. While at it, fix indentation issue reported by Vladimir Oltean. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240204203127.1186621-3-andy.shevchenko@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
| * spi: fsl-dspi: Preserve error code returned by dmaengine_slave_config()andy.shevchenko@gmail.com2024-02-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | dmaengine_slave_config() may return different error codes based on the circumstances. Preserve it instead of shadowing to -EINVAL. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240204203127.1186621-2-andy.shevchenko@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: spi-rockchip: Fix max_native_csLuis de Arquer2024-02-051-3/+2
| | | | | | | | | | | | Signed-off-by: Luis de Arquer <luis.dearquer@inertim.com> Link: https://lore.kernel.org/r/4d18808e85b85077761c5655083f20ebfd7d3770.camel@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: spi-rockchip: Fix out of bounds array accessLuis de Arquer2024-02-051-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since spi-rockchip enables use_gpio_descriptors and the SPI_CONTROLLER_GPIO_SS flag, the spi subsytem may call set_cs() for spi devices with indexes above ROCKCHIP_SPI_MAX_CS_NUM Remove array cs_asserted[] which held a shadow copy of the state of the chip select lines with the only purpose of optimizing out rewriting a chip select line to the current state (no-op) This case is already handled by spi.c Signed-off-by: Luis de Arquer <luis.dearquer@inertim.com> Link: https://lore.kernel.org/r/d0a0c4b94f933f7f43973c34765214303ee82b77.camel@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: intel: Keep the BIOS partition inside the first chipMika Westerberg2024-02-051-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are two flash chips connected flash regions can refer to the second chip too. In this case we may see the following warning: mtd: partition "BIOS" extends beyond the end of device "0000:00:1f.5" -- size truncated to 0x400000 For this reason, check the BIOS partition size against the chip size and make sure it stays within the that. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20240201121638.207632-2-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: intel: Add default partition and name to the second chipMika Westerberg2024-02-051-1/+18
| | | | | | | | | | | | | | | | | | This should make it easier to identify the second chip and also allows using "mtdparts=" and the like with this chip too. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20240201121638.207632-1-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: axi-spi-engine: use common AXI macrosDavid Lechner2024-02-051-11/+6
| | | | | | | | | | | | | | | | | | | | This avoid duplicating the same macros in multiple drivers by reusing the common AXI macros for the version register. Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240202213132.3863124-2-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: bcm2835: implement ctlr->max_transfer_sizeDavid Lechner2024-02-051-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core SPI code will handle splitting transfers if needed as long as ctlr->max_transfer_size is implemented. It does this in __spi_pump_transfer_message() immediately before calling ctlr->prepare_message. So effectively, this change does not alter the behavior of the driver. Also, several peripheral drivers make use of spi_max_transfer_size(), so this should improve compatibility with those drivers. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240126220024.3926403-2-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: move split xfers for CS_WORD emulationDavid Lechner2024-02-051-32/+31
|/ | | | | | | | | | | | | | | | | | | | | | | | This moves splitting transfers for CS_WORD software emulation to the same place where we split transfers for controller-specific reasons. This fixes a few subtle bugs. The calculation for maxsize was wrong for bit sizes between 17 and 24. This is fixed by making use of spi_split_transfers_maxwords() which already has the correct calculation. Also, since this indirectly calls spi_res_alloc(), to avoid leaking resources, spi_finalize_current_message() would need to be called on all error paths in __spi_validate() and callers of __spi_validate() would need to do the same. This is fixed by moving the call to __spi_pump_transfer_message() where it is already splitting transfers for other reasons and correctly releases resources in the subsequent error paths. Fixes: cbaa62e0094a ("spi: add software implementation for SPI_CS_WORD") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240126212358.3916280-2-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: cs42l43: Clean up of firmware nodeCharles Keepax2024-02-021-6/+16
| | | | | | | | | | | As we get a child node in the OF case, we should also clean up the reference, add code to do so. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://msgid.link/r/20240202103430.951598-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: reorder spi_message struct member doc commentsDavid Lechner2024-01-311-2/+2
| | | | | | | | | | | | The members of `struct spi_message` were reordered in commit ae2ade4ba581 ("spi: Reorder fields in 'struct spi_message'") but the documentation comments were not updated to match. This commit updates the comments to match the new order. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://msgid.link/r/20240131170732.1665105-1-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: Kconfig: cap[c]abilityAhelenia Ziemiańska2024-01-291-1/+1
| | | | | | Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Link: https://msgid.link/r/lq6gstev3sd7i4iw2btiq3gg7lhsraj5w74fkbp6lpbl6nkyff@tarta.nabijaczleweli.xyz Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: avoid double validation in __spi_sync()David Lechner2024-01-261-52/+6
| | | | | | | | | | | | | | | | The __spi_sync() function calls __spi_validate() early in the function. Later, it can call spi_async_locked() which calls __spi_validate() again. __spi_validate() is an expensive function, so we can improve performance measurably by avoiding calling it twice. Instead of calling spi_async_locked(), we can call __spi_async() with the spin lock held. spi_async_locked() is removed since there are no more callers. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://msgid.link/r/20240125234732.3530278-2-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: cs42l43: Handle error from devm_pm_runtime_enable()Charles Keepax2024-01-251-1/+4
| | | | | | | | | | | As it devm_pm_runtime_enable() can fail due to memory allocations, it is best to handle the error. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://msgid.link/r/20240125103426.2622549-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
* arm64: exynos: Enable SPI for Exynos850Mark Brown2024-01-242-0/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge series from Sam Protsenko <semen.protsenko@linaro.org>: This series enables SPI for Exynos850 SoC, there are also some dependencies that will be needed at runtime which were sent as part of the same series but will be separately applied: 1. Enable PDMA, it's needed for SPI (dts, clk) 2. Propagate SPI src clock rate change up to DIV clocks, to make it possible to change SPI frequency (clk driver) 4. Add SPI nodes to Exynos850 SoC dtsi All SPI instances were tested using `spidev_test' tool in all 3 possible modes: - Polling mode: xfer_size <= 32 - IRQ mode: 64 >= xfer_size >= 32 - DMA mode: xfer_size > 64 with 200 kHz ... 49.9 MHz SPI frequencies. The next 3 approaches were used: 1. Software loopback ('-l' option for `spidev_test' tool) 2. Hardware loopback (by connecting MISO line to MOSI) 3. By communicating with ATMega found on Sensors Mezzanine board [1], programmed to act as an SPI device and all the transactions were additionally checked on my Logic Analyzer to make sure the SCK frequencies were actually correct. [1] https://www.96boards.org/product/sensors-mezzanine/
| * spi: dt-bindings: samsung: Add Exynos850 SPISam Protsenko2024-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Document samsung,exynos850-spi compatible which will be used on Exynos850 SoC. Exynos850 doesn't have ioclk, so only two clocks are needed (bus clock and functional SPI clock). Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://msgid.link/r/20240120012948.8836-3-semen.protsenko@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
| * spi: s3c64xx: Add Exynos850 supportSam Protsenko2024-01-241-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add SPI port configuration for Exynos850 SoC. It has 3 USI blocks which can be configured in SPI mode: * spi_0: BLK_PERI_SPI_0 (0x13940000) * spi_1: BLK_ALIVE_USI_CMGP00 (0x11d00000) * spi_2: BLK_ALIVE_USI_CMGP01 (0x11d20000) SPI FIFO depth is 64 bytes for all those SPI blocks, so the .fifo_lvl_mask value is set to 0x7f. All blocks have DIV_4 as the default internal clock divider, and an internal loopback mode to run a loopback test. Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://msgid.link/r/20240120012948.8836-6-semen.protsenko@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
* | spi: consolidate setting message->spiDavid Lechner2024-01-241-7/+2
|/ | | | | | | | | | | | | | | | Previously, __spi_sync() and __spi_async() set message->spi to the spi device independently after calling __spi_validate(). __spi_validate() also would conditionally set this if it needed to split the message since it wasn't set yet. Since both __spi_sync() and __spi_async() call __spi_validate(), we can consolidate this into only setting message->spi once (unconditionally) in __spi_validate(). This will also save any future callers of __spi_validate() from also needing to set message->spi. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://msgid.link/r/20240123214946.2616786-1-dlechner@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
* dt-bindings: spi: nxp-fspi: support i.MX93 and i.MX95Peng Fan2024-01-221-6/+12
| | | | | | | | | | | Add i.MX93/95 flexspi compatible strings, which are compatible with i.MX8MM Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Han Xu <han.xu@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://msgid.link/r/20240122091510.2077498-2-peng.fan@oss.nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
* dt-bindings: spi: fsl-lpspi: support i.MX95 LPSPIPeng Fan2024-01-221-0/+1
| | | | | | | | | | Add i.MX95 LPSPI compatible string, same as i.MX93 compatible with i.MX7ULP Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://msgid.link/r/20240122091510.2077498-1-peng.fan@oss.nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: s3c64xx: Extract FIFO depth calculation to a dedicated macroSam Protsenko2024-01-221-7/+6
| | | | | | | | | | Simplify the code by extracting all cases of FIFO depth calculation into a dedicated macro. No functional change. Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://msgid.link/r/20240120170001.3356-1-semen.protsenko@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: nxp-fspi: Adjust LUT debug output alignmentAlexander Stein2024-01-221-1/+1
| | | | | | | | | Ensure the command value and LUT entry values have a fixed width. This way consecutive output lines can be read much easier. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://msgid.link/r/20240118121016.3734770-1-alexander.stein@ew.tq-group.com Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: make spi_bus_type constGreg Kroah-Hartman2024-01-222-2/+2
| | | | | | | | | | | | Now that the driver core can properly handle constant struct bus_type, move the spi_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Mark Brown <broonie@kernel.org> Cc: <linux-spi@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://msgid.link/r/2024010549-erasure-swoop-1cc6@gregkh Signed-off-by: Mark Brown <broonie@kernel.org>
* spi: spi-mt65xx: Support sleep pin controlRuihai Zhou2024-01-221-0/+5
| | | | | | | | | Supports configuring sleep pin control during system suspend to prevent potential power leakage and additional power consumption. Signed-off-by: Ruihai Zhou <zhouruihai@huaqin.corp-partner.google.com> Link: https://msgid.link/r/20240108120802.7601-1-zhouruihai@huaqin.corp-partner.google.com Signed-off-by: Mark Brown <broonie@kernel.org>
* Linux 6.8-rc1v6.8-rc1Linus Torvalds2024-01-211-2/+2
|
* Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-01-2178-1426/+1629
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
| * bcachefs: Improve inode_to_text()Kent Overstreet2024-01-211-7/+18
| | | | | | | | | | | | Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: logged_ops_format.hKent Overstreet2024-01-212-27/+31
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: reflink_format.hKent Overstreet2024-01-213-47/+48
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs; extents_format.hKent Overstreet2024-01-212-279/+284
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: ec_format.hKent Overstreet2024-01-212-16/+20
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: subvolume_format.hKent Overstreet2024-01-212-32/+36
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: snapshot_format.hKent Overstreet2024-01-212-33/+37
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: alloc_background_format.hKent Overstreet2024-01-212-93/+94
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: xattr_format.hKent Overstreet2024-01-212-15/+20
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: dirent_format.hKent Overstreet2024-01-212-39/+43
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: inode_format.hKent Overstreet2024-01-212-164/+167
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs; quota_format.hKent Overstreet2024-01-212-42/+48
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: sb-counters_format.hKent Overstreet2024-01-212-95/+100
| | | | | | | | | | | | bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: counters.c -> sb-counters.cKent Overstreet2024-01-215-8/+7
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: comment bch_subvolumeKent Overstreet2024-01-211-0/+3
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch_snapshot::btimeKent Overstreet2024-01-212-0/+3
| | | | | | | | | | | | | | Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: add missing __GFP_NOWARNKent Overstreet2024-01-211-1/+1
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet2024-01-2111-23/+24
| | | | | | | | | | | | | | | | | | The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-2118-97/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: grab s_umount only if snapshottingSu Yue2024-01-211-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>