| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a general principal, it is best to do as little as possible in an
interrupt handler. This patch reworks the AXI SPI Engine driver to move
timer_delete_sync() and spi_finalize_current_message() out of the
interrupt handler. Instead, spi_finalize_current_message() is moved to
the transfer_one_message function (similar to nearly all other SPI
controllers). A completion is now used to wait for the sync interrupt
that indicates that the message is complete. The watchdog timer is no
longer needed since we can use the wait_for_completion_timeout()
function to wait for the message to complete with the same effect.
As a bonus, these changes also improve throughput of the SPI controller.
For example, this was tested on a ZynqMP with a 80MHz SCLK reading 4
byte samples from an ADC. The max measured throughput increased from
26k to 28k samples per second.
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240207-axi-spi-engine-round-2-1-v2-2-40c0b4e85352@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Profiling has shown that ida_alloc_range() accounts for about 10% of the
time spent in spi_sync() when using the AXI SPI Engine controller. This
call is used to create a unique id for each SPI message to match to an
IRQ when the message is complete.
Since the core SPI code serializes messages in a message queue, we can
only have one message in flight at a time, namely host->cur_msg. This
means that we can use a fixed value instead of a unique id for each
message since there can never be more than one message pending at a
time.
This patch removes the use of ida for the sync id and replaces it with a
constant value. This simplifies the driver and improves performance.
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240207-axi-spi-engine-round-2-1-v2-1-40c0b4e85352@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The __spi_split_transfer_maxsize() function has a gpf argument to allow
callers to specify the type of memory allocation that needs to be used.
However, this function only allocates struct spi_transfer and is not
intended to be used from atomic contexts so this type should always be
GFP_KERNEL, so we can just drop the argument.
Some callers of these functions also passed GFP_DMA, but since only
struct spi_transfer is allocated and not any tx/rx buffers, this is
not actually necessary and is removed in this commit.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240206200648.1782234-1-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| | |
Merge series from andy.shevchenko@gmail.com:
A couple of error handling improvements here:
- unshadowing error code from dmaengine_slave_config()
- making error messages uniform
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use dev_err_probe() for all messages in dspi_request_dma() for the sake of
making them uniform. While at it, fix indentation issue reported by Vladimir
Oltean.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240204203127.1186621-3-andy.shevchenko@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dmaengine_slave_config() may return different error codes based on
the circumstances. Preserve it instead of shadowing to -EINVAL.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240204203127.1186621-2-andy.shevchenko@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Luis de Arquer <luis.dearquer@inertim.com>
Link: https://lore.kernel.org/r/4d18808e85b85077761c5655083f20ebfd7d3770.camel@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since spi-rockchip enables use_gpio_descriptors and the
SPI_CONTROLLER_GPIO_SS flag, the spi subsytem may call set_cs()
for spi devices with indexes above ROCKCHIP_SPI_MAX_CS_NUM
Remove array cs_asserted[] which held a shadow copy of the state
of the chip select lines with the only purpose of optimizing out
rewriting a chip select line to the current state (no-op)
This case is already handled by spi.c
Signed-off-by: Luis de Arquer <luis.dearquer@inertim.com>
Link: https://lore.kernel.org/r/d0a0c4b94f933f7f43973c34765214303ee82b77.camel@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If there are two flash chips connected flash regions can refer to the
second chip too. In this case we may see the following warning:
mtd: partition "BIOS" extends beyond the end of device "0000:00:1f.5" --
size truncated to 0x400000
For this reason, check the BIOS partition size against the chip size and
make sure it stays within the that.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/r/20240201121638.207632-2-mika.westerberg@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This should make it easier to identify the second chip and also allows
using "mtdparts=" and the like with this chip too.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/r/20240201121638.207632-1-mika.westerberg@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This avoid duplicating the same macros in multiple drivers by reusing
the common AXI macros for the version register.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20240202213132.3863124-2-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The core SPI code will handle splitting transfers if needed as long
as ctlr->max_transfer_size is implemented. It does this in
__spi_pump_transfer_message() immediately before calling
ctlr->prepare_message. So effectively, this change does not
alter the behavior of the driver.
Also, several peripheral drivers make use of spi_max_transfer_size(),
so this should improve compatibility with those drivers.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240126220024.3926403-2-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves splitting transfers for CS_WORD software emulation to the
same place where we split transfers for controller-specific reasons.
This fixes a few subtle bugs.
The calculation for maxsize was wrong for bit sizes between 17 and 24.
This is fixed by making use of spi_split_transfers_maxwords() which
already has the correct calculation.
Also, since this indirectly calls spi_res_alloc(), to avoid leaking
resources, spi_finalize_current_message() would need to be called
on all error paths in __spi_validate() and callers of __spi_validate()
would need to do the same. This is fixed by moving the call to
__spi_pump_transfer_message() where it is already splitting transfers
for other reasons and correctly releases resources in the subsequent
error paths.
Fixes: cbaa62e0094a ("spi: add software implementation for SPI_CS_WORD")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240126212358.3916280-2-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
As we get a child node in the OF case, we should also clean up the
reference, add code to do so.
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240202103430.951598-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The members of `struct spi_message` were reordered in commit
ae2ade4ba581 ("spi: Reorder fields in 'struct spi_message'")
but the documentation comments were not updated to match.
This commit updates the comments to match the new order.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://msgid.link/r/20240131170732.1665105-1-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
| |
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Link: https://msgid.link/r/lq6gstev3sd7i4iw2btiq3gg7lhsraj5w74fkbp6lpbl6nkyff@tarta.nabijaczleweli.xyz
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The __spi_sync() function calls __spi_validate() early in the function.
Later, it can call spi_async_locked() which calls __spi_validate()
again. __spi_validate() is an expensive function, so we can improve
performance measurably by avoiding calling it twice.
Instead of calling spi_async_locked(), we can call __spi_async() with
the spin lock held.
spi_async_locked() is removed since there are no more callers.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://msgid.link/r/20240125234732.3530278-2-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
As it devm_pm_runtime_enable() can fail due to memory allocations, it
is best to handle the error.
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240125103426.2622549-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merge series from Sam Protsenko <semen.protsenko@linaro.org>:
This series enables SPI for Exynos850 SoC, there are also some
dependencies that will be needed at runtime which were sent as part of
the same series but will be separately applied:
1. Enable PDMA, it's needed for SPI (dts, clk)
2. Propagate SPI src clock rate change up to DIV clocks, to make it
possible to change SPI frequency (clk driver)
4. Add SPI nodes to Exynos850 SoC dtsi
All SPI instances were tested using `spidev_test' tool in all 3 possible
modes:
- Polling mode: xfer_size <= 32
- IRQ mode: 64 >= xfer_size >= 32
- DMA mode: xfer_size > 64
with 200 kHz ... 49.9 MHz SPI frequencies. The next 3 approaches were
used:
1. Software loopback ('-l' option for `spidev_test' tool)
2. Hardware loopback (by connecting MISO line to MOSI)
3. By communicating with ATMega found on Sensors Mezzanine board [1],
programmed to act as an SPI device
and all the transactions were additionally checked on my Logic Analyzer
to make sure the SCK frequencies were actually correct.
[1] https://www.96boards.org/product/sensors-mezzanine/
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Document samsung,exynos850-spi compatible which will be used on
Exynos850 SoC. Exynos850 doesn't have ioclk, so only two clocks are
needed (bus clock and functional SPI clock).
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://msgid.link/r/20240120012948.8836-3-semen.protsenko@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add SPI port configuration for Exynos850 SoC. It has 3 USI blocks which
can be configured in SPI mode:
* spi_0: BLK_PERI_SPI_0 (0x13940000)
* spi_1: BLK_ALIVE_USI_CMGP00 (0x11d00000)
* spi_2: BLK_ALIVE_USI_CMGP01 (0x11d20000)
SPI FIFO depth is 64 bytes for all those SPI blocks, so the
.fifo_lvl_mask value is set to 0x7f. All blocks have DIV_4 as the
default internal clock divider, and an internal loopback mode to run
a loopback test.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://msgid.link/r/20240120012948.8836-6-semen.protsenko@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, __spi_sync() and __spi_async() set message->spi to the spi
device independently after calling __spi_validate(). __spi_validate()
also would conditionally set this if it needed to split the message
since it wasn't set yet.
Since both __spi_sync() and __spi_async() call __spi_validate(), we can
consolidate this into only setting message->spi once (unconditionally)
in __spi_validate(). This will also save any future callers of
__spi_validate() from also needing to set message->spi.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://msgid.link/r/20240123214946.2616786-1-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Add i.MX93/95 flexspi compatible strings, which are compatible with
i.MX8MM
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Han Xu <han.xu@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://msgid.link/r/20240122091510.2077498-2-peng.fan@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
Add i.MX95 LPSPI compatible string, same as i.MX93 compatible with
i.MX7ULP
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://msgid.link/r/20240122091510.2077498-1-peng.fan@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
Simplify the code by extracting all cases of FIFO depth calculation into
a dedicated macro. No functional change.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://msgid.link/r/20240120170001.3356-1-semen.protsenko@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
| |
Ensure the command value and LUT entry values have a fixed width. This
way consecutive output lines can be read much easier.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://msgid.link/r/20240118121016.3734770-1-alexander.stein@ew.tq-group.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the driver core can properly handle constant struct bus_type,
move the spi_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Mark Brown <broonie@kernel.org>
Cc: <linux-spi@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://msgid.link/r/2024010549-erasure-swoop-1cc6@gregkh
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
|
|
|
|
|
| |
Supports configuring sleep pin control during system suspend to prevent
potential power leakage and additional power consumption.
Signed-off-by: Ruihai Zhou <zhouruihai@huaqin.corp-partner.google.com>
Link: https://msgid.link/r/20240108120802.7601-1-zhouruihai@huaqin.corp-partner.google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull more bcachefs updates from Kent Overstreet:
"Some fixes, Some refactoring, some minor features:
- Assorted prep work for disk space accounting rewrite
- BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
makes our trigger context more explicit
- A few fixes to avoid excessive transaction restarts on
multithreaded workloads: fstests (in addition to ktest tests) are
now checking slowpath counters, and that's shaking out a few bugs
- Assorted tracepoint improvements
- Starting to break up bcachefs_format.h and move on disk types so
they're with the code they belong to; this will make room to start
documenting the on disk format better.
- A few minor fixes"
* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
bcachefs: Improve inode_to_text()
bcachefs: logged_ops_format.h
bcachefs: reflink_format.h
bcachefs; extents_format.h
bcachefs: ec_format.h
bcachefs: subvolume_format.h
bcachefs: snapshot_format.h
bcachefs: alloc_background_format.h
bcachefs: xattr_format.h
bcachefs: dirent_format.h
bcachefs: inode_format.h
bcachefs; quota_format.h
bcachefs: sb-counters_format.h
bcachefs: counters.c -> sb-counters.c
bcachefs: comment bch_subvolume
bcachefs: bch_snapshot::btime
bcachefs: add missing __GFP_NOWARN
bcachefs: opts->compression can now also be applied in the background
bcachefs: Prep work for variable size btree node buffers
bcachefs: grab s_umount only if snapshotting
...
|
| |
| |
| |
| |
| |
| | |
Add line breaks - inode_to_text() is now much easier to read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| |
| |
| | |
bcachefs_format.h has gotten too big; let's do some organizing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| |
| |
| |
| | |
Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.
And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.
Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.
$ cat test.sh
prog=bcachefs
$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots
while true;do
$prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
sleep 1s
done
$ cat /etc/mongodb.conf
systemLog:
destination: file
logAppend: true
path: /mnt/data/mongod.log
storage:
dbPath: /mnt/data/
lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
which lock already depends on the new lock.
[ 3437.456553]
the existing dependency chain (in reverse order) is:
[ 3437.457054]
-> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507] down_read+0x3e/0x170
[ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206] __x64_sys_ioctl+0x93/0xd0
[ 3437.458498] do_syscall_64+0x42/0xf0
[ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
-> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615] down_read+0x3e/0x170
[ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686] notify_change+0x1f1/0x4a0
[ 3437.461283] do_truncate+0x7f/0xd0
[ 3437.461555] path_openat+0xa57/0xce0
[ 3437.461836] do_filp_open+0xb4/0x160
[ 3437.462116] do_sys_openat2+0x91/0xc0
[ 3437.462402] __x64_sys_openat+0x53/0xa0
[ 3437.462701] do_syscall_64+0x42/0xf0
[ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
-> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843] down_write+0x3b/0xc0
[ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493] vfs_write+0x21b/0x4c0
[ 3437.464653] ksys_write+0x69/0xf0
[ 3437.464839] do_syscall_64+0x42/0xf0
[ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
-> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471] __lock_acquire+0x1455/0x21b0
[ 3437.465656] lock_acquire+0xc6/0x2b0
[ 3437.465822] mnt_want_write+0x46/0x1a0
[ 3437.465996] filename_create+0x62/0x190
[ 3437.466175] user_path_create+0x2d/0x50
[ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617] __x64_sys_ioctl+0x93/0xd0
[ 3437.466791] do_syscall_64+0x42/0xf0
[ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
other info that might help us debug this:
[ 3437.469670] 2 locks held by bcachefs/35533:
other info that might help us debug this:
[ 3437.467507] Chain exists of:
sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
[ 3437.467979] Possible unsafe locking scenario:
[ 3437.468223] CPU0 CPU1
[ 3437.468405] ---- ----
[ 3437.468585] rlock(&type->s_umount_key#48);
[ 3437.468758] lock(&c->snapshot_create_lock);
[ 3437.469030] lock(&type->s_umount_key#48);
[ 3437.469291] rlock(sb_writers#10);
[ 3437.469434]
*** DEADLOCK ***
[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795] <TASK>
[ 3437.471884] dump_stack_lvl+0x57/0x90
[ 3437.472035] check_noncircular+0x132/0x150
[ 3437.472202] __lock_acquire+0x1455/0x21b0
[ 3437.472369] lock_acquire+0xc6/0x2b0
[ 3437.472518] ? filename_create+0x62/0x190
[ 3437.472683] ? lock_is_held_type+0x97/0x110
[ 3437.472856] mnt_want_write+0x46/0x1a0
[ 3437.473025] ? filename_create+0x62/0x190
[ 3437.473204] filename_create+0x62/0x190
[ 3437.473380] user_path_create+0x2d/0x50
[ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819] ? lock_acquire+0xc6/0x2b0
[ 3437.474002] ? __fget_files+0x2a/0x190
[ 3437.474195] ? __fget_files+0xbc/0x190
[ 3437.474380] ? lock_release+0xc5/0x270
[ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090] __x64_sys_ioctl+0x93/0xd0
[ 3437.475277] do_syscall_64+0x42/0xf0
[ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================
In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.
Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().
Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|