summaryrefslogtreecommitdiffstats
path: root/fs/netfs
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-09-16 12:13:31 +0200
committerLinus Torvalds <torvalds@linux-foundation.org>2024-09-16 12:13:31 +0200
commit35219bc5c71f4197c8bd10297597de797c1eece5 (patch)
tree2448156135b78f54cd341a8457ccd84a371ddac7 /fs/netfs
parentMerge tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/v... (diff)
parentdocs: filesystems: corrected grammar of netfs page (diff)
downloadlinux-35219bc5c71f4197c8bd10297597de797c1eece5.tar.xz
linux-35219bc5c71f4197c8bd10297597de797c1eece5.zip
Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner: "This contains the work to improve read/write performance for the new netfs library. The main performance enhancing changes are: - Define a structure, struct folio_queue, and a new iterator type, ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See that patch for questions about naming and form. ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The problem with an xarray is that accessing it requires the use of a lock (typically the RCU read lock) - and this means that we can't supply iterate_and_advance() with a step function that might sleep (crypto for example) without having to drop the lock between pages. ITER_FOLIOQ is the iterator for a chain of folio_queue structs, where each folio_queue holds a small list of folios. A folio_queue struct is a simpler structure than xarray and is not subject to concurrent manipulation by the VM. folio_queue is used rather than a bvec[] as it can form lists of indefinite size, adding to one end and removing from the other on the fly. - Provide a copy_folio_from_iter() wrapper. - Make cifs RDMA support ITER_FOLIOQ. - Use folio queues in the write-side helpers instead of xarrays. - Add a function to reset the iterator in a subrequest. - Simplify the write-side helpers to use sheaves to skip gaps rather than trying to work out where gaps are. - In afs, make the read subrequests asynchronous, putting them into work items to allow the next patch to do progressive unlocking/reading. - Overhaul the read-side helpers to improve performance. - Fix the caching of a partial block at the end of a file. - Allow a store to be cancelled. Then some changes for cifs to make it use folio queues instead of xarrays for crypto bufferage: - Use raw iteration functions rather than manually coding iteration when hashing data. - Switch to using folio_queue for crypto buffers. - Remove the xarray bits. Make some adjustments to the /proc/fs/netfs/stats file such that: - All the netfs stats lines begin 'Netfs:' but change this to something a bit more useful. - Add a couple of stats counters to track the numbers of skips and waits on the per-inode writeback serialisation lock to make it easier to check for this as a source of performance loss. Miscellaneous work: - Ensure that the sb_writers lock is taken around vfs_{set,remove}xattr() in the cachefiles code. - Reduce the number of conditional branches in netfs_perform_write(). - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and remove cifs_post_modify(). - Move the max_len/max_nr_segs members from netfs_io_subrequest to netfs_io_request as they're only needed for one subreq at a time. - Add an 'unknown' source value for tracing purposes. - Remove NETFS_COPY_TO_CACHE as it's no longer used. - Set the request work function up front at allocation time. - Use bh-disabling spinlocks for rreq->lock as cachefiles completion may be run from block-filesystem DIO completion in softirq context. - Remove fs/netfs/io.c" * tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) docs: filesystems: corrected grammar of netfs page cifs: Don't support ITER_XARRAY cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Use iterate_and_advance*() routines directly for hashing netfs: Cancel dirty folios that have no storage destination cachefiles, netfs: Fix write to partial block at EOF netfs: Remove fs/netfs/io.c netfs: Speed up buffered reading afs: Make read subreqs async netfs: Simplify the writeback code netfs: Provide an iterator-reset function netfs: Use new folio_queue data type and iterator instead of xarray iter cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs iov_iter: Provide copy_folio_from_iter() mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios netfs: Use bh-disabling spinlocks for rreq->lock netfs: Set the request work function upon allocation netfs: Remove NETFS_COPY_TO_CACHE netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream ...
Diffstat (limited to 'fs/netfs')
-rw-r--r--fs/netfs/Makefile4
-rw-r--r--fs/netfs/buffered_read.c766
-rw-r--r--fs/netfs/buffered_write.c309
-rw-r--r--fs/netfs/direct_read.c147
-rw-r--r--fs/netfs/internal.h43
-rw-r--r--fs/netfs/io.c804
-rw-r--r--fs/netfs/iterator.c50
-rw-r--r--fs/netfs/main.c7
-rw-r--r--fs/netfs/misc.c94
-rw-r--r--fs/netfs/objects.c16
-rw-r--r--fs/netfs/read_collect.c544
-rw-r--r--fs/netfs/read_pgpriv2.c264
-rw-r--r--fs/netfs/read_retry.c256
-rw-r--r--fs/netfs/stats.c27
-rw-r--r--fs/netfs/write_collect.c246
-rw-r--r--fs/netfs/write_issue.c93
16 files changed, 2163 insertions, 1507 deletions
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 8e6781e0b10b..d08b0bfb6756 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -5,12 +5,14 @@ netfs-y := \
buffered_write.o \
direct_read.o \
direct_write.o \
- io.o \
iterator.o \
locking.o \
main.o \
misc.o \
objects.o \
+ read_collect.o \
+ read_pgpriv2.o \
+ read_retry.o \
write_collect.o \
write_issue.o
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 27c750d39476..c40e226053cc 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -9,266 +9,388 @@
#include <linux/task_io_accounting_ops.h>
#include "internal.h"
-/*
- * [DEPRECATED] Unlock the folios in a read operation for when the filesystem
- * is using PG_private_2 and direct writing to the cache from here rather than
- * marking the page for writeback.
- *
- * Note that we don't touch folio->private in this code.
- */
-static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq,
- size_t *account)
+static void netfs_cache_expand_readahead(struct netfs_io_request *rreq,
+ unsigned long long *_start,
+ unsigned long long *_len,
+ unsigned long long i_size)
{
- struct netfs_io_subrequest *subreq;
- struct folio *folio;
- pgoff_t start_page = rreq->start / PAGE_SIZE;
- pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
- bool subreq_failed = false;
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
- XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+ if (cres->ops && cres->ops->expand_readahead)
+ cres->ops->expand_readahead(cres, _start, _len, i_size);
+}
- /* Walk through the pagecache and the I/O request lists simultaneously.
- * We may have a mixture of cached and uncached sections and we only
- * really want to write out the uncached sections. This is slightly
- * complicated by the possibility that we might have huge pages with a
- * mixture inside.
+static void netfs_rreq_expand(struct netfs_io_request *rreq,
+ struct readahead_control *ractl)
+{
+ /* Give the cache a chance to change the request parameters. The
+ * resultant request must contain the original region.
*/
- subreq = list_first_entry(&rreq->subrequests,
- struct netfs_io_subrequest, rreq_link);
- subreq_failed = (subreq->error < 0);
+ netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size);
- trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2);
+ /* Give the netfs a chance to change the request parameters. The
+ * resultant request must contain the original region.
+ */
+ if (rreq->netfs_ops->expand_readahead)
+ rreq->netfs_ops->expand_readahead(rreq);
- rcu_read_lock();
- xas_for_each(&xas, folio, last_page) {
- loff_t pg_end;
- bool pg_failed = false;
- bool folio_started = false;
+ /* Expand the request if the cache wants it to start earlier. Note
+ * that the expansion may get further extended if the VM wishes to
+ * insert THPs and the preferred start and/or end wind up in the middle
+ * of THPs.
+ *
+ * If this is the case, however, the THP size should be an integer
+ * multiple of the cache granule size, so we get a whole number of
+ * granules to deal with.
+ */
+ if (rreq->start != readahead_pos(ractl) ||
+ rreq->len != readahead_length(ractl)) {
+ readahead_expand(ractl, rreq->start, rreq->len);
+ rreq->start = readahead_pos(ractl);
+ rreq->len = readahead_length(ractl);
- if (xas_retry(&xas, folio))
- continue;
+ trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
+ netfs_read_trace_expanded);
+ }
+}
- pg_end = folio_pos(folio) + folio_size(folio) - 1;
+/*
+ * Begin an operation, and fetch the stored zero point value from the cookie if
+ * available.
+ */
+static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_inode *ctx)
+{
+ return fscache_begin_read_operation(&rreq->cache_resources, netfs_i_cookie(ctx));
+}
- for (;;) {
- loff_t sreq_end;
+/*
+ * Decant the list of folios to read into a rolling buffer.
+ */
+static size_t netfs_load_buffer_from_ra(struct netfs_io_request *rreq,
+ struct folio_queue *folioq)
+{
+ unsigned int order, nr;
+ size_t size = 0;
+
+ nr = __readahead_batch(rreq->ractl, (struct page **)folioq->vec.folios,
+ ARRAY_SIZE(folioq->vec.folios));
+ folioq->vec.nr = nr;
+ for (int i = 0; i < nr; i++) {
+ struct folio *folio = folioq_folio(folioq, i);
+
+ trace_netfs_folio(folio, netfs_folio_trace_read);
+ order = folio_order(folio);
+ folioq->orders[i] = order;
+ size += PAGE_SIZE << order;
+ }
- if (!subreq) {
- pg_failed = true;
- break;
- }
+ for (int i = nr; i < folioq_nr_slots(folioq); i++)
+ folioq_clear(folioq, i);
- if (!folio_started &&
- test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) &&
- fscache_operation_valid(&rreq->cache_resources)) {
- trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
- folio_start_private_2(folio);
- folio_started = true;
- }
+ return size;
+}
- pg_failed |= subreq_failed;
- sreq_end = subreq->start + subreq->len - 1;
- if (pg_end < sreq_end)
- break;
+/*
+ * netfs_prepare_read_iterator - Prepare the subreq iterator for I/O
+ * @subreq: The subrequest to be set up
+ *
+ * Prepare the I/O iterator representing the read buffer on a subrequest for
+ * the filesystem to use for I/O (it can be passed directly to a socket). This
+ * is intended to be called from the ->issue_read() method once the filesystem
+ * has trimmed the request to the size it wants.
+ *
+ * Returns the limited size if successful and -ENOMEM if insufficient memory
+ * available.
+ *
+ * [!] NOTE: This must be run in the same thread as ->issue_read() was called
+ * in as we access the readahead_control struct.
+ */
+static ssize_t netfs_prepare_read_iterator(struct netfs_io_subrequest *subreq)
+{
+ struct netfs_io_request *rreq = subreq->rreq;
+ size_t rsize = subreq->len;
+
+ if (subreq->source == NETFS_DOWNLOAD_FROM_SERVER)
+ rsize = umin(rsize, rreq->io_streams[0].sreq_max_len);
+
+ if (rreq->ractl) {
+ /* If we don't have sufficient folios in the rolling buffer,
+ * extract a folioq's worth from the readahead region at a time
+ * into the buffer. Note that this acquires a ref on each page
+ * that we will need to release later - but we don't want to do
+ * that until after we've started the I/O.
+ */
+ while (rreq->submitted < subreq->start + rsize) {
+ struct folio_queue *tail = rreq->buffer_tail, *new;
+ size_t added;
+
+ new = kmalloc(sizeof(*new), GFP_NOFS);
+ if (!new)
+ return -ENOMEM;
+ netfs_stat(&netfs_n_folioq);
+ folioq_init(new);
+ new->prev = tail;
+ tail->next = new;
+ rreq->buffer_tail = new;
+ added = netfs_load_buffer_from_ra(rreq, new);
+ rreq->iter.count += added;
+ rreq->submitted += added;
+ }
+ }
- *account += subreq->transferred;
- if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
- subreq = list_next_entry(subreq, rreq_link);
- subreq_failed = (subreq->error < 0);
- } else {
- subreq = NULL;
- subreq_failed = false;
- }
+ subreq->len = rsize;
+ if (unlikely(rreq->io_streams[0].sreq_max_segs)) {
+ size_t limit = netfs_limit_iter(&rreq->iter, 0, rsize,
+ rreq->io_streams[0].sreq_max_segs);
- if (pg_end == sreq_end)
- break;
+ if (limit < rsize) {
+ subreq->len = limit;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_limited);
}
+ }
- if (!pg_failed) {
- flush_dcache_folio(folio);
- folio_mark_uptodate(folio);
- }
+ subreq->io_iter = rreq->iter;
- if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
- if (folio->index == rreq->no_unlock_folio &&
- test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
- _debug("no unlock");
- else
- folio_unlock(folio);
+ if (iov_iter_is_folioq(&subreq->io_iter)) {
+ if (subreq->io_iter.folioq_slot >= folioq_nr_slots(subreq->io_iter.folioq)) {
+ subreq->io_iter.folioq = subreq->io_iter.folioq->next;
+ subreq->io_iter.folioq_slot = 0;
}
+ subreq->curr_folioq = (struct folio_queue *)subreq->io_iter.folioq;
+ subreq->curr_folioq_slot = subreq->io_iter.folioq_slot;
+ subreq->curr_folio_order = subreq->curr_folioq->orders[subreq->curr_folioq_slot];
}
- rcu_read_unlock();
+
+ iov_iter_truncate(&subreq->io_iter, subreq->len);
+ iov_iter_advance(&rreq->iter, subreq->len);
+ return subreq->len;
}
-/*
- * Unlock the folios in a read operation. We need to set PG_writeback on any
- * folios we're going to write back before we unlock them.
- *
- * Note that if the deprecated NETFS_RREQ_USE_PGPRIV2 is set then we use
- * PG_private_2 and do a direct write to the cache from here instead.
- */
-void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
+static enum netfs_io_source netfs_cache_prepare_read(struct netfs_io_request *rreq,
+ struct netfs_io_subrequest *subreq,
+ loff_t i_size)
{
- struct netfs_io_subrequest *subreq;
- struct netfs_folio *finfo;
- struct folio *folio;
- pgoff_t start_page = rreq->start / PAGE_SIZE;
- pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
- size_t account = 0;
- bool subreq_failed = false;
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
- XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+ if (!cres->ops)
+ return NETFS_DOWNLOAD_FROM_SERVER;
+ return cres->ops->prepare_read(subreq, i_size);
+}
- if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) {
- __clear_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags);
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
- }
- }
+static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error,
+ bool was_async)
+{
+ struct netfs_io_subrequest *subreq = priv;
- /* Handle deprecated PG_private_2 case. */
- if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
- netfs_rreq_unlock_folios_pgpriv2(rreq, &account);
- goto out;
+ if (transferred_or_error < 0) {
+ netfs_read_subreq_terminated(subreq, transferred_or_error, was_async);
+ return;
}
- /* Walk through the pagecache and the I/O request lists simultaneously.
- * We may have a mixture of cached and uncached sections and we only
- * really want to write out the uncached sections. This is slightly
- * complicated by the possibility that we might have huge pages with a
- * mixture inside.
- */
- subreq = list_first_entry(&rreq->subrequests,
- struct netfs_io_subrequest, rreq_link);
- subreq_failed = (subreq->error < 0);
-
- trace_netfs_rreq(rreq, netfs_rreq_trace_unlock);
+ if (transferred_or_error > 0)
+ subreq->transferred += transferred_or_error;
+ netfs_read_subreq_terminated(subreq, 0, was_async);
+}
- rcu_read_lock();
- xas_for_each(&xas, folio, last_page) {
- loff_t pg_end;
- bool pg_failed = false;
- bool wback_to_cache = false;
+/*
+ * Issue a read against the cache.
+ * - Eats the caller's ref on subreq.
+ */
+static void netfs_read_cache_to_pagecache(struct netfs_io_request *rreq,
+ struct netfs_io_subrequest *subreq)
+{
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
- if (xas_retry(&xas, folio))
- continue;
+ netfs_stat(&netfs_n_rh_read);
+ cres->ops->read(cres, subreq->start, &subreq->io_iter, NETFS_READ_HOLE_IGNORE,
+ netfs_cache_read_terminated, subreq);
+}
- pg_end = folio_pos(folio) + folio_size(folio) - 1;
+/*
+ * Perform a read to the pagecache from a series of sources of different types,
+ * slicing up the region to be read according to available cache blocks and
+ * network rsize.
+ */
+static void netfs_read_to_pagecache(struct netfs_io_request *rreq)
+{
+ struct netfs_inode *ictx = netfs_inode(rreq->inode);
+ unsigned long long start = rreq->start;
+ ssize_t size = rreq->len;
+ int ret = 0;
+
+ atomic_inc(&rreq->nr_outstanding);
+
+ do {
+ struct netfs_io_subrequest *subreq;
+ enum netfs_io_source source = NETFS_DOWNLOAD_FROM_SERVER;
+ ssize_t slice;
+
+ subreq = netfs_alloc_subrequest(rreq);
+ if (!subreq) {
+ ret = -ENOMEM;
+ break;
+ }
- for (;;) {
- loff_t sreq_end;
+ subreq->start = start;
+ subreq->len = size;
+
+ atomic_inc(&rreq->nr_outstanding);
+ spin_lock_bh(&rreq->lock);
+ list_add_tail(&subreq->rreq_link, &rreq->subrequests);
+ subreq->prev_donated = rreq->prev_donated;
+ rreq->prev_donated = 0;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_added);
+ spin_unlock_bh(&rreq->lock);
+
+ source = netfs_cache_prepare_read(rreq, subreq, rreq->i_size);
+ subreq->source = source;
+ if (source == NETFS_DOWNLOAD_FROM_SERVER) {
+ unsigned long long zp = umin(ictx->zero_point, rreq->i_size);
+ size_t len = subreq->len;
+
+ if (subreq->start >= zp) {
+ subreq->source = source = NETFS_FILL_WITH_ZEROES;
+ goto fill_with_zeroes;
+ }
- if (!subreq) {
- pg_failed = true;
+ if (len > zp - subreq->start)
+ len = zp - subreq->start;
+ if (len == 0) {
+ pr_err("ZERO-LEN READ: R=%08x[%x] l=%zx/%zx s=%llx z=%llx i=%llx",
+ rreq->debug_id, subreq->debug_index,
+ subreq->len, size,
+ subreq->start, ictx->zero_point, rreq->i_size);
break;
}
+ subreq->len = len;
+
+ netfs_stat(&netfs_n_rh_download);
+ if (rreq->netfs_ops->prepare_read) {
+ ret = rreq->netfs_ops->prepare_read(subreq);
+ if (ret < 0) {
+ atomic_dec(&rreq->nr_outstanding);
+ netfs_put_subrequest(subreq, false,
+ netfs_sreq_trace_put_cancel);
+ break;
+ }
+ trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
+ }
- wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
- pg_failed |= subreq_failed;
- sreq_end = subreq->start + subreq->len - 1;
- if (pg_end < sreq_end)
+ slice = netfs_prepare_read_iterator(subreq);
+ if (slice < 0) {
+ atomic_dec(&rreq->nr_outstanding);
+ netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_cancel);
+ ret = slice;
break;
-
- account += subreq->transferred;
- if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
- subreq = list_next_entry(subreq, rreq_link);
- subreq_failed = (subreq->error < 0);
- } else {
- subreq = NULL;
- subreq_failed = false;
}
- if (pg_end == sreq_end)
- break;
+ rreq->netfs_ops->issue_read(subreq);
+ goto done;
}
- if (!pg_failed) {
- flush_dcache_folio(folio);
- finfo = netfs_folio_info(folio);
- if (finfo) {
- trace_netfs_folio(folio, netfs_folio_trace_filled_gaps);
- if (finfo->netfs_group)
- folio_change_private(folio, finfo->netfs_group);
- else
- folio_detach_private(folio);
- kfree(finfo);
- }
- folio_mark_uptodate(folio);
- if (wback_to_cache && !WARN_ON_ONCE(folio_get_private(folio) != NULL)) {
- trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
- folio_attach_private(folio, NETFS_FOLIO_COPY_TO_CACHE);
- filemap_dirty_folio(folio->mapping, folio);
- }
+ fill_with_zeroes:
+ if (source == NETFS_FILL_WITH_ZEROES) {
+ subreq->source = NETFS_FILL_WITH_ZEROES;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
+ netfs_stat(&netfs_n_rh_zero);
+ slice = netfs_prepare_read_iterator(subreq);
+ __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+ netfs_read_subreq_terminated(subreq, 0, false);
+ goto done;
}
- if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
- if (folio->index == rreq->no_unlock_folio &&
- test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
- _debug("no unlock");
- else
- folio_unlock(folio);
+ if (source == NETFS_READ_FROM_CACHE) {
+ trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
+ slice = netfs_prepare_read_iterator(subreq);
+ netfs_read_cache_to_pagecache(rreq, subreq);
+ goto done;
}
- }
- rcu_read_unlock();
-out:
- task_io_account_read(account);
- if (rreq->netfs_ops->done)
- rreq->netfs_ops->done(rreq);
-}
+ pr_err("Unexpected read source %u\n", source);
+ WARN_ON_ONCE(1);
+ break;
-static void netfs_cache_expand_readahead(struct netfs_io_request *rreq,
- unsigned long long *_start,
- unsigned long long *_len,
- unsigned long long i_size)
-{
- struct netfs_cache_resources *cres = &rreq->cache_resources;
+ done:
+ size -= slice;
+ start += slice;
+ cond_resched();
+ } while (size > 0);
- if (cres->ops && cres->ops->expand_readahead)
- cres->ops->expand_readahead(cres, _start, _len, i_size);
+ if (atomic_dec_and_test(&rreq->nr_outstanding))
+ netfs_rreq_terminated(rreq, false);
+
+ /* Defer error return as we may need to wait for outstanding I/O. */
+ cmpxchg(&rreq->error, 0, ret);
}
-static void netfs_rreq_expand(struct netfs_io_request *rreq,
- struct readahead_control *ractl)
+/*
+ * Wait for the read operation to complete, successfully or otherwise.
+ */
+static int netfs_wait_for_read(struct netfs_io_request *rreq)
{
- /* Give the cache a chance to change the request parameters. The
- * resultant request must contain the original region.
- */
- netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size);
+ int ret;
- /* Give the netfs a chance to change the request parameters. The
- * resultant request must contain the original region.
- */
- if (rreq->netfs_ops->expand_readahead)
- rreq->netfs_ops->expand_readahead(rreq);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip);
+ wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS, TASK_UNINTERRUPTIBLE);
+ ret = rreq->error;
+ if (ret == 0 && rreq->submitted < rreq->len) {
+ trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read);
+ ret = -EIO;
+ }
- /* Expand the request if the cache wants it to start earlier. Note
- * that the expansion may get further extended if the VM wishes to
- * insert THPs and the preferred start and/or end wind up in the middle
- * of THPs.
- *
- * If this is the case, however, the THP size should be an integer
- * multiple of the cache granule size, so we get a whole number of
- * granules to deal with.
- */
- if (rreq->start != readahead_pos(ractl) ||
- rreq->len != readahead_length(ractl)) {
- readahead_expand(ractl, rreq->start, rreq->len);
- rreq->start = readahead_pos(ractl);
- rreq->len = readahead_length(ractl);
+ return ret;
+}
- trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
- netfs_read_trace_expanded);
- }
+/*
+ * Set up the initial folioq of buffer folios in the rolling buffer and set the
+ * iterator to refer to it.
+ */
+static int netfs_prime_buffer(struct netfs_io_request *rreq)
+{
+ struct folio_queue *folioq;
+ size_t added;
+
+ folioq = kmalloc(sizeof(*folioq), GFP_KERNEL);
+ if (!folioq)
+ return -ENOMEM;
+ netfs_stat(&netfs_n_folioq);
+ folioq_init(folioq);
+ rreq->buffer = folioq;
+ rreq->buffer_tail = folioq;
+ rreq->submitted = rreq->start;
+ iov_iter_folio_queue(&rreq->iter, ITER_DEST, folioq, 0, 0, 0);
+
+ added = netfs_load_buffer_from_ra(rreq, folioq);
+ rreq->iter.count += added;
+ rreq->submitted += added;
+ return 0;
}
/*
- * Begin an operation, and fetch the stored zero point value from the cookie if
- * available.
+ * Drop the ref on each folio that we inherited from the VM readahead code. We
+ * still have the folio locks to pin the page until we complete the I/O.
+ *
+ * Note that we can't just release the batch in each queue struct as we use the
+ * occupancy count in other places.
*/
-static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_inode *ctx)
+static void netfs_put_ra_refs(struct folio_queue *folioq)
{
- return fscache_begin_read_operation(&rreq->cache_resources, netfs_i_cookie(ctx));
+ struct folio_batch fbatch;
+
+ folio_batch_init(&fbatch);
+ while (folioq) {
+ for (unsigned int slot = 0; slot < folioq_count(folioq); slot++) {
+ struct folio *folio = folioq_folio(folioq, slot);
+ if (!folio)
+ continue;
+ trace_netfs_folio(folio, netfs_folio_trace_read_put);
+ if (!folio_batch_add(&fbatch, folio))
+ folio_batch_release(&fbatch);
+ }
+ folioq = folioq->next;
+ }
+
+ folio_batch_release(&fbatch);
}
/**
@@ -289,22 +411,17 @@ static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_in
void netfs_readahead(struct readahead_control *ractl)
{
struct netfs_io_request *rreq;
- struct netfs_inode *ctx = netfs_inode(ractl->mapping->host);
+ struct netfs_inode *ictx = netfs_inode(ractl->mapping->host);
+ unsigned long long start = readahead_pos(ractl);
+ size_t size = readahead_length(ractl);
int ret;
- _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl));
-
- if (readahead_count(ractl) == 0)
- return;
-
- rreq = netfs_alloc_request(ractl->mapping, ractl->file,
- readahead_pos(ractl),
- readahead_length(ractl),
+ rreq = netfs_alloc_request(ractl->mapping, ractl->file, start, size,
NETFS_READAHEAD);
if (IS_ERR(rreq))
return;
- ret = netfs_begin_cache_read(rreq, ctx);
+ ret = netfs_begin_cache_read(rreq, ictx);
if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
goto cleanup_free;
@@ -314,18 +431,15 @@ void netfs_readahead(struct readahead_control *ractl)
netfs_rreq_expand(rreq, ractl);
- /* Set up the output buffer */
- iov_iter_xarray(&rreq->iter, ITER_DEST, &ractl->mapping->i_pages,
- rreq->start, rreq->len);
+ rreq->ractl = ractl;
+ if (netfs_prime_buffer(rreq) < 0)
+ goto cleanup_free;
+ netfs_read_to_pagecache(rreq);
- /* Drop the refs on the folios here rather than in the cache or
- * filesystem. The locks will be dropped in netfs_rreq_unlock().
- */
- while (readahead_folio(ractl))
- ;
+ /* Release the folio refs whilst we're waiting for the I/O. */
+ netfs_put_ra_refs(rreq->buffer);
- netfs_begin_read(rreq, false);
- netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
+ netfs_put_request(rreq, true, netfs_rreq_trace_put_return);
return;
cleanup_free:
@@ -334,6 +448,117 @@ cleanup_free:
}
EXPORT_SYMBOL(netfs_readahead);
+/*
+ * Create a rolling buffer with a single occupying folio.
+ */
+static int netfs_create_singular_buffer(struct netfs_io_request *rreq, struct folio *folio)
+{
+ struct folio_queue *folioq;
+
+ folioq = kmalloc(sizeof(*folioq), GFP_KERNEL);
+ if (!folioq)
+ return -ENOMEM;
+
+ netfs_stat(&netfs_n_folioq);
+ folioq_init(folioq);
+ folioq_append(folioq, folio);
+ BUG_ON(folioq_folio(folioq, 0) != folio);
+ BUG_ON(folioq_folio_order(folioq, 0) != folio_order(folio));
+ rreq->buffer = folioq;
+ rreq->buffer_tail = folioq;
+ rreq->submitted = rreq->start + rreq->len;
+ iov_iter_folio_queue(&rreq->iter, ITER_DEST, folioq, 0, 0, rreq->len);
+ rreq->ractl = (struct readahead_control *)1UL;
+ return 0;
+}
+
+/*
+ * Read into gaps in a folio partially filled by a streaming write.
+ */
+static int netfs_read_gaps(struct file *file, struct folio *folio)
+{
+ struct netfs_io_request *rreq;
+ struct address_space *mapping = folio->mapping;
+ struct netfs_folio *finfo = netfs_folio_info(folio);
+ struct netfs_inode *ctx = netfs_inode(mapping->host);
+ struct folio *sink = NULL;
+ struct bio_vec *bvec;
+ unsigned int from = finfo->dirty_offset;
+ unsigned int to = from + finfo->dirty_len;
+ unsigned int off = 0, i = 0;
+ size_t flen = folio_size(folio);
+ size_t nr_bvec = flen / PAGE_SIZE + 2;
+ size_t part;
+ int ret;
+
+ _enter("%lx", folio->index);
+
+ rreq = netfs_alloc_request(mapping, file, folio_pos(folio), flen, NETFS_READ_GAPS);
+ if (IS_ERR(rreq)) {
+ ret = PTR_ERR(rreq);
+ goto alloc_error;
+ }
+
+ ret = netfs_begin_cache_read(rreq, ctx);
+ if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
+ goto discard;
+
+ netfs_stat(&netfs_n_rh_read_folio);
+ trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_read_gaps);
+
+ /* Fiddle the buffer so that a gap at the beginning and/or a gap at the
+ * end get copied to, but the middle is discarded.
+ */
+ ret = -ENOMEM;
+ bvec = kmalloc_array(nr_bvec, sizeof(*bvec), GFP_KERNEL);
+ if (!bvec)
+ goto discard;
+
+ sink = folio_alloc(GFP_KERNEL, 0);
+ if (!sink) {
+ kfree(bvec);
+ goto discard;
+ }
+
+ trace_netfs_folio(folio, netfs_folio_trace_read_gaps);
+
+ rreq->direct_bv = bvec;
+ rreq->direct_bv_count = nr_bvec;
+ if (from > 0) {
+ bvec_set_folio(&bvec[i++], folio, from, 0);
+ off = from;
+ }
+ while (off < to) {
+ part = min_t(size_t, to - off, PAGE_SIZE);
+ bvec_set_folio(&bvec[i++], sink, part, 0);
+ off += part;
+ }
+ if (to < flen)
+ bvec_set_folio(&bvec[i++], folio, flen - to, to);
+ iov_iter_bvec(&rreq->iter, ITER_DEST, bvec, i, rreq->len);
+ rreq->submitted = rreq->start + flen;
+
+ netfs_read_to_pagecache(rreq);
+
+ if (sink)
+ folio_put(sink);
+
+ ret = netfs_wait_for_read(rreq);
+ if (ret == 0) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+ folio_unlock(folio);
+ netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
+ return ret < 0 ? ret : 0;
+
+discard:
+ netfs_put_request(rreq, false, netfs_rreq_trace_put_discard);
+alloc_error:
+ folio_unlock(folio);
+ return ret;
+}
+
/**
* netfs_read_folio - Helper to manage a read_folio request
* @file: The file to read from
@@ -353,9 +578,13 @@ int netfs_read_folio(struct file *file, struct folio *folio)
struct address_space *mapping = folio->mapping;
struct netfs_io_request *rreq;
struct netfs_inode *ctx = netfs_inode(mapping->host);
- struct folio *sink = NULL;
int ret;
+ if (folio_test_dirty(folio)) {
+ trace_netfs_folio(folio, netfs_folio_trace_read_gaps);
+ return netfs_read_gaps(file, folio);
+ }
+
_enter("%lx", folio->index);
rreq = netfs_alloc_request(mapping, file,
@@ -374,54 +603,12 @@ int netfs_read_folio(struct file *file, struct folio *folio)
trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage);
/* Set up the output buffer */
- if (folio_test_dirty(folio)) {
- /* Handle someone trying to read from an unflushed streaming
- * write. We fiddle the buffer so that a gap at the beginning
- * and/or a gap at the end get copied to, but the middle is
- * discarded.
- */
- struct netfs_folio *finfo = netfs_folio_info(folio);
- struct bio_vec *bvec;
- unsigned int from = finfo->dirty_offset;
- unsigned int to = from + finfo->dirty_len;
- unsigned int off = 0, i = 0;
- size_t flen = folio_size(folio);
- size_t nr_bvec = flen / PAGE_SIZE + 2;
- size_t part;
-
- ret = -ENOMEM;
- bvec = kmalloc_array(nr_bvec, sizeof(*bvec), GFP_KERNEL);
- if (!bvec)
- goto discard;
-
- sink = folio_alloc(GFP_KERNEL, 0);
- if (!sink)
- goto discard;
-
- trace_netfs_folio(folio, netfs_folio_trace_read_gaps);
-
- rreq->direct_bv = bvec;
- rreq->direct_bv_count = nr_bvec;
- if (from > 0) {
- bvec_set_folio(&bvec[i++], folio, from, 0);
- off = from;
- }
- while (off < to) {
- part = min_t(size_t, to - off, PAGE_SIZE);
- bvec_set_folio(&bvec[i++], sink, part, 0);
- off += part;
- }
- if (to < flen)
- bvec_set_folio(&bvec[i++], folio, flen - to, to);
- iov_iter_bvec(&rreq->iter, ITER_DEST, bvec, i, rreq->len);
- } else {
- iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages,
- rreq->start, rreq->len);
- }
+ ret = netfs_create_singular_buffer(rreq, folio);
+ if (ret < 0)
+ goto discard;
- ret = netfs_begin_read(rreq, true);
- if (sink)
- folio_put(sink);
+ netfs_read_to_pagecache(rreq);
+ ret = netfs_wait_for_read(rreq);
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
return ret < 0 ? ret : 0;
@@ -494,13 +681,10 @@ zero_out:
*
* Pre-read data for a write-begin request by drawing data from the cache if
* possible, or the netfs if not. Space beyond the EOF is zero-filled.
- * Multiple I/O requests from different sources will get munged together. If
- * necessary, the readahead window can be expanded in either direction to a
- * more convenient alighment for RPC efficiency or to make storage in the cache
- * feasible.
+ * Multiple I/O requests from different sources will get munged together.
*
* The calling netfs must provide a table of operations, only one of which,
- * issue_op, is mandatory.
+ * issue_read, is mandatory.
*
* The check_write_begin() operation can be provided to check for and flush
* conflicting writes once the folio is grabbed and locked. It is passed a
@@ -528,8 +712,6 @@ int netfs_write_begin(struct netfs_inode *ctx,
pgoff_t index = pos >> PAGE_SHIFT;
int ret;
- DEFINE_READAHEAD(ractl, file, NULL, mapping, index);
-
retry:
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
@@ -577,22 +759,13 @@ retry:
netfs_stat(&netfs_n_rh_write_begin);
trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin);
- /* Expand the request to meet caching requirements and download
- * preferences.
- */
- ractl._nr_pages = folio_nr_pages(folio);
- netfs_rreq_expand(rreq, &ractl);
-
/* Set up the output buffer */
- iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages,
- rreq->start, rreq->len);
-
- /* We hold the folio locks, so we can drop the references */
- folio_get(folio);
- while (readahead_folio(&ractl))
- ;
+ ret = netfs_create_singular_buffer(rreq, folio);
+ if (ret < 0)
+ goto error_put;
- ret = netfs_begin_read(rreq, true);
+ netfs_read_to_pagecache(rreq);
+ ret = netfs_wait_for_read(rreq);
if (ret < 0)
goto error;
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
@@ -652,10 +825,13 @@ int netfs_prefetch_for_write(struct file *file, struct folio *folio,
trace_netfs_read(rreq, start, flen, netfs_read_trace_prefetch_for_write);
/* Set up the output buffer */
- iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages,
- rreq->start, rreq->len);
+ ret = netfs_create_singular_buffer(rreq, folio);
+ if (ret < 0)
+ goto error_put;
- ret = netfs_begin_read(rreq, true);
+ folioq_mark2(rreq->buffer, 0);
+ netfs_read_to_pagecache(rreq);
+ ret = netfs_wait_for_read(rreq);
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
return ret;
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index ca53c5d1622e..d7eae597e54d 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -13,91 +13,22 @@
#include <linux/pagevec.h>
#include "internal.h"
-/*
- * Determined write method. Adjust netfs_folio_traces if this is changed.
- */
-enum netfs_how_to_modify {
- NETFS_FOLIO_IS_UPTODATE, /* Folio is uptodate already */
- NETFS_JUST_PREFETCH, /* We have to read the folio anyway */
- NETFS_WHOLE_FOLIO_MODIFY, /* We're going to overwrite the whole folio */
- NETFS_MODIFY_AND_CLEAR, /* We can assume there is no data to be downloaded. */
- NETFS_STREAMING_WRITE, /* Store incomplete data in non-uptodate page. */
- NETFS_STREAMING_WRITE_CONT, /* Continue streaming write. */
- NETFS_FLUSH_CONTENT, /* Flush incompatible content. */
-};
-
-static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group)
+static void __netfs_set_group(struct folio *folio, struct netfs_group *netfs_group)
{
- void *priv = folio_get_private(folio);
-
- if (netfs_group && (!priv || priv == NETFS_FOLIO_COPY_TO_CACHE))
+ if (netfs_group)
folio_attach_private(folio, netfs_get_group(netfs_group));
- else if (!netfs_group && priv == NETFS_FOLIO_COPY_TO_CACHE)
- folio_detach_private(folio);
}
-/*
- * Decide how we should modify a folio. We might be attempting to do
- * write-streaming, in which case we don't want to a local RMW cycle if we can
- * avoid it. If we're doing local caching or content crypto, we award that
- * priority over avoiding RMW. If the file is open readably, then we also
- * assume that we may want to read what we wrote.
- */
-static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx,
- struct file *file,
- struct folio *folio,
- void *netfs_group,
- size_t flen,
- size_t offset,
- size_t len,
- bool maybe_trouble)
+static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group)
{
- struct netfs_folio *finfo = netfs_folio_info(folio);
- struct netfs_group *group = netfs_folio_group(folio);
- loff_t pos = folio_pos(folio);
-
- _enter("");
-
- if (group != netfs_group && group != NETFS_FOLIO_COPY_TO_CACHE)
- return NETFS_FLUSH_CONTENT;
-
- if (folio_test_uptodate(folio))
- return NETFS_FOLIO_IS_UPTODATE;
-
- if (pos >= ctx->zero_point)
- return NETFS_MODIFY_AND_CLEAR;
-
- if (!maybe_trouble && offset == 0 && len >= flen)
- return NETFS_WHOLE_FOLIO_MODIFY;
-
- if (file->f_mode & FMODE_READ)
- goto no_write_streaming;
-
- if (netfs_is_cache_enabled(ctx)) {
- /* We don't want to get a streaming write on a file that loses
- * caching service temporarily because the backing store got
- * culled.
- */
- goto no_write_streaming;
- }
+ void *priv = folio_get_private(folio);
- if (!finfo)
- return NETFS_STREAMING_WRITE;
-
- /* We can continue a streaming write only if it continues on from the
- * previous. If it overlaps, we must flush lest we suffer a partial
- * copy and disjoint dirty regions.
- */
- if (offset == finfo->dirty_offset + finfo->dirty_len)
- return NETFS_STREAMING_WRITE_CONT;
- return NETFS_FLUSH_CONTENT;
-
-no_write_streaming:
- if (finfo) {
- netfs_stat(&netfs_n_wh_wstream_conflict);
- return NETFS_FLUSH_CONTENT;
+ if (unlikely(priv != netfs_group)) {
+ if (netfs_group && (!priv || priv == NETFS_FOLIO_COPY_TO_CACHE))
+ folio_attach_private(folio, netfs_get_group(netfs_group));
+ else if (!netfs_group && priv == NETFS_FOLIO_COPY_TO_CACHE)
+ folio_detach_private(folio);
}
- return NETFS_JUST_PREFETCH;
}
/*
@@ -177,13 +108,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
.range_end = iocb->ki_pos + iter->count,
};
struct netfs_io_request *wreq = NULL;
- struct netfs_folio *finfo;
- struct folio *folio, *writethrough = NULL;
- enum netfs_how_to_modify howto;
- enum netfs_folio_trace trace;
+ struct folio *folio = NULL, *writethrough = NULL;
unsigned int bdp_flags = (iocb->ki_flags & IOCB_NOWAIT) ? BDP_ASYNC : 0;
ssize_t written = 0, ret, ret2;
- loff_t i_size, pos = iocb->ki_pos, from, to;
+ loff_t i_size, pos = iocb->ki_pos;
size_t max_chunk = mapping_max_folio_size(mapping);
bool maybe_trouble = false;
@@ -213,15 +141,14 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
}
do {
+ struct netfs_folio *finfo;
+ struct netfs_group *group;
+ unsigned long long fpos;
size_t flen;
size_t offset; /* Offset into pagecache folio */
size_t part; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
- ret = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags);
- if (unlikely(ret < 0))
- break;
-
offset = pos & (max_chunk - 1);
part = min(max_chunk - offset, iov_iter_count(iter));
@@ -247,7 +174,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
}
flen = folio_size(folio);
- offset = pos & (flen - 1);
+ fpos = folio_pos(folio);
+ offset = pos - fpos;
part = min_t(size_t, flen - offset, part);
/* Wait for writeback to complete. The writeback engine owns
@@ -265,71 +193,52 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
goto error_folio_unlock;
}
- /* See if we need to prefetch the area we're going to modify.
- * We need to do this before we get a lock on the folio in case
- * there's more than one writer competing for the same cache
- * block.
+ /* Decide how we should modify a folio. We might be attempting
+ * to do write-streaming, in which case we don't want to a
+ * local RMW cycle if we can avoid it. If we're doing local
+ * caching or content crypto, we award that priority over
+ * avoiding RMW. If the file is open readably, then we also
+ * assume that we may want to read what we wrote.
*/
- howto = netfs_how_to_modify(ctx, file, folio, netfs_group,
- flen, offset, part, maybe_trouble);
- _debug("howto %u", howto);
- switch (howto) {
- case NETFS_JUST_PREFETCH:
- ret = netfs_prefetch_for_write(file, folio, offset, part);
- if (ret < 0) {
- _debug("prefetch = %zd", ret);
- goto error_folio_unlock;
- }
- break;
- case NETFS_FOLIO_IS_UPTODATE:
- case NETFS_WHOLE_FOLIO_MODIFY:
- case NETFS_STREAMING_WRITE_CONT:
- break;
- case NETFS_MODIFY_AND_CLEAR:
- zero_user_segment(&folio->page, 0, offset);
- break;
- case NETFS_STREAMING_WRITE:
- ret = -EIO;
- if (WARN_ON(folio_get_private(folio)))
- goto error_folio_unlock;
- break;
- case NETFS_FLUSH_CONTENT:
- trace_netfs_folio(folio, netfs_flush_content);
- from = folio_pos(folio);
- to = from + folio_size(folio) - 1;
- folio_unlock(folio);
- folio_put(folio);
- ret = filemap_write_and_wait_range(mapping, from, to);
- if (ret < 0)
- goto error_folio_unlock;
- continue;
- }
-
- if (mapping_writably_mapped(mapping))
- flush_dcache_folio(folio);
-
- copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
-
- flush_dcache_folio(folio);
-
- /* Deal with a (partially) failed copy */
- if (copied == 0) {
- ret = -EFAULT;
- goto error_folio_unlock;
+ finfo = netfs_folio_info(folio);
+ group = netfs_folio_group(folio);
+
+ if (unlikely(group != netfs_group) &&
+ group != NETFS_FOLIO_COPY_TO_CACHE)
+ goto flush_content;
+
+ if (folio_test_uptodate(folio)) {
+ if (mapping_writably_mapped(mapping))
+ flush_dcache_folio(folio);
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
+ netfs_set_group(folio, netfs_group);
+ trace_netfs_folio(folio, netfs_folio_is_uptodate);
+ goto copied;
}
- trace = (enum netfs_folio_trace)howto;
- switch (howto) {
- case NETFS_FOLIO_IS_UPTODATE:
- case NETFS_JUST_PREFETCH:
- netfs_set_group(folio, netfs_group);
- break;
- case NETFS_MODIFY_AND_CLEAR:
+ /* If the page is above the zero-point then we assume that the
+ * server would just return a block of zeros or a short read if
+ * we try to read it.
+ */
+ if (fpos >= ctx->zero_point) {
+ zero_user_segment(&folio->page, 0, offset);
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
zero_user_segment(&folio->page, offset + copied, flen);
- netfs_set_group(folio, netfs_group);
+ __netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- break;
- case NETFS_WHOLE_FOLIO_MODIFY:
+ trace_netfs_folio(folio, netfs_modify_and_clear);
+ goto copied;
+ }
+
+ /* See if we can write a whole folio in one go. */
+ if (!maybe_trouble && offset == 0 && part >= flen) {
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
if (unlikely(copied < part)) {
maybe_trouble = true;
iov_iter_revert(iter, copied);
@@ -337,16 +246,53 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
folio_unlock(folio);
goto retry;
}
- netfs_set_group(folio, netfs_group);
+ __netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- break;
- case NETFS_STREAMING_WRITE:
+ trace_netfs_folio(folio, netfs_whole_folio_modify);
+ goto copied;
+ }
+
+ /* We don't want to do a streaming write on a file that loses
+ * caching service temporarily because the backing store got
+ * culled and we don't really want to get a streaming write on
+ * a file that's open for reading as ->read_folio() then has to
+ * be able to flush it.
+ */
+ if ((file->f_mode & FMODE_READ) ||
+ netfs_is_cache_enabled(ctx)) {
+ if (finfo) {
+ netfs_stat(&netfs_n_wh_wstream_conflict);
+ goto flush_content;
+ }
+ ret = netfs_prefetch_for_write(file, folio, offset, part);
+ if (ret < 0) {
+ _debug("prefetch = %zd", ret);
+ goto error_folio_unlock;
+ }
+ /* Note that copy-to-cache may have been set. */
+
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
+ netfs_set_group(folio, netfs_group);
+ trace_netfs_folio(folio, netfs_just_prefetch);
+ goto copied;
+ }
+
+ if (!finfo) {
+ ret = -EIO;
+ if (WARN_ON(folio_get_private(folio)))
+ goto error_folio_unlock;
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
if (offset == 0 && copied == flen) {
- netfs_set_group(folio, netfs_group);
+ __netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- trace = netfs_streaming_filled_page;
- break;
+ trace_netfs_folio(folio, netfs_streaming_filled_page);
+ goto copied;
}
+
finfo = kzalloc(sizeof(*finfo), GFP_KERNEL);
if (!finfo) {
iov_iter_revert(iter, copied);
@@ -358,9 +304,18 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
finfo->dirty_len = copied;
folio_attach_private(folio, (void *)((unsigned long)finfo |
NETFS_FOLIO_INFO));
- break;
- case NETFS_STREAMING_WRITE_CONT:
- finfo = netfs_folio_info(folio);
+ trace_netfs_folio(folio, netfs_streaming_write);
+ goto copied;
+ }
+
+ /* We can continue a streaming write only if it continues on
+ * from the previous. If it overlaps, we must flush lest we
+ * suffer a partial copy and disjoint dirty regions.
+ */
+ if (offset == finfo->dirty_offset + finfo->dirty_len) {
+ copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
+ if (unlikely(copied == 0))
+ goto copy_failed;
finfo->dirty_len += copied;
if (finfo->dirty_offset == 0 && finfo->dirty_len == flen) {
if (finfo->netfs_group)
@@ -369,17 +324,25 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
folio_detach_private(folio);
folio_mark_uptodate(folio);
kfree(finfo);
- trace = netfs_streaming_cont_filled_page;
+ trace_netfs_folio(folio, netfs_streaming_cont_filled_page);
+ } else {
+ trace_netfs_folio(folio, netfs_streaming_write_cont);
}
- break;
- default:
- WARN(true, "Unexpected modify type %u ix=%lx\n",
- howto, folio->index);
- ret = -EIO;
- goto error_folio_unlock;
+ goto copied;
}
- trace_netfs_folio(folio, trace);
+ /* Incompatible write; flush the folio and try again. */
+ flush_content:
+ trace_netfs_folio(folio, netfs_flush_content);
+ folio_unlock(folio);
+ folio_put(folio);
+ ret = filemap_write_and_wait_range(mapping, fpos, fpos + flen - 1);
+ if (ret < 0)
+ goto error_folio_unlock;
+ continue;
+
+ copied:
+ flush_dcache_folio(folio);
/* Update the inode size if we moved the EOF marker */
pos += copied;
@@ -401,12 +364,22 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
folio_put(folio);
folio = NULL;
+ ret = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags);
+ if (unlikely(ret < 0))
+ break;
+
cond_resched();
} while (iov_iter_count(iter));
out:
- if (likely(written) && ctx->ops->post_modify)
- ctx->ops->post_modify(inode);
+ if (likely(written)) {
+ /* Set indication that ctime and mtime got updated in case
+ * close is deferred.
+ */
+ set_bit(NETFS_ICTX_MODIFIED_ATTR, &ctx->flags);
+ if (unlikely(ctx->ops->post_modify))
+ ctx->ops->post_modify(inode);
+ }
if (unlikely(wreq)) {
ret2 = netfs_end_writethrough(wreq, &wbc, writethrough);
@@ -421,6 +394,8 @@ out:
_leave(" = %zd [%zd]", written, ret);
return written ? written : ret;
+copy_failed:
+ ret = -EFAULT;
error_folio_unlock:
folio_unlock(folio);
folio_put(folio);
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index 10a1e4da6bda..b1a66a6e6bc2 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -16,6 +16,143 @@
#include <linux/netfs.h>
#include "internal.h"
+static void netfs_prepare_dio_read_iterator(struct netfs_io_subrequest *subreq)
+{
+ struct netfs_io_request *rreq = subreq->rreq;
+ size_t rsize;
+
+ rsize = umin(subreq->len, rreq->io_streams[0].sreq_max_len);
+ subreq->len = rsize;
+
+ if (unlikely(rreq->io_streams[0].sreq_max_segs)) {
+ size_t limit = netfs_limit_iter(&rreq->iter, 0, rsize,
+ rreq->io_streams[0].sreq_max_segs);
+
+ if (limit < rsize) {
+ subreq->len = limit;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_limited);
+ }
+ }
+
+ trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
+
+ subreq->io_iter = rreq->iter;
+ iov_iter_truncate(&subreq->io_iter, subreq->len);
+ iov_iter_advance(&rreq->iter, subreq->len);
+}
+
+/*
+ * Perform a read to a buffer from the server, slicing up the region to be read
+ * according to the network rsize.
+ */
+static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
+{
+ unsigned long long start = rreq->start;
+ ssize_t size = rreq->len;
+ int ret = 0;
+
+ atomic_set(&rreq->nr_outstanding, 1);
+
+ do {
+ struct netfs_io_subrequest *subreq;
+ ssize_t slice;
+
+ subreq = netfs_alloc_subrequest(rreq);
+ if (!subreq) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
+ subreq->start = start;
+ subreq->len = size;
+
+ atomic_inc(&rreq->nr_outstanding);
+ spin_lock_bh(&rreq->lock);
+ list_add_tail(&subreq->rreq_link, &rreq->subrequests);
+ subreq->prev_donated = rreq->prev_donated;
+ rreq->prev_donated = 0;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_added);
+ spin_unlock_bh(&rreq->lock);
+
+ netfs_stat(&netfs_n_rh_download);
+ if (rreq->netfs_ops->prepare_read) {
+ ret = rreq->netfs_ops->prepare_read(subreq);
+ if (ret < 0) {
+ atomic_dec(&rreq->nr_outstanding);
+ netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_cancel);
+ break;
+ }
+ }
+
+ netfs_prepare_dio_read_iterator(subreq);
+ slice = subreq->len;
+ rreq->netfs_ops->issue_read(subreq);
+
+ size -= slice;
+ start += slice;
+ rreq->submitted += slice;
+
+ if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
+ test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
+ break;
+ cond_resched();
+ } while (size > 0);
+
+ if (atomic_dec_and_test(&rreq->nr_outstanding))
+ netfs_rreq_terminated(rreq, false);
+ return ret;
+}
+
+/*
+ * Perform a read to an application buffer, bypassing the pagecache and the
+ * local disk cache.
+ */
+static int netfs_unbuffered_read(struct netfs_io_request *rreq, bool sync)
+{
+ int ret;
+
+ _enter("R=%x %llx-%llx",
+ rreq->debug_id, rreq->start, rreq->start + rreq->len - 1);
+
+ if (rreq->len == 0) {
+ pr_err("Zero-sized read [R=%x]\n", rreq->debug_id);
+ return -EIO;
+ }
+
+ // TODO: Use bounce buffer if requested
+
+ inode_dio_begin(rreq->inode);
+
+ ret = netfs_dispatch_unbuffered_reads(rreq);
+
+ if (!rreq->submitted) {
+ netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit);
+ inode_dio_end(rreq->inode);
+ ret = 0;
+ goto out;
+ }
+
+ if (sync) {
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip);
+ wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS,
+ TASK_UNINTERRUPTIBLE);
+
+ ret = rreq->error;
+ if (ret == 0 && rreq->submitted < rreq->len &&
+ rreq->origin != NETFS_DIO_READ) {
+ trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read);
+ ret = -EIO;
+ }
+ } else {
+ ret = -EIOCBQUEUED;
+ }
+
+out:
+ _leave(" = %d", ret);
+ return ret;
+}
+
/**
* netfs_unbuffered_read_iter_locked - Perform an unbuffered or direct I/O read
* @iocb: The I/O control descriptor describing the read
@@ -31,7 +168,7 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i
struct netfs_io_request *rreq;
ssize_t ret;
size_t orig_count = iov_iter_count(iter);
- bool async = !is_sync_kiocb(iocb);
+ bool sync = is_sync_kiocb(iocb);
_enter("");
@@ -78,13 +215,13 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i
// TODO: Set up bounce buffer if needed
- if (async)
+ if (!sync)
rreq->iocb = iocb;
- ret = netfs_begin_read(rreq, is_sync_kiocb(iocb));
+ ret = netfs_unbuffered_read(rreq, sync);
if (ret < 0)
goto out; /* May be -EIOCBQUEUED */
- if (!async) {
+ if (sync) {
// TODO: Copy from bounce buffer
iocb->ki_pos += rreq->transferred;
ret = rreq->transferred;
@@ -94,8 +231,6 @@ out:
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
if (ret > 0)
orig_count -= ret;
- if (ret != -EIOCBQUEUED)
- iov_iter_revert(iter, orig_count - iov_iter_count(iter));
return ret;
}
EXPORT_SYMBOL(netfs_unbuffered_read_iter_locked);
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 7773f3d855a9..c9f0ed24cb7b 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/seq_file.h>
+#include <linux/folio_queue.h>
#include <linux/netfs.h>
#include <linux/fscache.h>
#include <linux/fscache-cache.h>
@@ -22,16 +23,10 @@
/*
* buffered_read.c
*/
-void netfs_rreq_unlock_folios(struct netfs_io_request *rreq);
int netfs_prefetch_for_write(struct file *file, struct folio *folio,
size_t offset, size_t len);
/*
- * io.c
- */
-int netfs_begin_read(struct netfs_io_request *rreq, bool sync);
-
-/*
* main.c
*/
extern unsigned int netfs_debug;
@@ -63,6 +58,11 @@ static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {}
/*
* misc.c
*/
+int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio,
+ bool needs_put);
+struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq);
+void netfs_clear_buffer(struct netfs_io_request *rreq);
+void netfs_reset_iter(struct netfs_io_subrequest *subreq);
/*
* objects.c
@@ -84,6 +84,28 @@ static inline void netfs_see_request(struct netfs_io_request *rreq,
}
/*
+ * read_collect.c
+ */
+void netfs_read_termination_worker(struct work_struct *work);
+void netfs_rreq_terminated(struct netfs_io_request *rreq, bool was_async);
+
+/*
+ * read_pgpriv2.c
+ */
+void netfs_pgpriv2_mark_copy_to_cache(struct netfs_io_subrequest *subreq,
+ struct netfs_io_request *rreq,
+ struct folio_queue *folioq,
+ int slot);
+void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq);
+bool netfs_pgpriv2_unlock_copied_folios(struct netfs_io_request *wreq);
+
+/*
+ * read_retry.c
+ */
+void netfs_retry_reads(struct netfs_io_request *rreq);
+void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq);
+
+/*
* stats.c
*/
#ifdef CONFIG_NETFS_STATS
@@ -110,6 +132,7 @@ extern atomic_t netfs_n_wh_buffered_write;
extern atomic_t netfs_n_wh_writethrough;
extern atomic_t netfs_n_wh_dio_write;
extern atomic_t netfs_n_wh_writepages;
+extern atomic_t netfs_n_wh_copy_to_cache;
extern atomic_t netfs_n_wh_wstream_conflict;
extern atomic_t netfs_n_wh_upload;
extern atomic_t netfs_n_wh_upload_done;
@@ -117,6 +140,9 @@ extern atomic_t netfs_n_wh_upload_failed;
extern atomic_t netfs_n_wh_write;
extern atomic_t netfs_n_wh_write_done;
extern atomic_t netfs_n_wh_write_failed;
+extern atomic_t netfs_n_wb_lock_skip;
+extern atomic_t netfs_n_wb_lock_wait;
+extern atomic_t netfs_n_folioq;
int netfs_stats_show(struct seq_file *m, void *v);
@@ -150,7 +176,10 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
loff_t start,
enum netfs_io_origin origin);
void netfs_reissue_write(struct netfs_io_stream *stream,
- struct netfs_io_subrequest *subreq);
+ struct netfs_io_subrequest *subreq,
+ struct iov_iter *source);
+void netfs_issue_write(struct netfs_io_request *wreq,
+ struct netfs_io_stream *stream);
int netfs_advance_write(struct netfs_io_request *wreq,
struct netfs_io_stream *stream,
loff_t start, size_t len, bool to_eof);
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
deleted file mode 100644
index d6ada4eba744..000000000000
--- a/fs/netfs/io.c
+++ /dev/null
@@ -1,804 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* Network filesystem high-level read support.
- *
- * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- */
-
-#include <linux/module.h>
-#include <linux/export.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <linux/uio.h>
-#include <linux/sched/mm.h>
-#include <linux/task_io_accounting_ops.h>
-#include "internal.h"
-
-/*
- * Clear the unread part of an I/O request.
- */
-static void netfs_clear_unread(struct netfs_io_subrequest *subreq)
-{
- iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter);
-}
-
-static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error,
- bool was_async)
-{
- struct netfs_io_subrequest *subreq = priv;
-
- netfs_subreq_terminated(subreq, transferred_or_error, was_async);
-}
-
-/*
- * Issue a read against the cache.
- * - Eats the caller's ref on subreq.
- */
-static void netfs_read_from_cache(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq,
- enum netfs_read_from_hole read_hole)
-{
- struct netfs_cache_resources *cres = &rreq->cache_resources;
-
- netfs_stat(&netfs_n_rh_read);
- cres->ops->read(cres, subreq->start, &subreq->io_iter, read_hole,
- netfs_cache_read_terminated, subreq);
-}
-
-/*
- * Fill a subrequest region with zeroes.
- */
-static void netfs_fill_with_zeroes(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq)
-{
- netfs_stat(&netfs_n_rh_zero);
- __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
- netfs_subreq_terminated(subreq, 0, false);
-}
-
-/*
- * Ask the netfs to issue a read request to the server for us.
- *
- * The netfs is expected to read from subreq->pos + subreq->transferred to
- * subreq->pos + subreq->len - 1. It may not backtrack and write data into the
- * buffer prior to the transferred point as it might clobber dirty data
- * obtained from the cache.
- *
- * Alternatively, the netfs is allowed to indicate one of two things:
- *
- * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and
- * make progress.
- *
- * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be
- * cleared.
- */
-static void netfs_read_from_server(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq)
-{
- netfs_stat(&netfs_n_rh_download);
-
- if (rreq->origin != NETFS_DIO_READ &&
- iov_iter_count(&subreq->io_iter) != subreq->len - subreq->transferred)
- pr_warn("R=%08x[%u] ITER PRE-MISMATCH %zx != %zx-%zx %lx\n",
- rreq->debug_id, subreq->debug_index,
- iov_iter_count(&subreq->io_iter), subreq->len,
- subreq->transferred, subreq->flags);
- rreq->netfs_ops->issue_read(subreq);
-}
-
-/*
- * Release those waiting.
- */
-static void netfs_rreq_completed(struct netfs_io_request *rreq, bool was_async)
-{
- trace_netfs_rreq(rreq, netfs_rreq_trace_done);
- netfs_clear_subrequests(rreq, was_async);
- netfs_put_request(rreq, was_async, netfs_rreq_trace_put_complete);
-}
-
-/*
- * [DEPRECATED] Deal with the completion of writing the data to the cache. We
- * have to clear the PG_fscache bits on the folios involved and release the
- * caller's ref.
- *
- * May be called in softirq mode and we inherit a ref from the caller.
- */
-static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq,
- bool was_async)
-{
- struct netfs_io_subrequest *subreq;
- struct folio *folio;
- pgoff_t unlocked = 0;
- bool have_unlocked = false;
-
- rcu_read_lock();
-
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE);
-
- xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) {
- if (xas_retry(&xas, folio))
- continue;
-
- /* We might have multiple writes from the same huge
- * folio, but we mustn't unlock a folio more than once.
- */
- if (have_unlocked && folio->index <= unlocked)
- continue;
- unlocked = folio_next_index(folio) - 1;
- trace_netfs_folio(folio, netfs_folio_trace_end_copy);
- folio_end_private_2(folio);
- have_unlocked = true;
- }
- }
-
- rcu_read_unlock();
- netfs_rreq_completed(rreq, was_async);
-}
-
-static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error,
- bool was_async) /* [DEPRECATED] */
-{
- struct netfs_io_subrequest *subreq = priv;
- struct netfs_io_request *rreq = subreq->rreq;
-
- if (IS_ERR_VALUE(transferred_or_error)) {
- netfs_stat(&netfs_n_rh_write_failed);
- trace_netfs_failure(rreq, subreq, transferred_or_error,
- netfs_fail_copy_to_cache);
- } else {
- netfs_stat(&netfs_n_rh_write_done);
- }
-
- trace_netfs_sreq(subreq, netfs_sreq_trace_write_term);
-
- /* If we decrement nr_copy_ops to 0, the ref belongs to us. */
- if (atomic_dec_and_test(&rreq->nr_copy_ops))
- netfs_rreq_unmark_after_write(rreq, was_async);
-
- netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
-}
-
-/*
- * [DEPRECATED] Perform any outstanding writes to the cache. We inherit a ref
- * from the caller.
- */
-static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq)
-{
- struct netfs_cache_resources *cres = &rreq->cache_resources;
- struct netfs_io_subrequest *subreq, *next, *p;
- struct iov_iter iter;
- int ret;
-
- trace_netfs_rreq(rreq, netfs_rreq_trace_copy);
-
- /* We don't want terminating writes trying to wake us up whilst we're
- * still going through the list.
- */
- atomic_inc(&rreq->nr_copy_ops);
-
- list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) {
- if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) {
- list_del_init(&subreq->rreq_link);
- netfs_put_subrequest(subreq, false,
- netfs_sreq_trace_put_no_copy);
- }
- }
-
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- /* Amalgamate adjacent writes */
- while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
- next = list_next_entry(subreq, rreq_link);
- if (next->start != subreq->start + subreq->len)
- break;
- subreq->len += next->len;
- list_del_init(&next->rreq_link);
- netfs_put_subrequest(next, false,
- netfs_sreq_trace_put_merged);
- }
-
- ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len,
- subreq->len, rreq->i_size, true);
- if (ret < 0) {
- trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write);
- trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
- continue;
- }
-
- iov_iter_xarray(&iter, ITER_SOURCE, &rreq->mapping->i_pages,
- subreq->start, subreq->len);
-
- atomic_inc(&rreq->nr_copy_ops);
- netfs_stat(&netfs_n_rh_write);
- netfs_get_subrequest(subreq, netfs_sreq_trace_get_copy_to_cache);
- trace_netfs_sreq(subreq, netfs_sreq_trace_write);
- cres->ops->write(cres, subreq->start, &iter,
- netfs_rreq_copy_terminated, subreq);
- }
-
- /* If we decrement nr_copy_ops to 0, the usage ref belongs to us. */
- if (atomic_dec_and_test(&rreq->nr_copy_ops))
- netfs_rreq_unmark_after_write(rreq, false);
-}
-
-static void netfs_rreq_write_to_cache_work(struct work_struct *work) /* [DEPRECATED] */
-{
- struct netfs_io_request *rreq =
- container_of(work, struct netfs_io_request, work);
-
- netfs_rreq_do_write_to_cache(rreq);
-}
-
-static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) /* [DEPRECATED] */
-{
- rreq->work.func = netfs_rreq_write_to_cache_work;
- if (!queue_work(system_unbound_wq, &rreq->work))
- BUG();
-}
-
-/*
- * Handle a short read.
- */
-static void netfs_rreq_short_read(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq)
-{
- __clear_bit(NETFS_SREQ_SHORT_IO, &subreq->flags);
- __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags);
-
- netfs_stat(&netfs_n_rh_short_read);
- trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);
-
- netfs_get_subrequest(subreq, netfs_sreq_trace_get_short_read);
- atomic_inc(&rreq->nr_outstanding);
- if (subreq->source == NETFS_READ_FROM_CACHE)
- netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_CLEAR);
- else
- netfs_read_from_server(rreq, subreq);
-}
-
-/*
- * Reset the subrequest iterator prior to resubmission.
- */
-static void netfs_reset_subreq_iter(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq)
-{
- size_t remaining = subreq->len - subreq->transferred;
- size_t count = iov_iter_count(&subreq->io_iter);
-
- if (count == remaining)
- return;
-
- _debug("R=%08x[%u] ITER RESUB-MISMATCH %zx != %zx-%zx-%llx %x",
- rreq->debug_id, subreq->debug_index,
- iov_iter_count(&subreq->io_iter), subreq->transferred,
- subreq->len, rreq->i_size,
- subreq->io_iter.iter_type);
-
- if (count < remaining)
- iov_iter_revert(&subreq->io_iter, remaining - count);
- else
- iov_iter_advance(&subreq->io_iter, count - remaining);
-}
-
-/*
- * Resubmit any short or failed operations. Returns true if we got the rreq
- * ref back.
- */
-static bool netfs_rreq_perform_resubmissions(struct netfs_io_request *rreq)
-{
- struct netfs_io_subrequest *subreq;
-
- WARN_ON(in_interrupt());
-
- trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
-
- /* We don't want terminating submissions trying to wake us up whilst
- * we're still going through the list.
- */
- atomic_inc(&rreq->nr_outstanding);
-
- __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- if (subreq->error) {
- if (subreq->source != NETFS_READ_FROM_CACHE)
- break;
- subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
- subreq->error = 0;
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
- netfs_stat(&netfs_n_rh_download_instead);
- trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead);
- netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
- atomic_inc(&rreq->nr_outstanding);
- netfs_reset_subreq_iter(rreq, subreq);
- netfs_read_from_server(rreq, subreq);
- } else if (test_bit(NETFS_SREQ_SHORT_IO, &subreq->flags)) {
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
- netfs_reset_subreq_iter(rreq, subreq);
- netfs_rreq_short_read(rreq, subreq);
- }
- }
-
- /* If we decrement nr_outstanding to 0, the usage ref belongs to us. */
- if (atomic_dec_and_test(&rreq->nr_outstanding))
- return true;
-
- wake_up_var(&rreq->nr_outstanding);
- return false;
-}
-
-/*
- * Check to see if the data read is still valid.
- */
-static void netfs_rreq_is_still_valid(struct netfs_io_request *rreq)
-{
- struct netfs_io_subrequest *subreq;
-
- if (!rreq->netfs_ops->is_still_valid ||
- rreq->netfs_ops->is_still_valid(rreq))
- return;
-
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- if (subreq->source == NETFS_READ_FROM_CACHE) {
- subreq->error = -ESTALE;
- __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
- }
- }
-}
-
-/*
- * Determine how much we can admit to having read from a DIO read.
- */
-static void netfs_rreq_assess_dio(struct netfs_io_request *rreq)
-{
- struct netfs_io_subrequest *subreq;
- unsigned int i;
- size_t transferred = 0;
-
- for (i = 0; i < rreq->direct_bv_count; i++) {
- flush_dcache_page(rreq->direct_bv[i].bv_page);
- // TODO: cifs marks pages in the destination buffer
- // dirty under some circumstances after a read. Do we
- // need to do that too?
- set_page_dirty(rreq->direct_bv[i].bv_page);
- }
-
- list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
- if (subreq->error || subreq->transferred == 0)
- break;
- transferred += subreq->transferred;
- if (subreq->transferred < subreq->len ||
- test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags))
- break;
- }
-
- for (i = 0; i < rreq->direct_bv_count; i++)
- flush_dcache_page(rreq->direct_bv[i].bv_page);
-
- rreq->transferred = transferred;
- task_io_account_read(transferred);
-
- if (rreq->iocb) {
- rreq->iocb->ki_pos += transferred;
- if (rreq->iocb->ki_complete)
- rreq->iocb->ki_complete(
- rreq->iocb, rreq->error ? rreq->error : transferred);
- }
- if (rreq->netfs_ops->done)
- rreq->netfs_ops->done(rreq);
- inode_dio_end(rreq->inode);
-}
-
-/*
- * Assess the state of a read request and decide what to do next.
- *
- * Note that we could be in an ordinary kernel thread, on a workqueue or in
- * softirq context at this point. We inherit a ref from the caller.
- */
-static void netfs_rreq_assess(struct netfs_io_request *rreq, bool was_async)
-{
- trace_netfs_rreq(rreq, netfs_rreq_trace_assess);
-
-again:
- netfs_rreq_is_still_valid(rreq);
-
- if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) &&
- test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) {
- if (netfs_rreq_perform_resubmissions(rreq))
- goto again;
- return;
- }
-
- if (rreq->origin != NETFS_DIO_READ)
- netfs_rreq_unlock_folios(rreq);
- else
- netfs_rreq_assess_dio(rreq);
-
- trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
- clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
- wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
-
- if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags) &&
- test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags))
- return netfs_rreq_write_to_cache(rreq);
-
- netfs_rreq_completed(rreq, was_async);
-}
-
-static void netfs_rreq_work(struct work_struct *work)
-{
- struct netfs_io_request *rreq =
- container_of(work, struct netfs_io_request, work);
- netfs_rreq_assess(rreq, false);
-}
-
-/*
- * Handle the completion of all outstanding I/O operations on a read request.
- * We inherit a ref from the caller.
- */
-static void netfs_rreq_terminated(struct netfs_io_request *rreq,
- bool was_async)
-{
- if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) &&
- was_async) {
- if (!queue_work(system_unbound_wq, &rreq->work))
- BUG();
- } else {
- netfs_rreq_assess(rreq, was_async);
- }
-}
-
-/**
- * netfs_subreq_terminated - Note the termination of an I/O operation.
- * @subreq: The I/O request that has terminated.
- * @transferred_or_error: The amount of data transferred or an error code.
- * @was_async: The termination was asynchronous
- *
- * This tells the read helper that a contributory I/O operation has terminated,
- * one way or another, and that it should integrate the results.
- *
- * The caller indicates in @transferred_or_error the outcome of the operation,
- * supplying a positive value to indicate the number of bytes transferred, 0 to
- * indicate a failure to transfer anything that should be retried or a negative
- * error code. The helper will look after reissuing I/O operations as
- * appropriate and writing downloaded data to the cache.
- *
- * If @was_async is true, the caller might be running in softirq or interrupt
- * context and we can't sleep.
- */
-void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
- ssize_t transferred_or_error,
- bool was_async)
-{
- struct netfs_io_request *rreq = subreq->rreq;
- int u;
-
- _enter("R=%x[%x]{%llx,%lx},%zd",
- rreq->debug_id, subreq->debug_index,
- subreq->start, subreq->flags, transferred_or_error);
-
- switch (subreq->source) {
- case NETFS_READ_FROM_CACHE:
- netfs_stat(&netfs_n_rh_read_done);
- break;
- case NETFS_DOWNLOAD_FROM_SERVER:
- netfs_stat(&netfs_n_rh_download_done);
- break;
- default:
- break;
- }
-
- if (IS_ERR_VALUE(transferred_or_error)) {
- subreq->error = transferred_or_error;
- trace_netfs_failure(rreq, subreq, transferred_or_error,
- netfs_fail_read);
- goto failed;
- }
-
- if (WARN(transferred_or_error > subreq->len - subreq->transferred,
- "Subreq overread: R%x[%x] %zd > %zu - %zu",
- rreq->debug_id, subreq->debug_index,
- transferred_or_error, subreq->len, subreq->transferred))
- transferred_or_error = subreq->len - subreq->transferred;
-
- subreq->error = 0;
- subreq->transferred += transferred_or_error;
- if (subreq->transferred < subreq->len &&
- !test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags))
- goto incomplete;
-
-complete:
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
- if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags))
- set_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags);
-
-out:
- trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
-
- /* If we decrement nr_outstanding to 0, the ref belongs to us. */
- u = atomic_dec_return(&rreq->nr_outstanding);
- if (u == 0)
- netfs_rreq_terminated(rreq, was_async);
- else if (u == 1)
- wake_up_var(&rreq->nr_outstanding);
-
- netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
- return;
-
-incomplete:
- if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) {
- netfs_clear_unread(subreq);
- subreq->transferred = subreq->len;
- goto complete;
- }
-
- if (transferred_or_error == 0) {
- if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
- if (rreq->origin != NETFS_DIO_READ)
- subreq->error = -ENODATA;
- goto failed;
- }
- } else {
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
- }
-
- __set_bit(NETFS_SREQ_SHORT_IO, &subreq->flags);
- set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
- goto out;
-
-failed:
- if (subreq->source == NETFS_READ_FROM_CACHE) {
- netfs_stat(&netfs_n_rh_read_failed);
- set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
- } else {
- netfs_stat(&netfs_n_rh_download_failed);
- set_bit(NETFS_RREQ_FAILED, &rreq->flags);
- rreq->error = subreq->error;
- }
- goto out;
-}
-EXPORT_SYMBOL(netfs_subreq_terminated);
-
-static enum netfs_io_source netfs_cache_prepare_read(struct netfs_io_subrequest *subreq,
- loff_t i_size)
-{
- struct netfs_io_request *rreq = subreq->rreq;
- struct netfs_cache_resources *cres = &rreq->cache_resources;
-
- if (cres->ops)
- return cres->ops->prepare_read(subreq, i_size);
- if (subreq->start >= rreq->i_size)
- return NETFS_FILL_WITH_ZEROES;
- return NETFS_DOWNLOAD_FROM_SERVER;
-}
-
-/*
- * Work out what sort of subrequest the next one will be.
- */
-static enum netfs_io_source
-netfs_rreq_prepare_read(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq,
- struct iov_iter *io_iter)
-{
- enum netfs_io_source source = NETFS_DOWNLOAD_FROM_SERVER;
- struct netfs_inode *ictx = netfs_inode(rreq->inode);
- size_t lsize;
-
- _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size);
-
- if (rreq->origin != NETFS_DIO_READ) {
- source = netfs_cache_prepare_read(subreq, rreq->i_size);
- if (source == NETFS_INVALID_READ)
- goto out;
- }
-
- if (source == NETFS_DOWNLOAD_FROM_SERVER) {
- /* Call out to the netfs to let it shrink the request to fit
- * its own I/O sizes and boundaries. If it shinks it here, it
- * will be called again to make simultaneous calls; if it wants
- * to make serial calls, it can indicate a short read and then
- * we will call it again.
- */
- if (rreq->origin != NETFS_DIO_READ) {
- if (subreq->start >= ictx->zero_point) {
- source = NETFS_FILL_WITH_ZEROES;
- goto set;
- }
- if (subreq->len > ictx->zero_point - subreq->start)
- subreq->len = ictx->zero_point - subreq->start;
-
- /* We limit buffered reads to the EOF, but let the
- * server deal with larger-than-EOF DIO/unbuffered
- * reads.
- */
- if (subreq->len > rreq->i_size - subreq->start)
- subreq->len = rreq->i_size - subreq->start;
- }
- if (rreq->rsize && subreq->len > rreq->rsize)
- subreq->len = rreq->rsize;
-
- if (rreq->netfs_ops->clamp_length &&
- !rreq->netfs_ops->clamp_length(subreq)) {
- source = NETFS_INVALID_READ;
- goto out;
- }
-
- if (subreq->max_nr_segs) {
- lsize = netfs_limit_iter(io_iter, 0, subreq->len,
- subreq->max_nr_segs);
- if (subreq->len > lsize) {
- subreq->len = lsize;
- trace_netfs_sreq(subreq, netfs_sreq_trace_limited);
- }
- }
- }
-
-set:
- if (subreq->len > rreq->len)
- pr_warn("R=%08x[%u] SREQ>RREQ %zx > %llx\n",
- rreq->debug_id, subreq->debug_index,
- subreq->len, rreq->len);
-
- if (WARN_ON(subreq->len == 0)) {
- source = NETFS_INVALID_READ;
- goto out;
- }
-
- subreq->source = source;
- trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
-
- subreq->io_iter = *io_iter;
- iov_iter_truncate(&subreq->io_iter, subreq->len);
- iov_iter_advance(io_iter, subreq->len);
-out:
- subreq->source = source;
- trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
- return source;
-}
-
-/*
- * Slice off a piece of a read request and submit an I/O request for it.
- */
-static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq,
- struct iov_iter *io_iter)
-{
- struct netfs_io_subrequest *subreq;
- enum netfs_io_source source;
-
- subreq = netfs_alloc_subrequest(rreq);
- if (!subreq)
- return false;
-
- subreq->start = rreq->start + rreq->submitted;
- subreq->len = io_iter->count;
-
- _debug("slice %llx,%zx,%llx", subreq->start, subreq->len, rreq->submitted);
- list_add_tail(&subreq->rreq_link, &rreq->subrequests);
-
- /* Call out to the cache to find out what it can do with the remaining
- * subset. It tells us in subreq->flags what it decided should be done
- * and adjusts subreq->len down if the subset crosses a cache boundary.
- *
- * Then when we hand the subset, it can choose to take a subset of that
- * (the starts must coincide), in which case, we go around the loop
- * again and ask it to download the next piece.
- */
- source = netfs_rreq_prepare_read(rreq, subreq, io_iter);
- if (source == NETFS_INVALID_READ)
- goto subreq_failed;
-
- atomic_inc(&rreq->nr_outstanding);
-
- rreq->submitted += subreq->len;
-
- trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
- switch (source) {
- case NETFS_FILL_WITH_ZEROES:
- netfs_fill_with_zeroes(rreq, subreq);
- break;
- case NETFS_DOWNLOAD_FROM_SERVER:
- netfs_read_from_server(rreq, subreq);
- break;
- case NETFS_READ_FROM_CACHE:
- netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_IGNORE);
- break;
- default:
- BUG();
- }
-
- return true;
-
-subreq_failed:
- rreq->error = subreq->error;
- netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_failed);
- return false;
-}
-
-/*
- * Begin the process of reading in a chunk of data, where that data may be
- * stitched together from multiple sources, including multiple servers and the
- * local cache.
- */
-int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
-{
- struct iov_iter io_iter;
- int ret;
-
- _enter("R=%x %llx-%llx",
- rreq->debug_id, rreq->start, rreq->start + rreq->len - 1);
-
- if (rreq->len == 0) {
- pr_err("Zero-sized read [R=%x]\n", rreq->debug_id);
- return -EIO;
- }
-
- if (rreq->origin == NETFS_DIO_READ)
- inode_dio_begin(rreq->inode);
-
- // TODO: Use bounce buffer if requested
- rreq->io_iter = rreq->iter;
-
- INIT_WORK(&rreq->work, netfs_rreq_work);
-
- /* Chop the read into slices according to what the cache and the netfs
- * want and submit each one.
- */
- netfs_get_request(rreq, netfs_rreq_trace_get_for_outstanding);
- atomic_set(&rreq->nr_outstanding, 1);
- io_iter = rreq->io_iter;
- do {
- _debug("submit %llx + %llx >= %llx",
- rreq->start, rreq->submitted, rreq->i_size);
- if (!netfs_rreq_submit_slice(rreq, &io_iter))
- break;
- if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
- break;
- if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
- test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
- break;
-
- } while (rreq->submitted < rreq->len);
-
- if (!rreq->submitted) {
- netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit);
- if (rreq->origin == NETFS_DIO_READ)
- inode_dio_end(rreq->inode);
- ret = 0;
- goto out;
- }
-
- if (sync) {
- /* Keep nr_outstanding incremented so that the ref always
- * belongs to us, and the service code isn't punted off to a
- * random thread pool to process. Note that this might start
- * further work, such as writing to the cache.
- */
- wait_var_event(&rreq->nr_outstanding,
- atomic_read(&rreq->nr_outstanding) == 1);
- if (atomic_dec_and_test(&rreq->nr_outstanding))
- netfs_rreq_assess(rreq, false);
-
- trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip);
- wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS,
- TASK_UNINTERRUPTIBLE);
-
- ret = rreq->error;
- if (ret == 0) {
- if (rreq->origin == NETFS_DIO_READ) {
- ret = rreq->transferred;
- } else if (rreq->submitted < rreq->len) {
- trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read);
- ret = -EIO;
- }
- }
- } else {
- /* If we decrement nr_outstanding to 0, the ref belongs to us. */
- if (atomic_dec_and_test(&rreq->nr_outstanding))
- netfs_rreq_assess(rreq, false);
- ret = -EIOCBQUEUED;
- }
-
-out:
- return ret;
-}
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index b781bbbf1d8d..72a435e5fc6d 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -188,9 +188,59 @@ static size_t netfs_limit_xarray(const struct iov_iter *iter, size_t start_offse
return min(span, max_size);
}
+/*
+ * Select the span of a folio queue iterator we're going to use. Limit it by
+ * both maximum size and maximum number of segments. Returns the size of the
+ * span in bytes.
+ */
+static size_t netfs_limit_folioq(const struct iov_iter *iter, size_t start_offset,
+ size_t max_size, size_t max_segs)
+{
+ const struct folio_queue *folioq = iter->folioq;
+ unsigned int nsegs = 0;
+ unsigned int slot = iter->folioq_slot;
+ size_t span = 0, n = iter->count;
+
+ if (WARN_ON(!iov_iter_is_folioq(iter)) ||
+ WARN_ON(start_offset > n) ||
+ n == 0)
+ return 0;
+ max_size = umin(max_size, n - start_offset);
+
+ if (slot >= folioq_nr_slots(folioq)) {
+ folioq = folioq->next;
+ slot = 0;
+ }
+
+ start_offset += iter->iov_offset;
+ do {
+ size_t flen = folioq_folio_size(folioq, slot);
+
+ if (start_offset < flen) {
+ span += flen - start_offset;
+ nsegs++;
+ start_offset = 0;
+ } else {
+ start_offset -= flen;
+ }
+ if (span >= max_size || nsegs >= max_segs)
+ break;
+
+ slot++;
+ if (slot >= folioq_nr_slots(folioq)) {
+ folioq = folioq->next;
+ slot = 0;
+ }
+ } while (folioq);
+
+ return umin(span, max_size);
+}
+
size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset,
size_t max_size, size_t max_segs)
{
+ if (iov_iter_is_folioq(iter))
+ return netfs_limit_folioq(iter, start_offset, max_size, max_segs);
if (iov_iter_is_bvec(iter))
return netfs_limit_bvec(iter, start_offset, max_size, max_segs);
if (iov_iter_is_xarray(iter))
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 9d6b49dc6694..6c7be1377ee0 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -36,13 +36,14 @@ DEFINE_SPINLOCK(netfs_proc_lock);
static const char *netfs_origins[nr__netfs_io_origin] = {
[NETFS_READAHEAD] = "RA",
[NETFS_READPAGE] = "RP",
+ [NETFS_READ_GAPS] = "RG",
[NETFS_READ_FOR_WRITE] = "RW",
- [NETFS_COPY_TO_CACHE] = "CC",
+ [NETFS_DIO_READ] = "DR",
[NETFS_WRITEBACK] = "WB",
[NETFS_WRITETHROUGH] = "WT",
[NETFS_UNBUFFERED_WRITE] = "UW",
- [NETFS_DIO_READ] = "DR",
[NETFS_DIO_WRITE] = "DW",
+ [NETFS_PGPRIV2_COPY_TO_CACHE] = "2C",
};
/*
@@ -62,7 +63,7 @@ static int netfs_requests_seq_show(struct seq_file *m, void *v)
rreq = list_entry(v, struct netfs_io_request, proc_link);
seq_printf(m,
- "%08x %s %3d %2lx %4d %3d @%04llx %llx/%llx",
+ "%08x %s %3d %2lx %4ld %3d @%04llx %llx/%llx",
rreq->debug_id,
netfs_origins[rreq->origin],
refcount_read(&rreq->ref),
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index c1f321cf5999..0ad0982ce0e2 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -8,6 +8,100 @@
#include <linux/swap.h>
#include "internal.h"
+/*
+ * Append a folio to the rolling queue.
+ */
+int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio,
+ bool needs_put)
+{
+ struct folio_queue *tail = rreq->buffer_tail;
+ unsigned int slot, order = folio_order(folio);
+
+ if (WARN_ON_ONCE(!rreq->buffer && tail) ||
+ WARN_ON_ONCE(rreq->buffer && !tail))
+ return -EIO;
+
+ if (!tail || folioq_full(tail)) {
+ tail = kmalloc(sizeof(*tail), GFP_NOFS);
+ if (!tail)
+ return -ENOMEM;
+ netfs_stat(&netfs_n_folioq);
+ folioq_init(tail);
+ tail->prev = rreq->buffer_tail;
+ if (tail->prev)
+ tail->prev->next = tail;
+ rreq->buffer_tail = tail;
+ if (!rreq->buffer) {
+ rreq->buffer = tail;
+ iov_iter_folio_queue(&rreq->io_iter, ITER_SOURCE, tail, 0, 0, 0);
+ }
+ rreq->buffer_tail_slot = 0;
+ }
+
+ rreq->io_iter.count += PAGE_SIZE << order;
+
+ slot = folioq_append(tail, folio);
+ /* Store the counter after setting the slot. */
+ smp_store_release(&rreq->buffer_tail_slot, slot);
+ return 0;
+}
+
+/*
+ * Delete the head of a rolling queue.
+ */
+struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq)
+{
+ struct folio_queue *head = wreq->buffer, *next = head->next;
+
+ if (next)
+ next->prev = NULL;
+ netfs_stat_d(&netfs_n_folioq);
+ kfree(head);
+ wreq->buffer = next;
+ return next;
+}
+
+/*
+ * Clear out a rolling queue.
+ */
+void netfs_clear_buffer(struct netfs_io_request *rreq)
+{
+ struct folio_queue *p;
+
+ while ((p = rreq->buffer)) {
+ rreq->buffer = p->next;
+ for (int slot = 0; slot < folioq_nr_slots(p); slot++) {
+ struct folio *folio = folioq_folio(p, slot);
+ if (!folio)
+ continue;
+ if (folioq_is_marked(p, slot)) {
+ trace_netfs_folio(folio, netfs_folio_trace_put);
+ folio_put(folio);
+ }
+ }
+ netfs_stat_d(&netfs_n_folioq);
+ kfree(p);
+ }
+}
+
+/*
+ * Reset the subrequest iterator to refer just to the region remaining to be
+ * read. The iterator may or may not have been advanced by socket ops or
+ * extraction ops to an extent that may or may not match the amount actually
+ * read.
+ */
+void netfs_reset_iter(struct netfs_io_subrequest *subreq)
+{
+ struct iov_iter *io_iter = &subreq->io_iter;
+ size_t remain = subreq->len - subreq->transferred;
+
+ if (io_iter->count > remain)
+ iov_iter_advance(io_iter, io_iter->count - remain);
+ else if (io_iter->count < remain)
+ iov_iter_revert(io_iter, remain - io_iter->count);
+ iov_iter_truncate(&subreq->io_iter, remain);
+}
+
/**
* netfs_dirty_folio - Mark folio dirty and pin a cache object for writeback
* @mapping: The mapping the folio belongs to.
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 0294df70c3ff..31e388ec6e48 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -36,7 +36,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
memset(rreq, 0, kmem_cache_size(cache));
rreq->start = start;
rreq->len = len;
- rreq->upper_len = len;
rreq->origin = origin;
rreq->netfs_ops = ctx->ops;
rreq->mapping = mapping;
@@ -44,13 +43,23 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
rreq->i_size = i_size_read(inode);
rreq->debug_id = atomic_inc_return(&debug_ids);
rreq->wsize = INT_MAX;
+ rreq->io_streams[0].sreq_max_len = ULONG_MAX;
+ rreq->io_streams[0].sreq_max_segs = 0;
spin_lock_init(&rreq->lock);
INIT_LIST_HEAD(&rreq->io_streams[0].subrequests);
INIT_LIST_HEAD(&rreq->io_streams[1].subrequests);
INIT_LIST_HEAD(&rreq->subrequests);
- INIT_WORK(&rreq->work, NULL);
refcount_set(&rreq->ref, 1);
+ if (origin == NETFS_READAHEAD ||
+ origin == NETFS_READPAGE ||
+ origin == NETFS_READ_GAPS ||
+ origin == NETFS_READ_FOR_WRITE ||
+ origin == NETFS_DIO_READ)
+ INIT_WORK(&rreq->work, netfs_read_termination_worker);
+ else
+ INIT_WORK(&rreq->work, netfs_write_collection_worker);
+
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
if (file && file->f_flags & O_NONBLOCK)
__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
@@ -134,6 +143,7 @@ static void netfs_free_request(struct work_struct *work)
}
kvfree(rreq->direct_bv);
}
+ netfs_clear_buffer(rreq);
if (atomic_dec_and_test(&ictx->io_count))
wake_up_var(&ictx->io_count);
@@ -155,7 +165,7 @@ void netfs_put_request(struct netfs_io_request *rreq, bool was_async,
if (was_async) {
rreq->work.func = netfs_free_request;
if (!queue_work(system_unbound_wq, &rreq->work))
- BUG();
+ WARN_ON(1);
} else {
netfs_free_request(&rreq->work);
}
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
new file mode 100644
index 000000000000..b18c65ba5580
--- /dev/null
+++ b/fs/netfs/read_collect.c
@@ -0,0 +1,544 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem read subrequest result collection, assessment and
+ * retrying.
+ *
+ * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/task_io_accounting_ops.h>
+#include "internal.h"
+
+/*
+ * Clear the unread part of an I/O request.
+ */
+static void netfs_clear_unread(struct netfs_io_subrequest *subreq)
+{
+ netfs_reset_iter(subreq);
+ WARN_ON_ONCE(subreq->len - subreq->transferred != iov_iter_count(&subreq->io_iter));
+ iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter);
+ if (subreq->start + subreq->transferred >= subreq->rreq->i_size)
+ __set_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
+}
+
+/*
+ * Flush, mark and unlock a folio that's now completely read. If we want to
+ * cache the folio, we set the group to NETFS_FOLIO_COPY_TO_CACHE, mark it
+ * dirty and let writeback handle it.
+ */
+static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq,
+ struct netfs_io_request *rreq,
+ struct folio_queue *folioq,
+ int slot)
+{
+ struct netfs_folio *finfo;
+ struct folio *folio = folioq_folio(folioq, slot);
+
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+
+ if (!test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
+ finfo = netfs_folio_info(folio);
+ if (finfo) {
+ trace_netfs_folio(folio, netfs_folio_trace_filled_gaps);
+ if (finfo->netfs_group)
+ folio_change_private(folio, finfo->netfs_group);
+ else
+ folio_detach_private(folio);
+ kfree(finfo);
+ }
+
+ if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) {
+ if (!WARN_ON_ONCE(folio_get_private(folio) != NULL)) {
+ trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+ folio_attach_private(folio, NETFS_FOLIO_COPY_TO_CACHE);
+ folio_mark_dirty(folio);
+ }
+ } else {
+ trace_netfs_folio(folio, netfs_folio_trace_read_done);
+ }
+ } else {
+ // TODO: Use of PG_private_2 is deprecated.
+ if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags))
+ netfs_pgpriv2_mark_copy_to_cache(subreq, rreq, folioq, slot);
+ }
+
+ if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
+ if (folio->index == rreq->no_unlock_folio &&
+ test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) {
+ _debug("no unlock");
+ } else {
+ trace_netfs_folio(folio, netfs_folio_trace_read_unlock);
+ folio_unlock(folio);
+ }
+ }
+}
+
+/*
+ * Unlock any folios that are now completely read. Returns true if the
+ * subrequest is removed from the list.
+ */
+static bool netfs_consume_read_data(struct netfs_io_subrequest *subreq, bool was_async)
+{
+ struct netfs_io_subrequest *prev, *next;
+ struct netfs_io_request *rreq = subreq->rreq;
+ struct folio_queue *folioq = subreq->curr_folioq;
+ size_t avail, prev_donated, next_donated, fsize, part, excess;
+ loff_t fpos, start;
+ loff_t fend;
+ int slot = subreq->curr_folioq_slot;
+
+ if (WARN(subreq->transferred > subreq->len,
+ "Subreq overread: R%x[%x] %zu > %zu",
+ rreq->debug_id, subreq->debug_index,
+ subreq->transferred, subreq->len))
+ subreq->transferred = subreq->len;
+
+next_folio:
+ fsize = PAGE_SIZE << subreq->curr_folio_order;
+ fpos = round_down(subreq->start + subreq->consumed, fsize);
+ fend = fpos + fsize;
+
+ if (WARN_ON_ONCE(!folioq) ||
+ WARN_ON_ONCE(!folioq_folio(folioq, slot)) ||
+ WARN_ON_ONCE(folioq_folio(folioq, slot)->index != fpos / PAGE_SIZE)) {
+ pr_err("R=%08x[%x] s=%llx-%llx ctl=%zx/%zx/%zx sl=%u\n",
+ rreq->debug_id, subreq->debug_index,
+ subreq->start, subreq->start + subreq->transferred - 1,
+ subreq->consumed, subreq->transferred, subreq->len,
+ slot);
+ if (folioq) {
+ struct folio *folio = folioq_folio(folioq, slot);
+
+ pr_err("folioq: orders=%02x%02x%02x%02x\n",
+ folioq->orders[0], folioq->orders[1],
+ folioq->orders[2], folioq->orders[3]);
+ if (folio)
+ pr_err("folio: %llx-%llx ix=%llx o=%u qo=%u\n",
+ fpos, fend - 1, folio_pos(folio), folio_order(folio),
+ folioq_folio_order(folioq, slot));
+ }
+ }
+
+donation_changed:
+ /* Try to consume the current folio if we've hit or passed the end of
+ * it. There's a possibility that this subreq doesn't start at the
+ * beginning of the folio, in which case we need to donate to/from the
+ * preceding subreq.
+ *
+ * We also need to include any potential donation back from the
+ * following subreq.
+ */
+ prev_donated = READ_ONCE(subreq->prev_donated);
+ next_donated = READ_ONCE(subreq->next_donated);
+ if (prev_donated || next_donated) {
+ spin_lock_bh(&rreq->lock);
+ prev_donated = subreq->prev_donated;
+ next_donated = subreq->next_donated;
+ subreq->start -= prev_donated;
+ subreq->len += prev_donated;
+ subreq->transferred += prev_donated;
+ prev_donated = subreq->prev_donated = 0;
+ if (subreq->transferred == subreq->len) {
+ subreq->len += next_donated;
+ subreq->transferred += next_donated;
+ next_donated = subreq->next_donated = 0;
+ }
+ trace_netfs_sreq(subreq, netfs_sreq_trace_add_donations);
+ spin_unlock_bh(&rreq->lock);
+ }
+
+ avail = subreq->transferred;
+ if (avail == subreq->len)
+ avail += next_donated;
+ start = subreq->start;
+ if (subreq->consumed == 0) {
+ start -= prev_donated;
+ avail += prev_donated;
+ } else {
+ start += subreq->consumed;
+ avail -= subreq->consumed;
+ }
+ part = umin(avail, fsize);
+
+ trace_netfs_progress(subreq, start, avail, part);
+
+ if (start + avail >= fend) {
+ if (fpos == start) {
+ /* Flush, unlock and mark for caching any folio we've just read. */
+ subreq->consumed = fend - subreq->start;
+ netfs_unlock_read_folio(subreq, rreq, folioq, slot);
+ folioq_mark2(folioq, slot);
+ if (subreq->consumed >= subreq->len)
+ goto remove_subreq;
+ } else if (fpos < start) {
+ excess = fend - subreq->start;
+
+ spin_lock_bh(&rreq->lock);
+ /* If we complete first on a folio split with the
+ * preceding subreq, donate to that subreq - otherwise
+ * we get the responsibility.
+ */
+ if (subreq->prev_donated != prev_donated) {
+ spin_unlock_bh(&rreq->lock);
+ goto donation_changed;
+ }
+
+ if (list_is_first(&subreq->rreq_link, &rreq->subrequests)) {
+ spin_unlock_bh(&rreq->lock);
+ pr_err("Can't donate prior to front\n");
+ goto bad;
+ }
+
+ prev = list_prev_entry(subreq, rreq_link);
+ WRITE_ONCE(prev->next_donated, prev->next_donated + excess);
+ subreq->start += excess;
+ subreq->len -= excess;
+ subreq->transferred -= excess;
+ trace_netfs_donate(rreq, subreq, prev, excess,
+ netfs_trace_donate_tail_to_prev);
+ trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_prev);
+
+ if (subreq->consumed >= subreq->len)
+ goto remove_subreq_locked;
+ spin_unlock_bh(&rreq->lock);
+ } else {
+ pr_err("fpos > start\n");
+ goto bad;
+ }
+
+ /* Advance the rolling buffer to the next folio. */
+ slot++;
+ if (slot >= folioq_nr_slots(folioq)) {
+ slot = 0;
+ folioq = folioq->next;
+ subreq->curr_folioq = folioq;
+ }
+ subreq->curr_folioq_slot = slot;
+ if (folioq && folioq_folio(folioq, slot))
+ subreq->curr_folio_order = folioq->orders[slot];
+ if (!was_async)
+ cond_resched();
+ goto next_folio;
+ }
+
+ /* Deal with partial progress. */
+ if (subreq->transferred < subreq->len)
+ return false;
+
+ /* Donate the remaining downloaded data to one of the neighbouring
+ * subrequests. Note that we may race with them doing the same thing.
+ */
+ spin_lock_bh(&rreq->lock);
+
+ if (subreq->prev_donated != prev_donated ||
+ subreq->next_donated != next_donated) {
+ spin_unlock_bh(&rreq->lock);
+ cond_resched();
+ goto donation_changed;
+ }
+
+ /* Deal with the trickiest case: that this subreq is in the middle of a
+ * folio, not touching either edge, but finishes first. In such a
+ * case, we donate to the previous subreq, if there is one, so that the
+ * donation is only handled when that completes - and remove this
+ * subreq from the list.
+ *
+ * If the previous subreq finished first, we will have acquired their
+ * donation and should be able to unlock folios and/or donate nextwards.
+ */
+ if (!subreq->consumed &&
+ !prev_donated &&
+ !list_is_first(&subreq->rreq_link, &rreq->subrequests)) {
+ prev = list_prev_entry(subreq, rreq_link);
+ WRITE_ONCE(prev->next_donated, prev->next_donated + subreq->len);
+ subreq->start += subreq->len;
+ subreq->len = 0;
+ subreq->transferred = 0;
+ trace_netfs_donate(rreq, subreq, prev, subreq->len,
+ netfs_trace_donate_to_prev);
+ trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_prev);
+ goto remove_subreq_locked;
+ }
+
+ /* If we can't donate down the chain, donate up the chain instead. */
+ excess = subreq->len - subreq->consumed + next_donated;
+
+ if (!subreq->consumed)
+ excess += prev_donated;
+
+ if (list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+ rreq->prev_donated = excess;
+ trace_netfs_donate(rreq, subreq, NULL, excess,
+ netfs_trace_donate_to_deferred_next);
+ } else {
+ next = list_next_entry(subreq, rreq_link);
+ WRITE_ONCE(next->prev_donated, excess);
+ trace_netfs_donate(rreq, subreq, next, excess,
+ netfs_trace_donate_to_next);
+ }
+ trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_next);
+ subreq->len = subreq->consumed;
+ subreq->transferred = subreq->consumed;
+ goto remove_subreq_locked;
+
+remove_subreq:
+ spin_lock_bh(&rreq->lock);
+remove_subreq_locked:
+ subreq->consumed = subreq->len;
+ list_del(&subreq->rreq_link);
+ spin_unlock_bh(&rreq->lock);
+ netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_consumed);
+ return true;
+
+bad:
+ /* Errr... prev and next both donated to us, but insufficient to finish
+ * the folio.
+ */
+ printk("R=%08x[%x] s=%llx-%llx %zx/%zx/%zx\n",
+ rreq->debug_id, subreq->debug_index,
+ subreq->start, subreq->start + subreq->transferred - 1,
+ subreq->consumed, subreq->transferred, subreq->len);
+ printk("folio: %llx-%llx\n", fpos, fend - 1);
+ printk("donated: prev=%zx next=%zx\n", prev_donated, next_donated);
+ printk("s=%llx av=%zx part=%zx\n", start, avail, part);
+ BUG();
+}
+
+/*
+ * Do page flushing and suchlike after DIO.
+ */
+static void netfs_rreq_assess_dio(struct netfs_io_request *rreq)
+{
+ struct netfs_io_subrequest *subreq;
+ unsigned int i;
+
+ /* Collect unbuffered reads and direct reads, adding up the transfer
+ * sizes until we find the first short or failed subrequest.
+ */
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ rreq->transferred += subreq->transferred;
+
+ if (subreq->transferred < subreq->len ||
+ test_bit(NETFS_SREQ_FAILED, &subreq->flags)) {
+ rreq->error = subreq->error;
+ break;
+ }
+ }
+
+ if (rreq->origin == NETFS_DIO_READ) {
+ for (i = 0; i < rreq->direct_bv_count; i++) {
+ flush_dcache_page(rreq->direct_bv[i].bv_page);
+ // TODO: cifs marks pages in the destination buffer
+ // dirty under some circumstances after a read. Do we
+ // need to do that too?
+ set_page_dirty(rreq->direct_bv[i].bv_page);
+ }
+ }
+
+ if (rreq->iocb) {
+ rreq->iocb->ki_pos += rreq->transferred;
+ if (rreq->iocb->ki_complete)
+ rreq->iocb->ki_complete(
+ rreq->iocb, rreq->error ? rreq->error : rreq->transferred);
+ }
+ if (rreq->netfs_ops->done)
+ rreq->netfs_ops->done(rreq);
+ if (rreq->origin == NETFS_DIO_READ)
+ inode_dio_end(rreq->inode);
+}
+
+/*
+ * Assess the state of a read request and decide what to do next.
+ *
+ * Note that we're in normal kernel thread context at this point, possibly
+ * running on a workqueue.
+ */
+static void netfs_rreq_assess(struct netfs_io_request *rreq)
+{
+ trace_netfs_rreq(rreq, netfs_rreq_trace_assess);
+
+ //netfs_rreq_is_still_valid(rreq);
+
+ if (test_and_clear_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags)) {
+ netfs_retry_reads(rreq);
+ return;
+ }
+
+ if (rreq->origin == NETFS_DIO_READ ||
+ rreq->origin == NETFS_READ_GAPS)
+ netfs_rreq_assess_dio(rreq);
+ task_io_account_read(rreq->transferred);
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
+ clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
+ wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_done);
+ netfs_clear_subrequests(rreq, false);
+ netfs_unlock_abandoned_read_pages(rreq);
+ if (unlikely(test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)))
+ netfs_pgpriv2_write_to_the_cache(rreq);
+}
+
+void netfs_read_termination_worker(struct work_struct *work)
+{
+ struct netfs_io_request *rreq =
+ container_of(work, struct netfs_io_request, work);
+ netfs_see_request(rreq, netfs_rreq_trace_see_work);
+ netfs_rreq_assess(rreq);
+ netfs_put_request(rreq, false, netfs_rreq_trace_put_work_complete);
+}
+
+/*
+ * Handle the completion of all outstanding I/O operations on a read request.
+ * We inherit a ref from the caller.
+ */
+void netfs_rreq_terminated(struct netfs_io_request *rreq, bool was_async)
+{
+ if (!was_async)
+ return netfs_rreq_assess(rreq);
+ if (!work_pending(&rreq->work)) {
+ netfs_get_request(rreq, netfs_rreq_trace_get_work);
+ if (!queue_work(system_unbound_wq, &rreq->work))
+ netfs_put_request(rreq, was_async, netfs_rreq_trace_put_work_nq);
+ }
+}
+
+/**
+ * netfs_read_subreq_progress - Note progress of a read operation.
+ * @subreq: The read request that has terminated.
+ * @was_async: True if we're in an asynchronous context.
+ *
+ * This tells the read side of netfs lib that a contributory I/O operation has
+ * made some progress and that it may be possible to unlock some folios.
+ *
+ * Before calling, the filesystem should update subreq->transferred to track
+ * the amount of data copied into the output buffer.
+ *
+ * If @was_async is true, the caller might be running in softirq or interrupt
+ * context and we can't sleep.
+ */
+void netfs_read_subreq_progress(struct netfs_io_subrequest *subreq,
+ bool was_async)
+{
+ struct netfs_io_request *rreq = subreq->rreq;
+
+ trace_netfs_sreq(subreq, netfs_sreq_trace_progress);
+
+ if (subreq->transferred > subreq->consumed &&
+ (rreq->origin == NETFS_READAHEAD ||
+ rreq->origin == NETFS_READPAGE ||
+ rreq->origin == NETFS_READ_FOR_WRITE)) {
+ netfs_consume_read_data(subreq, was_async);
+ __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ }
+}
+EXPORT_SYMBOL(netfs_read_subreq_progress);
+
+/**
+ * netfs_read_subreq_terminated - Note the termination of an I/O operation.
+ * @subreq: The I/O request that has terminated.
+ * @error: Error code indicating type of completion.
+ * @was_async: The termination was asynchronous
+ *
+ * This tells the read helper that a contributory I/O operation has terminated,
+ * one way or another, and that it should integrate the results.
+ *
+ * The caller indicates the outcome of the operation through @error, supplying
+ * 0 to indicate a successful or retryable transfer (if NETFS_SREQ_NEED_RETRY
+ * is set) or a negative error code. The helper will look after reissuing I/O
+ * operations as appropriate and writing downloaded data to the cache.
+ *
+ * Before calling, the filesystem should update subreq->transferred to track
+ * the amount of data copied into the output buffer.
+ *
+ * If @was_async is true, the caller might be running in softirq or interrupt
+ * context and we can't sleep.
+ */
+void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
+ int error, bool was_async)
+{
+ struct netfs_io_request *rreq = subreq->rreq;
+
+ switch (subreq->source) {
+ case NETFS_READ_FROM_CACHE:
+ netfs_stat(&netfs_n_rh_read_done);
+ break;
+ case NETFS_DOWNLOAD_FROM_SERVER:
+ netfs_stat(&netfs_n_rh_download_done);
+ break;
+ default:
+ break;
+ }
+
+ if (rreq->origin != NETFS_DIO_READ) {
+ /* Collect buffered reads.
+ *
+ * If the read completed validly short, then we can clear the
+ * tail before going on to unlock the folios.
+ */
+ if (error == 0 && subreq->transferred < subreq->len &&
+ (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags) ||
+ test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags))) {
+ netfs_clear_unread(subreq);
+ subreq->transferred = subreq->len;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_clear);
+ }
+ if (subreq->transferred > subreq->consumed &&
+ (rreq->origin == NETFS_READAHEAD ||
+ rreq->origin == NETFS_READPAGE ||
+ rreq->origin == NETFS_READ_FOR_WRITE)) {
+ netfs_consume_read_data(subreq, was_async);
+ __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ }
+ rreq->transferred += subreq->transferred;
+ }
+
+ /* Deal with retry requests, short reads and errors. If we retry
+ * but don't make progress, we abandon the attempt.
+ */
+ if (!error && subreq->transferred < subreq->len) {
+ if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) {
+ trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof);
+ } else {
+ trace_netfs_sreq(subreq, netfs_sreq_trace_short);
+ if (subreq->transferred > subreq->consumed) {
+ __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ } else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
+ __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ } else {
+ __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ error = -ENODATA;
+ }
+ }
+ }
+
+ subreq->error = error;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
+
+ if (unlikely(error < 0)) {
+ trace_netfs_failure(rreq, subreq, error, netfs_fail_read);
+ if (subreq->source == NETFS_READ_FROM_CACHE) {
+ netfs_stat(&netfs_n_rh_read_failed);
+ } else {
+ netfs_stat(&netfs_n_rh_download_failed);
+ set_bit(NETFS_RREQ_FAILED, &rreq->flags);
+ rreq->error = subreq->error;
+ }
+ }
+
+ if (atomic_dec_and_test(&rreq->nr_outstanding))
+ netfs_rreq_terminated(rreq, was_async);
+
+ netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
+}
+EXPORT_SYMBOL(netfs_read_subreq_terminated);
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
new file mode 100644
index 000000000000..ba5af89d37fa
--- /dev/null
+++ b/fs/netfs/read_pgpriv2.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Read with PG_private_2 [DEPRECATED].
+ *
+ * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/task_io_accounting_ops.h>
+#include "internal.h"
+
+/*
+ * [DEPRECATED] Mark page as requiring copy-to-cache using PG_private_2. The
+ * third mark in the folio queue is used to indicate that this folio needs
+ * writing.
+ */
+void netfs_pgpriv2_mark_copy_to_cache(struct netfs_io_subrequest *subreq,
+ struct netfs_io_request *rreq,
+ struct folio_queue *folioq,
+ int slot)
+{
+ struct folio *folio = folioq_folio(folioq, slot);
+
+ trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+ folio_start_private_2(folio);
+ folioq_mark3(folioq, slot);
+}
+
+/*
+ * [DEPRECATED] Cancel PG_private_2 on all marked folios in the event of an
+ * unrecoverable error.
+ */
+static void netfs_pgpriv2_cancel(struct folio_queue *folioq)
+{
+ struct folio *folio;
+ int slot;
+
+ while (folioq) {
+ if (!folioq->marks3) {
+ folioq = folioq->next;
+ continue;
+ }
+
+ slot = __ffs(folioq->marks3);
+ folio = folioq_folio(folioq, slot);
+
+ trace_netfs_folio(folio, netfs_folio_trace_cancel_copy);
+ folio_end_private_2(folio);
+ folioq_unmark3(folioq, slot);
+ }
+}
+
+/*
+ * [DEPRECATED] Copy a folio to the cache with PG_private_2 set.
+ */
+static int netfs_pgpriv2_copy_folio(struct netfs_io_request *wreq, struct folio *folio)
+{
+ struct netfs_io_stream *cache = &wreq->io_streams[1];
+ size_t fsize = folio_size(folio), flen = fsize;
+ loff_t fpos = folio_pos(folio), i_size;
+ bool to_eof = false;
+
+ _enter("");
+
+ /* netfs_perform_write() may shift i_size around the page or from out
+ * of the page to beyond it, but cannot move i_size into or through the
+ * page since we have it locked.
+ */
+ i_size = i_size_read(wreq->inode);
+
+ if (fpos >= i_size) {
+ /* mmap beyond eof. */
+ _debug("beyond eof");
+ folio_end_private_2(folio);
+ return 0;
+ }
+
+ if (fpos + fsize > wreq->i_size)
+ wreq->i_size = i_size;
+
+ if (flen > i_size - fpos) {
+ flen = i_size - fpos;
+ to_eof = true;
+ } else if (flen == i_size - fpos) {
+ to_eof = true;
+ }
+
+ _debug("folio %zx %zx", flen, fsize);
+
+ trace_netfs_folio(folio, netfs_folio_trace_store_copy);
+
+ /* Attach the folio to the rolling buffer. */
+ if (netfs_buffer_append_folio(wreq, folio, false) < 0)
+ return -ENOMEM;
+
+ cache->submit_extendable_to = fsize;
+ cache->submit_off = 0;
+ cache->submit_len = flen;
+
+ /* Attach the folio to one or more subrequests. For a big folio, we
+ * could end up with thousands of subrequests if the wsize is small -
+ * but we might need to wait during the creation of subrequests for
+ * network resources (eg. SMB credits).
+ */
+ do {
+ ssize_t part;
+
+ wreq->io_iter.iov_offset = cache->submit_off;
+
+ atomic64_set(&wreq->issued_to, fpos + cache->submit_off);
+ cache->submit_extendable_to = fsize - cache->submit_off;
+ part = netfs_advance_write(wreq, cache, fpos + cache->submit_off,
+ cache->submit_len, to_eof);
+ cache->submit_off += part;
+ if (part > cache->submit_len)
+ cache->submit_len = 0;
+ else
+ cache->submit_len -= part;
+ } while (cache->submit_len > 0);
+
+ wreq->io_iter.iov_offset = 0;
+ iov_iter_advance(&wreq->io_iter, fsize);
+ atomic64_set(&wreq->issued_to, fpos + fsize);
+
+ if (flen < fsize)
+ netfs_issue_write(wreq, cache);
+
+ _leave(" = 0");
+ return 0;
+}
+
+/*
+ * [DEPRECATED] Go through the buffer and write any folios that are marked with
+ * the third mark to the cache.
+ */
+void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
+{
+ struct netfs_io_request *wreq;
+ struct folio_queue *folioq;
+ struct folio *folio;
+ int error = 0;
+ int slot = 0;
+
+ _enter("");
+
+ if (!fscache_resources_valid(&rreq->cache_resources))
+ goto couldnt_start;
+
+ /* Need the first folio to be able to set up the op. */
+ for (folioq = rreq->buffer; folioq; folioq = folioq->next) {
+ if (folioq->marks3) {
+ slot = __ffs(folioq->marks3);
+ break;
+ }
+ }
+ if (!folioq)
+ return;
+ folio = folioq_folio(folioq, slot);
+
+ wreq = netfs_create_write_req(rreq->mapping, NULL, folio_pos(folio),
+ NETFS_PGPRIV2_COPY_TO_CACHE);
+ if (IS_ERR(wreq)) {
+ kleave(" [create %ld]", PTR_ERR(wreq));
+ goto couldnt_start;
+ }
+
+ trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
+ netfs_stat(&netfs_n_wh_copy_to_cache);
+
+ for (;;) {
+ error = netfs_pgpriv2_copy_folio(wreq, folio);
+ if (error < 0)
+ break;
+
+ folioq_unmark3(folioq, slot);
+ if (!folioq->marks3) {
+ folioq = folioq->next;
+ if (!folioq)
+ break;
+ }
+
+ slot = __ffs(folioq->marks3);
+ folio = folioq_folio(folioq, slot);
+ }
+
+ netfs_issue_write(wreq, &wreq->io_streams[1]);
+ smp_wmb(); /* Write lists before ALL_QUEUED. */
+ set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
+
+ netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+ _leave(" = %d", error);
+couldnt_start:
+ netfs_pgpriv2_cancel(rreq->buffer);
+}
+
+/*
+ * [DEPRECATED] Remove the PG_private_2 mark from any folios we've finished
+ * copying.
+ */
+bool netfs_pgpriv2_unlock_copied_folios(struct netfs_io_request *wreq)
+{
+ struct folio_queue *folioq = wreq->buffer;
+ unsigned long long collected_to = wreq->collected_to;
+ unsigned int slot = wreq->buffer_head_slot;
+ bool made_progress = false;
+
+ if (slot >= folioq_nr_slots(folioq)) {
+ folioq = netfs_delete_buffer_head(wreq);
+ slot = 0;
+ }
+
+ for (;;) {
+ struct folio *folio;
+ unsigned long long fpos, fend;
+ size_t fsize, flen;
+
+ folio = folioq_folio(folioq, slot);
+ if (WARN_ONCE(!folio_test_private_2(folio),
+ "R=%08x: folio %lx is not marked private_2\n",
+ wreq->debug_id, folio->index))
+ trace_netfs_folio(folio, netfs_folio_trace_not_under_wback);
+
+ fpos = folio_pos(folio);
+ fsize = folio_size(folio);
+ flen = fsize;
+
+ fend = min_t(unsigned long long, fpos + flen, wreq->i_size);
+
+ trace_netfs_collect_folio(wreq, folio, fend, collected_to);
+
+ /* Unlock any folio we've transferred all of. */
+ if (collected_to < fend)
+ break;
+
+ trace_netfs_folio(folio, netfs_folio_trace_end_copy);
+ folio_end_private_2(folio);
+ wreq->cleaned_to = fpos + fsize;
+ made_progress = true;
+
+ /* Clean up the head folioq. If we clear an entire folioq, then
+ * we can get rid of it provided it's not also the tail folioq
+ * being filled by the issuer.
+ */
+ folioq_clear(folioq, slot);
+ slot++;
+ if (slot >= folioq_nr_slots(folioq)) {
+ if (READ_ONCE(wreq->buffer_tail) == folioq)
+ break;
+ folioq = netfs_delete_buffer_head(wreq);
+ slot = 0;
+ }
+
+ if (fpos + fsize >= collected_to)
+ break;
+ }
+
+ wreq->buffer = folioq;
+ wreq->buffer_head_slot = slot;
+ return made_progress;
+}
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
new file mode 100644
index 000000000000..0350592ea804
--- /dev/null
+++ b/fs/netfs/read_retry.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem read subrequest retrying.
+ *
+ * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static void netfs_reissue_read(struct netfs_io_request *rreq,
+ struct netfs_io_subrequest *subreq)
+{
+ struct iov_iter *io_iter = &subreq->io_iter;
+
+ if (iov_iter_is_folioq(io_iter)) {
+ subreq->curr_folioq = (struct folio_queue *)io_iter->folioq;
+ subreq->curr_folioq_slot = io_iter->folioq_slot;
+ subreq->curr_folio_order = subreq->curr_folioq->orders[subreq->curr_folioq_slot];
+ }
+
+ atomic_inc(&rreq->nr_outstanding);
+ __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+ netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+ subreq->rreq->netfs_ops->issue_read(subreq);
+}
+
+/*
+ * Go through the list of failed/short reads, retrying all retryable ones. We
+ * need to switch failed cache reads to network downloads.
+ */
+static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
+{
+ struct netfs_io_subrequest *subreq;
+ struct netfs_io_stream *stream0 = &rreq->io_streams[0];
+ LIST_HEAD(sublist);
+ LIST_HEAD(queue);
+
+ _enter("R=%x", rreq->debug_id);
+
+ if (list_empty(&rreq->subrequests))
+ return;
+
+ if (rreq->netfs_ops->retry_request)
+ rreq->netfs_ops->retry_request(rreq, NULL);
+
+ /* If there's no renegotiation to do, just resend each retryable subreq
+ * up to the first permanently failed one.
+ */
+ if (!rreq->netfs_ops->prepare_read &&
+ !test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags)) {
+ struct netfs_io_subrequest *subreq;
+
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+ break;
+ if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ netfs_reset_iter(subreq);
+ netfs_reissue_read(rreq, subreq);
+ }
+ }
+ return;
+ }
+
+ /* Okay, we need to renegotiate all the download requests and flip any
+ * failed cache reads over to being download requests and negotiate
+ * those also. All fully successful subreqs have been removed from the
+ * list and any spare data from those has been donated.
+ *
+ * What we do is decant the list and rebuild it one subreq at a time so
+ * that we don't end up with donations jumping over a gap we're busy
+ * populating with smaller subrequests. In the event that the subreq
+ * we just launched finishes before we insert the next subreq, it'll
+ * fill in rreq->prev_donated instead.
+
+ * Note: Alternatively, we could split the tail subrequest right before
+ * we reissue it and fix up the donations under lock.
+ */
+ list_splice_init(&rreq->subrequests, &queue);
+
+ do {
+ struct netfs_io_subrequest *from;
+ struct iov_iter source;
+ unsigned long long start, len;
+ size_t part, deferred_next_donated = 0;
+ bool boundary = false;
+
+ /* Go through the subreqs and find the next span of contiguous
+ * buffer that we then rejig (cifs, for example, needs the
+ * rsize renegotiating) and reissue.
+ */
+ from = list_first_entry(&queue, struct netfs_io_subrequest, rreq_link);
+ list_move_tail(&from->rreq_link, &sublist);
+ start = from->start + from->transferred;
+ len = from->len - from->transferred;
+
+ _debug("from R=%08x[%x] s=%llx ctl=%zx/%zx/%zx",
+ rreq->debug_id, from->debug_index,
+ from->start, from->consumed, from->transferred, from->len);
+
+ if (test_bit(NETFS_SREQ_FAILED, &from->flags) ||
+ !test_bit(NETFS_SREQ_NEED_RETRY, &from->flags))
+ goto abandon;
+
+ deferred_next_donated = from->next_donated;
+ while ((subreq = list_first_entry_or_null(
+ &queue, struct netfs_io_subrequest, rreq_link))) {
+ if (subreq->start != start + len ||
+ subreq->transferred > 0 ||
+ !test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags))
+ break;
+ list_move_tail(&subreq->rreq_link, &sublist);
+ len += subreq->len;
+ deferred_next_donated = subreq->next_donated;
+ if (test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags))
+ break;
+ }
+
+ _debug(" - range: %llx-%llx %llx", start, start + len - 1, len);
+
+ /* Determine the set of buffers we're going to use. Each
+ * subreq gets a subset of a single overall contiguous buffer.
+ */
+ netfs_reset_iter(from);
+ source = from->io_iter;
+ source.count = len;
+
+ /* Work through the sublist. */
+ while ((subreq = list_first_entry_or_null(
+ &sublist, struct netfs_io_subrequest, rreq_link))) {
+ list_del(&subreq->rreq_link);
+
+ subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
+ subreq->start = start - subreq->transferred;
+ subreq->len = len + subreq->transferred;
+ stream0->sreq_max_len = subreq->len;
+
+ __clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+
+ spin_lock_bh(&rreq->lock);
+ list_add_tail(&subreq->rreq_link, &rreq->subrequests);
+ subreq->prev_donated += rreq->prev_donated;
+ rreq->prev_donated = 0;
+ trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
+ spin_unlock_bh(&rreq->lock);
+
+ BUG_ON(!len);
+
+ /* Renegotiate max_len (rsize) */
+ if (rreq->netfs_ops->prepare_read(subreq) < 0) {
+ trace_netfs_sreq(subreq, netfs_sreq_trace_reprep_failed);
+ __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ }
+
+ part = umin(len, stream0->sreq_max_len);
+ if (unlikely(rreq->io_streams[0].sreq_max_segs))
+ part = netfs_limit_iter(&source, 0, part, stream0->sreq_max_segs);
+ subreq->len = subreq->transferred + part;
+ subreq->io_iter = source;
+ iov_iter_truncate(&subreq->io_iter, part);
+ iov_iter_advance(&source, part);
+ len -= part;
+ start += part;
+ if (!len) {
+ if (boundary)
+ __set_bit(NETFS_SREQ_BOUNDARY, &subreq->flags);
+ subreq->next_donated = deferred_next_donated;
+ } else {
+ __clear_bit(NETFS_SREQ_BOUNDARY, &subreq->flags);
+ subreq->next_donated = 0;
+ }
+
+ netfs_reissue_read(rreq, subreq);
+ if (!len)
+ break;
+
+ /* If we ran out of subrequests, allocate another. */
+ if (list_empty(&sublist)) {
+ subreq = netfs_alloc_subrequest(rreq);
+ if (!subreq)
+ goto abandon;
+ subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
+ subreq->start = start;
+
+ /* We get two refs, but need just one. */
+ netfs_put_subrequest(subreq, false, netfs_sreq_trace_new);
+ trace_netfs_sreq(subreq, netfs_sreq_trace_split);
+ list_add_tail(&subreq->rreq_link, &sublist);
+ }
+ }
+
+ /* If we managed to use fewer subreqs, we can discard the
+ * excess.
+ */
+ while ((subreq = list_first_entry_or_null(
+ &sublist, struct netfs_io_subrequest, rreq_link))) {
+ trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
+ list_del(&subreq->rreq_link);
+ netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
+ }
+
+ } while (!list_empty(&queue));
+
+ return;
+
+ /* If we hit ENOMEM, fail all remaining subrequests */
+abandon:
+ list_splice_init(&sublist, &queue);
+ list_for_each_entry(subreq, &queue, rreq_link) {
+ if (!subreq->error)
+ subreq->error = -ENOMEM;
+ __clear_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ __clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ __clear_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+ }
+ spin_lock_bh(&rreq->lock);
+ list_splice_tail_init(&queue, &rreq->subrequests);
+ spin_unlock_bh(&rreq->lock);
+}
+
+/*
+ * Retry reads.
+ */
+void netfs_retry_reads(struct netfs_io_request *rreq)
+{
+ trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
+
+ atomic_inc(&rreq->nr_outstanding);
+
+ netfs_retry_read_subrequests(rreq);
+
+ if (atomic_dec_and_test(&rreq->nr_outstanding))
+ netfs_rreq_terminated(rreq, false);
+}
+
+/*
+ * Unlock any the pages that haven't been unlocked yet due to abandoned
+ * subrequests.
+ */
+void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq)
+{
+ struct folio_queue *p;
+
+ for (p = rreq->buffer; p; p = p->next) {
+ for (int slot = 0; slot < folioq_count(p); slot++) {
+ struct folio *folio = folioq_folio(p, slot);
+
+ if (folio && !folioq_is_marked2(p, slot)) {
+ trace_netfs_folio(folio, netfs_folio_trace_abandon);
+ folio_unlock(folio);
+ }
+ }
+ }
+}
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index 0892768eea32..8e63516b40f6 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -32,6 +32,7 @@ atomic_t netfs_n_wh_buffered_write;
atomic_t netfs_n_wh_writethrough;
atomic_t netfs_n_wh_dio_write;
atomic_t netfs_n_wh_writepages;
+atomic_t netfs_n_wh_copy_to_cache;
atomic_t netfs_n_wh_wstream_conflict;
atomic_t netfs_n_wh_upload;
atomic_t netfs_n_wh_upload_done;
@@ -39,45 +40,53 @@ atomic_t netfs_n_wh_upload_failed;
atomic_t netfs_n_wh_write;
atomic_t netfs_n_wh_write_done;
atomic_t netfs_n_wh_write_failed;
+atomic_t netfs_n_wb_lock_skip;
+atomic_t netfs_n_wb_lock_wait;
+atomic_t netfs_n_folioq;
int netfs_stats_show(struct seq_file *m, void *v)
{
- seq_printf(m, "Netfs : DR=%u RA=%u RF=%u WB=%u WBZ=%u\n",
+ seq_printf(m, "Reads : DR=%u RA=%u RF=%u WB=%u WBZ=%u\n",
atomic_read(&netfs_n_rh_dio_read),
atomic_read(&netfs_n_rh_readahead),
atomic_read(&netfs_n_rh_read_folio),
atomic_read(&netfs_n_rh_write_begin),
atomic_read(&netfs_n_rh_write_zskip));
- seq_printf(m, "Netfs : BW=%u WT=%u DW=%u WP=%u\n",
+ seq_printf(m, "Writes : BW=%u WT=%u DW=%u WP=%u 2C=%u\n",
atomic_read(&netfs_n_wh_buffered_write),
atomic_read(&netfs_n_wh_writethrough),
atomic_read(&netfs_n_wh_dio_write),
- atomic_read(&netfs_n_wh_writepages));
- seq_printf(m, "Netfs : ZR=%u sh=%u sk=%u\n",
+ atomic_read(&netfs_n_wh_writepages),
+ atomic_read(&netfs_n_wh_copy_to_cache));
+ seq_printf(m, "ZeroOps: ZR=%u sh=%u sk=%u\n",
atomic_read(&netfs_n_rh_zero),
atomic_read(&netfs_n_rh_short_read),
atomic_read(&netfs_n_rh_write_zskip));
- seq_printf(m, "Netfs : DL=%u ds=%u df=%u di=%u\n",
+ seq_printf(m, "DownOps: DL=%u ds=%u df=%u di=%u\n",
atomic_read(&netfs_n_rh_download),
atomic_read(&netfs_n_rh_download_done),
atomic_read(&netfs_n_rh_download_failed),
atomic_read(&netfs_n_rh_download_instead));
- seq_printf(m, "Netfs : RD=%u rs=%u rf=%u\n",
+ seq_printf(m, "CaRdOps: RD=%u rs=%u rf=%u\n",
atomic_read(&netfs_n_rh_read),
atomic_read(&netfs_n_rh_read_done),
atomic_read(&netfs_n_rh_read_failed));
- seq_printf(m, "Netfs : UL=%u us=%u uf=%u\n",
+ seq_printf(m, "UpldOps: UL=%u us=%u uf=%u\n",
atomic_read(&netfs_n_wh_upload),
atomic_read(&netfs_n_wh_upload_done),
atomic_read(&netfs_n_wh_upload_failed));
- seq_printf(m, "Netfs : WR=%u ws=%u wf=%u\n",
+ seq_printf(m, "CaWrOps: WR=%u ws=%u wf=%u\n",
atomic_read(&netfs_n_wh_write),
atomic_read(&netfs_n_wh_write_done),
atomic_read(&netfs_n_wh_write_failed));
- seq_printf(m, "Netfs : rr=%u sr=%u wsc=%u\n",
+ seq_printf(m, "Objs : rr=%u sr=%u foq=%u wsc=%u\n",
atomic_read(&netfs_n_rh_rreq),
atomic_read(&netfs_n_rh_sreq),
+ atomic_read(&netfs_n_folioq),
atomic_read(&netfs_n_wh_wstream_conflict));
+ seq_printf(m, "WbLock : skip=%u wait=%u\n",
+ atomic_read(&netfs_n_wb_lock_skip),
+ atomic_read(&netfs_n_wb_lock_wait));
return fscache_stats_show(m);
}
EXPORT_SYMBOL(netfs_stats_show);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index ae7a2043f670..1d438be2e1b4 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -15,15 +15,11 @@
/* Notes made in the collector */
#define HIT_PENDING 0x01 /* A front op was still pending */
-#define SOME_EMPTY 0x02 /* One of more streams are empty */
-#define ALL_EMPTY 0x04 /* All streams are empty */
-#define MAYBE_DISCONTIG 0x08 /* A front op may be discontiguous (rounded to PAGE_SIZE) */
-#define NEED_REASSESS 0x10 /* Need to loop round and reassess */
-#define REASSESS_DISCONTIG 0x20 /* Reassess discontiguity if contiguity advances */
-#define MADE_PROGRESS 0x40 /* Made progress cleaning up a stream or the folio set */
-#define BUFFERED 0x80 /* The pagecache needs cleaning up */
-#define NEED_RETRY 0x100 /* A front op requests retrying */
-#define SAW_FAILURE 0x200 /* One stream or hit a permanent failure */
+#define NEED_REASSESS 0x02 /* Need to loop round and reassess */
+#define MADE_PROGRESS 0x04 /* Made progress cleaning up a stream or the folio set */
+#define BUFFERED 0x08 /* The pagecache needs cleaning up */
+#define NEED_RETRY 0x10 /* A front op requests retrying */
+#define SAW_FAILURE 0x20 /* One stream or hit a permanent failure */
/*
* Successful completion of write of a folio to the server and/or cache. Note
@@ -82,55 +78,37 @@ end_wb:
}
/*
- * Get hold of a folio we have under writeback. We don't want to get the
- * refcount on it.
+ * Unlock any folios we've finished with.
*/
-static struct folio *netfs_writeback_lookup_folio(struct netfs_io_request *wreq, loff_t pos)
+static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
+ unsigned int *notes)
{
- XA_STATE(xas, &wreq->mapping->i_pages, pos / PAGE_SIZE);
- struct folio *folio;
-
- rcu_read_lock();
-
- for (;;) {
- xas_reset(&xas);
- folio = xas_load(&xas);
- if (xas_retry(&xas, folio))
- continue;
+ struct folio_queue *folioq = wreq->buffer;
+ unsigned long long collected_to = wreq->collected_to;
+ unsigned int slot = wreq->buffer_head_slot;
- if (!folio || xa_is_value(folio))
- kdebug("R=%08x: folio %lx (%llx) not present",
- wreq->debug_id, xas.xa_index, pos / PAGE_SIZE);
- BUG_ON(!folio || xa_is_value(folio));
-
- if (folio == xas_reload(&xas))
- break;
+ if (wreq->origin == NETFS_PGPRIV2_COPY_TO_CACHE) {
+ if (netfs_pgpriv2_unlock_copied_folios(wreq))
+ *notes |= MADE_PROGRESS;
+ return;
}
- rcu_read_unlock();
-
- if (WARN_ONCE(!folio_test_writeback(folio),
- "R=%08x: folio %lx is not under writeback\n",
- wreq->debug_id, folio->index)) {
- trace_netfs_folio(folio, netfs_folio_trace_not_under_wback);
+ if (slot >= folioq_nr_slots(folioq)) {
+ folioq = netfs_delete_buffer_head(wreq);
+ slot = 0;
}
- return folio;
-}
-/*
- * Unlock any folios we've finished with.
- */
-static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
- unsigned long long collected_to,
- unsigned int *notes)
-{
for (;;) {
struct folio *folio;
struct netfs_folio *finfo;
unsigned long long fpos, fend;
size_t fsize, flen;
- folio = netfs_writeback_lookup_folio(wreq, wreq->cleaned_to);
+ folio = folioq_folio(folioq, slot);
+ if (WARN_ONCE(!folio_test_writeback(folio),
+ "R=%08x: folio %lx is not under writeback\n",
+ wreq->debug_id, folio->index))
+ trace_netfs_folio(folio, netfs_folio_trace_not_under_wback);
fpos = folio_pos(folio);
fsize = folio_size(folio);
@@ -141,12 +119,6 @@ static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
trace_netfs_collect_folio(wreq, folio, fend, collected_to);
- if (fpos + fsize > wreq->contiguity) {
- trace_netfs_collect_contig(wreq, fpos + fsize,
- netfs_contig_trace_unlock);
- wreq->contiguity = fpos + fsize;
- }
-
/* Unlock any folio we've transferred all of. */
if (collected_to < fend)
break;
@@ -155,9 +127,25 @@ static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq,
wreq->cleaned_to = fpos + fsize;
*notes |= MADE_PROGRESS;
+ /* Clean up the head folioq. If we clear an entire folioq, then
+ * we can get rid of it provided it's not also the tail folioq
+ * being filled by the issuer.
+ */
+ folioq_clear(folioq, slot);
+ slot++;
+ if (slot >= folioq_nr_slots(folioq)) {
+ if (READ_ONCE(wreq->buffer_tail) == folioq)
+ break;
+ folioq = netfs_delete_buffer_head(wreq);
+ slot = 0;
+ }
+
if (fpos + fsize >= collected_to)
break;
}
+
+ wreq->buffer = folioq;
+ wreq->buffer_head_slot = slot;
}
/*
@@ -188,9 +176,12 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
break;
if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ struct iov_iter source = subreq->io_iter;
+
+ iov_iter_revert(&source, subreq->len - source.count);
__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
- netfs_reissue_write(stream, subreq);
+ netfs_reissue_write(stream, subreq, &source);
}
}
return;
@@ -200,6 +191,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
do {
struct netfs_io_subrequest *subreq = NULL, *from, *to, *tmp;
+ struct iov_iter source;
unsigned long long start, len;
size_t part;
bool boundary = false;
@@ -227,6 +219,13 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
len += to->len;
}
+ /* Determine the set of buffers we're going to use. Each
+ * subreq gets a subset of a single overall contiguous buffer.
+ */
+ netfs_reset_iter(from);
+ source = from->io_iter;
+ source.count = len;
+
/* Work through the sublist. */
subreq = from;
list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
@@ -238,7 +237,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
stream->prepare_write(subreq);
- part = min(len, subreq->max_len);
+ part = min(len, stream->sreq_max_len);
subreq->len = part;
subreq->start = start;
subreq->transferred = 0;
@@ -249,7 +248,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
boundary = true;
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
- netfs_reissue_write(stream, subreq);
+ netfs_reissue_write(stream, subreq, &source);
if (subreq == to)
break;
}
@@ -278,8 +277,6 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
subreq = netfs_alloc_subrequest(wreq);
subreq->source = to->source;
subreq->start = start;
- subreq->max_len = len;
- subreq->max_nr_segs = INT_MAX;
subreq->debug_index = atomic_inc_return(&wreq->subreq_counter);
subreq->stream_nr = to->stream_nr;
__set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
@@ -293,10 +290,12 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
to = list_next_entry(to, rreq_link);
trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
+ stream->sreq_max_len = len;
+ stream->sreq_max_segs = INT_MAX;
switch (stream->source) {
case NETFS_UPLOAD_TO_SERVER:
netfs_stat(&netfs_n_wh_upload);
- subreq->max_len = min(len, wreq->wsize);
+ stream->sreq_max_len = umin(len, wreq->wsize);
break;
case NETFS_WRITE_TO_CACHE:
netfs_stat(&netfs_n_wh_write);
@@ -307,7 +306,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
stream->prepare_write(subreq);
- part = min(len, subreq->max_len);
+ part = umin(len, stream->sreq_max_len);
subreq->len = subreq->transferred + part;
len -= part;
start += part;
@@ -316,7 +315,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
boundary = false;
}
- netfs_reissue_write(stream, subreq);
+ netfs_reissue_write(stream, subreq, &source);
if (!len)
break;
@@ -377,7 +376,7 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
{
struct netfs_io_subrequest *front, *remove;
struct netfs_io_stream *stream;
- unsigned long long collected_to;
+ unsigned long long collected_to, issued_to;
unsigned int notes;
int s;
@@ -386,28 +385,22 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
trace_netfs_rreq(wreq, netfs_rreq_trace_collect);
reassess_streams:
+ issued_to = atomic64_read(&wreq->issued_to);
smp_rmb();
collected_to = ULLONG_MAX;
- if (wreq->origin == NETFS_WRITEBACK)
- notes = ALL_EMPTY | BUFFERED | MAYBE_DISCONTIG;
- else if (wreq->origin == NETFS_WRITETHROUGH)
- notes = ALL_EMPTY | BUFFERED;
+ if (wreq->origin == NETFS_WRITEBACK ||
+ wreq->origin == NETFS_WRITETHROUGH ||
+ wreq->origin == NETFS_PGPRIV2_COPY_TO_CACHE)
+ notes = BUFFERED;
else
- notes = ALL_EMPTY;
+ notes = 0;
/* Remove completed subrequests from the front of the streams and
* advance the completion point on each stream. We stop when we hit
* something that's in progress. The issuer thread may be adding stuff
* to the tail whilst we're doing this.
- *
- * We must not, however, merge in discontiguities that span whole
- * folios that aren't under writeback. This is made more complicated
- * by the folios in the gap being of unpredictable sizes - if they even
- * exist - but we don't want to look them up.
*/
for (s = 0; s < NR_IO_STREAMS; s++) {
- loff_t rstart, rend;
-
stream = &wreq->io_streams[s];
/* Read active flag before list pointers */
if (!smp_load_acquire(&stream->active))
@@ -419,26 +412,10 @@ reassess_streams:
//_debug("sreq [%x] %llx %zx/%zx",
// front->debug_index, front->start, front->transferred, front->len);
- /* Stall if there may be a discontinuity. */
- rstart = round_down(front->start, PAGE_SIZE);
- if (rstart > wreq->contiguity) {
- if (wreq->contiguity > stream->collected_to) {
- trace_netfs_collect_gap(wreq, stream,
- wreq->contiguity, 'D');
- stream->collected_to = wreq->contiguity;
- }
- notes |= REASSESS_DISCONTIG;
- break;
- }
- rend = round_up(front->start + front->len, PAGE_SIZE);
- if (rend > wreq->contiguity) {
- trace_netfs_collect_contig(wreq, rend,
- netfs_contig_trace_collect);
- wreq->contiguity = rend;
- if (notes & REASSESS_DISCONTIG)
- notes |= NEED_REASSESS;
+ if (stream->collected_to < front->start) {
+ trace_netfs_collect_gap(wreq, stream, issued_to, 'F');
+ stream->collected_to = front->start;
}
- notes &= ~MAYBE_DISCONTIG;
/* Stall if the front is still undergoing I/O. */
if (test_bit(NETFS_SREQ_IN_PROGRESS, &front->flags)) {
@@ -473,33 +450,27 @@ reassess_streams:
cancel:
/* Remove if completely consumed. */
- spin_lock(&wreq->lock);
+ spin_lock_bh(&wreq->lock);
remove = front;
list_del_init(&front->rreq_link);
front = list_first_entry_or_null(&stream->subrequests,
struct netfs_io_subrequest, rreq_link);
stream->front = front;
- if (!front) {
- unsigned long long jump_to = atomic64_read(&wreq->issued_to);
-
- if (stream->collected_to < jump_to) {
- trace_netfs_collect_gap(wreq, stream, jump_to, 'A');
- stream->collected_to = jump_to;
- }
- }
-
- spin_unlock(&wreq->lock);
+ spin_unlock_bh(&wreq->lock);
netfs_put_subrequest(remove, false,
notes & SAW_FAILURE ?
netfs_sreq_trace_put_cancel :
netfs_sreq_trace_put_done);
}
- if (front)
- notes &= ~ALL_EMPTY;
- else
- notes |= SOME_EMPTY;
+ /* If we have an empty stream, we need to jump it forward
+ * otherwise the collection point will never advance.
+ */
+ if (!front && issued_to > stream->collected_to) {
+ trace_netfs_collect_gap(wreq, stream, issued_to, 'E');
+ stream->collected_to = issued_to;
+ }
if (stream->collected_to < collected_to)
collected_to = stream->collected_to;
@@ -508,36 +479,6 @@ reassess_streams:
if (collected_to != ULLONG_MAX && collected_to > wreq->collected_to)
wreq->collected_to = collected_to;
- /* If we have an empty stream, we need to jump it forward over any gap
- * otherwise the collection point will never advance.
- *
- * Note that the issuer always adds to the stream with the lowest
- * so-far submitted start, so if we see two consecutive subreqs in one
- * stream with nothing between then in another stream, then the second
- * stream has a gap that can be jumped.
- */
- if (notes & SOME_EMPTY) {
- unsigned long long jump_to = wreq->start + READ_ONCE(wreq->submitted);
-
- for (s = 0; s < NR_IO_STREAMS; s++) {
- stream = &wreq->io_streams[s];
- if (stream->active &&
- stream->front &&
- stream->front->start < jump_to)
- jump_to = stream->front->start;
- }
-
- for (s = 0; s < NR_IO_STREAMS; s++) {
- stream = &wreq->io_streams[s];
- if (stream->active &&
- !stream->front &&
- stream->collected_to < jump_to) {
- trace_netfs_collect_gap(wreq, stream, jump_to, 'B');
- stream->collected_to = jump_to;
- }
- }
- }
-
for (s = 0; s < NR_IO_STREAMS; s++) {
stream = &wreq->io_streams[s];
if (stream->active)
@@ -548,43 +489,14 @@ reassess_streams:
/* Unlock any folios that we have now finished with. */
if (notes & BUFFERED) {
- unsigned long long clean_to = min(wreq->collected_to, wreq->contiguity);
-
- if (wreq->cleaned_to < clean_to)
- netfs_writeback_unlock_folios(wreq, clean_to, &notes);
+ if (wreq->cleaned_to < wreq->collected_to)
+ netfs_writeback_unlock_folios(wreq, &notes);
} else {
wreq->cleaned_to = wreq->collected_to;
}
// TODO: Discard encryption buffers
- /* If all streams are discontiguous with the last folio we cleared, we
- * may need to skip a set of folios.
- */
- if ((notes & (MAYBE_DISCONTIG | ALL_EMPTY)) == MAYBE_DISCONTIG) {
- unsigned long long jump_to = ULLONG_MAX;
-
- for (s = 0; s < NR_IO_STREAMS; s++) {
- stream = &wreq->io_streams[s];
- if (stream->active && stream->front &&
- stream->front->start < jump_to)
- jump_to = stream->front->start;
- }
-
- trace_netfs_collect_contig(wreq, jump_to, netfs_contig_trace_jump);
- wreq->contiguity = jump_to;
- wreq->cleaned_to = jump_to;
- wreq->collected_to = jump_to;
- for (s = 0; s < NR_IO_STREAMS; s++) {
- stream = &wreq->io_streams[s];
- if (stream->collected_to < jump_to)
- stream->collected_to = jump_to;
- }
- //cond_resched();
- notes |= MADE_PROGRESS;
- goto reassess_streams;
- }
-
if (notes & NEED_RETRY)
goto need_retry;
if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 3f7e37e50c7d..04e66d587f77 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -95,7 +95,8 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
struct netfs_io_request *wreq;
struct netfs_inode *ictx;
bool is_buffered = (origin == NETFS_WRITEBACK ||
- origin == NETFS_WRITETHROUGH);
+ origin == NETFS_WRITETHROUGH ||
+ origin == NETFS_PGPRIV2_COPY_TO_CACHE);
wreq = netfs_alloc_request(mapping, file, start, 0, origin);
if (IS_ERR(wreq))
@@ -107,9 +108,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
if (is_buffered && netfs_is_cache_enabled(ictx))
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
- wreq->contiguity = wreq->start;
wreq->cleaned_to = wreq->start;
- INIT_WORK(&wreq->work, netfs_write_collection_worker);
wreq->io_streams[0].stream_nr = 0;
wreq->io_streams[0].source = NETFS_UPLOAD_TO_SERVER;
@@ -158,22 +157,19 @@ static void netfs_prepare_write(struct netfs_io_request *wreq,
subreq = netfs_alloc_subrequest(wreq);
subreq->source = stream->source;
subreq->start = start;
- subreq->max_len = ULONG_MAX;
- subreq->max_nr_segs = INT_MAX;
subreq->stream_nr = stream->stream_nr;
+ subreq->io_iter = wreq->io_iter;
_enter("R=%x[%x]", wreq->debug_id, subreq->debug_index);
- trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
- refcount_read(&subreq->ref),
- netfs_sreq_trace_new);
-
trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
+ stream->sreq_max_len = UINT_MAX;
+ stream->sreq_max_segs = INT_MAX;
switch (stream->source) {
case NETFS_UPLOAD_TO_SERVER:
netfs_stat(&netfs_n_wh_upload);
- subreq->max_len = wreq->wsize;
+ stream->sreq_max_len = wreq->wsize;
break;
case NETFS_WRITE_TO_CACHE:
netfs_stat(&netfs_n_wh_write);
@@ -192,7 +188,7 @@ static void netfs_prepare_write(struct netfs_io_request *wreq,
* the list. The collector only goes nextwards and uses the lock to
* remove entries off of the front.
*/
- spin_lock(&wreq->lock);
+ spin_lock_bh(&wreq->lock);
list_add_tail(&subreq->rreq_link, &stream->subrequests);
if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
stream->front = subreq;
@@ -203,7 +199,7 @@ static void netfs_prepare_write(struct netfs_io_request *wreq,
}
}
- spin_unlock(&wreq->lock);
+ spin_unlock_bh(&wreq->lock);
stream->construct = subreq;
}
@@ -223,41 +219,34 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream,
if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
return netfs_write_subrequest_terminated(subreq, subreq->error, false);
- // TODO: Use encrypted buffer
- if (test_bit(NETFS_RREQ_USE_IO_ITER, &wreq->flags)) {
- subreq->io_iter = wreq->io_iter;
- iov_iter_advance(&subreq->io_iter,
- subreq->start + subreq->transferred - wreq->start);
- iov_iter_truncate(&subreq->io_iter,
- subreq->len - subreq->transferred);
- } else {
- iov_iter_xarray(&subreq->io_iter, ITER_SOURCE, &wreq->mapping->i_pages,
- subreq->start + subreq->transferred,
- subreq->len - subreq->transferred);
- }
-
trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
stream->issue_write(subreq);
}
void netfs_reissue_write(struct netfs_io_stream *stream,
- struct netfs_io_subrequest *subreq)
+ struct netfs_io_subrequest *subreq,
+ struct iov_iter *source)
{
+ size_t size = subreq->len - subreq->transferred;
+
+ // TODO: Use encrypted buffer
+ subreq->io_iter = *source;
+ iov_iter_advance(source, size);
+ iov_iter_truncate(&subreq->io_iter, size);
+
__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
netfs_do_issue_write(stream, subreq);
}
-static void netfs_issue_write(struct netfs_io_request *wreq,
- struct netfs_io_stream *stream)
+void netfs_issue_write(struct netfs_io_request *wreq,
+ struct netfs_io_stream *stream)
{
struct netfs_io_subrequest *subreq = stream->construct;
if (!subreq)
return;
stream->construct = NULL;
-
- if (subreq->start + subreq->len > wreq->start + wreq->submitted)
- WRITE_ONCE(wreq->submitted, subreq->start + subreq->len - wreq->start);
+ subreq->io_iter.count = subreq->len;
netfs_do_issue_write(stream, subreq);
}
@@ -290,13 +279,14 @@ int netfs_advance_write(struct netfs_io_request *wreq,
netfs_prepare_write(wreq, stream, start);
subreq = stream->construct;
- part = min(subreq->max_len - subreq->len, len);
- _debug("part %zx/%zx %zx/%zx", subreq->len, subreq->max_len, part, len);
+ part = umin(stream->sreq_max_len - subreq->len, len);
+ _debug("part %zx/%zx %zx/%zx", subreq->len, stream->sreq_max_len, part, len);
subreq->len += part;
subreq->nr_segs++;
+ stream->submit_extendable_to -= part;
- if (subreq->len >= subreq->max_len ||
- subreq->nr_segs >= subreq->max_nr_segs ||
+ if (subreq->len >= stream->sreq_max_len ||
+ subreq->nr_segs >= stream->sreq_max_segs ||
to_eof) {
netfs_issue_write(wreq, stream);
subreq = NULL;
@@ -410,19 +400,26 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
folio_unlock(folio);
if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
- if (!fscache_resources_valid(&wreq->cache_resources)) {
+ if (!cache->avail) {
trace_netfs_folio(folio, netfs_folio_trace_cancel_copy);
netfs_issue_write(wreq, upload);
netfs_folio_written_back(folio);
return 0;
}
trace_netfs_folio(folio, netfs_folio_trace_store_copy);
+ } else if (!upload->avail && !cache->avail) {
+ trace_netfs_folio(folio, netfs_folio_trace_cancel_store);
+ netfs_folio_written_back(folio);
+ return 0;
} else if (!upload->construct) {
trace_netfs_folio(folio, netfs_folio_trace_store);
} else {
trace_netfs_folio(folio, netfs_folio_trace_store_plus);
}
+ /* Attach the folio to the rolling buffer. */
+ netfs_buffer_append_folio(wreq, folio, false);
+
/* Move the submission point forward to allow for write-streaming data
* not starting at the front of the page. We don't do write-streaming
* with the cache as the cache requires DIO alignment.
@@ -432,7 +429,6 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
*/
for (int s = 0; s < NR_IO_STREAMS; s++) {
stream = &wreq->io_streams[s];
- stream->submit_max_len = fsize;
stream->submit_off = foff;
stream->submit_len = flen;
if ((stream->source == NETFS_WRITE_TO_CACHE && streamw) ||
@@ -440,7 +436,6 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
fgroup == NETFS_FOLIO_COPY_TO_CACHE)) {
stream->submit_off = UINT_MAX;
stream->submit_len = 0;
- stream->submit_max_len = 0;
}
}
@@ -467,12 +462,13 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
if (choose_s < 0)
break;
stream = &wreq->io_streams[choose_s];
+ wreq->io_iter.iov_offset = stream->submit_off;
+ atomic64_set(&wreq->issued_to, fpos + stream->submit_off);
+ stream->submit_extendable_to = fsize - stream->submit_off;
part = netfs_advance_write(wreq, stream, fpos + stream->submit_off,
stream->submit_len, to_eof);
- atomic64_set(&wreq->issued_to, fpos + stream->submit_off);
stream->submit_off += part;
- stream->submit_max_len -= part;
if (part > stream->submit_len)
stream->submit_len = 0;
else
@@ -481,6 +477,8 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
debug = true;
}
+ wreq->io_iter.iov_offset = 0;
+ iov_iter_advance(&wreq->io_iter, fsize);
atomic64_set(&wreq->issued_to, fpos + fsize);
if (!debug)
@@ -505,10 +503,14 @@ int netfs_writepages(struct address_space *mapping,
struct folio *folio;
int error = 0;
- if (wbc->sync_mode == WB_SYNC_ALL)
+ if (!mutex_trylock(&ictx->wb_lock)) {
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ netfs_stat(&netfs_n_wb_lock_skip);
+ return 0;
+ }
+ netfs_stat(&netfs_n_wb_lock_wait);
mutex_lock(&ictx->wb_lock);
- else if (!mutex_trylock(&ictx->wb_lock))
- return 0;
+ }
/* Need the first folio to be able to set up the op. */
folio = writeback_iter(mapping, wbc, NULL, &error);
@@ -525,10 +527,10 @@ int netfs_writepages(struct address_space *mapping,
netfs_stat(&netfs_n_wh_writepages);
do {
- _debug("wbiter %lx %llx", folio->index, wreq->start + wreq->submitted);
+ _debug("wbiter %lx %llx", folio->index, atomic64_read(&wreq->issued_to));
/* It appears we don't have to handle cyclic writeback wrapping. */
- WARN_ON_ONCE(wreq && folio_pos(folio) < wreq->start + wreq->submitted);
+ WARN_ON_ONCE(wreq && folio_pos(folio) < atomic64_read(&wreq->issued_to));
if (netfs_folio_group(folio) != NETFS_FOLIO_COPY_TO_CACHE &&
unlikely(!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))) {
@@ -672,6 +674,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
part = netfs_advance_write(wreq, upload, start, len, false);
start += part;
len -= part;
+ iov_iter_advance(&wreq->io_iter, part);
if (test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
trace_netfs_rreq(wreq, netfs_rreq_trace_wait_pause);
wait_on_bit(&wreq->flags, NETFS_RREQ_PAUSE, TASK_UNINTERRUPTIBLE);