diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-09-16 12:13:31 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-09-16 12:13:31 +0200 |
commit | 35219bc5c71f4197c8bd10297597de797c1eece5 (patch) | |
tree | 2448156135b78f54cd341a8457ccd84a371ddac7 /fs/netfs | |
parent | Merge tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/v... (diff) | |
parent | docs: filesystems: corrected grammar of netfs page (diff) | |
download | linux-35219bc5c71f4197c8bd10297597de797c1eece5.tar.xz linux-35219bc5c71f4197c8bd10297597de797c1eece5.zip |
Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
"This contains the work to improve read/write performance for the new
netfs library.
The main performance enhancing changes are:
- Define a structure, struct folio_queue, and a new iterator type,
ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
that patch for questions about naming and form.
ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
problem with an xarray is that accessing it requires the use of a
lock (typically the RCU read lock) - and this means that we can't
supply iterate_and_advance() with a step function that might sleep
(crypto for example) without having to drop the lock between pages.
ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
where each folio_queue holds a small list of folios. A folio_queue
struct is a simpler structure than xarray and is not subject to
concurrent manipulation by the VM. folio_queue is used rather than
a bvec[] as it can form lists of indefinite size, adding to one end
and removing from the other on the fly.
- Provide a copy_folio_from_iter() wrapper.
- Make cifs RDMA support ITER_FOLIOQ.
- Use folio queues in the write-side helpers instead of xarrays.
- Add a function to reset the iterator in a subrequest.
- Simplify the write-side helpers to use sheaves to skip gaps rather
than trying to work out where gaps are.
- In afs, make the read subrequests asynchronous, putting them into
work items to allow the next patch to do progressive
unlocking/reading.
- Overhaul the read-side helpers to improve performance.
- Fix the caching of a partial block at the end of a file.
- Allow a store to be cancelled.
Then some changes for cifs to make it use folio queues instead of
xarrays for crypto bufferage:
- Use raw iteration functions rather than manually coding iteration
when hashing data.
- Switch to using folio_queue for crypto buffers.
- Remove the xarray bits.
Make some adjustments to the /proc/fs/netfs/stats file such that:
- All the netfs stats lines begin 'Netfs:' but change this to
something a bit more useful.
- Add a couple of stats counters to track the numbers of skips and
waits on the per-inode writeback serialisation lock to make it
easier to check for this as a source of performance loss.
Miscellaneous work:
- Ensure that the sb_writers lock is taken around
vfs_{set,remove}xattr() in the cachefiles code.
- Reduce the number of conditional branches in netfs_perform_write().
- Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
remove cifs_post_modify().
- Move the max_len/max_nr_segs members from netfs_io_subrequest to
netfs_io_request as they're only needed for one subreq at a time.
- Add an 'unknown' source value for tracing purposes.
- Remove NETFS_COPY_TO_CACHE as it's no longer used.
- Set the request work function up front at allocation time.
- Use bh-disabling spinlocks for rreq->lock as cachefiles completion
may be run from block-filesystem DIO completion in softirq context.
- Remove fs/netfs/io.c"
* tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
docs: filesystems: corrected grammar of netfs page
cifs: Don't support ITER_XARRAY
cifs: Switch crypto buffer to use a folio_queue rather than an xarray
cifs: Use iterate_and_advance*() routines directly for hashing
netfs: Cancel dirty folios that have no storage destination
cachefiles, netfs: Fix write to partial block at EOF
netfs: Remove fs/netfs/io.c
netfs: Speed up buffered reading
afs: Make read subreqs async
netfs: Simplify the writeback code
netfs: Provide an iterator-reset function
netfs: Use new folio_queue data type and iterator instead of xarray iter
cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
iov_iter: Provide copy_folio_from_iter()
mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
netfs: Use bh-disabling spinlocks for rreq->lock
netfs: Set the request work function upon allocation
netfs: Remove NETFS_COPY_TO_CACHE
netfs: Reserve netfs_sreq_source 0 as unset/unknown
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
...
Diffstat (limited to 'fs/netfs')
-rw-r--r-- | fs/netfs/Makefile | 4 | ||||
-rw-r--r-- | fs/netfs/buffered_read.c | 766 | ||||
-rw-r--r-- | fs/netfs/buffered_write.c | 309 | ||||
-rw-r--r-- | fs/netfs/direct_read.c | 147 | ||||
-rw-r--r-- | fs/netfs/internal.h | 43 | ||||
-rw-r--r-- | fs/netfs/io.c | 804 | ||||
-rw-r--r-- | fs/netfs/iterator.c | 50 | ||||
-rw-r--r-- | fs/netfs/main.c | 7 | ||||
-rw-r--r-- | fs/netfs/misc.c | 94 | ||||
-rw-r--r-- | fs/netfs/objects.c | 16 | ||||
-rw-r--r-- | fs/netfs/read_collect.c | 544 | ||||
-rw-r--r-- | fs/netfs/read_pgpriv2.c | 264 | ||||
-rw-r--r-- | fs/netfs/read_retry.c | 256 | ||||
-rw-r--r-- | fs/netfs/stats.c | 27 | ||||
-rw-r--r-- | fs/netfs/write_collect.c | 246 | ||||
-rw-r--r-- | fs/netfs/write_issue.c | 93 |
16 files changed, 2163 insertions, 1507 deletions
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 8e6781e0b10b..d08b0bfb6756 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -5,12 +5,14 @@ netfs-y := \ buffered_write.o \ direct_read.o \ direct_write.o \ - io.o \ iterator.o \ locking.o \ main.o \ misc.o \ objects.o \ + read_collect.o \ + read_pgpriv2.o \ + read_retry.o \ write_collect.o \ write_issue.o diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index 27c750d39476..c40e226053cc 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -9,266 +9,388 @@ #include <linux/task_io_accounting_ops.h> #include "internal.h" -/* - * [DEPRECATED] Unlock the folios in a read operation for when the filesystem - * is using PG_private_2 and direct writing to the cache from here rather than - * marking the page for writeback. - * - * Note that we don't touch folio->private in this code. - */ -static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq, - size_t *account) +static void netfs_cache_expand_readahead(struct netfs_io_request *rreq, + unsigned long long *_start, + unsigned long long *_len, + unsigned long long i_size) { - struct netfs_io_subrequest *subreq; - struct folio *folio; - pgoff_t start_page = rreq->start / PAGE_SIZE; - pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; - bool subreq_failed = false; + struct netfs_cache_resources *cres = &rreq->cache_resources; - XA_STATE(xas, &rreq->mapping->i_pages, start_page); + if (cres->ops && cres->ops->expand_readahead) + cres->ops->expand_readahead(cres, _start, _len, i_size); +} - /* Walk through the pagecache and the I/O request lists simultaneously. - * We may have a mixture of cached and uncached sections and we only - * really want to write out the uncached sections. This is slightly - * complicated by the possibility that we might have huge pages with a - * mixture inside. +static void netfs_rreq_expand(struct netfs_io_request *rreq, + struct readahead_control *ractl) +{ + /* Give the cache a chance to change the request parameters. The + * resultant request must contain the original region. */ - subreq = list_first_entry(&rreq->subrequests, - struct netfs_io_subrequest, rreq_link); - subreq_failed = (subreq->error < 0); + netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); - trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2); + /* Give the netfs a chance to change the request parameters. The + * resultant request must contain the original region. + */ + if (rreq->netfs_ops->expand_readahead) + rreq->netfs_ops->expand_readahead(rreq); - rcu_read_lock(); - xas_for_each(&xas, folio, last_page) { - loff_t pg_end; - bool pg_failed = false; - bool folio_started = false; + /* Expand the request if the cache wants it to start earlier. Note + * that the expansion may get further extended if the VM wishes to + * insert THPs and the preferred start and/or end wind up in the middle + * of THPs. + * + * If this is the case, however, the THP size should be an integer + * multiple of the cache granule size, so we get a whole number of + * granules to deal with. + */ + if (rreq->start != readahead_pos(ractl) || + rreq->len != readahead_length(ractl)) { + readahead_expand(ractl, rreq->start, rreq->len); + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); - if (xas_retry(&xas, folio)) - continue; + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_expanded); + } +} - pg_end = folio_pos(folio) + folio_size(folio) - 1; +/* + * Begin an operation, and fetch the stored zero point value from the cookie if + * available. + */ +static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_inode *ctx) +{ + return fscache_begin_read_operation(&rreq->cache_resources, netfs_i_cookie(ctx)); +} - for (;;) { - loff_t sreq_end; +/* + * Decant the list of folios to read into a rolling buffer. + */ +static size_t netfs_load_buffer_from_ra(struct netfs_io_request *rreq, + struct folio_queue *folioq) +{ + unsigned int order, nr; + size_t size = 0; + + nr = __readahead_batch(rreq->ractl, (struct page **)folioq->vec.folios, + ARRAY_SIZE(folioq->vec.folios)); + folioq->vec.nr = nr; + for (int i = 0; i < nr; i++) { + struct folio *folio = folioq_folio(folioq, i); + + trace_netfs_folio(folio, netfs_folio_trace_read); + order = folio_order(folio); + folioq->orders[i] = order; + size += PAGE_SIZE << order; + } - if (!subreq) { - pg_failed = true; - break; - } + for (int i = nr; i < folioq_nr_slots(folioq); i++) + folioq_clear(folioq, i); - if (!folio_started && - test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) && - fscache_operation_valid(&rreq->cache_resources)) { - trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache); - folio_start_private_2(folio); - folio_started = true; - } + return size; +} - pg_failed |= subreq_failed; - sreq_end = subreq->start + subreq->len - 1; - if (pg_end < sreq_end) - break; +/* + * netfs_prepare_read_iterator - Prepare the subreq iterator for I/O + * @subreq: The subrequest to be set up + * + * Prepare the I/O iterator representing the read buffer on a subrequest for + * the filesystem to use for I/O (it can be passed directly to a socket). This + * is intended to be called from the ->issue_read() method once the filesystem + * has trimmed the request to the size it wants. + * + * Returns the limited size if successful and -ENOMEM if insufficient memory + * available. + * + * [!] NOTE: This must be run in the same thread as ->issue_read() was called + * in as we access the readahead_control struct. + */ +static ssize_t netfs_prepare_read_iterator(struct netfs_io_subrequest *subreq) +{ + struct netfs_io_request *rreq = subreq->rreq; + size_t rsize = subreq->len; + + if (subreq->source == NETFS_DOWNLOAD_FROM_SERVER) + rsize = umin(rsize, rreq->io_streams[0].sreq_max_len); + + if (rreq->ractl) { + /* If we don't have sufficient folios in the rolling buffer, + * extract a folioq's worth from the readahead region at a time + * into the buffer. Note that this acquires a ref on each page + * that we will need to release later - but we don't want to do + * that until after we've started the I/O. + */ + while (rreq->submitted < subreq->start + rsize) { + struct folio_queue *tail = rreq->buffer_tail, *new; + size_t added; + + new = kmalloc(sizeof(*new), GFP_NOFS); + if (!new) + return -ENOMEM; + netfs_stat(&netfs_n_folioq); + folioq_init(new); + new->prev = tail; + tail->next = new; + rreq->buffer_tail = new; + added = netfs_load_buffer_from_ra(rreq, new); + rreq->iter.count += added; + rreq->submitted += added; + } + } - *account += subreq->transferred; - if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - subreq = list_next_entry(subreq, rreq_link); - subreq_failed = (subreq->error < 0); - } else { - subreq = NULL; - subreq_failed = false; - } + subreq->len = rsize; + if (unlikely(rreq->io_streams[0].sreq_max_segs)) { + size_t limit = netfs_limit_iter(&rreq->iter, 0, rsize, + rreq->io_streams[0].sreq_max_segs); - if (pg_end == sreq_end) - break; + if (limit < rsize) { + subreq->len = limit; + trace_netfs_sreq(subreq, netfs_sreq_trace_limited); } + } - if (!pg_failed) { - flush_dcache_folio(folio); - folio_mark_uptodate(folio); - } + subreq->io_iter = rreq->iter; - if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { - if (folio->index == rreq->no_unlock_folio && - test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) - _debug("no unlock"); - else - folio_unlock(folio); + if (iov_iter_is_folioq(&subreq->io_iter)) { + if (subreq->io_iter.folioq_slot >= folioq_nr_slots(subreq->io_iter.folioq)) { + subreq->io_iter.folioq = subreq->io_iter.folioq->next; + subreq->io_iter.folioq_slot = 0; } + subreq->curr_folioq = (struct folio_queue *)subreq->io_iter.folioq; + subreq->curr_folioq_slot = subreq->io_iter.folioq_slot; + subreq->curr_folio_order = subreq->curr_folioq->orders[subreq->curr_folioq_slot]; } - rcu_read_unlock(); + + iov_iter_truncate(&subreq->io_iter, subreq->len); + iov_iter_advance(&rreq->iter, subreq->len); + return subreq->len; } -/* - * Unlock the folios in a read operation. We need to set PG_writeback on any - * folios we're going to write back before we unlock them. - * - * Note that if the deprecated NETFS_RREQ_USE_PGPRIV2 is set then we use - * PG_private_2 and do a direct write to the cache from here instead. - */ -void netfs_rreq_unlock_folios(struct netfs_io_request *rreq) +static enum netfs_io_source netfs_cache_prepare_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq, + loff_t i_size) { - struct netfs_io_subrequest *subreq; - struct netfs_folio *finfo; - struct folio *folio; - pgoff_t start_page = rreq->start / PAGE_SIZE; - pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; - size_t account = 0; - bool subreq_failed = false; + struct netfs_cache_resources *cres = &rreq->cache_resources; - XA_STATE(xas, &rreq->mapping->i_pages, start_page); + if (!cres->ops) + return NETFS_DOWNLOAD_FROM_SERVER; + return cres->ops->prepare_read(subreq, i_size); +} - if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { - __clear_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags); - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); - } - } +static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_io_subrequest *subreq = priv; - /* Handle deprecated PG_private_2 case. */ - if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) { - netfs_rreq_unlock_folios_pgpriv2(rreq, &account); - goto out; + if (transferred_or_error < 0) { + netfs_read_subreq_terminated(subreq, transferred_or_error, was_async); + return; } - /* Walk through the pagecache and the I/O request lists simultaneously. - * We may have a mixture of cached and uncached sections and we only - * really want to write out the uncached sections. This is slightly - * complicated by the possibility that we might have huge pages with a - * mixture inside. - */ - subreq = list_first_entry(&rreq->subrequests, - struct netfs_io_subrequest, rreq_link); - subreq_failed = (subreq->error < 0); - - trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); + if (transferred_or_error > 0) + subreq->transferred += transferred_or_error; + netfs_read_subreq_terminated(subreq, 0, was_async); +} - rcu_read_lock(); - xas_for_each(&xas, folio, last_page) { - loff_t pg_end; - bool pg_failed = false; - bool wback_to_cache = false; +/* + * Issue a read against the cache. + * - Eats the caller's ref on subreq. + */ +static void netfs_read_cache_to_pagecache(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; - if (xas_retry(&xas, folio)) - continue; + netfs_stat(&netfs_n_rh_read); + cres->ops->read(cres, subreq->start, &subreq->io_iter, NETFS_READ_HOLE_IGNORE, + netfs_cache_read_terminated, subreq); +} - pg_end = folio_pos(folio) + folio_size(folio) - 1; +/* + * Perform a read to the pagecache from a series of sources of different types, + * slicing up the region to be read according to available cache blocks and + * network rsize. + */ +static void netfs_read_to_pagecache(struct netfs_io_request *rreq) +{ + struct netfs_inode *ictx = netfs_inode(rreq->inode); + unsigned long long start = rreq->start; + ssize_t size = rreq->len; + int ret = 0; + + atomic_inc(&rreq->nr_outstanding); + + do { + struct netfs_io_subrequest *subreq; + enum netfs_io_source source = NETFS_DOWNLOAD_FROM_SERVER; + ssize_t slice; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) { + ret = -ENOMEM; + break; + } - for (;;) { - loff_t sreq_end; + subreq->start = start; + subreq->len = size; + + atomic_inc(&rreq->nr_outstanding); + spin_lock_bh(&rreq->lock); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + subreq->prev_donated = rreq->prev_donated; + rreq->prev_donated = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_added); + spin_unlock_bh(&rreq->lock); + + source = netfs_cache_prepare_read(rreq, subreq, rreq->i_size); + subreq->source = source; + if (source == NETFS_DOWNLOAD_FROM_SERVER) { + unsigned long long zp = umin(ictx->zero_point, rreq->i_size); + size_t len = subreq->len; + + if (subreq->start >= zp) { + subreq->source = source = NETFS_FILL_WITH_ZEROES; + goto fill_with_zeroes; + } - if (!subreq) { - pg_failed = true; + if (len > zp - subreq->start) + len = zp - subreq->start; + if (len == 0) { + pr_err("ZERO-LEN READ: R=%08x[%x] l=%zx/%zx s=%llx z=%llx i=%llx", + rreq->debug_id, subreq->debug_index, + subreq->len, size, + subreq->start, ictx->zero_point, rreq->i_size); break; } + subreq->len = len; + + netfs_stat(&netfs_n_rh_download); + if (rreq->netfs_ops->prepare_read) { + ret = rreq->netfs_ops->prepare_read(subreq); + if (ret < 0) { + atomic_dec(&rreq->nr_outstanding); + netfs_put_subrequest(subreq, false, + netfs_sreq_trace_put_cancel); + break; + } + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); + } - wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); - pg_failed |= subreq_failed; - sreq_end = subreq->start + subreq->len - 1; - if (pg_end < sreq_end) + slice = netfs_prepare_read_iterator(subreq); + if (slice < 0) { + atomic_dec(&rreq->nr_outstanding); + netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_cancel); + ret = slice; break; - - account += subreq->transferred; - if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - subreq = list_next_entry(subreq, rreq_link); - subreq_failed = (subreq->error < 0); - } else { - subreq = NULL; - subreq_failed = false; } - if (pg_end == sreq_end) - break; + rreq->netfs_ops->issue_read(subreq); + goto done; } - if (!pg_failed) { - flush_dcache_folio(folio); - finfo = netfs_folio_info(folio); - if (finfo) { - trace_netfs_folio(folio, netfs_folio_trace_filled_gaps); - if (finfo->netfs_group) - folio_change_private(folio, finfo->netfs_group); - else - folio_detach_private(folio); - kfree(finfo); - } - folio_mark_uptodate(folio); - if (wback_to_cache && !WARN_ON_ONCE(folio_get_private(folio) != NULL)) { - trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache); - folio_attach_private(folio, NETFS_FOLIO_COPY_TO_CACHE); - filemap_dirty_folio(folio->mapping, folio); - } + fill_with_zeroes: + if (source == NETFS_FILL_WITH_ZEROES) { + subreq->source = NETFS_FILL_WITH_ZEROES; + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); + netfs_stat(&netfs_n_rh_zero); + slice = netfs_prepare_read_iterator(subreq); + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_read_subreq_terminated(subreq, 0, false); + goto done; } - if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { - if (folio->index == rreq->no_unlock_folio && - test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) - _debug("no unlock"); - else - folio_unlock(folio); + if (source == NETFS_READ_FROM_CACHE) { + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); + slice = netfs_prepare_read_iterator(subreq); + netfs_read_cache_to_pagecache(rreq, subreq); + goto done; } - } - rcu_read_unlock(); -out: - task_io_account_read(account); - if (rreq->netfs_ops->done) - rreq->netfs_ops->done(rreq); -} + pr_err("Unexpected read source %u\n", source); + WARN_ON_ONCE(1); + break; -static void netfs_cache_expand_readahead(struct netfs_io_request *rreq, - unsigned long long *_start, - unsigned long long *_len, - unsigned long long i_size) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; + done: + size -= slice; + start += slice; + cond_resched(); + } while (size > 0); - if (cres->ops && cres->ops->expand_readahead) - cres->ops->expand_readahead(cres, _start, _len, i_size); + if (atomic_dec_and_test(&rreq->nr_outstanding)) + netfs_rreq_terminated(rreq, false); + + /* Defer error return as we may need to wait for outstanding I/O. */ + cmpxchg(&rreq->error, 0, ret); } -static void netfs_rreq_expand(struct netfs_io_request *rreq, - struct readahead_control *ractl) +/* + * Wait for the read operation to complete, successfully or otherwise. + */ +static int netfs_wait_for_read(struct netfs_io_request *rreq) { - /* Give the cache a chance to change the request parameters. The - * resultant request must contain the original region. - */ - netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); + int ret; - /* Give the netfs a chance to change the request parameters. The - * resultant request must contain the original region. - */ - if (rreq->netfs_ops->expand_readahead) - rreq->netfs_ops->expand_readahead(rreq); + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip); + wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS, TASK_UNINTERRUPTIBLE); + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read); + ret = -EIO; + } - /* Expand the request if the cache wants it to start earlier. Note - * that the expansion may get further extended if the VM wishes to - * insert THPs and the preferred start and/or end wind up in the middle - * of THPs. - * - * If this is the case, however, the THP size should be an integer - * multiple of the cache granule size, so we get a whole number of - * granules to deal with. - */ - if (rreq->start != readahead_pos(ractl) || - rreq->len != readahead_length(ractl)) { - readahead_expand(ractl, rreq->start, rreq->len); - rreq->start = readahead_pos(ractl); - rreq->len = readahead_length(ractl); + return ret; +} - trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), - netfs_read_trace_expanded); - } +/* + * Set up the initial folioq of buffer folios in the rolling buffer and set the + * iterator to refer to it. + */ +static int netfs_prime_buffer(struct netfs_io_request *rreq) +{ + struct folio_queue *folioq; + size_t added; + + folioq = kmalloc(sizeof(*folioq), GFP_KERNEL); + if (!folioq) + return -ENOMEM; + netfs_stat(&netfs_n_folioq); + folioq_init(folioq); + rreq->buffer = folioq; + rreq->buffer_tail = folioq; + rreq->submitted = rreq->start; + iov_iter_folio_queue(&rreq->iter, ITER_DEST, folioq, 0, 0, 0); + + added = netfs_load_buffer_from_ra(rreq, folioq); + rreq->iter.count += added; + rreq->submitted += added; + return 0; } /* - * Begin an operation, and fetch the stored zero point value from the cookie if - * available. + * Drop the ref on each folio that we inherited from the VM readahead code. We + * still have the folio locks to pin the page until we complete the I/O. + * + * Note that we can't just release the batch in each queue struct as we use the + * occupancy count in other places. */ -static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_inode *ctx) +static void netfs_put_ra_refs(struct folio_queue *folioq) { - return fscache_begin_read_operation(&rreq->cache_resources, netfs_i_cookie(ctx)); + struct folio_batch fbatch; + + folio_batch_init(&fbatch); + while (folioq) { + for (unsigned int slot = 0; slot < folioq_count(folioq); slot++) { + struct folio *folio = folioq_folio(folioq, slot); + if (!folio) + continue; + trace_netfs_folio(folio, netfs_folio_trace_read_put); + if (!folio_batch_add(&fbatch, folio)) + folio_batch_release(&fbatch); + } + folioq = folioq->next; + } + + folio_batch_release(&fbatch); } /** @@ -289,22 +411,17 @@ static int netfs_begin_cache_read(struct netfs_io_request *rreq, struct netfs_in void netfs_readahead(struct readahead_control *ractl) { struct netfs_io_request *rreq; - struct netfs_inode *ctx = netfs_inode(ractl->mapping->host); + struct netfs_inode *ictx = netfs_inode(ractl->mapping->host); + unsigned long long start = readahead_pos(ractl); + size_t size = readahead_length(ractl); int ret; - _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); - - if (readahead_count(ractl) == 0) - return; - - rreq = netfs_alloc_request(ractl->mapping, ractl->file, - readahead_pos(ractl), - readahead_length(ractl), + rreq = netfs_alloc_request(ractl->mapping, ractl->file, start, size, NETFS_READAHEAD); if (IS_ERR(rreq)) return; - ret = netfs_begin_cache_read(rreq, ctx); + ret = netfs_begin_cache_read(rreq, ictx); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) goto cleanup_free; @@ -314,18 +431,15 @@ void netfs_readahead(struct readahead_control *ractl) netfs_rreq_expand(rreq, ractl); - /* Set up the output buffer */ - iov_iter_xarray(&rreq->iter, ITER_DEST, &ractl->mapping->i_pages, - rreq->start, rreq->len); + rreq->ractl = ractl; + if (netfs_prime_buffer(rreq) < 0) + goto cleanup_free; + netfs_read_to_pagecache(rreq); - /* Drop the refs on the folios here rather than in the cache or - * filesystem. The locks will be dropped in netfs_rreq_unlock(). - */ - while (readahead_folio(ractl)) - ; + /* Release the folio refs whilst we're waiting for the I/O. */ + netfs_put_ra_refs(rreq->buffer); - netfs_begin_read(rreq, false); - netfs_put_request(rreq, false, netfs_rreq_trace_put_return); + netfs_put_request(rreq, true, netfs_rreq_trace_put_return); return; cleanup_free: @@ -334,6 +448,117 @@ cleanup_free: } EXPORT_SYMBOL(netfs_readahead); +/* + * Create a rolling buffer with a single occupying folio. + */ +static int netfs_create_singular_buffer(struct netfs_io_request *rreq, struct folio *folio) +{ + struct folio_queue *folioq; + + folioq = kmalloc(sizeof(*folioq), GFP_KERNEL); + if (!folioq) + return -ENOMEM; + + netfs_stat(&netfs_n_folioq); + folioq_init(folioq); + folioq_append(folioq, folio); + BUG_ON(folioq_folio(folioq, 0) != folio); + BUG_ON(folioq_folio_order(folioq, 0) != folio_order(folio)); + rreq->buffer = folioq; + rreq->buffer_tail = folioq; + rreq->submitted = rreq->start + rreq->len; + iov_iter_folio_queue(&rreq->iter, ITER_DEST, folioq, 0, 0, rreq->len); + rreq->ractl = (struct readahead_control *)1UL; + return 0; +} + +/* + * Read into gaps in a folio partially filled by a streaming write. + */ +static int netfs_read_gaps(struct file *file, struct folio *folio) +{ + struct netfs_io_request *rreq; + struct address_space *mapping = folio->mapping; + struct netfs_folio *finfo = netfs_folio_info(folio); + struct netfs_inode *ctx = netfs_inode(mapping->host); + struct folio *sink = NULL; + struct bio_vec *bvec; + unsigned int from = finfo->dirty_offset; + unsigned int to = from + finfo->dirty_len; + unsigned int off = 0, i = 0; + size_t flen = folio_size(folio); + size_t nr_bvec = flen / PAGE_SIZE + 2; + size_t part; + int ret; + + _enter("%lx", folio->index); + + rreq = netfs_alloc_request(mapping, file, folio_pos(folio), flen, NETFS_READ_GAPS); + if (IS_ERR(rreq)) { + ret = PTR_ERR(rreq); + goto alloc_error; + } + + ret = netfs_begin_cache_read(rreq, ctx); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto discard; + + netfs_stat(&netfs_n_rh_read_folio); + trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_read_gaps); + + /* Fiddle the buffer so that a gap at the beginning and/or a gap at the + * end get copied to, but the middle is discarded. + */ + ret = -ENOMEM; + bvec = kmalloc_array(nr_bvec, sizeof(*bvec), GFP_KERNEL); + if (!bvec) + goto discard; + + sink = folio_alloc(GFP_KERNEL, 0); + if (!sink) { + kfree(bvec); + goto discard; + } + + trace_netfs_folio(folio, netfs_folio_trace_read_gaps); + + rreq->direct_bv = bvec; + rreq->direct_bv_count = nr_bvec; + if (from > 0) { + bvec_set_folio(&bvec[i++], folio, from, 0); + off = from; + } + while (off < to) { + part = min_t(size_t, to - off, PAGE_SIZE); + bvec_set_folio(&bvec[i++], sink, part, 0); + off += part; + } + if (to < flen) + bvec_set_folio(&bvec[i++], folio, flen - to, to); + iov_iter_bvec(&rreq->iter, ITER_DEST, bvec, i, rreq->len); + rreq->submitted = rreq->start + flen; + + netfs_read_to_pagecache(rreq); + + if (sink) + folio_put(sink); + + ret = netfs_wait_for_read(rreq); + if (ret == 0) { + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + folio_unlock(folio); + netfs_put_request(rreq, false, netfs_rreq_trace_put_return); + return ret < 0 ? ret : 0; + +discard: + netfs_put_request(rreq, false, netfs_rreq_trace_put_discard); +alloc_error: + folio_unlock(folio); + return ret; +} + /** * netfs_read_folio - Helper to manage a read_folio request * @file: The file to read from @@ -353,9 +578,13 @@ int netfs_read_folio(struct file *file, struct folio *folio) struct address_space *mapping = folio->mapping; struct netfs_io_request *rreq; struct netfs_inode *ctx = netfs_inode(mapping->host); - struct folio *sink = NULL; int ret; + if (folio_test_dirty(folio)) { + trace_netfs_folio(folio, netfs_folio_trace_read_gaps); + return netfs_read_gaps(file, folio); + } + _enter("%lx", folio->index); rreq = netfs_alloc_request(mapping, file, @@ -374,54 +603,12 @@ int netfs_read_folio(struct file *file, struct folio *folio) trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); /* Set up the output buffer */ - if (folio_test_dirty(folio)) { - /* Handle someone trying to read from an unflushed streaming - * write. We fiddle the buffer so that a gap at the beginning - * and/or a gap at the end get copied to, but the middle is - * discarded. - */ - struct netfs_folio *finfo = netfs_folio_info(folio); - struct bio_vec *bvec; - unsigned int from = finfo->dirty_offset; - unsigned int to = from + finfo->dirty_len; - unsigned int off = 0, i = 0; - size_t flen = folio_size(folio); - size_t nr_bvec = flen / PAGE_SIZE + 2; - size_t part; - - ret = -ENOMEM; - bvec = kmalloc_array(nr_bvec, sizeof(*bvec), GFP_KERNEL); - if (!bvec) - goto discard; - - sink = folio_alloc(GFP_KERNEL, 0); - if (!sink) - goto discard; - - trace_netfs_folio(folio, netfs_folio_trace_read_gaps); - - rreq->direct_bv = bvec; - rreq->direct_bv_count = nr_bvec; - if (from > 0) { - bvec_set_folio(&bvec[i++], folio, from, 0); - off = from; - } - while (off < to) { - part = min_t(size_t, to - off, PAGE_SIZE); - bvec_set_folio(&bvec[i++], sink, part, 0); - off += part; - } - if (to < flen) - bvec_set_folio(&bvec[i++], folio, flen - to, to); - iov_iter_bvec(&rreq->iter, ITER_DEST, bvec, i, rreq->len); - } else { - iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages, - rreq->start, rreq->len); - } + ret = netfs_create_singular_buffer(rreq, folio); + if (ret < 0) + goto discard; - ret = netfs_begin_read(rreq, true); - if (sink) - folio_put(sink); + netfs_read_to_pagecache(rreq); + ret = netfs_wait_for_read(rreq); netfs_put_request(rreq, false, netfs_rreq_trace_put_return); return ret < 0 ? ret : 0; @@ -494,13 +681,10 @@ zero_out: * * Pre-read data for a write-begin request by drawing data from the cache if * possible, or the netfs if not. Space beyond the EOF is zero-filled. - * Multiple I/O requests from different sources will get munged together. If - * necessary, the readahead window can be expanded in either direction to a - * more convenient alighment for RPC efficiency or to make storage in the cache - * feasible. + * Multiple I/O requests from different sources will get munged together. * * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. + * issue_read, is mandatory. * * The check_write_begin() operation can be provided to check for and flush * conflicting writes once the folio is grabbed and locked. It is passed a @@ -528,8 +712,6 @@ int netfs_write_begin(struct netfs_inode *ctx, pgoff_t index = pos >> PAGE_SHIFT; int ret; - DEFINE_READAHEAD(ractl, file, NULL, mapping, index); - retry: folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, mapping_gfp_mask(mapping)); @@ -577,22 +759,13 @@ retry: netfs_stat(&netfs_n_rh_write_begin); trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); - /* Expand the request to meet caching requirements and download - * preferences. - */ - ractl._nr_pages = folio_nr_pages(folio); - netfs_rreq_expand(rreq, &ractl); - /* Set up the output buffer */ - iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages, - rreq->start, rreq->len); - - /* We hold the folio locks, so we can drop the references */ - folio_get(folio); - while (readahead_folio(&ractl)) - ; + ret = netfs_create_singular_buffer(rreq, folio); + if (ret < 0) + goto error_put; - ret = netfs_begin_read(rreq, true); + netfs_read_to_pagecache(rreq); + ret = netfs_wait_for_read(rreq); if (ret < 0) goto error; netfs_put_request(rreq, false, netfs_rreq_trace_put_return); @@ -652,10 +825,13 @@ int netfs_prefetch_for_write(struct file *file, struct folio *folio, trace_netfs_read(rreq, start, flen, netfs_read_trace_prefetch_for_write); /* Set up the output buffer */ - iov_iter_xarray(&rreq->iter, ITER_DEST, &mapping->i_pages, - rreq->start, rreq->len); + ret = netfs_create_singular_buffer(rreq, folio); + if (ret < 0) + goto error_put; - ret = netfs_begin_read(rreq, true); + folioq_mark2(rreq->buffer, 0); + netfs_read_to_pagecache(rreq); + ret = netfs_wait_for_read(rreq); netfs_put_request(rreq, false, netfs_rreq_trace_put_return); return ret; diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index ca53c5d1622e..d7eae597e54d 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -13,91 +13,22 @@ #include <linux/pagevec.h> #include "internal.h" -/* - * Determined write method. Adjust netfs_folio_traces if this is changed. - */ -enum netfs_how_to_modify { - NETFS_FOLIO_IS_UPTODATE, /* Folio is uptodate already */ - NETFS_JUST_PREFETCH, /* We have to read the folio anyway */ - NETFS_WHOLE_FOLIO_MODIFY, /* We're going to overwrite the whole folio */ - NETFS_MODIFY_AND_CLEAR, /* We can assume there is no data to be downloaded. */ - NETFS_STREAMING_WRITE, /* Store incomplete data in non-uptodate page. */ - NETFS_STREAMING_WRITE_CONT, /* Continue streaming write. */ - NETFS_FLUSH_CONTENT, /* Flush incompatible content. */ -}; - -static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group) +static void __netfs_set_group(struct folio *folio, struct netfs_group *netfs_group) { - void *priv = folio_get_private(folio); - - if (netfs_group && (!priv || priv == NETFS_FOLIO_COPY_TO_CACHE)) + if (netfs_group) folio_attach_private(folio, netfs_get_group(netfs_group)); - else if (!netfs_group && priv == NETFS_FOLIO_COPY_TO_CACHE) - folio_detach_private(folio); } -/* - * Decide how we should modify a folio. We might be attempting to do - * write-streaming, in which case we don't want to a local RMW cycle if we can - * avoid it. If we're doing local caching or content crypto, we award that - * priority over avoiding RMW. If the file is open readably, then we also - * assume that we may want to read what we wrote. - */ -static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx, - struct file *file, - struct folio *folio, - void *netfs_group, - size_t flen, - size_t offset, - size_t len, - bool maybe_trouble) +static void netfs_set_group(struct folio *folio, struct netfs_group *netfs_group) { - struct netfs_folio *finfo = netfs_folio_info(folio); - struct netfs_group *group = netfs_folio_group(folio); - loff_t pos = folio_pos(folio); - - _enter(""); - - if (group != netfs_group && group != NETFS_FOLIO_COPY_TO_CACHE) - return NETFS_FLUSH_CONTENT; - - if (folio_test_uptodate(folio)) - return NETFS_FOLIO_IS_UPTODATE; - - if (pos >= ctx->zero_point) - return NETFS_MODIFY_AND_CLEAR; - - if (!maybe_trouble && offset == 0 && len >= flen) - return NETFS_WHOLE_FOLIO_MODIFY; - - if (file->f_mode & FMODE_READ) - goto no_write_streaming; - - if (netfs_is_cache_enabled(ctx)) { - /* We don't want to get a streaming write on a file that loses - * caching service temporarily because the backing store got - * culled. - */ - goto no_write_streaming; - } + void *priv = folio_get_private(folio); - if (!finfo) - return NETFS_STREAMING_WRITE; - - /* We can continue a streaming write only if it continues on from the - * previous. If it overlaps, we must flush lest we suffer a partial - * copy and disjoint dirty regions. - */ - if (offset == finfo->dirty_offset + finfo->dirty_len) - return NETFS_STREAMING_WRITE_CONT; - return NETFS_FLUSH_CONTENT; - -no_write_streaming: - if (finfo) { - netfs_stat(&netfs_n_wh_wstream_conflict); - return NETFS_FLUSH_CONTENT; + if (unlikely(priv != netfs_group)) { + if (netfs_group && (!priv || priv == NETFS_FOLIO_COPY_TO_CACHE)) + folio_attach_private(folio, netfs_get_group(netfs_group)); + else if (!netfs_group && priv == NETFS_FOLIO_COPY_TO_CACHE) + folio_detach_private(folio); } - return NETFS_JUST_PREFETCH; } /* @@ -177,13 +108,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, .range_end = iocb->ki_pos + iter->count, }; struct netfs_io_request *wreq = NULL; - struct netfs_folio *finfo; - struct folio *folio, *writethrough = NULL; - enum netfs_how_to_modify howto; - enum netfs_folio_trace trace; + struct folio *folio = NULL, *writethrough = NULL; unsigned int bdp_flags = (iocb->ki_flags & IOCB_NOWAIT) ? BDP_ASYNC : 0; ssize_t written = 0, ret, ret2; - loff_t i_size, pos = iocb->ki_pos, from, to; + loff_t i_size, pos = iocb->ki_pos; size_t max_chunk = mapping_max_folio_size(mapping); bool maybe_trouble = false; @@ -213,15 +141,14 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, } do { + struct netfs_folio *finfo; + struct netfs_group *group; + unsigned long long fpos; size_t flen; size_t offset; /* Offset into pagecache folio */ size_t part; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - ret = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); - if (unlikely(ret < 0)) - break; - offset = pos & (max_chunk - 1); part = min(max_chunk - offset, iov_iter_count(iter)); @@ -247,7 +174,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, } flen = folio_size(folio); - offset = pos & (flen - 1); + fpos = folio_pos(folio); + offset = pos - fpos; part = min_t(size_t, flen - offset, part); /* Wait for writeback to complete. The writeback engine owns @@ -265,71 +193,52 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, goto error_folio_unlock; } - /* See if we need to prefetch the area we're going to modify. - * We need to do this before we get a lock on the folio in case - * there's more than one writer competing for the same cache - * block. + /* Decide how we should modify a folio. We might be attempting + * to do write-streaming, in which case we don't want to a + * local RMW cycle if we can avoid it. If we're doing local + * caching or content crypto, we award that priority over + * avoiding RMW. If the file is open readably, then we also + * assume that we may want to read what we wrote. */ - howto = netfs_how_to_modify(ctx, file, folio, netfs_group, - flen, offset, part, maybe_trouble); - _debug("howto %u", howto); - switch (howto) { - case NETFS_JUST_PREFETCH: - ret = netfs_prefetch_for_write(file, folio, offset, part); - if (ret < 0) { - _debug("prefetch = %zd", ret); - goto error_folio_unlock; - } - break; - case NETFS_FOLIO_IS_UPTODATE: - case NETFS_WHOLE_FOLIO_MODIFY: - case NETFS_STREAMING_WRITE_CONT: - break; - case NETFS_MODIFY_AND_CLEAR: - zero_user_segment(&folio->page, 0, offset); - break; - case NETFS_STREAMING_WRITE: - ret = -EIO; - if (WARN_ON(folio_get_private(folio))) - goto error_folio_unlock; - break; - case NETFS_FLUSH_CONTENT: - trace_netfs_folio(folio, netfs_flush_content); - from = folio_pos(folio); - to = from + folio_size(folio) - 1; - folio_unlock(folio); - folio_put(folio); - ret = filemap_write_and_wait_range(mapping, from, to); - if (ret < 0) - goto error_folio_unlock; - continue; - } - - if (mapping_writably_mapped(mapping)) - flush_dcache_folio(folio); - - copied = copy_folio_from_iter_atomic(folio, offset, part, iter); - - flush_dcache_folio(folio); - - /* Deal with a (partially) failed copy */ - if (copied == 0) { - ret = -EFAULT; - goto error_folio_unlock; + finfo = netfs_folio_info(folio); + group = netfs_folio_group(folio); + + if (unlikely(group != netfs_group) && + group != NETFS_FOLIO_COPY_TO_CACHE) + goto flush_content; + + if (folio_test_uptodate(folio)) { + if (mapping_writably_mapped(mapping)) + flush_dcache_folio(folio); + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; + netfs_set_group(folio, netfs_group); + trace_netfs_folio(folio, netfs_folio_is_uptodate); + goto copied; } - trace = (enum netfs_folio_trace)howto; - switch (howto) { - case NETFS_FOLIO_IS_UPTODATE: - case NETFS_JUST_PREFETCH: - netfs_set_group(folio, netfs_group); - break; - case NETFS_MODIFY_AND_CLEAR: + /* If the page is above the zero-point then we assume that the + * server would just return a block of zeros or a short read if + * we try to read it. + */ + if (fpos >= ctx->zero_point) { + zero_user_segment(&folio->page, 0, offset); + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; zero_user_segment(&folio->page, offset + copied, flen); - netfs_set_group(folio, netfs_group); + __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - break; - case NETFS_WHOLE_FOLIO_MODIFY: + trace_netfs_folio(folio, netfs_modify_and_clear); + goto copied; + } + + /* See if we can write a whole folio in one go. */ + if (!maybe_trouble && offset == 0 && part >= flen) { + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; if (unlikely(copied < part)) { maybe_trouble = true; iov_iter_revert(iter, copied); @@ -337,16 +246,53 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, folio_unlock(folio); goto retry; } - netfs_set_group(folio, netfs_group); + __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - break; - case NETFS_STREAMING_WRITE: + trace_netfs_folio(folio, netfs_whole_folio_modify); + goto copied; + } + + /* We don't want to do a streaming write on a file that loses + * caching service temporarily because the backing store got + * culled and we don't really want to get a streaming write on + * a file that's open for reading as ->read_folio() then has to + * be able to flush it. + */ + if ((file->f_mode & FMODE_READ) || + netfs_is_cache_enabled(ctx)) { + if (finfo) { + netfs_stat(&netfs_n_wh_wstream_conflict); + goto flush_content; + } + ret = netfs_prefetch_for_write(file, folio, offset, part); + if (ret < 0) { + _debug("prefetch = %zd", ret); + goto error_folio_unlock; + } + /* Note that copy-to-cache may have been set. */ + + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; + netfs_set_group(folio, netfs_group); + trace_netfs_folio(folio, netfs_just_prefetch); + goto copied; + } + + if (!finfo) { + ret = -EIO; + if (WARN_ON(folio_get_private(folio))) + goto error_folio_unlock; + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; if (offset == 0 && copied == flen) { - netfs_set_group(folio, netfs_group); + __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - trace = netfs_streaming_filled_page; - break; + trace_netfs_folio(folio, netfs_streaming_filled_page); + goto copied; } + finfo = kzalloc(sizeof(*finfo), GFP_KERNEL); if (!finfo) { iov_iter_revert(iter, copied); @@ -358,9 +304,18 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, finfo->dirty_len = copied; folio_attach_private(folio, (void *)((unsigned long)finfo | NETFS_FOLIO_INFO)); - break; - case NETFS_STREAMING_WRITE_CONT: - finfo = netfs_folio_info(folio); + trace_netfs_folio(folio, netfs_streaming_write); + goto copied; + } + + /* We can continue a streaming write only if it continues on + * from the previous. If it overlaps, we must flush lest we + * suffer a partial copy and disjoint dirty regions. + */ + if (offset == finfo->dirty_offset + finfo->dirty_len) { + copied = copy_folio_from_iter_atomic(folio, offset, part, iter); + if (unlikely(copied == 0)) + goto copy_failed; finfo->dirty_len += copied; if (finfo->dirty_offset == 0 && finfo->dirty_len == flen) { if (finfo->netfs_group) @@ -369,17 +324,25 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, folio_detach_private(folio); folio_mark_uptodate(folio); kfree(finfo); - trace = netfs_streaming_cont_filled_page; + trace_netfs_folio(folio, netfs_streaming_cont_filled_page); + } else { + trace_netfs_folio(folio, netfs_streaming_write_cont); } - break; - default: - WARN(true, "Unexpected modify type %u ix=%lx\n", - howto, folio->index); - ret = -EIO; - goto error_folio_unlock; + goto copied; } - trace_netfs_folio(folio, trace); + /* Incompatible write; flush the folio and try again. */ + flush_content: + trace_netfs_folio(folio, netfs_flush_content); + folio_unlock(folio); + folio_put(folio); + ret = filemap_write_and_wait_range(mapping, fpos, fpos + flen - 1); + if (ret < 0) + goto error_folio_unlock; + continue; + + copied: + flush_dcache_folio(folio); /* Update the inode size if we moved the EOF marker */ pos += copied; @@ -401,12 +364,22 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, folio_put(folio); folio = NULL; + ret = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); + if (unlikely(ret < 0)) + break; + cond_resched(); } while (iov_iter_count(iter)); out: - if (likely(written) && ctx->ops->post_modify) - ctx->ops->post_modify(inode); + if (likely(written)) { + /* Set indication that ctime and mtime got updated in case + * close is deferred. + */ + set_bit(NETFS_ICTX_MODIFIED_ATTR, &ctx->flags); + if (unlikely(ctx->ops->post_modify)) + ctx->ops->post_modify(inode); + } if (unlikely(wreq)) { ret2 = netfs_end_writethrough(wreq, &wbc, writethrough); @@ -421,6 +394,8 @@ out: _leave(" = %zd [%zd]", written, ret); return written ? written : ret; +copy_failed: + ret = -EFAULT; error_folio_unlock: folio_unlock(folio); folio_put(folio); diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c index 10a1e4da6bda..b1a66a6e6bc2 100644 --- a/fs/netfs/direct_read.c +++ b/fs/netfs/direct_read.c @@ -16,6 +16,143 @@ #include <linux/netfs.h> #include "internal.h" +static void netfs_prepare_dio_read_iterator(struct netfs_io_subrequest *subreq) +{ + struct netfs_io_request *rreq = subreq->rreq; + size_t rsize; + + rsize = umin(subreq->len, rreq->io_streams[0].sreq_max_len); + subreq->len = rsize; + + if (unlikely(rreq->io_streams[0].sreq_max_segs)) { + size_t limit = netfs_limit_iter(&rreq->iter, 0, rsize, + rreq->io_streams[0].sreq_max_segs); + + if (limit < rsize) { + subreq->len = limit; + trace_netfs_sreq(subreq, netfs_sreq_trace_limited); + } + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); + + subreq->io_iter = rreq->iter; + iov_iter_truncate(&subreq->io_iter, subreq->len); + iov_iter_advance(&rreq->iter, subreq->len); +} + +/* + * Perform a read to a buffer from the server, slicing up the region to be read + * according to the network rsize. + */ +static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq) +{ + unsigned long long start = rreq->start; + ssize_t size = rreq->len; + int ret = 0; + + atomic_set(&rreq->nr_outstanding, 1); + + do { + struct netfs_io_subrequest *subreq; + ssize_t slice; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) { + ret = -ENOMEM; + break; + } + + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->start = start; + subreq->len = size; + + atomic_inc(&rreq->nr_outstanding); + spin_lock_bh(&rreq->lock); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + subreq->prev_donated = rreq->prev_donated; + rreq->prev_donated = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_added); + spin_unlock_bh(&rreq->lock); + + netfs_stat(&netfs_n_rh_download); + if (rreq->netfs_ops->prepare_read) { + ret = rreq->netfs_ops->prepare_read(subreq); + if (ret < 0) { + atomic_dec(&rreq->nr_outstanding); + netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_cancel); + break; + } + } + + netfs_prepare_dio_read_iterator(subreq); + slice = subreq->len; + rreq->netfs_ops->issue_read(subreq); + + size -= slice; + start += slice; + rreq->submitted += slice; + + if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && + test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) + break; + cond_resched(); + } while (size > 0); + + if (atomic_dec_and_test(&rreq->nr_outstanding)) + netfs_rreq_terminated(rreq, false); + return ret; +} + +/* + * Perform a read to an application buffer, bypassing the pagecache and the + * local disk cache. + */ +static int netfs_unbuffered_read(struct netfs_io_request *rreq, bool sync) +{ + int ret; + + _enter("R=%x %llx-%llx", + rreq->debug_id, rreq->start, rreq->start + rreq->len - 1); + + if (rreq->len == 0) { + pr_err("Zero-sized read [R=%x]\n", rreq->debug_id); + return -EIO; + } + + // TODO: Use bounce buffer if requested + + inode_dio_begin(rreq->inode); + + ret = netfs_dispatch_unbuffered_reads(rreq); + + if (!rreq->submitted) { + netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit); + inode_dio_end(rreq->inode); + ret = 0; + goto out; + } + + if (sync) { + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip); + wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS, + TASK_UNINTERRUPTIBLE); + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len && + rreq->origin != NETFS_DIO_READ) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read); + ret = -EIO; + } + } else { + ret = -EIOCBQUEUED; + } + +out: + _leave(" = %d", ret); + return ret; +} + /** * netfs_unbuffered_read_iter_locked - Perform an unbuffered or direct I/O read * @iocb: The I/O control descriptor describing the read @@ -31,7 +168,7 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i struct netfs_io_request *rreq; ssize_t ret; size_t orig_count = iov_iter_count(iter); - bool async = !is_sync_kiocb(iocb); + bool sync = is_sync_kiocb(iocb); _enter(""); @@ -78,13 +215,13 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i // TODO: Set up bounce buffer if needed - if (async) + if (!sync) rreq->iocb = iocb; - ret = netfs_begin_read(rreq, is_sync_kiocb(iocb)); + ret = netfs_unbuffered_read(rreq, sync); if (ret < 0) goto out; /* May be -EIOCBQUEUED */ - if (!async) { + if (sync) { // TODO: Copy from bounce buffer iocb->ki_pos += rreq->transferred; ret = rreq->transferred; @@ -94,8 +231,6 @@ out: netfs_put_request(rreq, false, netfs_rreq_trace_put_return); if (ret > 0) orig_count -= ret; - if (ret != -EIOCBQUEUED) - iov_iter_revert(iter, orig_count - iov_iter_count(iter)); return ret; } EXPORT_SYMBOL(netfs_unbuffered_read_iter_locked); diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 7773f3d855a9..c9f0ed24cb7b 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -7,6 +7,7 @@ #include <linux/slab.h> #include <linux/seq_file.h> +#include <linux/folio_queue.h> #include <linux/netfs.h> #include <linux/fscache.h> #include <linux/fscache-cache.h> @@ -22,16 +23,10 @@ /* * buffered_read.c */ -void netfs_rreq_unlock_folios(struct netfs_io_request *rreq); int netfs_prefetch_for_write(struct file *file, struct folio *folio, size_t offset, size_t len); /* - * io.c - */ -int netfs_begin_read(struct netfs_io_request *rreq, bool sync); - -/* * main.c */ extern unsigned int netfs_debug; @@ -63,6 +58,11 @@ static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {} /* * misc.c */ +int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, + bool needs_put); +struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq); +void netfs_clear_buffer(struct netfs_io_request *rreq); +void netfs_reset_iter(struct netfs_io_subrequest *subreq); /* * objects.c @@ -84,6 +84,28 @@ static inline void netfs_see_request(struct netfs_io_request *rreq, } /* + * read_collect.c + */ +void netfs_read_termination_worker(struct work_struct *work); +void netfs_rreq_terminated(struct netfs_io_request *rreq, bool was_async); + +/* + * read_pgpriv2.c + */ +void netfs_pgpriv2_mark_copy_to_cache(struct netfs_io_subrequest *subreq, + struct netfs_io_request *rreq, + struct folio_queue *folioq, + int slot); +void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq); +bool netfs_pgpriv2_unlock_copied_folios(struct netfs_io_request *wreq); + +/* + * read_retry.c + */ +void netfs_retry_reads(struct netfs_io_request *rreq); +void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq); + +/* * stats.c */ #ifdef CONFIG_NETFS_STATS @@ -110,6 +132,7 @@ extern atomic_t netfs_n_wh_buffered_write; extern atomic_t netfs_n_wh_writethrough; extern atomic_t netfs_n_wh_dio_write; extern atomic_t netfs_n_wh_writepages; +extern atomic_t netfs_n_wh_copy_to_cache; extern atomic_t netfs_n_wh_wstream_conflict; extern atomic_t netfs_n_wh_upload; extern atomic_t netfs_n_wh_upload_done; @@ -117,6 +140,9 @@ extern atomic_t netfs_n_wh_upload_failed; extern atomic_t netfs_n_wh_write; extern atomic_t netfs_n_wh_write_done; extern atomic_t netfs_n_wh_write_failed; +extern atomic_t netfs_n_wb_lock_skip; +extern atomic_t netfs_n_wb_lock_wait; +extern atomic_t netfs_n_folioq; int netfs_stats_show(struct seq_file *m, void *v); @@ -150,7 +176,10 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping, loff_t start, enum netfs_io_origin origin); void netfs_reissue_write(struct netfs_io_stream *stream, - struct netfs_io_subrequest *subreq); + struct netfs_io_subrequest *subreq, + struct iov_iter *source); +void netfs_issue_write(struct netfs_io_request *wreq, + struct netfs_io_stream *stream); int netfs_advance_write(struct netfs_io_request *wreq, struct netfs_io_stream *stream, loff_t start, size_t len, bool to_eof); diff --git a/fs/netfs/io.c b/fs/netfs/io.c deleted file mode 100644 index d6ada4eba744..000000000000 --- a/fs/netfs/io.c +++ /dev/null @@ -1,804 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* Network filesystem high-level read support. - * - * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - */ - -#include <linux/module.h> -#include <linux/export.h> -#include <linux/fs.h> -#include <linux/mm.h> -#include <linux/pagemap.h> -#include <linux/slab.h> -#include <linux/uio.h> -#include <linux/sched/mm.h> -#include <linux/task_io_accounting_ops.h> -#include "internal.h" - -/* - * Clear the unread part of an I/O request. - */ -static void netfs_clear_unread(struct netfs_io_subrequest *subreq) -{ - iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter); -} - -static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_io_subrequest *subreq = priv; - - netfs_subreq_terminated(subreq, transferred_or_error, was_async); -} - -/* - * Issue a read against the cache. - * - Eats the caller's ref on subreq. - */ -static void netfs_read_from_cache(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq, - enum netfs_read_from_hole read_hole) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - - netfs_stat(&netfs_n_rh_read); - cres->ops->read(cres, subreq->start, &subreq->io_iter, read_hole, - netfs_cache_read_terminated, subreq); -} - -/* - * Fill a subrequest region with zeroes. - */ -static void netfs_fill_with_zeroes(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq) -{ - netfs_stat(&netfs_n_rh_zero); - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); - netfs_subreq_terminated(subreq, 0, false); -} - -/* - * Ask the netfs to issue a read request to the server for us. - * - * The netfs is expected to read from subreq->pos + subreq->transferred to - * subreq->pos + subreq->len - 1. It may not backtrack and write data into the - * buffer prior to the transferred point as it might clobber dirty data - * obtained from the cache. - * - * Alternatively, the netfs is allowed to indicate one of two things: - * - * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and - * make progress. - * - * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be - * cleared. - */ -static void netfs_read_from_server(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq) -{ - netfs_stat(&netfs_n_rh_download); - - if (rreq->origin != NETFS_DIO_READ && - iov_iter_count(&subreq->io_iter) != subreq->len - subreq->transferred) - pr_warn("R=%08x[%u] ITER PRE-MISMATCH %zx != %zx-%zx %lx\n", - rreq->debug_id, subreq->debug_index, - iov_iter_count(&subreq->io_iter), subreq->len, - subreq->transferred, subreq->flags); - rreq->netfs_ops->issue_read(subreq); -} - -/* - * Release those waiting. - */ -static void netfs_rreq_completed(struct netfs_io_request *rreq, bool was_async) -{ - trace_netfs_rreq(rreq, netfs_rreq_trace_done); - netfs_clear_subrequests(rreq, was_async); - netfs_put_request(rreq, was_async, netfs_rreq_trace_put_complete); -} - -/* - * [DEPRECATED] Deal with the completion of writing the data to the cache. We - * have to clear the PG_fscache bits on the folios involved and release the - * caller's ref. - * - * May be called in softirq mode and we inherit a ref from the caller. - */ -static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq, - bool was_async) -{ - struct netfs_io_subrequest *subreq; - struct folio *folio; - pgoff_t unlocked = 0; - bool have_unlocked = false; - - rcu_read_lock(); - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); - - xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) { - if (xas_retry(&xas, folio)) - continue; - - /* We might have multiple writes from the same huge - * folio, but we mustn't unlock a folio more than once. - */ - if (have_unlocked && folio->index <= unlocked) - continue; - unlocked = folio_next_index(folio) - 1; - trace_netfs_folio(folio, netfs_folio_trace_end_copy); - folio_end_private_2(folio); - have_unlocked = true; - } - } - - rcu_read_unlock(); - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) /* [DEPRECATED] */ -{ - struct netfs_io_subrequest *subreq = priv; - struct netfs_io_request *rreq = subreq->rreq; - - if (IS_ERR_VALUE(transferred_or_error)) { - netfs_stat(&netfs_n_rh_write_failed); - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_copy_to_cache); - } else { - netfs_stat(&netfs_n_rh_write_done); - } - - trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); - - /* If we decrement nr_copy_ops to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_copy_ops)) - netfs_rreq_unmark_after_write(rreq, was_async); - - netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); -} - -/* - * [DEPRECATED] Perform any outstanding writes to the cache. We inherit a ref - * from the caller. - */ -static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - struct netfs_io_subrequest *subreq, *next, *p; - struct iov_iter iter; - int ret; - - trace_netfs_rreq(rreq, netfs_rreq_trace_copy); - - /* We don't want terminating writes trying to wake us up whilst we're - * still going through the list. - */ - atomic_inc(&rreq->nr_copy_ops); - - list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { - if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) { - list_del_init(&subreq->rreq_link); - netfs_put_subrequest(subreq, false, - netfs_sreq_trace_put_no_copy); - } - } - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - /* Amalgamate adjacent writes */ - while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - next = list_next_entry(subreq, rreq_link); - if (next->start != subreq->start + subreq->len) - break; - subreq->len += next->len; - list_del_init(&next->rreq_link); - netfs_put_subrequest(next, false, - netfs_sreq_trace_put_merged); - } - - ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, - subreq->len, rreq->i_size, true); - if (ret < 0) { - trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); - trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); - continue; - } - - iov_iter_xarray(&iter, ITER_SOURCE, &rreq->mapping->i_pages, - subreq->start, subreq->len); - - atomic_inc(&rreq->nr_copy_ops); - netfs_stat(&netfs_n_rh_write); - netfs_get_subrequest(subreq, netfs_sreq_trace_get_copy_to_cache); - trace_netfs_sreq(subreq, netfs_sreq_trace_write); - cres->ops->write(cres, subreq->start, &iter, - netfs_rreq_copy_terminated, subreq); - } - - /* If we decrement nr_copy_ops to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_copy_ops)) - netfs_rreq_unmark_after_write(rreq, false); -} - -static void netfs_rreq_write_to_cache_work(struct work_struct *work) /* [DEPRECATED] */ -{ - struct netfs_io_request *rreq = - container_of(work, struct netfs_io_request, work); - - netfs_rreq_do_write_to_cache(rreq); -} - -static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) /* [DEPRECATED] */ -{ - rreq->work.func = netfs_rreq_write_to_cache_work; - if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); -} - -/* - * Handle a short read. - */ -static void netfs_rreq_short_read(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq) -{ - __clear_bit(NETFS_SREQ_SHORT_IO, &subreq->flags); - __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); - - netfs_stat(&netfs_n_rh_short_read); - trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); - - netfs_get_subrequest(subreq, netfs_sreq_trace_get_short_read); - atomic_inc(&rreq->nr_outstanding); - if (subreq->source == NETFS_READ_FROM_CACHE) - netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_CLEAR); - else - netfs_read_from_server(rreq, subreq); -} - -/* - * Reset the subrequest iterator prior to resubmission. - */ -static void netfs_reset_subreq_iter(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq) -{ - size_t remaining = subreq->len - subreq->transferred; - size_t count = iov_iter_count(&subreq->io_iter); - - if (count == remaining) - return; - - _debug("R=%08x[%u] ITER RESUB-MISMATCH %zx != %zx-%zx-%llx %x", - rreq->debug_id, subreq->debug_index, - iov_iter_count(&subreq->io_iter), subreq->transferred, - subreq->len, rreq->i_size, - subreq->io_iter.iter_type); - - if (count < remaining) - iov_iter_revert(&subreq->io_iter, remaining - count); - else - iov_iter_advance(&subreq->io_iter, count - remaining); -} - -/* - * Resubmit any short or failed operations. Returns true if we got the rreq - * ref back. - */ -static bool netfs_rreq_perform_resubmissions(struct netfs_io_request *rreq) -{ - struct netfs_io_subrequest *subreq; - - WARN_ON(in_interrupt()); - - trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); - - /* We don't want terminating submissions trying to wake us up whilst - * we're still going through the list. - */ - atomic_inc(&rreq->nr_outstanding); - - __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - if (subreq->error) { - if (subreq->source != NETFS_READ_FROM_CACHE) - break; - subreq->source = NETFS_DOWNLOAD_FROM_SERVER; - subreq->error = 0; - __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); - netfs_stat(&netfs_n_rh_download_instead); - trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); - atomic_inc(&rreq->nr_outstanding); - netfs_reset_subreq_iter(rreq, subreq); - netfs_read_from_server(rreq, subreq); - } else if (test_bit(NETFS_SREQ_SHORT_IO, &subreq->flags)) { - __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); - netfs_reset_subreq_iter(rreq, subreq); - netfs_rreq_short_read(rreq, subreq); - } - } - - /* If we decrement nr_outstanding to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_outstanding)) - return true; - - wake_up_var(&rreq->nr_outstanding); - return false; -} - -/* - * Check to see if the data read is still valid. - */ -static void netfs_rreq_is_still_valid(struct netfs_io_request *rreq) -{ - struct netfs_io_subrequest *subreq; - - if (!rreq->netfs_ops->is_still_valid || - rreq->netfs_ops->is_still_valid(rreq)) - return; - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - if (subreq->source == NETFS_READ_FROM_CACHE) { - subreq->error = -ESTALE; - __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - } - } -} - -/* - * Determine how much we can admit to having read from a DIO read. - */ -static void netfs_rreq_assess_dio(struct netfs_io_request *rreq) -{ - struct netfs_io_subrequest *subreq; - unsigned int i; - size_t transferred = 0; - - for (i = 0; i < rreq->direct_bv_count; i++) { - flush_dcache_page(rreq->direct_bv[i].bv_page); - // TODO: cifs marks pages in the destination buffer - // dirty under some circumstances after a read. Do we - // need to do that too? - set_page_dirty(rreq->direct_bv[i].bv_page); - } - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - if (subreq->error || subreq->transferred == 0) - break; - transferred += subreq->transferred; - if (subreq->transferred < subreq->len || - test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) - break; - } - - for (i = 0; i < rreq->direct_bv_count; i++) - flush_dcache_page(rreq->direct_bv[i].bv_page); - - rreq->transferred = transferred; - task_io_account_read(transferred); - - if (rreq->iocb) { - rreq->iocb->ki_pos += transferred; - if (rreq->iocb->ki_complete) - rreq->iocb->ki_complete( - rreq->iocb, rreq->error ? rreq->error : transferred); - } - if (rreq->netfs_ops->done) - rreq->netfs_ops->done(rreq); - inode_dio_end(rreq->inode); -} - -/* - * Assess the state of a read request and decide what to do next. - * - * Note that we could be in an ordinary kernel thread, on a workqueue or in - * softirq context at this point. We inherit a ref from the caller. - */ -static void netfs_rreq_assess(struct netfs_io_request *rreq, bool was_async) -{ - trace_netfs_rreq(rreq, netfs_rreq_trace_assess); - -again: - netfs_rreq_is_still_valid(rreq); - - if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && - test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { - if (netfs_rreq_perform_resubmissions(rreq)) - goto again; - return; - } - - if (rreq->origin != NETFS_DIO_READ) - netfs_rreq_unlock_folios(rreq); - else - netfs_rreq_assess_dio(rreq); - - trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip); - clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); - wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); - - if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags) && - test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) - return netfs_rreq_write_to_cache(rreq); - - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_work(struct work_struct *work) -{ - struct netfs_io_request *rreq = - container_of(work, struct netfs_io_request, work); - netfs_rreq_assess(rreq, false); -} - -/* - * Handle the completion of all outstanding I/O operations on a read request. - * We inherit a ref from the caller. - */ -static void netfs_rreq_terminated(struct netfs_io_request *rreq, - bool was_async) -{ - if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && - was_async) { - if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); - } else { - netfs_rreq_assess(rreq, was_async); - } -} - -/** - * netfs_subreq_terminated - Note the termination of an I/O operation. - * @subreq: The I/O request that has terminated. - * @transferred_or_error: The amount of data transferred or an error code. - * @was_async: The termination was asynchronous - * - * This tells the read helper that a contributory I/O operation has terminated, - * one way or another, and that it should integrate the results. - * - * The caller indicates in @transferred_or_error the outcome of the operation, - * supplying a positive value to indicate the number of bytes transferred, 0 to - * indicate a failure to transfer anything that should be retried or a negative - * error code. The helper will look after reissuing I/O operations as - * appropriate and writing downloaded data to the cache. - * - * If @was_async is true, the caller might be running in softirq or interrupt - * context and we can't sleep. - */ -void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, - ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_io_request *rreq = subreq->rreq; - int u; - - _enter("R=%x[%x]{%llx,%lx},%zd", - rreq->debug_id, subreq->debug_index, - subreq->start, subreq->flags, transferred_or_error); - - switch (subreq->source) { - case NETFS_READ_FROM_CACHE: - netfs_stat(&netfs_n_rh_read_done); - break; - case NETFS_DOWNLOAD_FROM_SERVER: - netfs_stat(&netfs_n_rh_download_done); - break; - default: - break; - } - - if (IS_ERR_VALUE(transferred_or_error)) { - subreq->error = transferred_or_error; - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_read); - goto failed; - } - - if (WARN(transferred_or_error > subreq->len - subreq->transferred, - "Subreq overread: R%x[%x] %zd > %zu - %zu", - rreq->debug_id, subreq->debug_index, - transferred_or_error, subreq->len, subreq->transferred)) - transferred_or_error = subreq->len - subreq->transferred; - - subreq->error = 0; - subreq->transferred += transferred_or_error; - if (subreq->transferred < subreq->len && - !test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) - goto incomplete; - -complete: - __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); - if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) - set_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags); - -out: - trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); - - /* If we decrement nr_outstanding to 0, the ref belongs to us. */ - u = atomic_dec_return(&rreq->nr_outstanding); - if (u == 0) - netfs_rreq_terminated(rreq, was_async); - else if (u == 1) - wake_up_var(&rreq->nr_outstanding); - - netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); - return; - -incomplete: - if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { - netfs_clear_unread(subreq); - subreq->transferred = subreq->len; - goto complete; - } - - if (transferred_or_error == 0) { - if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { - if (rreq->origin != NETFS_DIO_READ) - subreq->error = -ENODATA; - goto failed; - } - } else { - __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); - } - - __set_bit(NETFS_SREQ_SHORT_IO, &subreq->flags); - set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - goto out; - -failed: - if (subreq->source == NETFS_READ_FROM_CACHE) { - netfs_stat(&netfs_n_rh_read_failed); - set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - } else { - netfs_stat(&netfs_n_rh_download_failed); - set_bit(NETFS_RREQ_FAILED, &rreq->flags); - rreq->error = subreq->error; - } - goto out; -} -EXPORT_SYMBOL(netfs_subreq_terminated); - -static enum netfs_io_source netfs_cache_prepare_read(struct netfs_io_subrequest *subreq, - loff_t i_size) -{ - struct netfs_io_request *rreq = subreq->rreq; - struct netfs_cache_resources *cres = &rreq->cache_resources; - - if (cres->ops) - return cres->ops->prepare_read(subreq, i_size); - if (subreq->start >= rreq->i_size) - return NETFS_FILL_WITH_ZEROES; - return NETFS_DOWNLOAD_FROM_SERVER; -} - -/* - * Work out what sort of subrequest the next one will be. - */ -static enum netfs_io_source -netfs_rreq_prepare_read(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq, - struct iov_iter *io_iter) -{ - enum netfs_io_source source = NETFS_DOWNLOAD_FROM_SERVER; - struct netfs_inode *ictx = netfs_inode(rreq->inode); - size_t lsize; - - _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); - - if (rreq->origin != NETFS_DIO_READ) { - source = netfs_cache_prepare_read(subreq, rreq->i_size); - if (source == NETFS_INVALID_READ) - goto out; - } - - if (source == NETFS_DOWNLOAD_FROM_SERVER) { - /* Call out to the netfs to let it shrink the request to fit - * its own I/O sizes and boundaries. If it shinks it here, it - * will be called again to make simultaneous calls; if it wants - * to make serial calls, it can indicate a short read and then - * we will call it again. - */ - if (rreq->origin != NETFS_DIO_READ) { - if (subreq->start >= ictx->zero_point) { - source = NETFS_FILL_WITH_ZEROES; - goto set; - } - if (subreq->len > ictx->zero_point - subreq->start) - subreq->len = ictx->zero_point - subreq->start; - - /* We limit buffered reads to the EOF, but let the - * server deal with larger-than-EOF DIO/unbuffered - * reads. - */ - if (subreq->len > rreq->i_size - subreq->start) - subreq->len = rreq->i_size - subreq->start; - } - if (rreq->rsize && subreq->len > rreq->rsize) - subreq->len = rreq->rsize; - - if (rreq->netfs_ops->clamp_length && - !rreq->netfs_ops->clamp_length(subreq)) { - source = NETFS_INVALID_READ; - goto out; - } - - if (subreq->max_nr_segs) { - lsize = netfs_limit_iter(io_iter, 0, subreq->len, - subreq->max_nr_segs); - if (subreq->len > lsize) { - subreq->len = lsize; - trace_netfs_sreq(subreq, netfs_sreq_trace_limited); - } - } - } - -set: - if (subreq->len > rreq->len) - pr_warn("R=%08x[%u] SREQ>RREQ %zx > %llx\n", - rreq->debug_id, subreq->debug_index, - subreq->len, rreq->len); - - if (WARN_ON(subreq->len == 0)) { - source = NETFS_INVALID_READ; - goto out; - } - - subreq->source = source; - trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); - - subreq->io_iter = *io_iter; - iov_iter_truncate(&subreq->io_iter, subreq->len); - iov_iter_advance(io_iter, subreq->len); -out: - subreq->source = source; - trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); - return source; -} - -/* - * Slice off a piece of a read request and submit an I/O request for it. - */ -static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq, - struct iov_iter *io_iter) -{ - struct netfs_io_subrequest *subreq; - enum netfs_io_source source; - - subreq = netfs_alloc_subrequest(rreq); - if (!subreq) - return false; - - subreq->start = rreq->start + rreq->submitted; - subreq->len = io_iter->count; - - _debug("slice %llx,%zx,%llx", subreq->start, subreq->len, rreq->submitted); - list_add_tail(&subreq->rreq_link, &rreq->subrequests); - - /* Call out to the cache to find out what it can do with the remaining - * subset. It tells us in subreq->flags what it decided should be done - * and adjusts subreq->len down if the subset crosses a cache boundary. - * - * Then when we hand the subset, it can choose to take a subset of that - * (the starts must coincide), in which case, we go around the loop - * again and ask it to download the next piece. - */ - source = netfs_rreq_prepare_read(rreq, subreq, io_iter); - if (source == NETFS_INVALID_READ) - goto subreq_failed; - - atomic_inc(&rreq->nr_outstanding); - - rreq->submitted += subreq->len; - - trace_netfs_sreq(subreq, netfs_sreq_trace_submit); - switch (source) { - case NETFS_FILL_WITH_ZEROES: - netfs_fill_with_zeroes(rreq, subreq); - break; - case NETFS_DOWNLOAD_FROM_SERVER: - netfs_read_from_server(rreq, subreq); - break; - case NETFS_READ_FROM_CACHE: - netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_IGNORE); - break; - default: - BUG(); - } - - return true; - -subreq_failed: - rreq->error = subreq->error; - netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_failed); - return false; -} - -/* - * Begin the process of reading in a chunk of data, where that data may be - * stitched together from multiple sources, including multiple servers and the - * local cache. - */ -int netfs_begin_read(struct netfs_io_request *rreq, bool sync) -{ - struct iov_iter io_iter; - int ret; - - _enter("R=%x %llx-%llx", - rreq->debug_id, rreq->start, rreq->start + rreq->len - 1); - - if (rreq->len == 0) { - pr_err("Zero-sized read [R=%x]\n", rreq->debug_id); - return -EIO; - } - - if (rreq->origin == NETFS_DIO_READ) - inode_dio_begin(rreq->inode); - - // TODO: Use bounce buffer if requested - rreq->io_iter = rreq->iter; - - INIT_WORK(&rreq->work, netfs_rreq_work); - - /* Chop the read into slices according to what the cache and the netfs - * want and submit each one. - */ - netfs_get_request(rreq, netfs_rreq_trace_get_for_outstanding); - atomic_set(&rreq->nr_outstanding, 1); - io_iter = rreq->io_iter; - do { - _debug("submit %llx + %llx >= %llx", - rreq->start, rreq->submitted, rreq->i_size); - if (!netfs_rreq_submit_slice(rreq, &io_iter)) - break; - if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags)) - break; - if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && - test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) - break; - - } while (rreq->submitted < rreq->len); - - if (!rreq->submitted) { - netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit); - if (rreq->origin == NETFS_DIO_READ) - inode_dio_end(rreq->inode); - ret = 0; - goto out; - } - - if (sync) { - /* Keep nr_outstanding incremented so that the ref always - * belongs to us, and the service code isn't punted off to a - * random thread pool to process. Note that this might start - * further work, such as writing to the cache. - */ - wait_var_event(&rreq->nr_outstanding, - atomic_read(&rreq->nr_outstanding) == 1); - if (atomic_dec_and_test(&rreq->nr_outstanding)) - netfs_rreq_assess(rreq, false); - - trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip); - wait_on_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS, - TASK_UNINTERRUPTIBLE); - - ret = rreq->error; - if (ret == 0) { - if (rreq->origin == NETFS_DIO_READ) { - ret = rreq->transferred; - } else if (rreq->submitted < rreq->len) { - trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read); - ret = -EIO; - } - } - } else { - /* If we decrement nr_outstanding to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_outstanding)) - netfs_rreq_assess(rreq, false); - ret = -EIOCBQUEUED; - } - -out: - return ret; -} diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index b781bbbf1d8d..72a435e5fc6d 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -188,9 +188,59 @@ static size_t netfs_limit_xarray(const struct iov_iter *iter, size_t start_offse return min(span, max_size); } +/* + * Select the span of a folio queue iterator we're going to use. Limit it by + * both maximum size and maximum number of segments. Returns the size of the + * span in bytes. + */ +static size_t netfs_limit_folioq(const struct iov_iter *iter, size_t start_offset, + size_t max_size, size_t max_segs) +{ + const struct folio_queue *folioq = iter->folioq; + unsigned int nsegs = 0; + unsigned int slot = iter->folioq_slot; + size_t span = 0, n = iter->count; + + if (WARN_ON(!iov_iter_is_folioq(iter)) || + WARN_ON(start_offset > n) || + n == 0) + return 0; + max_size = umin(max_size, n - start_offset); + + if (slot >= folioq_nr_slots(folioq)) { + folioq = folioq->next; + slot = 0; + } + + start_offset += iter->iov_offset; + do { + size_t flen = folioq_folio_size(folioq, slot); + + if (start_offset < flen) { + span += flen - start_offset; + nsegs++; + start_offset = 0; + } else { + start_offset -= flen; + } + if (span >= max_size || nsegs >= max_segs) + break; + + slot++; + if (slot >= folioq_nr_slots(folioq)) { + folioq = folioq->next; + slot = 0; + } + } while (folioq); + + return umin(span, max_size); +} + size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset, size_t max_size, size_t max_segs) { + if (iov_iter_is_folioq(iter)) + return netfs_limit_folioq(iter, start_offset, max_size, max_segs); if (iov_iter_is_bvec(iter)) return netfs_limit_bvec(iter, start_offset, max_size, max_segs); if (iov_iter_is_xarray(iter)) diff --git a/fs/netfs/main.c b/fs/netfs/main.c index 9d6b49dc6694..6c7be1377ee0 100644 --- a/fs/netfs/main.c +++ b/fs/netfs/main.c @@ -36,13 +36,14 @@ DEFINE_SPINLOCK(netfs_proc_lock); static const char *netfs_origins[nr__netfs_io_origin] = { [NETFS_READAHEAD] = "RA", [NETFS_READPAGE] = "RP", + [NETFS_READ_GAPS] = "RG", [NETFS_READ_FOR_WRITE] = "RW", - [NETFS_COPY_TO_CACHE] = "CC", + [NETFS_DIO_READ] = "DR", [NETFS_WRITEBACK] = "WB", [NETFS_WRITETHROUGH] = "WT", [NETFS_UNBUFFERED_WRITE] = "UW", - [NETFS_DIO_READ] = "DR", [NETFS_DIO_WRITE] = "DW", + [NETFS_PGPRIV2_COPY_TO_CACHE] = "2C", }; /* @@ -62,7 +63,7 @@ static int netfs_requests_seq_show(struct seq_file *m, void *v) rreq = list_entry(v, struct netfs_io_request, proc_link); seq_printf(m, - "%08x %s %3d %2lx %4d %3d @%04llx %llx/%llx", + "%08x %s %3d %2lx %4ld %3d @%04llx %llx/%llx", rreq->debug_id, netfs_origins[rreq->origin], refcount_read(&rreq->ref), diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index c1f321cf5999..0ad0982ce0e2 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -8,6 +8,100 @@ #include <linux/swap.h> #include "internal.h" +/* + * Append a folio to the rolling queue. + */ +int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, + bool needs_put) +{ + struct folio_queue *tail = rreq->buffer_tail; + unsigned int slot, order = folio_order(folio); + + if (WARN_ON_ONCE(!rreq->buffer && tail) || + WARN_ON_ONCE(rreq->buffer && !tail)) + return -EIO; + + if (!tail || folioq_full(tail)) { + tail = kmalloc(sizeof(*tail), GFP_NOFS); + if (!tail) + return -ENOMEM; + netfs_stat(&netfs_n_folioq); + folioq_init(tail); + tail->prev = rreq->buffer_tail; + if (tail->prev) + tail->prev->next = tail; + rreq->buffer_tail = tail; + if (!rreq->buffer) { + rreq->buffer = tail; + iov_iter_folio_queue(&rreq->io_iter, ITER_SOURCE, tail, 0, 0, 0); + } + rreq->buffer_tail_slot = 0; + } + + rreq->io_iter.count += PAGE_SIZE << order; + + slot = folioq_append(tail, folio); + /* Store the counter after setting the slot. */ + smp_store_release(&rreq->buffer_tail_slot, slot); + return 0; +} + +/* + * Delete the head of a rolling queue. + */ +struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq) +{ + struct folio_queue *head = wreq->buffer, *next = head->next; + + if (next) + next->prev = NULL; + netfs_stat_d(&netfs_n_folioq); + kfree(head); + wreq->buffer = next; + return next; +} + +/* + * Clear out a rolling queue. + */ +void netfs_clear_buffer(struct netfs_io_request *rreq) +{ + struct folio_queue *p; + + while ((p = rreq->buffer)) { + rreq->buffer = p->next; + for (int slot = 0; slot < folioq_nr_slots(p); slot++) { + struct folio *folio = folioq_folio(p, slot); + if (!folio) + continue; + if (folioq_is_marked(p, slot)) { + trace_netfs_folio(folio, netfs_folio_trace_put); + folio_put(folio); + } + } + netfs_stat_d(&netfs_n_folioq); + kfree(p); + } +} + +/* + * Reset the subrequest iterator to refer just to the region remaining to be + * read. The iterator may or may not have been advanced by socket ops or + * extraction ops to an extent that may or may not match the amount actually + * read. + */ +void netfs_reset_iter(struct netfs_io_subrequest *subreq) +{ + struct iov_iter *io_iter = &subreq->io_iter; + size_t remain = subreq->len - subreq->transferred; + + if (io_iter->count > remain) + iov_iter_advance(io_iter, io_iter->count - remain); + else if (io_iter->count < remain) + iov_iter_revert(io_iter, remain - io_iter->count); + iov_iter_truncate(&subreq->io_iter, remain); +} + /** * netfs_dirty_folio - Mark folio dirty and pin a cache object for writeback * @mapping: The mapping the folio belongs to. diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index 0294df70c3ff..31e388ec6e48 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -36,7 +36,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping, memset(rreq, 0, kmem_cache_size(cache)); rreq->start = start; rreq->len = len; - rreq->upper_len = len; rreq->origin = origin; rreq->netfs_ops = ctx->ops; rreq->mapping = mapping; @@ -44,13 +43,23 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping, rreq->i_size = i_size_read(inode); rreq->debug_id = atomic_inc_return(&debug_ids); rreq->wsize = INT_MAX; + rreq->io_streams[0].sreq_max_len = ULONG_MAX; + rreq->io_streams[0].sreq_max_segs = 0; spin_lock_init(&rreq->lock); INIT_LIST_HEAD(&rreq->io_streams[0].subrequests); INIT_LIST_HEAD(&rreq->io_streams[1].subrequests); INIT_LIST_HEAD(&rreq->subrequests); - INIT_WORK(&rreq->work, NULL); refcount_set(&rreq->ref, 1); + if (origin == NETFS_READAHEAD || + origin == NETFS_READPAGE || + origin == NETFS_READ_GAPS || + origin == NETFS_READ_FOR_WRITE || + origin == NETFS_DIO_READ) + INIT_WORK(&rreq->work, netfs_read_termination_worker); + else + INIT_WORK(&rreq->work, netfs_write_collection_worker); + __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); if (file && file->f_flags & O_NONBLOCK) __set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags); @@ -134,6 +143,7 @@ static void netfs_free_request(struct work_struct *work) } kvfree(rreq->direct_bv); } + netfs_clear_buffer(rreq); if (atomic_dec_and_test(&ictx->io_count)) wake_up_var(&ictx->io_count); @@ -155,7 +165,7 @@ void netfs_put_request(struct netfs_io_request *rreq, bool was_async, if (was_async) { rreq->work.func = netfs_free_request; if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); + WARN_ON(1); } else { netfs_free_request(&rreq->work); } diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c new file mode 100644 index 000000000000..b18c65ba5580 --- /dev/null +++ b/fs/netfs/read_collect.c @@ -0,0 +1,544 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem read subrequest result collection, assessment and + * retrying. + * + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/export.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/slab.h> +#include <linux/task_io_accounting_ops.h> +#include "internal.h" + +/* + * Clear the unread part of an I/O request. + */ +static void netfs_clear_unread(struct netfs_io_subrequest *subreq) +{ + netfs_reset_iter(subreq); + WARN_ON_ONCE(subreq->len - subreq->transferred != iov_iter_count(&subreq->io_iter)); + iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter); + if (subreq->start + subreq->transferred >= subreq->rreq->i_size) + __set_bit(NETFS_SREQ_HIT_EOF, &subreq->flags); +} + +/* + * Flush, mark and unlock a folio that's now completely read. If we want to + * cache the folio, we set the group to NETFS_FOLIO_COPY_TO_CACHE, mark it + * dirty and let writeback handle it. + */ +static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq, + struct netfs_io_request *rreq, + struct folio_queue *folioq, + int slot) +{ + struct netfs_folio *finfo; + struct folio *folio = folioq_folio(folioq, slot); + + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + + if (!test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) { + finfo = netfs_folio_info(folio); + if (finfo) { + trace_netfs_folio(folio, netfs_folio_trace_filled_gaps); + if (finfo->netfs_group) + folio_change_private(folio, finfo->netfs_group); + else + folio_detach_private(folio); + kfree(finfo); + } + + if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) { + if (!WARN_ON_ONCE(folio_get_private(folio) != NULL)) { + trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache); + folio_attach_private(folio, NETFS_FOLIO_COPY_TO_CACHE); + folio_mark_dirty(folio); + } + } else { + trace_netfs_folio(folio, netfs_folio_trace_read_done); + } + } else { + // TODO: Use of PG_private_2 is deprecated. + if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) + netfs_pgpriv2_mark_copy_to_cache(subreq, rreq, folioq, slot); + } + + if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { + if (folio->index == rreq->no_unlock_folio && + test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) { + _debug("no unlock"); + } else { + trace_netfs_folio(folio, netfs_folio_trace_read_unlock); + folio_unlock(folio); + } + } +} + +/* + * Unlock any folios that are now completely read. Returns true if the + * subrequest is removed from the list. + */ +static bool netfs_consume_read_data(struct netfs_io_subrequest *subreq, bool was_async) +{ + struct netfs_io_subrequest *prev, *next; + struct netfs_io_request *rreq = subreq->rreq; + struct folio_queue *folioq = subreq->curr_folioq; + size_t avail, prev_donated, next_donated, fsize, part, excess; + loff_t fpos, start; + loff_t fend; + int slot = subreq->curr_folioq_slot; + + if (WARN(subreq->transferred > subreq->len, + "Subreq overread: R%x[%x] %zu > %zu", + rreq->debug_id, subreq->debug_index, + subreq->transferred, subreq->len)) + subreq->transferred = subreq->len; + +next_folio: + fsize = PAGE_SIZE << subreq->curr_folio_order; + fpos = round_down(subreq->start + subreq->consumed, fsize); + fend = fpos + fsize; + + if (WARN_ON_ONCE(!folioq) || + WARN_ON_ONCE(!folioq_folio(folioq, slot)) || + WARN_ON_ONCE(folioq_folio(folioq, slot)->index != fpos / PAGE_SIZE)) { + pr_err("R=%08x[%x] s=%llx-%llx ctl=%zx/%zx/%zx sl=%u\n", + rreq->debug_id, subreq->debug_index, + subreq->start, subreq->start + subreq->transferred - 1, + subreq->consumed, subreq->transferred, subreq->len, + slot); + if (folioq) { + struct folio *folio = folioq_folio(folioq, slot); + + pr_err("folioq: orders=%02x%02x%02x%02x\n", + folioq->orders[0], folioq->orders[1], + folioq->orders[2], folioq->orders[3]); + if (folio) + pr_err("folio: %llx-%llx ix=%llx o=%u qo=%u\n", + fpos, fend - 1, folio_pos(folio), folio_order(folio), + folioq_folio_order(folioq, slot)); + } + } + +donation_changed: + /* Try to consume the current folio if we've hit or passed the end of + * it. There's a possibility that this subreq doesn't start at the + * beginning of the folio, in which case we need to donate to/from the + * preceding subreq. + * + * We also need to include any potential donation back from the + * following subreq. + */ + prev_donated = READ_ONCE(subreq->prev_donated); + next_donated = READ_ONCE(subreq->next_donated); + if (prev_donated || next_donated) { + spin_lock_bh(&rreq->lock); + prev_donated = subreq->prev_donated; + next_donated = subreq->next_donated; + subreq->start -= prev_donated; + subreq->len += prev_donated; + subreq->transferred += prev_donated; + prev_donated = subreq->prev_donated = 0; + if (subreq->transferred == subreq->len) { + subreq->len += next_donated; + subreq->transferred += next_donated; + next_donated = subreq->next_donated = 0; + } + trace_netfs_sreq(subreq, netfs_sreq_trace_add_donations); + spin_unlock_bh(&rreq->lock); + } + + avail = subreq->transferred; + if (avail == subreq->len) + avail += next_donated; + start = subreq->start; + if (subreq->consumed == 0) { + start -= prev_donated; + avail += prev_donated; + } else { + start += subreq->consumed; + avail -= subreq->consumed; + } + part = umin(avail, fsize); + + trace_netfs_progress(subreq, start, avail, part); + + if (start + avail >= fend) { + if (fpos == start) { + /* Flush, unlock and mark for caching any folio we've just read. */ + subreq->consumed = fend - subreq->start; + netfs_unlock_read_folio(subreq, rreq, folioq, slot); + folioq_mark2(folioq, slot); + if (subreq->consumed >= subreq->len) + goto remove_subreq; + } else if (fpos < start) { + excess = fend - subreq->start; + + spin_lock_bh(&rreq->lock); + /* If we complete first on a folio split with the + * preceding subreq, donate to that subreq - otherwise + * we get the responsibility. + */ + if (subreq->prev_donated != prev_donated) { + spin_unlock_bh(&rreq->lock); + goto donation_changed; + } + + if (list_is_first(&subreq->rreq_link, &rreq->subrequests)) { + spin_unlock_bh(&rreq->lock); + pr_err("Can't donate prior to front\n"); + goto bad; + } + + prev = list_prev_entry(subreq, rreq_link); + WRITE_ONCE(prev->next_donated, prev->next_donated + excess); + subreq->start += excess; + subreq->len -= excess; + subreq->transferred -= excess; + trace_netfs_donate(rreq, subreq, prev, excess, + netfs_trace_donate_tail_to_prev); + trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_prev); + + if (subreq->consumed >= subreq->len) + goto remove_subreq_locked; + spin_unlock_bh(&rreq->lock); + } else { + pr_err("fpos > start\n"); + goto bad; + } + + /* Advance the rolling buffer to the next folio. */ + slot++; + if (slot >= folioq_nr_slots(folioq)) { + slot = 0; + folioq = folioq->next; + subreq->curr_folioq = folioq; + } + subreq->curr_folioq_slot = slot; + if (folioq && folioq_folio(folioq, slot)) + subreq->curr_folio_order = folioq->orders[slot]; + if (!was_async) + cond_resched(); + goto next_folio; + } + + /* Deal with partial progress. */ + if (subreq->transferred < subreq->len) + return false; + + /* Donate the remaining downloaded data to one of the neighbouring + * subrequests. Note that we may race with them doing the same thing. + */ + spin_lock_bh(&rreq->lock); + + if (subreq->prev_donated != prev_donated || + subreq->next_donated != next_donated) { + spin_unlock_bh(&rreq->lock); + cond_resched(); + goto donation_changed; + } + + /* Deal with the trickiest case: that this subreq is in the middle of a + * folio, not touching either edge, but finishes first. In such a + * case, we donate to the previous subreq, if there is one, so that the + * donation is only handled when that completes - and remove this + * subreq from the list. + * + * If the previous subreq finished first, we will have acquired their + * donation and should be able to unlock folios and/or donate nextwards. + */ + if (!subreq->consumed && + !prev_donated && + !list_is_first(&subreq->rreq_link, &rreq->subrequests)) { + prev = list_prev_entry(subreq, rreq_link); + WRITE_ONCE(prev->next_donated, prev->next_donated + subreq->len); + subreq->start += subreq->len; + subreq->len = 0; + subreq->transferred = 0; + trace_netfs_donate(rreq, subreq, prev, subreq->len, + netfs_trace_donate_to_prev); + trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_prev); + goto remove_subreq_locked; + } + + /* If we can't donate down the chain, donate up the chain instead. */ + excess = subreq->len - subreq->consumed + next_donated; + + if (!subreq->consumed) + excess += prev_donated; + + if (list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + rreq->prev_donated = excess; + trace_netfs_donate(rreq, subreq, NULL, excess, + netfs_trace_donate_to_deferred_next); + } else { + next = list_next_entry(subreq, rreq_link); + WRITE_ONCE(next->prev_donated, excess); + trace_netfs_donate(rreq, subreq, next, excess, + netfs_trace_donate_to_next); + } + trace_netfs_sreq(subreq, netfs_sreq_trace_donate_to_next); + subreq->len = subreq->consumed; + subreq->transferred = subreq->consumed; + goto remove_subreq_locked; + +remove_subreq: + spin_lock_bh(&rreq->lock); +remove_subreq_locked: + subreq->consumed = subreq->len; + list_del(&subreq->rreq_link); + spin_unlock_bh(&rreq->lock); + netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_consumed); + return true; + +bad: + /* Errr... prev and next both donated to us, but insufficient to finish + * the folio. + */ + printk("R=%08x[%x] s=%llx-%llx %zx/%zx/%zx\n", + rreq->debug_id, subreq->debug_index, + subreq->start, subreq->start + subreq->transferred - 1, + subreq->consumed, subreq->transferred, subreq->len); + printk("folio: %llx-%llx\n", fpos, fend - 1); + printk("donated: prev=%zx next=%zx\n", prev_donated, next_donated); + printk("s=%llx av=%zx part=%zx\n", start, avail, part); + BUG(); +} + +/* + * Do page flushing and suchlike after DIO. + */ +static void netfs_rreq_assess_dio(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + unsigned int i; + + /* Collect unbuffered reads and direct reads, adding up the transfer + * sizes until we find the first short or failed subrequest. + */ + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + rreq->transferred += subreq->transferred; + + if (subreq->transferred < subreq->len || + test_bit(NETFS_SREQ_FAILED, &subreq->flags)) { + rreq->error = subreq->error; + break; + } + } + + if (rreq->origin == NETFS_DIO_READ) { + for (i = 0; i < rreq->direct_bv_count; i++) { + flush_dcache_page(rreq->direct_bv[i].bv_page); + // TODO: cifs marks pages in the destination buffer + // dirty under some circumstances after a read. Do we + // need to do that too? + set_page_dirty(rreq->direct_bv[i].bv_page); + } + } + + if (rreq->iocb) { + rreq->iocb->ki_pos += rreq->transferred; + if (rreq->iocb->ki_complete) + rreq->iocb->ki_complete( + rreq->iocb, rreq->error ? rreq->error : rreq->transferred); + } + if (rreq->netfs_ops->done) + rreq->netfs_ops->done(rreq); + if (rreq->origin == NETFS_DIO_READ) + inode_dio_end(rreq->inode); +} + +/* + * Assess the state of a read request and decide what to do next. + * + * Note that we're in normal kernel thread context at this point, possibly + * running on a workqueue. + */ +static void netfs_rreq_assess(struct netfs_io_request *rreq) +{ + trace_netfs_rreq(rreq, netfs_rreq_trace_assess); + + //netfs_rreq_is_still_valid(rreq); + + if (test_and_clear_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags)) { + netfs_retry_reads(rreq); + return; + } + + if (rreq->origin == NETFS_DIO_READ || + rreq->origin == NETFS_READ_GAPS) + netfs_rreq_assess_dio(rreq); + task_io_account_read(rreq->transferred); + + trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip); + clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + + trace_netfs_rreq(rreq, netfs_rreq_trace_done); + netfs_clear_subrequests(rreq, false); + netfs_unlock_abandoned_read_pages(rreq); + if (unlikely(test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags))) + netfs_pgpriv2_write_to_the_cache(rreq); +} + +void netfs_read_termination_worker(struct work_struct *work) +{ + struct netfs_io_request *rreq = + container_of(work, struct netfs_io_request, work); + netfs_see_request(rreq, netfs_rreq_trace_see_work); + netfs_rreq_assess(rreq); + netfs_put_request(rreq, false, netfs_rreq_trace_put_work_complete); +} + +/* + * Handle the completion of all outstanding I/O operations on a read request. + * We inherit a ref from the caller. + */ +void netfs_rreq_terminated(struct netfs_io_request *rreq, bool was_async) +{ + if (!was_async) + return netfs_rreq_assess(rreq); + if (!work_pending(&rreq->work)) { + netfs_get_request(rreq, netfs_rreq_trace_get_work); + if (!queue_work(system_unbound_wq, &rreq->work)) + netfs_put_request(rreq, was_async, netfs_rreq_trace_put_work_nq); + } +} + +/** + * netfs_read_subreq_progress - Note progress of a read operation. + * @subreq: The read request that has terminated. + * @was_async: True if we're in an asynchronous context. + * + * This tells the read side of netfs lib that a contributory I/O operation has + * made some progress and that it may be possible to unlock some folios. + * + * Before calling, the filesystem should update subreq->transferred to track + * the amount of data copied into the output buffer. + * + * If @was_async is true, the caller might be running in softirq or interrupt + * context and we can't sleep. + */ +void netfs_read_subreq_progress(struct netfs_io_subrequest *subreq, + bool was_async) +{ + struct netfs_io_request *rreq = subreq->rreq; + + trace_netfs_sreq(subreq, netfs_sreq_trace_progress); + + if (subreq->transferred > subreq->consumed && + (rreq->origin == NETFS_READAHEAD || + rreq->origin == NETFS_READPAGE || + rreq->origin == NETFS_READ_FOR_WRITE)) { + netfs_consume_read_data(subreq, was_async); + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } +} +EXPORT_SYMBOL(netfs_read_subreq_progress); + +/** + * netfs_read_subreq_terminated - Note the termination of an I/O operation. + * @subreq: The I/O request that has terminated. + * @error: Error code indicating type of completion. + * @was_async: The termination was asynchronous + * + * This tells the read helper that a contributory I/O operation has terminated, + * one way or another, and that it should integrate the results. + * + * The caller indicates the outcome of the operation through @error, supplying + * 0 to indicate a successful or retryable transfer (if NETFS_SREQ_NEED_RETRY + * is set) or a negative error code. The helper will look after reissuing I/O + * operations as appropriate and writing downloaded data to the cache. + * + * Before calling, the filesystem should update subreq->transferred to track + * the amount of data copied into the output buffer. + * + * If @was_async is true, the caller might be running in softirq or interrupt + * context and we can't sleep. + */ +void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq, + int error, bool was_async) +{ + struct netfs_io_request *rreq = subreq->rreq; + + switch (subreq->source) { + case NETFS_READ_FROM_CACHE: + netfs_stat(&netfs_n_rh_read_done); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_stat(&netfs_n_rh_download_done); + break; + default: + break; + } + + if (rreq->origin != NETFS_DIO_READ) { + /* Collect buffered reads. + * + * If the read completed validly short, then we can clear the + * tail before going on to unlock the folios. + */ + if (error == 0 && subreq->transferred < subreq->len && + (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags) || + test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags))) { + netfs_clear_unread(subreq); + subreq->transferred = subreq->len; + trace_netfs_sreq(subreq, netfs_sreq_trace_clear); + } + if (subreq->transferred > subreq->consumed && + (rreq->origin == NETFS_READAHEAD || + rreq->origin == NETFS_READPAGE || + rreq->origin == NETFS_READ_FOR_WRITE)) { + netfs_consume_read_data(subreq, was_async); + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } + rreq->transferred += subreq->transferred; + } + + /* Deal with retry requests, short reads and errors. If we retry + * but don't make progress, we abandon the attempt. + */ + if (!error && subreq->transferred < subreq->len) { + if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) { + trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof); + } else { + trace_netfs_sreq(subreq, netfs_sreq_trace_short); + if (subreq->transferred > subreq->consumed) { + __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags); + } else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { + __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); + set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags); + } else { + __set_bit(NETFS_SREQ_FAILED, &subreq->flags); + error = -ENODATA; + } + } + } + + subreq->error = error; + trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); + + if (unlikely(error < 0)) { + trace_netfs_failure(rreq, subreq, error, netfs_fail_read); + if (subreq->source == NETFS_READ_FROM_CACHE) { + netfs_stat(&netfs_n_rh_read_failed); + } else { + netfs_stat(&netfs_n_rh_download_failed); + set_bit(NETFS_RREQ_FAILED, &rreq->flags); + rreq->error = subreq->error; + } + } + + if (atomic_dec_and_test(&rreq->nr_outstanding)) + netfs_rreq_terminated(rreq, was_async); + + netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); +} +EXPORT_SYMBOL(netfs_read_subreq_terminated); diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c new file mode 100644 index 000000000000..ba5af89d37fa --- /dev/null +++ b/fs/netfs/read_pgpriv2.c @@ -0,0 +1,264 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Read with PG_private_2 [DEPRECATED]. + * + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/export.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/slab.h> +#include <linux/task_io_accounting_ops.h> +#include "internal.h" + +/* + * [DEPRECATED] Mark page as requiring copy-to-cache using PG_private_2. The + * third mark in the folio queue is used to indicate that this folio needs + * writing. + */ +void netfs_pgpriv2_mark_copy_to_cache(struct netfs_io_subrequest *subreq, + struct netfs_io_request *rreq, + struct folio_queue *folioq, + int slot) +{ + struct folio *folio = folioq_folio(folioq, slot); + + trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache); + folio_start_private_2(folio); + folioq_mark3(folioq, slot); +} + +/* + * [DEPRECATED] Cancel PG_private_2 on all marked folios in the event of an + * unrecoverable error. + */ +static void netfs_pgpriv2_cancel(struct folio_queue *folioq) +{ + struct folio *folio; + int slot; + + while (folioq) { + if (!folioq->marks3) { + folioq = folioq->next; + continue; + } + + slot = __ffs(folioq->marks3); + folio = folioq_folio(folioq, slot); + + trace_netfs_folio(folio, netfs_folio_trace_cancel_copy); + folio_end_private_2(folio); + folioq_unmark3(folioq, slot); + } +} + +/* + * [DEPRECATED] Copy a folio to the cache with PG_private_2 set. + */ +static int netfs_pgpriv2_copy_folio(struct netfs_io_request *wreq, struct folio *folio) +{ + struct netfs_io_stream *cache = &wreq->io_streams[1]; + size_t fsize = folio_size(folio), flen = fsize; + loff_t fpos = folio_pos(folio), i_size; + bool to_eof = false; + + _enter(""); + + /* netfs_perform_write() may shift i_size around the page or from out + * of the page to beyond it, but cannot move i_size into or through the + * page since we have it locked. + */ + i_size = i_size_read(wreq->inode); + + if (fpos >= i_size) { + /* mmap beyond eof. */ + _debug("beyond eof"); + folio_end_private_2(folio); + return 0; + } + + if (fpos + fsize > wreq->i_size) + wreq->i_size = i_size; + + if (flen > i_size - fpos) { + flen = i_size - fpos; + to_eof = true; + } else if (flen == i_size - fpos) { + to_eof = true; + } + + _debug("folio %zx %zx", flen, fsize); + + trace_netfs_folio(folio, netfs_folio_trace_store_copy); + + /* Attach the folio to the rolling buffer. */ + if (netfs_buffer_append_folio(wreq, folio, false) < 0) + return -ENOMEM; + + cache->submit_extendable_to = fsize; + cache->submit_off = 0; + cache->submit_len = flen; + + /* Attach the folio to one or more subrequests. For a big folio, we + * could end up with thousands of subrequests if the wsize is small - + * but we might need to wait during the creation of subrequests for + * network resources (eg. SMB credits). + */ + do { + ssize_t part; + + wreq->io_iter.iov_offset = cache->submit_off; + + atomic64_set(&wreq->issued_to, fpos + cache->submit_off); + cache->submit_extendable_to = fsize - cache->submit_off; + part = netfs_advance_write(wreq, cache, fpos + cache->submit_off, + cache->submit_len, to_eof); + cache->submit_off += part; + if (part > cache->submit_len) + cache->submit_len = 0; + else + cache->submit_len -= part; + } while (cache->submit_len > 0); + + wreq->io_iter.iov_offset = 0; + iov_iter_advance(&wreq->io_iter, fsize); + atomic64_set(&wreq->issued_to, fpos + fsize); + + if (flen < fsize) + netfs_issue_write(wreq, cache); + + _leave(" = 0"); + return 0; +} + +/* + * [DEPRECATED] Go through the buffer and write any folios that are marked with + * the third mark to the cache. + */ +void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq) +{ + struct netfs_io_request *wreq; + struct folio_queue *folioq; + struct folio *folio; + int error = 0; + int slot = 0; + + _enter(""); + + if (!fscache_resources_valid(&rreq->cache_resources)) + goto couldnt_start; + + /* Need the first folio to be able to set up the op. */ + for (folioq = rreq->buffer; folioq; folioq = folioq->next) { + if (folioq->marks3) { + slot = __ffs(folioq->marks3); + break; + } + } + if (!folioq) + return; + folio = folioq_folio(folioq, slot); + + wreq = netfs_create_write_req(rreq->mapping, NULL, folio_pos(folio), + NETFS_PGPRIV2_COPY_TO_CACHE); + if (IS_ERR(wreq)) { + kleave(" [create %ld]", PTR_ERR(wreq)); + goto couldnt_start; + } + + trace_netfs_write(wreq, netfs_write_trace_copy_to_cache); + netfs_stat(&netfs_n_wh_copy_to_cache); + + for (;;) { + error = netfs_pgpriv2_copy_folio(wreq, folio); + if (error < 0) + break; + + folioq_unmark3(folioq, slot); + if (!folioq->marks3) { + folioq = folioq->next; + if (!folioq) + break; + } + + slot = __ffs(folioq->marks3); + folio = folioq_folio(folioq, slot); + } + + netfs_issue_write(wreq, &wreq->io_streams[1]); + smp_wmb(); /* Write lists before ALL_QUEUED. */ + set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags); + + netfs_put_request(wreq, false, netfs_rreq_trace_put_return); + _leave(" = %d", error); +couldnt_start: + netfs_pgpriv2_cancel(rreq->buffer); +} + +/* + * [DEPRECATED] Remove the PG_private_2 mark from any folios we've finished + * copying. + */ +bool netfs_pgpriv2_unlock_copied_folios(struct netfs_io_request *wreq) +{ + struct folio_queue *folioq = wreq->buffer; + unsigned long long collected_to = wreq->collected_to; + unsigned int slot = wreq->buffer_head_slot; + bool made_progress = false; + + if (slot >= folioq_nr_slots(folioq)) { + folioq = netfs_delete_buffer_head(wreq); + slot = 0; + } + + for (;;) { + struct folio *folio; + unsigned long long fpos, fend; + size_t fsize, flen; + + folio = folioq_folio(folioq, slot); + if (WARN_ONCE(!folio_test_private_2(folio), + "R=%08x: folio %lx is not marked private_2\n", + wreq->debug_id, folio->index)) + trace_netfs_folio(folio, netfs_folio_trace_not_under_wback); + + fpos = folio_pos(folio); + fsize = folio_size(folio); + flen = fsize; + + fend = min_t(unsigned long long, fpos + flen, wreq->i_size); + + trace_netfs_collect_folio(wreq, folio, fend, collected_to); + + /* Unlock any folio we've transferred all of. */ + if (collected_to < fend) + break; + + trace_netfs_folio(folio, netfs_folio_trace_end_copy); + folio_end_private_2(folio); + wreq->cleaned_to = fpos + fsize; + made_progress = true; + + /* Clean up the head folioq. If we clear an entire folioq, then + * we can get rid of it provided it's not also the tail folioq + * being filled by the issuer. + */ + folioq_clear(folioq, slot); + slot++; + if (slot >= folioq_nr_slots(folioq)) { + if (READ_ONCE(wreq->buffer_tail) == folioq) + break; + folioq = netfs_delete_buffer_head(wreq); + slot = 0; + } + + if (fpos + fsize >= collected_to) + break; + } + + wreq->buffer = folioq; + wreq->buffer_head_slot = slot; + return made_progress; +} diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c new file mode 100644 index 000000000000..0350592ea804 --- /dev/null +++ b/fs/netfs/read_retry.c @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem read subrequest retrying. + * + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/fs.h> +#include <linux/slab.h> +#include "internal.h" + +static void netfs_reissue_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + struct iov_iter *io_iter = &subreq->io_iter; + + if (iov_iter_is_folioq(io_iter)) { + subreq->curr_folioq = (struct folio_queue *)io_iter->folioq; + subreq->curr_folioq_slot = io_iter->folioq_slot; + subreq->curr_folio_order = subreq->curr_folioq->orders[subreq->curr_folioq_slot]; + } + + atomic_inc(&rreq->nr_outstanding); + __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); + subreq->rreq->netfs_ops->issue_read(subreq); +} + +/* + * Go through the list of failed/short reads, retrying all retryable ones. We + * need to switch failed cache reads to network downloads. + */ +static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + struct netfs_io_stream *stream0 = &rreq->io_streams[0]; + LIST_HEAD(sublist); + LIST_HEAD(queue); + + _enter("R=%x", rreq->debug_id); + + if (list_empty(&rreq->subrequests)) + return; + + if (rreq->netfs_ops->retry_request) + rreq->netfs_ops->retry_request(rreq, NULL); + + /* If there's no renegotiation to do, just resend each retryable subreq + * up to the first permanently failed one. + */ + if (!rreq->netfs_ops->prepare_read && + !test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags)) { + struct netfs_io_subrequest *subreq; + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) + break; + if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) { + netfs_reset_iter(subreq); + netfs_reissue_read(rreq, subreq); + } + } + return; + } + + /* Okay, we need to renegotiate all the download requests and flip any + * failed cache reads over to being download requests and negotiate + * those also. All fully successful subreqs have been removed from the + * list and any spare data from those has been donated. + * + * What we do is decant the list and rebuild it one subreq at a time so + * that we don't end up with donations jumping over a gap we're busy + * populating with smaller subrequests. In the event that the subreq + * we just launched finishes before we insert the next subreq, it'll + * fill in rreq->prev_donated instead. + + * Note: Alternatively, we could split the tail subrequest right before + * we reissue it and fix up the donations under lock. + */ + list_splice_init(&rreq->subrequests, &queue); + + do { + struct netfs_io_subrequest *from; + struct iov_iter source; + unsigned long long start, len; + size_t part, deferred_next_donated = 0; + bool boundary = false; + + /* Go through the subreqs and find the next span of contiguous + * buffer that we then rejig (cifs, for example, needs the + * rsize renegotiating) and reissue. + */ + from = list_first_entry(&queue, struct netfs_io_subrequest, rreq_link); + list_move_tail(&from->rreq_link, &sublist); + start = from->start + from->transferred; + len = from->len - from->transferred; + + _debug("from R=%08x[%x] s=%llx ctl=%zx/%zx/%zx", + rreq->debug_id, from->debug_index, + from->start, from->consumed, from->transferred, from->len); + + if (test_bit(NETFS_SREQ_FAILED, &from->flags) || + !test_bit(NETFS_SREQ_NEED_RETRY, &from->flags)) + goto abandon; + + deferred_next_donated = from->next_donated; + while ((subreq = list_first_entry_or_null( + &queue, struct netfs_io_subrequest, rreq_link))) { + if (subreq->start != start + len || + subreq->transferred > 0 || + !test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) + break; + list_move_tail(&subreq->rreq_link, &sublist); + len += subreq->len; + deferred_next_donated = subreq->next_donated; + if (test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags)) + break; + } + + _debug(" - range: %llx-%llx %llx", start, start + len - 1, len); + + /* Determine the set of buffers we're going to use. Each + * subreq gets a subset of a single overall contiguous buffer. + */ + netfs_reset_iter(from); + source = from->io_iter; + source.count = len; + + /* Work through the sublist. */ + while ((subreq = list_first_entry_or_null( + &sublist, struct netfs_io_subrequest, rreq_link))) { + list_del(&subreq->rreq_link); + + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->start = start - subreq->transferred; + subreq->len = len + subreq->transferred; + stream0->sreq_max_len = subreq->len; + + __clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); + __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); + + spin_lock_bh(&rreq->lock); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + subreq->prev_donated += rreq->prev_donated; + rreq->prev_donated = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_retry); + spin_unlock_bh(&rreq->lock); + + BUG_ON(!len); + + /* Renegotiate max_len (rsize) */ + if (rreq->netfs_ops->prepare_read(subreq) < 0) { + trace_netfs_sreq(subreq, netfs_sreq_trace_reprep_failed); + __set_bit(NETFS_SREQ_FAILED, &subreq->flags); + } + + part = umin(len, stream0->sreq_max_len); + if (unlikely(rreq->io_streams[0].sreq_max_segs)) + part = netfs_limit_iter(&source, 0, part, stream0->sreq_max_segs); + subreq->len = subreq->transferred + part; + subreq->io_iter = source; + iov_iter_truncate(&subreq->io_iter, part); + iov_iter_advance(&source, part); + len -= part; + start += part; + if (!len) { + if (boundary) + __set_bit(NETFS_SREQ_BOUNDARY, &subreq->flags); + subreq->next_donated = deferred_next_donated; + } else { + __clear_bit(NETFS_SREQ_BOUNDARY, &subreq->flags); + subreq->next_donated = 0; + } + + netfs_reissue_read(rreq, subreq); + if (!len) + break; + + /* If we ran out of subrequests, allocate another. */ + if (list_empty(&sublist)) { + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) + goto abandon; + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->start = start; + + /* We get two refs, but need just one. */ + netfs_put_subrequest(subreq, false, netfs_sreq_trace_new); + trace_netfs_sreq(subreq, netfs_sreq_trace_split); + list_add_tail(&subreq->rreq_link, &sublist); + } + } + + /* If we managed to use fewer subreqs, we can discard the + * excess. + */ + while ((subreq = list_first_entry_or_null( + &sublist, struct netfs_io_subrequest, rreq_link))) { + trace_netfs_sreq(subreq, netfs_sreq_trace_discard); + list_del(&subreq->rreq_link); + netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done); + } + + } while (!list_empty(&queue)); + + return; + + /* If we hit ENOMEM, fail all remaining subrequests */ +abandon: + list_splice_init(&sublist, &queue); + list_for_each_entry(subreq, &queue, rreq_link) { + if (!subreq->error) + subreq->error = -ENOMEM; + __clear_bit(NETFS_SREQ_FAILED, &subreq->flags); + __clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); + __clear_bit(NETFS_SREQ_RETRYING, &subreq->flags); + } + spin_lock_bh(&rreq->lock); + list_splice_tail_init(&queue, &rreq->subrequests); + spin_unlock_bh(&rreq->lock); +} + +/* + * Retry reads. + */ +void netfs_retry_reads(struct netfs_io_request *rreq) +{ + trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); + + atomic_inc(&rreq->nr_outstanding); + + netfs_retry_read_subrequests(rreq); + + if (atomic_dec_and_test(&rreq->nr_outstanding)) + netfs_rreq_terminated(rreq, false); +} + +/* + * Unlock any the pages that haven't been unlocked yet due to abandoned + * subrequests. + */ +void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq) +{ + struct folio_queue *p; + + for (p = rreq->buffer; p; p = p->next) { + for (int slot = 0; slot < folioq_count(p); slot++) { + struct folio *folio = folioq_folio(p, slot); + + if (folio && !folioq_is_marked2(p, slot)) { + trace_netfs_folio(folio, netfs_folio_trace_abandon); + folio_unlock(folio); + } + } + } +} diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 0892768eea32..8e63516b40f6 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -32,6 +32,7 @@ atomic_t netfs_n_wh_buffered_write; atomic_t netfs_n_wh_writethrough; atomic_t netfs_n_wh_dio_write; atomic_t netfs_n_wh_writepages; +atomic_t netfs_n_wh_copy_to_cache; atomic_t netfs_n_wh_wstream_conflict; atomic_t netfs_n_wh_upload; atomic_t netfs_n_wh_upload_done; @@ -39,45 +40,53 @@ atomic_t netfs_n_wh_upload_failed; atomic_t netfs_n_wh_write; atomic_t netfs_n_wh_write_done; atomic_t netfs_n_wh_write_failed; +atomic_t netfs_n_wb_lock_skip; +atomic_t netfs_n_wb_lock_wait; +atomic_t netfs_n_folioq; int netfs_stats_show(struct seq_file *m, void *v) { - seq_printf(m, "Netfs : DR=%u RA=%u RF=%u WB=%u WBZ=%u\n", + seq_printf(m, "Reads : DR=%u RA=%u RF=%u WB=%u WBZ=%u\n", atomic_read(&netfs_n_rh_dio_read), atomic_read(&netfs_n_rh_readahead), atomic_read(&netfs_n_rh_read_folio), atomic_read(&netfs_n_rh_write_begin), atomic_read(&netfs_n_rh_write_zskip)); - seq_printf(m, "Netfs : BW=%u WT=%u DW=%u WP=%u\n", + seq_printf(m, "Writes : BW=%u WT=%u DW=%u WP=%u 2C=%u\n", atomic_read(&netfs_n_wh_buffered_write), atomic_read(&netfs_n_wh_writethrough), atomic_read(&netfs_n_wh_dio_write), - atomic_read(&netfs_n_wh_writepages)); - seq_printf(m, "Netfs : ZR=%u sh=%u sk=%u\n", + atomic_read(&netfs_n_wh_writepages), + atomic_read(&netfs_n_wh_copy_to_cache)); + seq_printf(m, "ZeroOps: ZR=%u sh=%u sk=%u\n", atomic_read(&netfs_n_rh_zero), atomic_read(&netfs_n_rh_short_read), atomic_read(&netfs_n_rh_write_zskip)); - seq_printf(m, "Netfs : DL=%u ds=%u df=%u di=%u\n", + seq_printf(m, "DownOps: DL=%u ds=%u df=%u di=%u\n", atomic_read(&netfs_n_rh_download), atomic_read(&netfs_n_rh_download_done), atomic_read(&netfs_n_rh_download_failed), atomic_read(&netfs_n_rh_download_instead)); - seq_printf(m, "Netfs : RD=%u rs=%u rf=%u\n", + seq_printf(m, "CaRdOps: RD=%u rs=%u rf=%u\n", atomic_read(&netfs_n_rh_read), atomic_read(&netfs_n_rh_read_done), atomic_read(&netfs_n_rh_read_failed)); - seq_printf(m, "Netfs : UL=%u us=%u uf=%u\n", + seq_printf(m, "UpldOps: UL=%u us=%u uf=%u\n", atomic_read(&netfs_n_wh_upload), atomic_read(&netfs_n_wh_upload_done), atomic_read(&netfs_n_wh_upload_failed)); - seq_printf(m, "Netfs : WR=%u ws=%u wf=%u\n", + seq_printf(m, "CaWrOps: WR=%u ws=%u wf=%u\n", atomic_read(&netfs_n_wh_write), atomic_read(&netfs_n_wh_write_done), atomic_read(&netfs_n_wh_write_failed)); - seq_printf(m, "Netfs : rr=%u sr=%u wsc=%u\n", + seq_printf(m, "Objs : rr=%u sr=%u foq=%u wsc=%u\n", atomic_read(&netfs_n_rh_rreq), atomic_read(&netfs_n_rh_sreq), + atomic_read(&netfs_n_folioq), atomic_read(&netfs_n_wh_wstream_conflict)); + seq_printf(m, "WbLock : skip=%u wait=%u\n", + atomic_read(&netfs_n_wb_lock_skip), + atomic_read(&netfs_n_wb_lock_wait)); return fscache_stats_show(m); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c index ae7a2043f670..1d438be2e1b4 100644 --- a/fs/netfs/write_collect.c +++ b/fs/netfs/write_collect.c @@ -15,15 +15,11 @@ /* Notes made in the collector */ #define HIT_PENDING 0x01 /* A front op was still pending */ -#define SOME_EMPTY 0x02 /* One of more streams are empty */ -#define ALL_EMPTY 0x04 /* All streams are empty */ -#define MAYBE_DISCONTIG 0x08 /* A front op may be discontiguous (rounded to PAGE_SIZE) */ -#define NEED_REASSESS 0x10 /* Need to loop round and reassess */ -#define REASSESS_DISCONTIG 0x20 /* Reassess discontiguity if contiguity advances */ -#define MADE_PROGRESS 0x40 /* Made progress cleaning up a stream or the folio set */ -#define BUFFERED 0x80 /* The pagecache needs cleaning up */ -#define NEED_RETRY 0x100 /* A front op requests retrying */ -#define SAW_FAILURE 0x200 /* One stream or hit a permanent failure */ +#define NEED_REASSESS 0x02 /* Need to loop round and reassess */ +#define MADE_PROGRESS 0x04 /* Made progress cleaning up a stream or the folio set */ +#define BUFFERED 0x08 /* The pagecache needs cleaning up */ +#define NEED_RETRY 0x10 /* A front op requests retrying */ +#define SAW_FAILURE 0x20 /* One stream or hit a permanent failure */ /* * Successful completion of write of a folio to the server and/or cache. Note @@ -82,55 +78,37 @@ end_wb: } /* - * Get hold of a folio we have under writeback. We don't want to get the - * refcount on it. + * Unlock any folios we've finished with. */ -static struct folio *netfs_writeback_lookup_folio(struct netfs_io_request *wreq, loff_t pos) +static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq, + unsigned int *notes) { - XA_STATE(xas, &wreq->mapping->i_pages, pos / PAGE_SIZE); - struct folio *folio; - - rcu_read_lock(); - - for (;;) { - xas_reset(&xas); - folio = xas_load(&xas); - if (xas_retry(&xas, folio)) - continue; + struct folio_queue *folioq = wreq->buffer; + unsigned long long collected_to = wreq->collected_to; + unsigned int slot = wreq->buffer_head_slot; - if (!folio || xa_is_value(folio)) - kdebug("R=%08x: folio %lx (%llx) not present", - wreq->debug_id, xas.xa_index, pos / PAGE_SIZE); - BUG_ON(!folio || xa_is_value(folio)); - - if (folio == xas_reload(&xas)) - break; + if (wreq->origin == NETFS_PGPRIV2_COPY_TO_CACHE) { + if (netfs_pgpriv2_unlock_copied_folios(wreq)) + *notes |= MADE_PROGRESS; + return; } - rcu_read_unlock(); - - if (WARN_ONCE(!folio_test_writeback(folio), - "R=%08x: folio %lx is not under writeback\n", - wreq->debug_id, folio->index)) { - trace_netfs_folio(folio, netfs_folio_trace_not_under_wback); + if (slot >= folioq_nr_slots(folioq)) { + folioq = netfs_delete_buffer_head(wreq); + slot = 0; } - return folio; -} -/* - * Unlock any folios we've finished with. - */ -static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq, - unsigned long long collected_to, - unsigned int *notes) -{ for (;;) { struct folio *folio; struct netfs_folio *finfo; unsigned long long fpos, fend; size_t fsize, flen; - folio = netfs_writeback_lookup_folio(wreq, wreq->cleaned_to); + folio = folioq_folio(folioq, slot); + if (WARN_ONCE(!folio_test_writeback(folio), + "R=%08x: folio %lx is not under writeback\n", + wreq->debug_id, folio->index)) + trace_netfs_folio(folio, netfs_folio_trace_not_under_wback); fpos = folio_pos(folio); fsize = folio_size(folio); @@ -141,12 +119,6 @@ static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq, trace_netfs_collect_folio(wreq, folio, fend, collected_to); - if (fpos + fsize > wreq->contiguity) { - trace_netfs_collect_contig(wreq, fpos + fsize, - netfs_contig_trace_unlock); - wreq->contiguity = fpos + fsize; - } - /* Unlock any folio we've transferred all of. */ if (collected_to < fend) break; @@ -155,9 +127,25 @@ static void netfs_writeback_unlock_folios(struct netfs_io_request *wreq, wreq->cleaned_to = fpos + fsize; *notes |= MADE_PROGRESS; + /* Clean up the head folioq. If we clear an entire folioq, then + * we can get rid of it provided it's not also the tail folioq + * being filled by the issuer. + */ + folioq_clear(folioq, slot); + slot++; + if (slot >= folioq_nr_slots(folioq)) { + if (READ_ONCE(wreq->buffer_tail) == folioq) + break; + folioq = netfs_delete_buffer_head(wreq); + slot = 0; + } + if (fpos + fsize >= collected_to) break; } + + wreq->buffer = folioq; + wreq->buffer_head_slot = slot; } /* @@ -188,9 +176,12 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) break; if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) { + struct iov_iter source = subreq->io_iter; + + iov_iter_revert(&source, subreq->len - source.count); __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); - netfs_reissue_write(stream, subreq); + netfs_reissue_write(stream, subreq, &source); } } return; @@ -200,6 +191,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, do { struct netfs_io_subrequest *subreq = NULL, *from, *to, *tmp; + struct iov_iter source; unsigned long long start, len; size_t part; bool boundary = false; @@ -227,6 +219,13 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, len += to->len; } + /* Determine the set of buffers we're going to use. Each + * subreq gets a subset of a single overall contiguous buffer. + */ + netfs_reset_iter(from); + source = from->io_iter; + source.count = len; + /* Work through the sublist. */ subreq = from; list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) { @@ -238,7 +237,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); stream->prepare_write(subreq); - part = min(len, subreq->max_len); + part = min(len, stream->sreq_max_len); subreq->len = part; subreq->start = start; subreq->transferred = 0; @@ -249,7 +248,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, boundary = true; netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); - netfs_reissue_write(stream, subreq); + netfs_reissue_write(stream, subreq, &source); if (subreq == to) break; } @@ -278,8 +277,6 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, subreq = netfs_alloc_subrequest(wreq); subreq->source = to->source; subreq->start = start; - subreq->max_len = len; - subreq->max_nr_segs = INT_MAX; subreq->debug_index = atomic_inc_return(&wreq->subreq_counter); subreq->stream_nr = to->stream_nr; __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); @@ -293,10 +290,12 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, to = list_next_entry(to, rreq_link); trace_netfs_sreq(subreq, netfs_sreq_trace_retry); + stream->sreq_max_len = len; + stream->sreq_max_segs = INT_MAX; switch (stream->source) { case NETFS_UPLOAD_TO_SERVER: netfs_stat(&netfs_n_wh_upload); - subreq->max_len = min(len, wreq->wsize); + stream->sreq_max_len = umin(len, wreq->wsize); break; case NETFS_WRITE_TO_CACHE: netfs_stat(&netfs_n_wh_write); @@ -307,7 +306,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, stream->prepare_write(subreq); - part = min(len, subreq->max_len); + part = umin(len, stream->sreq_max_len); subreq->len = subreq->transferred + part; len -= part; start += part; @@ -316,7 +315,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, boundary = false; } - netfs_reissue_write(stream, subreq); + netfs_reissue_write(stream, subreq, &source); if (!len) break; @@ -377,7 +376,7 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq) { struct netfs_io_subrequest *front, *remove; struct netfs_io_stream *stream; - unsigned long long collected_to; + unsigned long long collected_to, issued_to; unsigned int notes; int s; @@ -386,28 +385,22 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq) trace_netfs_rreq(wreq, netfs_rreq_trace_collect); reassess_streams: + issued_to = atomic64_read(&wreq->issued_to); smp_rmb(); collected_to = ULLONG_MAX; - if (wreq->origin == NETFS_WRITEBACK) - notes = ALL_EMPTY | BUFFERED | MAYBE_DISCONTIG; - else if (wreq->origin == NETFS_WRITETHROUGH) - notes = ALL_EMPTY | BUFFERED; + if (wreq->origin == NETFS_WRITEBACK || + wreq->origin == NETFS_WRITETHROUGH || + wreq->origin == NETFS_PGPRIV2_COPY_TO_CACHE) + notes = BUFFERED; else - notes = ALL_EMPTY; + notes = 0; /* Remove completed subrequests from the front of the streams and * advance the completion point on each stream. We stop when we hit * something that's in progress. The issuer thread may be adding stuff * to the tail whilst we're doing this. - * - * We must not, however, merge in discontiguities that span whole - * folios that aren't under writeback. This is made more complicated - * by the folios in the gap being of unpredictable sizes - if they even - * exist - but we don't want to look them up. */ for (s = 0; s < NR_IO_STREAMS; s++) { - loff_t rstart, rend; - stream = &wreq->io_streams[s]; /* Read active flag before list pointers */ if (!smp_load_acquire(&stream->active)) @@ -419,26 +412,10 @@ reassess_streams: //_debug("sreq [%x] %llx %zx/%zx", // front->debug_index, front->start, front->transferred, front->len); - /* Stall if there may be a discontinuity. */ - rstart = round_down(front->start, PAGE_SIZE); - if (rstart > wreq->contiguity) { - if (wreq->contiguity > stream->collected_to) { - trace_netfs_collect_gap(wreq, stream, - wreq->contiguity, 'D'); - stream->collected_to = wreq->contiguity; - } - notes |= REASSESS_DISCONTIG; - break; - } - rend = round_up(front->start + front->len, PAGE_SIZE); - if (rend > wreq->contiguity) { - trace_netfs_collect_contig(wreq, rend, - netfs_contig_trace_collect); - wreq->contiguity = rend; - if (notes & REASSESS_DISCONTIG) - notes |= NEED_REASSESS; + if (stream->collected_to < front->start) { + trace_netfs_collect_gap(wreq, stream, issued_to, 'F'); + stream->collected_to = front->start; } - notes &= ~MAYBE_DISCONTIG; /* Stall if the front is still undergoing I/O. */ if (test_bit(NETFS_SREQ_IN_PROGRESS, &front->flags)) { @@ -473,33 +450,27 @@ reassess_streams: cancel: /* Remove if completely consumed. */ - spin_lock(&wreq->lock); + spin_lock_bh(&wreq->lock); remove = front; list_del_init(&front->rreq_link); front = list_first_entry_or_null(&stream->subrequests, struct netfs_io_subrequest, rreq_link); stream->front = front; - if (!front) { - unsigned long long jump_to = atomic64_read(&wreq->issued_to); - - if (stream->collected_to < jump_to) { - trace_netfs_collect_gap(wreq, stream, jump_to, 'A'); - stream->collected_to = jump_to; - } - } - - spin_unlock(&wreq->lock); + spin_unlock_bh(&wreq->lock); netfs_put_subrequest(remove, false, notes & SAW_FAILURE ? netfs_sreq_trace_put_cancel : netfs_sreq_trace_put_done); } - if (front) - notes &= ~ALL_EMPTY; - else - notes |= SOME_EMPTY; + /* If we have an empty stream, we need to jump it forward + * otherwise the collection point will never advance. + */ + if (!front && issued_to > stream->collected_to) { + trace_netfs_collect_gap(wreq, stream, issued_to, 'E'); + stream->collected_to = issued_to; + } if (stream->collected_to < collected_to) collected_to = stream->collected_to; @@ -508,36 +479,6 @@ reassess_streams: if (collected_to != ULLONG_MAX && collected_to > wreq->collected_to) wreq->collected_to = collected_to; - /* If we have an empty stream, we need to jump it forward over any gap - * otherwise the collection point will never advance. - * - * Note that the issuer always adds to the stream with the lowest - * so-far submitted start, so if we see two consecutive subreqs in one - * stream with nothing between then in another stream, then the second - * stream has a gap that can be jumped. - */ - if (notes & SOME_EMPTY) { - unsigned long long jump_to = wreq->start + READ_ONCE(wreq->submitted); - - for (s = 0; s < NR_IO_STREAMS; s++) { - stream = &wreq->io_streams[s]; - if (stream->active && - stream->front && - stream->front->start < jump_to) - jump_to = stream->front->start; - } - - for (s = 0; s < NR_IO_STREAMS; s++) { - stream = &wreq->io_streams[s]; - if (stream->active && - !stream->front && - stream->collected_to < jump_to) { - trace_netfs_collect_gap(wreq, stream, jump_to, 'B'); - stream->collected_to = jump_to; - } - } - } - for (s = 0; s < NR_IO_STREAMS; s++) { stream = &wreq->io_streams[s]; if (stream->active) @@ -548,43 +489,14 @@ reassess_streams: /* Unlock any folios that we have now finished with. */ if (notes & BUFFERED) { - unsigned long long clean_to = min(wreq->collected_to, wreq->contiguity); - - if (wreq->cleaned_to < clean_to) - netfs_writeback_unlock_folios(wreq, clean_to, ¬es); + if (wreq->cleaned_to < wreq->collected_to) + netfs_writeback_unlock_folios(wreq, ¬es); } else { wreq->cleaned_to = wreq->collected_to; } // TODO: Discard encryption buffers - /* If all streams are discontiguous with the last folio we cleared, we - * may need to skip a set of folios. - */ - if ((notes & (MAYBE_DISCONTIG | ALL_EMPTY)) == MAYBE_DISCONTIG) { - unsigned long long jump_to = ULLONG_MAX; - - for (s = 0; s < NR_IO_STREAMS; s++) { - stream = &wreq->io_streams[s]; - if (stream->active && stream->front && - stream->front->start < jump_to) - jump_to = stream->front->start; - } - - trace_netfs_collect_contig(wreq, jump_to, netfs_contig_trace_jump); - wreq->contiguity = jump_to; - wreq->cleaned_to = jump_to; - wreq->collected_to = jump_to; - for (s = 0; s < NR_IO_STREAMS; s++) { - stream = &wreq->io_streams[s]; - if (stream->collected_to < jump_to) - stream->collected_to = jump_to; - } - //cond_resched(); - notes |= MADE_PROGRESS; - goto reassess_streams; - } - if (notes & NEED_RETRY) goto need_retry; if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) { diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c index 3f7e37e50c7d..04e66d587f77 100644 --- a/fs/netfs/write_issue.c +++ b/fs/netfs/write_issue.c @@ -95,7 +95,8 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping, struct netfs_io_request *wreq; struct netfs_inode *ictx; bool is_buffered = (origin == NETFS_WRITEBACK || - origin == NETFS_WRITETHROUGH); + origin == NETFS_WRITETHROUGH || + origin == NETFS_PGPRIV2_COPY_TO_CACHE); wreq = netfs_alloc_request(mapping, file, start, 0, origin); if (IS_ERR(wreq)) @@ -107,9 +108,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping, if (is_buffered && netfs_is_cache_enabled(ictx)) fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx)); - wreq->contiguity = wreq->start; wreq->cleaned_to = wreq->start; - INIT_WORK(&wreq->work, netfs_write_collection_worker); wreq->io_streams[0].stream_nr = 0; wreq->io_streams[0].source = NETFS_UPLOAD_TO_SERVER; @@ -158,22 +157,19 @@ static void netfs_prepare_write(struct netfs_io_request *wreq, subreq = netfs_alloc_subrequest(wreq); subreq->source = stream->source; subreq->start = start; - subreq->max_len = ULONG_MAX; - subreq->max_nr_segs = INT_MAX; subreq->stream_nr = stream->stream_nr; + subreq->io_iter = wreq->io_iter; _enter("R=%x[%x]", wreq->debug_id, subreq->debug_index); - trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index, - refcount_read(&subreq->ref), - netfs_sreq_trace_new); - trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); + stream->sreq_max_len = UINT_MAX; + stream->sreq_max_segs = INT_MAX; switch (stream->source) { case NETFS_UPLOAD_TO_SERVER: netfs_stat(&netfs_n_wh_upload); - subreq->max_len = wreq->wsize; + stream->sreq_max_len = wreq->wsize; break; case NETFS_WRITE_TO_CACHE: netfs_stat(&netfs_n_wh_write); @@ -192,7 +188,7 @@ static void netfs_prepare_write(struct netfs_io_request *wreq, * the list. The collector only goes nextwards and uses the lock to * remove entries off of the front. */ - spin_lock(&wreq->lock); + spin_lock_bh(&wreq->lock); list_add_tail(&subreq->rreq_link, &stream->subrequests); if (list_is_first(&subreq->rreq_link, &stream->subrequests)) { stream->front = subreq; @@ -203,7 +199,7 @@ static void netfs_prepare_write(struct netfs_io_request *wreq, } } - spin_unlock(&wreq->lock); + spin_unlock_bh(&wreq->lock); stream->construct = subreq; } @@ -223,41 +219,34 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream, if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) return netfs_write_subrequest_terminated(subreq, subreq->error, false); - // TODO: Use encrypted buffer - if (test_bit(NETFS_RREQ_USE_IO_ITER, &wreq->flags)) { - subreq->io_iter = wreq->io_iter; - iov_iter_advance(&subreq->io_iter, - subreq->start + subreq->transferred - wreq->start); - iov_iter_truncate(&subreq->io_iter, - subreq->len - subreq->transferred); - } else { - iov_iter_xarray(&subreq->io_iter, ITER_SOURCE, &wreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - } - trace_netfs_sreq(subreq, netfs_sreq_trace_submit); stream->issue_write(subreq); } void netfs_reissue_write(struct netfs_io_stream *stream, - struct netfs_io_subrequest *subreq) + struct netfs_io_subrequest *subreq, + struct iov_iter *source) { + size_t size = subreq->len - subreq->transferred; + + // TODO: Use encrypted buffer + subreq->io_iter = *source; + iov_iter_advance(source, size); + iov_iter_truncate(&subreq->io_iter, size); + __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); netfs_do_issue_write(stream, subreq); } -static void netfs_issue_write(struct netfs_io_request *wreq, - struct netfs_io_stream *stream) +void netfs_issue_write(struct netfs_io_request *wreq, + struct netfs_io_stream *stream) { struct netfs_io_subrequest *subreq = stream->construct; if (!subreq) return; stream->construct = NULL; - - if (subreq->start + subreq->len > wreq->start + wreq->submitted) - WRITE_ONCE(wreq->submitted, subreq->start + subreq->len - wreq->start); + subreq->io_iter.count = subreq->len; netfs_do_issue_write(stream, subreq); } @@ -290,13 +279,14 @@ int netfs_advance_write(struct netfs_io_request *wreq, netfs_prepare_write(wreq, stream, start); subreq = stream->construct; - part = min(subreq->max_len - subreq->len, len); - _debug("part %zx/%zx %zx/%zx", subreq->len, subreq->max_len, part, len); + part = umin(stream->sreq_max_len - subreq->len, len); + _debug("part %zx/%zx %zx/%zx", subreq->len, stream->sreq_max_len, part, len); subreq->len += part; subreq->nr_segs++; + stream->submit_extendable_to -= part; - if (subreq->len >= subreq->max_len || - subreq->nr_segs >= subreq->max_nr_segs || + if (subreq->len >= stream->sreq_max_len || + subreq->nr_segs >= stream->sreq_max_segs || to_eof) { netfs_issue_write(wreq, stream); subreq = NULL; @@ -410,19 +400,26 @@ static int netfs_write_folio(struct netfs_io_request *wreq, folio_unlock(folio); if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) { - if (!fscache_resources_valid(&wreq->cache_resources)) { + if (!cache->avail) { trace_netfs_folio(folio, netfs_folio_trace_cancel_copy); netfs_issue_write(wreq, upload); netfs_folio_written_back(folio); return 0; } trace_netfs_folio(folio, netfs_folio_trace_store_copy); + } else if (!upload->avail && !cache->avail) { + trace_netfs_folio(folio, netfs_folio_trace_cancel_store); + netfs_folio_written_back(folio); + return 0; } else if (!upload->construct) { trace_netfs_folio(folio, netfs_folio_trace_store); } else { trace_netfs_folio(folio, netfs_folio_trace_store_plus); } + /* Attach the folio to the rolling buffer. */ + netfs_buffer_append_folio(wreq, folio, false); + /* Move the submission point forward to allow for write-streaming data * not starting at the front of the page. We don't do write-streaming * with the cache as the cache requires DIO alignment. @@ -432,7 +429,6 @@ static int netfs_write_folio(struct netfs_io_request *wreq, */ for (int s = 0; s < NR_IO_STREAMS; s++) { stream = &wreq->io_streams[s]; - stream->submit_max_len = fsize; stream->submit_off = foff; stream->submit_len = flen; if ((stream->source == NETFS_WRITE_TO_CACHE && streamw) || @@ -440,7 +436,6 @@ static int netfs_write_folio(struct netfs_io_request *wreq, fgroup == NETFS_FOLIO_COPY_TO_CACHE)) { stream->submit_off = UINT_MAX; stream->submit_len = 0; - stream->submit_max_len = 0; } } @@ -467,12 +462,13 @@ static int netfs_write_folio(struct netfs_io_request *wreq, if (choose_s < 0) break; stream = &wreq->io_streams[choose_s]; + wreq->io_iter.iov_offset = stream->submit_off; + atomic64_set(&wreq->issued_to, fpos + stream->submit_off); + stream->submit_extendable_to = fsize - stream->submit_off; part = netfs_advance_write(wreq, stream, fpos + stream->submit_off, stream->submit_len, to_eof); - atomic64_set(&wreq->issued_to, fpos + stream->submit_off); stream->submit_off += part; - stream->submit_max_len -= part; if (part > stream->submit_len) stream->submit_len = 0; else @@ -481,6 +477,8 @@ static int netfs_write_folio(struct netfs_io_request *wreq, debug = true; } + wreq->io_iter.iov_offset = 0; + iov_iter_advance(&wreq->io_iter, fsize); atomic64_set(&wreq->issued_to, fpos + fsize); if (!debug) @@ -505,10 +503,14 @@ int netfs_writepages(struct address_space *mapping, struct folio *folio; int error = 0; - if (wbc->sync_mode == WB_SYNC_ALL) + if (!mutex_trylock(&ictx->wb_lock)) { + if (wbc->sync_mode == WB_SYNC_NONE) { + netfs_stat(&netfs_n_wb_lock_skip); + return 0; + } + netfs_stat(&netfs_n_wb_lock_wait); mutex_lock(&ictx->wb_lock); - else if (!mutex_trylock(&ictx->wb_lock)) - return 0; + } /* Need the first folio to be able to set up the op. */ folio = writeback_iter(mapping, wbc, NULL, &error); @@ -525,10 +527,10 @@ int netfs_writepages(struct address_space *mapping, netfs_stat(&netfs_n_wh_writepages); do { - _debug("wbiter %lx %llx", folio->index, wreq->start + wreq->submitted); + _debug("wbiter %lx %llx", folio->index, atomic64_read(&wreq->issued_to)); /* It appears we don't have to handle cyclic writeback wrapping. */ - WARN_ON_ONCE(wreq && folio_pos(folio) < wreq->start + wreq->submitted); + WARN_ON_ONCE(wreq && folio_pos(folio) < atomic64_read(&wreq->issued_to)); if (netfs_folio_group(folio) != NETFS_FOLIO_COPY_TO_CACHE && unlikely(!test_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags))) { @@ -672,6 +674,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t part = netfs_advance_write(wreq, upload, start, len, false); start += part; len -= part; + iov_iter_advance(&wreq->io_iter, part); if (test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) { trace_netfs_rreq(wreq, netfs_rreq_trace_wait_pause); wait_on_bit(&wreq->flags, NETFS_RREQ_PAUSE, TASK_UNINTERRUPTIBLE); |