summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: remove scheduling boost for "SWAPPER" tasks.NeilBrown2022-03-132-18/+0
| | | | | | | | | | | | | | | | | Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC/xprt: async tasks mustn't block waiting for memoryNeilBrown2022-03-132-2/+5
| | | | | | | | | | | | | | | | | | When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC/auth: async tasks mustn't block waiting for memoryNeilBrown2022-03-135-4/+22
| | | | | | | | | | | | | | | | | | | | When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC/call_alloc: async tasks mustn't block waiting for memoryNeilBrown2022-03-132-2/+6
| | | | | | | | | | | | | | | | | | When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: remove IS_SWAPFILE hackNeilBrown2022-03-131-5/+0
| | | | | | | | | | This code is pointless as IS_SWAPFILE is always defined. So remove it. Suggested-by: Mark Hemment <markhemm@googlemail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Remove remaining dfprintks related to fscache and remove NFSDBG_FSCACHEDave Wysochanski2022-03-132-11/+1
| | | | | | | | | | | | The fscache cookie APIs including fscache_acquire_cookie() and fscache_relinquish_cookie() now have very good tracing. Thus, there is no real need for dfprintks in the NFS fscache interface. The NFS fscache interface has removed all dfprintks so remove the NFSDBG_FSCACHE defines. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Replace dfprintks with tracepoints in fscache read and write page functionsDave Wysochanski2022-03-132-18/+102
| | | | | | | | | Most of fscache and other NFS IO paths are now using tracepoints. Remove the dfprintks in the NFS fscache read/write page functions and replace with tracepoints at the begin and end of the functions. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Rename fscache read and write pages functionsDave Wysochanski2022-03-133-22/+15
| | | | | | | | Rename NFS fscache functions in a more consistent fashion to better reflect when we read from and write to fscache. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Cleanup usage of nfs_inode in fscache interfaceDave Wysochanski2022-03-132-15/+13
| | | | | | | | A number of places in the fscache interface used nfs_inode when inode could be used, simplifying the code. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1 restrict GETATTR fs_location query to the main transportOlga Kornievskaia2022-03-131-2/+13
| | | | | | | | | In the presence of trunking transports, it's helpful to make sure that during the migration event, the GETATTR for fs_location attribute happens on the main transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: remove unneeded check in decode_devicenotify_args()Alexey Khoroshilov2022-03-131-4/+0
| | | | | | | | | | [You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Overflow check in not needed anymore after we switch to kmalloc_array(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: a4f743a6bb20 ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Cache all entries in the readdirplus replyTrond Myklebust2022-03-021-14/+26
| | | | | | | Even if we're not able to cache all the entries in the readdir buffer, let's ensure that we do prime the dcache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Optimise away the previous cookie fieldTrond Myklebust2022-03-025-17/+15
| | | | | | | Replace the 'previous cookie' field in struct nfs_entry with the array->last_cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Fix up forced readdirplusTrond Myklebust2022-03-023-17/+41
| | | | | | | Avoid clearing the entire readdir page cache if we're just doing forced readdirplus for the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Convert readdir page cache to use a cookie based indexTrond Myklebust2022-03-023-86/+69
| | | | | | | | | | | | | | | | | | | | | Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Clean up page array initialisation/freeTrond Myklebust2022-03-021-10/+6
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Trace effects of the readdirplus heuristicTrond Myklebust2022-03-022-1/+60
| | | | | | | Enable tracking of when the readdirplus heuristic causes a page cache invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Trace effects of readdirplus on the dcacheTrond Myklebust2022-03-022-0/+8
| | | | | | Trace the effects of readdirplus on attribute and dentry revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Add basic readdir tracingTrond Myklebust2022-03-022-1/+80
| | | | | | | Add tracing to track how often the client goes to the server for updated readdir information. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't request readdirplus when revalidation was forcedTrond Myklebust2022-03-021-10/+16
| | | | | | | | If the revalidation was forced, due to the presence of a LOOKUP_EXCL or a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help when we're doing a path component lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Readdirplus can't help lookup for case insensitive filesystemsTrond Myklebust2022-03-021-0/+2
| | | | | | | If the filesystem is case insensitive, then readdirplus can't help with cache misses, since it won't return case folded variants of the filename. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Ask for a full XDR buffer of readdir goodnessTrond Myklebust2022-03-022-6/+7
| | | | | | | | Instead of pretending that we know the ratio of directory info vs readdirplus attribute info, just set the 'dircount' field to the same value as the 'maxcount' field. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't ask for readdirplus unless it can help nfs_getattr()Trond Myklebust2022-03-021-20/+25
| | | | | | | | | If attribute caching is turned off, then use of readdirplus is not going to help stat() performance. Readdirplus also doesn't help if a file is being written to, since we will have to flush those writes in order to sync the mtime/ctime. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Improve heuristic for readdirplusTrond Myklebust2022-03-025-36/+58
| | | | | | | | | | | | | | | | | | | The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Reduce use of uncached readdirTrond Myklebust2022-03-021-20/+3
| | | | | | | | | | When reading a very large directory, we want to try to keep the page cache up to date if doing so is inexpensive. With the change to allow readdir to continue reading even when the cache is incomplete, we no longer need to fall back to uncached readdir in order to scale to large directories. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Simplify nfs_readdir_xdr_to_array()Trond Myklebust2022-03-021-18/+11
| | | | | | | | Recent changes to readdir mean that we can cope with partially filled page cache entries, so we no longer need to rely on looping in nfs_readdir_xdr_to_array(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: If the cookie verifier changes, we must invalidate the page cacheTrond Myklebust2022-03-021-1/+6
| | | | | | | Ensure that if the cookie verifier changes when we use the zero-valued cookie, then we invalidate any cached pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Adjust the amount of readahead performed by NFS readdirTrond Myklebust2022-03-022-1/+53
| | | | | | | | | | | | | | | The current NFS readdir code will always try to maximise the amount of readahead it performs on the assumption that we can cache anything that isn't immediately read by the process. There are several cases where this assumption breaks down, including when the 'ls -l' heuristic kicks in to try to force use of readdirplus as a batch replacement for lookup/getattr. This patch therefore tries to tone down the amount of readahead we perform, and adjust it to try to match the amount of data being requested by user space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't advance the page pointer unless the page is fullTrond Myklebust2022-03-021-10/+22
| | | | | | | When we hit the end of the data in the readdir page, we don't want to start filling a new page, unless this one is full. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't re-read the entire page cache to find the next cookieTrond Myklebust2022-03-022-3/+8
| | | | | | | | | | | | | | If the page cache entry that was last read gets invalidated for some reason, then make sure we can re-create it on the next call to readdir. This, combined with the cache page validation, allows us to reuse the cached value of page-index on successive calls to nfs_readdir. Credit is due to Benjamin Coddington for showing that the concept works, and that it allows for improved cache sharing between processes even in the case where pages are lost due to LRU or active invalidation. Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Store the change attribute in the directory page cacheTrond Myklebust2022-03-021-31/+37
| | | | | | | | Use the change attribute and the first cookie in a directory page cache entry to validate that the page is up to date. Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Calculate page offsets algorithmicallyTrond Myklebust2022-02-281-5/+13
| | | | | | | Instead of relying on counting the page offsets as we walk through the page cache, switch to calculating them algorithmically. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Use kzalloc() to avoid initialising the nfs_open_dir_contextTrond Myklebust2022-02-281-7/+4
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Initialise the readdir verifier as best we can in nfs_opendir()Trond Myklebust2022-02-281-0/+1
| | | | | | | | For the purpose of ensuring that opendir() followed by seekdir() work as correctly as possible, try to initialise the readdir verifier in nfs_opendir(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Trace lookup revalidation failureTrond Myklebust2022-02-281-12/+5
| | | | | | Enable tracing of lookup revalidation failures. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: constify nfs_server_capable() and nfs_have_writebacks()Trond Myklebust2022-02-281-4/+3
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Return valid errors from nfs2/3_decode_dirent()Trond Myklebust2022-02-282-16/+7
| | | | | | | | | | | | Valid return values for decode_dirent() callback functions are: 0: Success -EBADCOOKIE: End of directory -EAGAIN: End of xdr_stream All errors need to map into one of those three values. Fixes: 573c4e1ef53a ("NFS: Simplify ->decode_dirent() calling sequence") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Revert "NFSv4: use unique client identifiers in network namespaces"Trond Myklebust2022-02-281-14/+0
| | | | | | | | | This reverts commit 50c790a0b69bdc420f00f30bdf348d6c90194c78. The functionality is believed to be capable of causing regressions in existing setups, so the author has requested that it be reverted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Use of mapping_set_error() results in spurious errorsTrond Myklebust2022-02-261-1/+4
| | | | | | | | | | | The use of mapping_set_error() in conjunction with calls to filemap_check_errors() is problematic because every error gets reported as either an EIO or an ENOSPC by filemap_check_errors() in functions such as filemap_write_and_wait() or filemap_write_and_wait_range(). In almost all cases, we prefer to use the more nuanced wb errors. Fixes: b8946d7bfb94 ("NFS: Revalidate the file mapping on all fatal writeback errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Clean up NFSv4.2 xattrsTrond Myklebust2022-02-263-12/+18
| | | | | | | Add a helper for the xattr mask so that we can get rid of the inlined ifdefs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget()Trond Myklebust2022-02-261-2/+0
| | | | | | | We should never expect the 'xattr_cache' to be non-null in that case, hence nfs_set_cache_invalid() is just going to optimise it away. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTRTrond Myklebust2022-02-262-0/+2
| | | | | | | | | | | | | | | | Ensure that we always initialise the 'xattr_support' field in struct nfs_fsinfo, so that nfs_server_set_fsinfo() doesn't declare our NFSv2/v3 client to be capable of supporting the NFSv4.2 xattr protocol by setting the NFS_CAP_XATTR capability. This configuration can cause nfs_do_access() to set access mode bits that are unsupported by the NFSv3 ACCESS call, which may confuse spec-compliant servers. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Remove unused flag NFS_INO_REVAL_PAGECACHETrond Myklebust2022-02-263-5/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Replace last uses of NFS_INO_REVAL_PAGECACHETrond Myklebust2022-02-263-19/+15
| | | | | | | Now that we have more fine grained attribute revalidation, let's just get rid of NFS_INO_REVAL_PAGECACHE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: use unique client identifiers in network namespacesBenjamin Coddington2022-02-261-0/+14
| | | | | | | | | | | In order to differentiate client state, assign a random uuid to the uniquifing portion of the client identifier when a network namespace is created. Containers may still override this value if they wish to maintain stable client identifiers by writing to /sys/fs/nfs/net/client/identifier, either by udev rules or other means. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1 support for NFS4_RESULT_PRESERVER_UNLINKEDOlga Kornievskaia2022-02-264-2/+12
| | | | | | | | | | In 4.1+, the server is allowed to set a flag NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells the client that it does not need to do a silly rename of an opened file when it's being removed. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC/xprtrdma: Convert GFP_NOFS to GFP_KERNELTrond Myklebust2022-02-262-3/+3
| | | | | | | Assume that the upper layers have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC/auth_gss: Convert GFP_NOFS to GFP_KERNELTrond Myklebust2022-02-264-19/+19
| | | | | | | Assume that the upper layers have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Convert GFP_NOFS to GFP_KERNELTrond Myklebust2022-02-265-7/+7
| | | | | | | | | The sections which should not re-enter the filesystem are already protected with memalloc_nofs_save/restore calls, so it is better to use GFP_KERNEL in these calls to allow better performance for synchronous RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.2/copyoffload: Convert GFP_NOFS to GFP_KERNELTrond Myklebust2022-02-263-8/+8
| | | | | | | There doesn't seem to be any reason why the copy offload code can't use GFP_KERNEL. It can't get called by direct reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>