summaryrefslogtreecommitdiffstats
path: root/fs/nfs (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
| * NFS: Reduce readdir stack usageTrond Myklebust2020-12-021-25/+33
| | | | | | | | | | | | | | | | | | | | The descriptor and the struct nfs_entry are both large structures, so don't allocate them from the stack. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: nfs_do_filldir() does not return a valueTrond Myklebust2020-12-021-14/+7
| | | | | | | | | | | | | | | | | | Clean up nfs_do_filldir(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: More readdir cleanupsTrond Myklebust2020-12-021-14/+11
| | | | | | | | | | | | | | | | | | | | | | Remove the redundant caching of the credential in struct nfs_open_dir_context. Pass the buffer size as an argument to nfs_readdir_xdr_filler(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Support larger readdir buffersTrond Myklebust2020-12-023-22/+21
| | | | | | | | | | | | | | | | | | | | Support readdir buffers of up to 1MB in size so that we can read large directories using few RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Simplify struct nfs_cache_array_entryTrond Myklebust2020-12-021-21/+25
| | | | | | | | | | | | | | | | | | | | We don't need to store a hash, so replace struct qstr with a simple const char pointer and length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Replace kmap() with kmap_atomic() in nfs_readdir_search_array()Trond Myklebust2020-12-021-2/+2
| | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Remove unnecessary kmap in nfs_readdir_xdr_to_array()Trond Myklebust2020-12-021-7/+3
| | | | | | | | | | | | | | | | | | | | The kmapped pointer is only used once per loop to check if we need to exit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Don't discard readdir resultsTrond Myklebust2020-12-021-4/+42
| | | | | | | | | | | | | | | | | | | | | | If a readdir call returns more data than we can fit into one page cache page, then allocate a new one for that data rather than discarding the data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Clean up directory array handlingTrond Myklebust2020-12-021-61/+77
| | | | | | | | | | | | | | | | | | | | Refactor to use pagecache_get_page() so that we can fill the page in multiple stages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Clean up nfs_readdir_page_filler()Trond Myklebust2020-12-021-21/+18
| | | | | | | | | | | | | | | | | | | | Clean up handling of the case where there are no entries in the readdir reply. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Clean up readdir struct nfs_cache_arrayTrond Myklebust2020-12-021-17/+49
| | | | | | | | | | | | | | | | | | | | Since the 'eof_index' is only ever used as a flag, make it so. Also add a flag to detect if the page has been completely filled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Ensure contents of struct nfs_open_dir_context are consistentTrond Myklebust2020-12-021-29/+43
| | | | | | | | | | | | | | | | | | | | | | Ensure that the contents of struct nfs_open_dir_context are consistent by setting them under the file->f_lock from a private copy (that is known to be consistent). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFSv4.2: condition READDIR's mask for security label based on LSM stateOlga Kornievskaia2020-12-021-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the client will always ask for security_labels if the server returns that it supports that feature regardless of any LSM modules (such as Selinux) enforcing security policy. This adds performance penalty to the READDIR operation. Client adjusts superblock's support of the security_label based on the server's support but also current client's configuration of the LSM modules. Thus, prior to using the default bitmask in READDIR, this patch checks the server's capabilities and then instructs READDIR to remove FATTR4_WORD2_SECURITY_LABEL from the bitmask. v5: fixing silly mistakes of the rushed v4 v4: simplifying logic v3: changing label's initialization per Ondrej's comment v2: dropping selinux hook and using the sb cap. Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Suggested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: 2b0143b5c986 ("VFS: normal filesystems (and lustre): d_inode() annotations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookuppTrond Myklebust2020-12-021-1/+5
| | | | | | | | | | | | | | We need to respect the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp, by timing out if the server is unavailable. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv3: Add emulation of the lookupp() operationTrond Myklebust2020-12-021-0/+15
| | | | | | | | | | | | | | | | In order to use the open_by_filehandle() operations on NFSv3, we need to be able to emulate lookupp() so that nfs_get_parent() can be used to convert disconnected dentries into connected ones. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv3: Refactor nfs3_proc_lookup() to split out the dentryTrond Myklebust2020-12-021-11/+22
| | | | | | | | | | | | | | We want to reuse the lookup code in NFSv3 in order to emulate the NFSv4 lookupp operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Fix 5 seconds delay when doing inter server copyDai Ngo2020-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit b4868b44c5628 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operationChuck Lever2020-12-022-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <kolga@netapp.com> Suggested-by: Olga kornievskaia <kolga@netapp.com> Fixes: c10a75145feb ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga kornievskaia <kolga@netapp.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Tested-by: Olga kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2020-12-161-0/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge yet more updates from Andrew Morton: - lots of little subsystems - a few post-linux-next MM material. Most of the rest awaits more merging of other trees. Subsystems affected by this series: alpha, procfs, misc, core-kernel, bitmap, lib, lz4, checkpatch, nilfs, kdump, rapidio, gcov, bfs, relay, resource, ubsan, reboot, fault-injection, lzo, apparmor, and mm (swap, memory-hotplug, pagemap, cleanups, and gup). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (86 commits) mm: fix some spelling mistakes in comments mm: simplify follow_pte{,pmd} mm: unexport follow_pte_pmd apparmor: remove duplicate macro list_entry_is_head() lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static fault-injection: handle EI_ETYPE_TRUE reboot: hide from sysfs not applicable settings reboot: allow to override reboot type if quirks are found reboot: remove cf9_safe from allowed types and rename cf9_force reboot: allow to specify reboot mode via sysfs reboot: refactor and comment the cpu selection code lib/ubsan.c: mark type_check_kinds with static keyword kcov: don't instrument with UBSAN ubsan: expand tests and reporting ubsan: remove UBSAN_MISC in favor of individual options ubsan: enable for all*config builds ubsan: disable UBSAN_TRAP for all*config ubsan: disable object-size sanitizer under GCC ubsan: move cc-option tests into Kconfig ubsan: remove redundant -Wno-maybe-uninitialized ...
| * | kernel.h: split out mathematical helpersAndy Shevchenko2020-12-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out mathematical helpers. At the same time convert users in header and lib folder to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [sfr@canb.auug.org.au: fix powerpc build] Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds2020-12-1610-12/+13
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "Several substantial changes this time around: - Previously, exporting an NFS mount via NFSD was considered to be an unsupported feature. With v5.11, the community has attempted to make re-exporting a first-class feature of NFSD. This would enable the Linux in-kernel NFS server to be used as an intermediate cache for a remotely-located primary NFS server, for example, even with other NFS server implementations, like a NetApp filer, as the primary. - A short series of patches brings support for multiple RPC/RDMA data chunks per RPC transaction to the Linux NFS server's RPC/RDMA transport implementation. This is a part of the RPC/RDMA spec that the other premiere NFS/RDMA implementation (Solaris) has had for a very long time, and completes the implementation of RPC/RDMA version 1 in the Linux kernel's NFS server. - Long ago, NFSv4 support was introduced to NFSD using a series of C macros that hid dprintk's and goto's. Over time, the kernel's XDR implementation has been greatly improved, but these C macros have remained and become fallow. A series of patches in this pull request completely replaces those macros with the use of current kernel XDR infrastructure. Benefits include: - More robust input sanitization in NFSD's NFSv4 XDR decoders. - Make it easier to use common kernel library functions that use XDR stream APIs (for example, GSS-API). - Align the structure of the source code with the RFCs so it is easier to learn, verify, and maintain our XDR implementation. - Removal of more than a hundred hidden dprintk() call sites. - Removal of some explicit manipulation of pages to help make the eventual transition to xdr->bvec smoother. - On top of several related fixes in 5.10-rc, there are a few more fixes to get the Linux NFSD implementation of NFSv4.2 inter-server copy up to speed. And as usual, there is a pinch of seasoning in the form of a collection of unrelated minor bug fixes and clean-ups. Many thanks to all who contributed this time around!" * tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits) nfsd: Record NFSv4 pre/post-op attributes as non-atomic nfsd: Set PF_LOCAL_THROTTLE on local filesystems only nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE exportfs: Add a function to return the raw output from fh_to_dentry() nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: allow filesystems to opt out of subtree checking nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Revert "nfsd4: support change_attr_type attribute" nfsd4: don't query change attribute in v2/v3 case nfsd: minor nfsd4_change_attribute cleanup nfsd: simplify nfsd4_change_info nfsd: only call inode_query_iversion in the I_VERSION case nfs_common: need lock during iterate through the list NFSD: Fix 5 seconds delay when doing inter server copy NFSD: Fix sparse warning in nfs4proc.c SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall sunrpc: clean-up cache downcall nfsd: Fix message level for normal termination NFSD: Remove macros that are no longer used NFSD: Replace READ* macros in nfsd4_decode_compound() ...
| * | nfsd: Record NFSv4 pre/post-op attributes as non-atomicTrond Myklebust2020-12-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | For the case of NFSv4, specify to the client that the pre/post-op attributes were not recorded atomically with the main operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | nfsd: Set PF_LOCAL_THROTTLE on local filesystems onlyTrond Myklebust2020-12-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | nfsd: close cached files prior to a REMOVE or RENAME that would replace targetJeff Layton2020-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename. On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache. This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename. Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename. None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | nfsd: allow filesystems to opt out of subtree checkingJeff Layton2020-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle. To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operationsJeff Layton2020-12-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicity is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()Chuck Lever2020-11-309-12/+10
| | | | | | | | | | | | | | | | | | Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | NFS: Disable READ_PLUS by defaultAnna Schumaker2020-12-102-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | We've been seeing failures with xfstests generic/091 and generic/263 when using READ_PLUS. I've made some progress on these issues, and the tests fail later on but still don't pass. Let's disable READ_PLUS by default until we can work out what is going on. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | NFSv4.2: Fix 5 seconds delay when doing inter server copyDai Ngo2020-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit b4868b44c5628 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operationChuck Lever2020-12-102-9/+13
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <kolga@netapp.com> Suggested-by: Olga kornievskaia <kolga@netapp.com> Fixes: c10a75145feb ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga kornievskaia <kolga@netapp.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Tested-by: Olga kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabledTrond Myklebust2020-11-302-15/+48
|/ | | | | | | | | | If the flexfiles mirroring is enabled, then the read code expects to be able to set pgio->pg_mirror_idx to point to the data server that is being used for this particular read. However it does not change the pg_mirror_count because we only need to send a single read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove unnecessary inode lock in nfs_fsync_dir()Trond Myklebust2020-11-121-5/+1
| | | | | | | | nfs_inc_stats() is already thread-safe, and there are no other reasons to hold the inode lock here. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove unnecessary inode locking in nfs_llseek_dir()Trond Myklebust2020-11-121-5/+4
| | | | | | | | Remove the contentious inode lock, and instead provide thread safety using the file->f_lock spinlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Fix listxattr receive buffer sizeChuck Lever2020-11-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | Certain NFSv4.2/RDMA tests fail with v5.9-rc1. rpcrdma_convert_kvec() runs off the end of the rl_segments array because rq_rcv_buf.tail[0].iov_len holds a very large positive value. The resultant kernel memory corruption is enough to crash the client system. Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT in the maximum decode size for a possible XDR pad of the contents of the xdr_buf's pages. That guarantees the allocated receive buffer will be large enough to accommodate the usual contents plus that XDR pad word. encode_op_hdr() cannot add that extra word. If it does, xdr_inline_pages() underruns the length of the tail iovec. Fixes: 3e1f02123fba ("NFSv4.2: add client side XDR handling for extended attributes") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4.2: fix failure to unregister shrinkerJ. Bruce Fields2020-11-121-0/+2
| | | | | | | | | | | | | | | | | | We forgot to unregister the nfs4_xattr_large_entry_shrinker. That leaves the global list of shrinkers corrupted after unload of the nfs module, after which possibly unrelated code that calls register_shrinker() or unregister_shrinker() gets a BUG() with "supervisor write access in kernel mode". And similarly for the nfs4_xattr_large_entry_lru. Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Tested-By: Kris Karas <bugs-a17@moonlit-rail.com> Fixes: 95ad37f90c33 "NFSv4.2: add client side xattr caching." Signed-off-by: J. Bruce Fields <bfields@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* nfsroot: Default mount option should ask for built-in NFS versionHelge Deller2020-11-021-0/+6
| | | | | | | | | | Change the nfsroot default mount option to ask for NFSv2 only *if* the kernel was built with NFSv2 support. If not, default to NFSv3 or as last choice to NFSv4, depending on actual kernel config. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2020-10-223-6/+54
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "The one new feature this time, from Anna Schumaker, is READ_PLUS, which has the same arguments as READ but allows the server to return an array of data and hole extents. Otherwise it's a lot of cleanup and bugfixes" * tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits) NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy SUNRPC: fix copying of multiple pages in gss_read_proxy_verf() sunrpc: raise kernel RPC channel buffer size svcrdma: fix bounce buffers for unaligned offsets and multiple pages nfsd: remove unneeded break net/sunrpc: Fix return value for sysctl sunrpc.transports NFSD: Encode a full READ_PLUS reply NFSD: Return both a hole and a data segment NFSD: Add READ_PLUS hole segment encoding NFSD: Add READ_PLUS data support NFSD: Hoist status code encoding into XDR encoder functions NFSD: Map nfserr_wrongsec outside of nfsd_dispatch NFSD: Remove the RETURN_STATUS() macro NFSD: Call NFSv2 encoders on error returns NFSD: Fix .pc_release method for NFSv2 NFSD: Remove vestigial typedefs NFSD: Refactor nfsd_dispatch() error paths NFSD: Clean up nfsd_dispatch() variables NFSD: Clean up stale comments in nfsd_dispatch() NFSD: Clean up switch statement in nfsd_dispatch() ...
| * NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copyDai Ngo2020-10-213-6/+54
| | | | | | | | | | | | | | | | | | | | | | | | NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have build errors and some configs with NFSD=m to get NFS4ERR_STALE error when doing inter server copy. Added ops table in nfs_common for knfsd to access NFS client modules. Fixes: 3ac3711adb88 ("NFSD: Fix NFS server build errors") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2020-10-2015-114/+396
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Stable Fixes: - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+ - Fix nfs_path in case of a rename retry - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag New features and improvements: - Replace dprintk() calls with tracepoints - Make cache consistency bitmap dynamic - Added support for the NFS v4.2 READ_PLUS operation - Improvements to net namespace uniquifier Other bugfixes and cleanups: - Remove redundant clnt pointer - Don't update timeout values on connection resets - Remove redundant tracepoints - Various cleanups to comments - Fix oops when trying to use copy_file_range with v4.0 source server - Improvements to flexfiles mirrors - Add missing 'local_lock=posix' mount option" * tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits) NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag NFSv4: Fix up RCU annotations for struct nfs_netns_client NFS: Only reference user namespace from nfs4idmap struct instead of cred nfs: add missing "posix" local_lock constant table definition NFSv4: Use the net namespace uniquifier if it is set NFSv4: Clean up initialisation of uniquified client id strings NFS: Decode a full READ_PLUS reply SUNRPC: Add an xdr_align_data() function NFS: Add READ_PLUS hole segment decoding SUNRPC: Add the ability to expand holes in data pages SUNRPC: Split out _shift_data_right_tail() SUNRPC: Split out xdr_realign_pages() from xdr_align_pages() NFS: Add READ_PLUS data segment support NFS: Use xdr_page_pos() in NFSv4 decode_getacl() SUNRPC: Implement a xdr_page_pos() function SUNRPC: Split out a function for setting current page NFS: fix nfs_path in case of a rename retry fs: nfs: return per memcg count for xattr shrinkers NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE nfs: remove incorrect fallthrough label ...
| * | NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flagOlga Kornievskaia2020-10-161-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 7862 introduced a new flag that either client or server is allowed to set: EXCHGID4_FLAG_SUPP_FENCE_OPS. Client needs to update its bitmask to allow for this flag value. v2: changed minor version argument to unsigned int Signed-off-by: Olga Kornievskaia <kolga@netapp.com> CC: <stable@vger.kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Fix up RCU annotations for struct nfs_netns_clientTrond Myklebust2020-10-152-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The identifier is read as an RCU protected string. Its value may be changed during the lifetime of the network namespace by writing a new string into the sysfs pseudofile (at which point, we free the old string only after a call to synchronize_rcu()). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Only reference user namespace from nfs4idmap struct instead of credSargun Dhillon2020-10-131-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfs4idmapper only needs access to the user namespace, and not the entire cred struct. This replaces the struct cred* member with struct user_namespace*. This is mostly hygiene, so we don't have to hold onto the cred object, which has extraneous references to things like user_struct. This also makes switching away from init_user_ns more straightforward in the future. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | nfs: add missing "posix" local_lock constant table definitionScott Mayhew2020-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | "mount -o local_lock=posix..." was broken by the mount API conversion due to the missing constant. Fixes: e38bb238ed8c ("NFS: Convert mount option parsing to use functionality from fs_parser.h") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Use the net namespace uniquifier if it is setTrond Myklebust2020-10-091-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | If a container sets a net namespace specific uniquifier, then use that in the setclientid/exchangeid process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Clean up initialisation of uniquified client id stringsTrond Myklebust2020-10-091-41/+34
| | | | | | | | | | | | | | | | | | | | | | | | When the user sets a uniquifier, then ensure we copy the string so that calls to strlen() etc are atomic with calls to snprintf(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Decode a full READ_PLUS replyAnna Schumaker2020-10-071-17/+19
| | | | | | | | | | | | | | | | | | | | | Decode multiple hole and data segments sent by the server, placing everything directly where they need to go in the xdr pages. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Add READ_PLUS hole segment decodingAnna Schumaker2020-10-071-1/+25
| | | | | | | | | | | | | | | | | | | | | We keep things simple for now by only decoding a single hole or data segment returned by the server, even if they returned more to us. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Add READ_PLUS data segment supportAnna Schumaker2020-10-074-3/+184
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds client support for decoding a single NFS4_CONTENT_DATA segment returned by the server. This is the simplest implementation possible, since it does not account for any hole segments in the reply. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Use xdr_page_pos() in NFSv4 decode_getacl()Anna Schumaker2020-10-071-5/+1
| | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: fix nfs_path in case of a rename retryAshish Sangwan2020-10-061-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are generating incorrect path in case of rename retry because we are restarting from wrong dentry. We should restart from the dentry which was received in the call to nfs_path. CC: stable@vger.kernel.org Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>