summaryrefslogtreecommitdiffstats
path: root/net/sunrpc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: Fix NFS READs that start at non-page-aligned offsetsChuck Lever2021-02-011-3/+4
| | | | | | | | | | | | | | | | | | | Anj Duvnjak reports that the Kodi.tv NFS client is not able to read video files from a v5.10.11 Linux NFS server. The new sendpage-based TCP sendto logic was not attentive to non- zero page_base values. nfsd_splice_read() sets that field when a READ payload starts in the middle of a page. The Linux NFS client rarely emits an NFS READ that is not page- aligned. All of my testing so far has been with Linux clients, so I missed this one. Reported-by: A. Duvnjak <avian@extremenerds.net> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471 Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: A. Duvnjak <avian@extremenerds.net>
* SUNRPC: Handle 0 length opaque XDR object data properlyDave Wysochanski2021-01-251-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling an auth_gss downcall, it's possible to get 0-length opaque object for the acceptor. In the case of a 0-length XDR object, make sure simple_get_netobj() fills in dest->data = NULL, and does not continue to kmemdup() which will set dest->data = ZERO_SIZE_PTR for the acceptor. The trace event code can handle NULL but not ZERO_SIZE_PTR for a string, and so without this patch the rpcgss_context trace event will crash the kernel as follows: [ 162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 162.898693] #PF: supervisor read access in kernel mode [ 162.900830] #PF: error_code(0x0000) - not-present page [ 162.902940] PGD 0 P4D 0 [ 162.904027] Oops: 0000 [#1] SMP PTI [ 162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133 [ 162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 162.910978] RIP: 0010:strlen+0x0/0x20 [ 162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202 [ 162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697 [ 162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010 [ 162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000 [ 162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10 [ 162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028 [ 162.936777] FS: 00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000 [ 162.940067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0 [ 162.945300] Call Trace: [ 162.946428] trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss] [ 162.949308] ? __kmalloc_track_caller+0x35/0x5a0 [ 162.951224] ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss] [ 162.953484] gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss] [ 162.955953] rpc_pipe_write+0x58/0x70 [sunrpc] [ 162.957849] vfs_write+0xcb/0x2c0 [ 162.959264] ksys_write+0x68/0xe0 [ 162.960706] do_syscall_64+0x33/0x40 [ 162.962238] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 162.964346] RIP: 0033:0x7f1e1f1e57df Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Move simple_get_bytes and simple_get_netobj into private headerDave Wysochanski2021-01-253-58/+45
| | | | | | | | | | | | Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfsd-5.11-2' of ↵Linus Torvalds2021-01-191-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results - Fix a tracepoint change that went in the initial 5.11 merge * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Move the svc_xdr_recvfrom tracepoint again nfsd4: readdirplus shouldn't return parent of export
| * SUNRPC: Move the svc_xdr_recvfrom tracepoint againChuck Lever2021-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 156708adf2d9 ("SUNRPC: Move the svc_xdr_recvfrom() tracepoint") tried to capture the correct XID in the trace record, but this line in svc_recv: rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]); alters the size of rq_arg.head[0].iov_len. The tracepoint records the correct XID but an incorrect value for the length of the xdr_buf's head. To keep the trace callsites simple, I've created two trace classes. One assumes the xdr_buf contains a full RPC message, and the XID can be extracted from it. The other assumes the contents of the xdr_buf are arbitrary, and the xid will be provided by the caller. Currently there is only one user of each class, but I expect we will need a few more tracepoints using each class as time goes on. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2021-01-121-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Trond Myklebust: "Highlights include: - Fix parsing of link-local IPv6 addresses - Fix confusing logging of mount errors that was introduced by the fsopen() patchset. - Fix a tracing use after free in _nfs4_do_setlk() - Layout return-on-close fixes when called from nfs4_evict_inode() - Layout segments were being leaked in pnfs_generic_clear_request_commit() - Don't leak DS commits in pnfs_generic_retry_commit() - Fix an Oopsable use-after-free when nfs_delegation_find_inode_server() calls iput() on an inode after the super block has gone away" * tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: nfs_igrab_and_active must first reference the superblock NFS: nfs_delegation_find_inode_server must first reference the superblock NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit() NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request pNFS: Stricter ordering of layoutget and layoutreturn pNFS: Clean up pnfs_layoutreturn_free_lsegs() pNFS: We want return-on-close to complete when evicting the inode pNFS: Mark layout for return if return-on-close was not sent net: sunrpc: interpret the return value of kstrtou32 correctly NFS: Adjust fs_context error logging NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
| * net: sunrpc: interpret the return value of kstrtou32 correctlyj.nixdorf@avm.de2021-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A return value of 0 means success. This is documented in lib/kstrtox.c. This was found by trying to mount an NFS share from a link-local IPv6 address with the interface specified by its index: mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1") Before this commit this failed with EINVAL and also caused the following message in dmesg: [...] NFS: bad IP address specified: addr=fe80::1%1 The syscall using the same address based on the interface name instead of its index succeeds. Credits for this patch go to my colleague Christian Speich, who traced the origin of this bug to this line of code. Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Fixes: 00cfaa943ec3 ("replace strict_strto calls") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | Merge tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds2021-01-111-1/+85
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd fixes from Chuck Lever: - Fix major TCP performance regression - Get NFSv4.2 READ_PLUS regression tests to pass - Improve NFSv4 COMPOUND memory allocation - Fix sparse warning * tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: NFSD: Restore NFSv4 decoding's SAVEMEM functionality SUNRPC: Handle TCP socket sends with kernel_sendpage() again NFSD: Fix sparse warning in nfssvc.c nfsd: Don't set eof on a truncated READ_PLUS nfsd: Fixes for nfsd4_encode_read_plus_data()
| * SUNRPC: Handle TCP socket sends with kernel_sendpage() againChuck Lever2020-12-181-1/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daire Byrne reports a ~50% aggregrate throughput regression on his Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends"), which replaced kernel_send_page() calls in NFSD's socket send path with calls to sock_sendmsg() using iov_iter. Investigation showed that tcp_sendmsg() was not using zero-copy to send the xdr_buf's bvec pages, but instead was relying on memcpy. This means copying every byte of a large NFS READ payload. It looks like TLS sockets do indeed support a ->sendpage method, so it's really not necessary to use xprt_sock_sendmsg() to support TLS fully on the server. A mechanical reversion of da1661b93bf4 is not possible at this point, but we can re-implement the server's TCP socket sendmsg path using kernel_sendpage(). Reported-by: Daire Byrne <daire@dneg.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2020-12-1713-453/+769
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support - A series of patches to improve readdir performance, particularly with large directories - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers - Micro-optimisations for RDMA - RDMA tracing improvements Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni) - Various fixes for getxattr and listxattr, when used over non-TCP transports - Fixes for containerised NFS from Sargun Dhillon - switch nfsiod to be an UNBOUND workqueue (Neil Brown) - READDIR should not ask for security label information if there is no LSM policy (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay) - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes" * tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read NFSv4/pnfs: Add tracing for the deviceid cache fs/lockd: convert comma to semicolon NFSv4.2: fix error return on memory allocation failure NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer NFSv4.2: decode_read_plus_hole() needs to check the extent offset NFSv4.2: decode_read_plus_data() must skip padding after data segment NFSv4.2: Ensure we always reset the result->count in decode_read_plus() SUNRPC: When expanding the buffer, we may need grow the sparse pages SUNRPC: Cleanup - constify a number of xdr_buf helpers SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field SUNRPC: _copy_to/from_pages() now check for zero length SUNRPC: Cleanup xdr_shrink_bufhead() SUNRPC: Fix xdr_expand_hole() SUNRPC: Fixes for xdr_align_data() SUNRPC: _shift_data_left/right_pages should check the shift length ...
| * Merge tag 'nfs-rdma-for-5.11-1' of ↵Trond Myklebust2020-12-166-75/+90
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next NFSoRDmA Client updates for Linux 5.11 Cleanups and improvements: - Remove use of raw kernel memory addresses in tracepoints - Replace dprintk() call sites in ERR_CHUNK path - Trace unmap sync calls - Optimize MR DMA-unmapping Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| | * xprtrdma: Micro-optimize MR DMA-unmappingChuck Lever2020-11-112-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that rpcrdma_ep is no longer part of rpcrdma_xprt, there are four or five serial address dereferences needed to get to the IB device needed for DMA unmapping. Instead, let's use the same pattern that regbufs use: cache a pointer to the device in the MR, and use that as the indication that unmapping is necessary. This also guarantees that the exact same device is used for DMA mapping and unmapping, even if the r_xprt's ep has been replaced. I don't think this can happen today, but future changes might break this assumption. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Move rpcrdma_mr_put()Chuck Lever2020-11-113-33/+28
| | | | | | | | | | | | | | | | | | | | | | | | Clean up: This function is now invoked only in frwr_ops.c. The move enables deduplication of the trace_xprtrdma_mr_unmap() call site. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Trace unmap_sync callsChuck Lever2020-11-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->buf_free is called nearly once per RPC. Only rarely does xprt_rdma_free() have to do anything, thus tracing every one of these calls seems unnecessary. Instead, just throw a trace event when that one occasional RPC still has MRs that need to be released. xprt_rdma_free() is further micro-optimized to reduce the amount of work done in the common case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Display the task ID when reporting MR eventsChuck Lever2020-11-112-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tie each MR event to the requesting rpc_task to make it easier to follow MR ownership and control flow. MR unmapping and recycling can happen in the background, after an MR's mr_req field is stale, so set up a separate tracepoint class for those events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Clean up trace_xprtrdma_nomrs()Chuck Lever2020-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rename it following the "_err" suffix convention - Replace display of kernel memory addresses - Tie MR exhaustion to a peer IP address, similar to the createmrs tracepoint Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Clean up xprtrdma callback tracepointsChuck Lever2020-11-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | - Replace displayed kernel memory addresses - Tie the XID and event with the peer's IP address Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Clean up tracepoints in the reply pathChuck Lever2020-11-111-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace unnecessary display of kernel memory addresses. Also, there are no longer any trace_xprtrdma_defer_cmp() call sites. And remove the trace_xprtrdma_leaked_rep() tracepoint because there doesn't seem to be an overwhelming need to have a tracepoint for catching a software bug that has long since been fixed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Clean up reply parsing error tracepointsChuck Lever2020-11-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rename the tracepoints with the "_err" suffix to indicate these are rare error events - Replace display of kernel memory addresses - Tie the XID and error to a connection IP address instead Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Clean up trace_xprtrdma_post_linvChuck Lever2020-11-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | - Replace the display of kernel memory addresses - Add "_err" to the end of its name to indicate that it's a tracepoint that fires only when there's an error Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Introduce FRWR completion IDsChuck Lever2020-11-112-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set up a completion ID in each rpcrdma_frwr. The ID is used to match an incoming completion to a transport (CQ) and other MR-related activity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Introduce Send completion IDsChuck Lever2020-11-112-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set up a completion ID in each rpcrdma_req. The ID is used to match an incoming Send completion to a transport and to a previous ib_post_send(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Introduce Receive completion IDsChuck Lever2020-11-112-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set up a completion ID in each rpcrdma_rep. The ID is used to match an incoming Receive completion to a transport and to a previous ib_post_recv(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * xprtrdma: Replace dprintk call sites in ERR_CHUNK pathChuck Lever2020-11-111-10/+3
| | | | | | | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: When expanding the buffer, we may need grow the sparse pagesTrond Myklebust2020-12-141-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we're shifting the page data to the right, and this happens to be a sparse page array, then we may need to allocate new pages in order to receive the data. Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Cleanup - constify a number of xdr_buf helpersTrond Myklebust2020-12-141-29/+24
| | | | | | | | | | | | | | | | | | | | | | | | There are a number of xdr helpers for struct xdr_buf that do not change the structure itself. Mark those as taking const pointers for documentation purposes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' fieldTrond Myklebust2020-12-141-13/+16
| | | | | | | | | | | | | | | | | | | | | Move the setting of the xdr_stream 'nwords' field into the helpers that reset the xdr_stream cursor. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: _copy_to/from_pages() now check for zero lengthTrond Myklebust2020-12-141-4/+2
| | | | | | | | | | | | | | | | | | | | | Clean up callers of _copy_to/from_pages() that still check for a zero length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Cleanup xdr_shrink_bufhead()Trond Myklebust2020-12-141-77/+87
| | | | | | | | | | | | | | | | | | | | | Clean up xdr_shrink_bufhead() to use the new helpers instead of doing its own thing. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix xdr_expand_hole()Trond Myklebust2020-12-141-95/+179
| | | | | | | | | | | | | | | | | | | | | We do want to try to grow the buffer if possible, but if that attempt fails, we still want to move the data and truncate the XDR message. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fixes for xdr_align_data()Trond Myklebust2020-12-141-42/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | The main use case right now for xdr_align_data() is to shift the page data to the left, and in practice shrink the total XDR data buffer. This patch ensures that we fix up the accounting for the buffer length as we shift that data around. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: _shift_data_left/right_pages should check the shift lengthTrond Myklebust2020-12-141-0/+12
| | | | | | | | | | | | | | | | | | Exit early if the shift is zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | xprtrdma: Fix XDRBUF_SPARSE_PAGES supportChuck Lever2020-12-141-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | sunrpc: fix xs_read_xdr_buf for partial pages receiveDan Aloni2020-12-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When receiving pages data, return value 'ret' when positive includes `buf->page_base`, so we should subtract that before it is used for changing `offset` and comparing against `want`. This was discovered on the very rare cases where the server returned a chunk of bytes that when added to the already received amount of bytes for the pages happened to match the current `recv.len`, for example on this case: buf->page_base : 258356 actually received from socket: 1740 ret : 260096 want : 260096 In this case neither of the two 'if ... goto out' trigger, and we continue to tail parsing. Worth to mention that the ensuing EMSGSIZE from the continued execution of `xs_read_xdr_buf` may be observed by an application due to 4 superfluous bytes being added to the pages data. Fixes: 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching to using iterators") Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs'Fedor Tokarev2020-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'snprintf' returns the number of characters which would have been written if enough space had been available, excluding the terminating null byte. Thus, the return value of 'sizeof(buf)' means that the last character has been dropped. Signed-off-by: Fedor Tokarev <ftokarev@gmail.com> Fixes: 2f34b8bfae19 ("SUNRPC: add links for all client xprts to debugfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix open coded xdr_stream_remaining()Trond Myklebust2020-12-021-2/+2
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix up xdr_set_page()Trond Myklebust2020-12-021-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | While we always want to align to the next page and/or the beginning of the tail buffer when we call xdr_set_next_page(), the functions xdr_align_data() and xdr_expand_hole() really want to align to the next object in that next page or tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()Trond Myklebust2020-12-022-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to contain the length of the data that we expect to want placed in the head kvec plus a count of 1 word of padding that is placed after the page data. This is very confusing when trying to read the code, and sometimes leads to callers adding an arbitrary value of '1' just in order to satisfy the requirement (whether or not the page data actually needs such padding). This patch aims to clarify the code by changing the 'hdrsize' argument to remove that 1 word of padding. This means we need to subtract the padding from all the existing callers. Fixes: 02ef04e432ba ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix up xdr_read_pages() to take arbitrary object lengthsTrond Myklebust2020-12-021-26/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up xdr_read_pages() so that it can handle object lengths that are larger than the page length, by simply aligning to the next object in the buffer tail. The function will continue to return the length of the truncate object data that actually fit into the pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base()Trond Myklebust2020-12-021-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow xdr_set_iov() to set a base so that we can use it to set the cursor to a specific position in the kvec buffer. If the new base overflows the kvec/pages buffer in either xdr_set_iov() or xdr_set_page_base(), then truncate it so that we point to the end of the buffer. Finally, change both function to return the number of bytes remaining to read in their buffers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix up typo in xdr_init_decode()Trond Myklebust2020-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | We already know that the head buffer and page are empty, so if there is any data, it is in the tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Fix up open coded kmemdup_nul()Trond Myklebust2020-12-021-3/+1
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Remove unused function xprt_load_transport()Trond Myklebust2020-12-021-15/+0
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Add a helper to return the transport identifier given a netidTrond Myklebust2020-12-021-4/+21
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Close a race with transport setup and module putTrond Myklebust2020-12-021-11/+33
| | | | | | | | | | | | | | | | | | | | | After we've looked up the transport module, we need to ensure it can't go away until we've finished running the transport setup code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: xprt_load_transport() needs to support the netid "rdma6"Trond Myklebust2020-12-024-16/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: rpc_wake_up() should wake up tasks in the correct orderTrond Myklebust2020-12-021-30/+35
| | | | | | | | | | | | | | | | | | | | | | | | Currently, we wake up the tasks by priority queue ordering, which means that we ignore the batching that is supposed to help with QoS issues. Fixes: c049f8ea9a0d ("SUNRPC: Remove the bh-safe lock requirement on the rpc_wait_queue->lock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcallChuck Lever2020-12-022-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds2020-12-1614-650/+1315
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "Several substantial changes this time around: - Previously, exporting an NFS mount via NFSD was considered to be an unsupported feature. With v5.11, the community has attempted to make re-exporting a first-class feature of NFSD. This would enable the Linux in-kernel NFS server to be used as an intermediate cache for a remotely-located primary NFS server, for example, even with other NFS server implementations, like a NetApp filer, as the primary. - A short series of patches brings support for multiple RPC/RDMA data chunks per RPC transaction to the Linux NFS server's RPC/RDMA transport implementation. This is a part of the RPC/RDMA spec that the other premiere NFS/RDMA implementation (Solaris) has had for a very long time, and completes the implementation of RPC/RDMA version 1 in the Linux kernel's NFS server. - Long ago, NFSv4 support was introduced to NFSD using a series of C macros that hid dprintk's and goto's. Over time, the kernel's XDR implementation has been greatly improved, but these C macros have remained and become fallow. A series of patches in this pull request completely replaces those macros with the use of current kernel XDR infrastructure. Benefits include: - More robust input sanitization in NFSD's NFSv4 XDR decoders. - Make it easier to use common kernel library functions that use XDR stream APIs (for example, GSS-API). - Align the structure of the source code with the RFCs so it is easier to learn, verify, and maintain our XDR implementation. - Removal of more than a hundred hidden dprintk() call sites. - Removal of some explicit manipulation of pages to help make the eventual transition to xdr->bvec smoother. - On top of several related fixes in 5.10-rc, there are a few more fixes to get the Linux NFSD implementation of NFSv4.2 inter-server copy up to speed. And as usual, there is a pinch of seasoning in the form of a collection of unrelated minor bug fixes and clean-ups. Many thanks to all who contributed this time around!" * tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits) nfsd: Record NFSv4 pre/post-op attributes as non-atomic nfsd: Set PF_LOCAL_THROTTLE on local filesystems only nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE exportfs: Add a function to return the raw output from fh_to_dentry() nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: allow filesystems to opt out of subtree checking nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Revert "nfsd4: support change_attr_type attribute" nfsd4: don't query change attribute in v2/v3 case nfsd: minor nfsd4_change_attribute cleanup nfsd: simplify nfsd4_change_info nfsd: only call inode_query_iversion in the I_VERSION case nfs_common: need lock during iterate through the list NFSD: Fix 5 seconds delay when doing inter server copy NFSD: Fix sparse warning in nfs4proc.c SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall sunrpc: clean-up cache downcall nfsd: Fix message level for normal termination NFSD: Remove macros that are no longer used NFSD: Replace READ* macros in nfsd4_decode_compound() ...
| * | | SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcallChuck Lever2020-12-092-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <kolga@netapp.com>