summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: Clean upTrond Myklebust2019-03-091-33/+14
| | | | | | Replace remaining callers of call_timeout() with rpc_check_timeout(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Respect RPC call timeouts when retrying transmissionTrond Myklebust2019-03-071-18/+24
| | | | | | | | Fix a regression where soft and softconn requests are not timing out as expected. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Fix up RPC back channel transmissionTrond Myklebust2019-03-071-28/+33
| | | | | | | | | | | | | Now that transmissions happen through a queue, we require the RPC tasks to handle error conditions that may have been set while they were sleeping. The back channel does not currently do this, but assumes that any error condition happens during its own call to xprt_transmit(). The solution is to ensure that the back channel splits out the error handling just like the forward channel does. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Prevent thundering herd when the socket is not connectedTrond Myklebust2019-03-071-4/+17
| | | | | | | | | | | If the socket is not connected, then we want to initiate a reconnect rather that trying to transmit requests. If there is a large number of requests queued and waiting for the lock in call_transmit(), then it can take a while for one of the to loop back and retake the lock in call_connect. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Allow dynamic allocation of back channel slotsTrond Myklebust2019-03-021-16/+25
| | | | | | | | Now that the reads happen in a process context rather than a softirq, it is safe to allocate back channel slots using a reclaiming allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Bump the default callback session slot count to 16Trond Myklebust2019-03-021-1/+1
| | | | | | | | Users can still control this value explicitly using the max_session_cb_slots module parameter, but let's bump the default up to 16 for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpcTrond Myklebust2019-03-023-8/+5
| | | | | | | Convert the remaining gfp_flags arguments in sunrpc to standard reclaiming allocations, now that we set memalloc_nofs_save() as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Clean up mirror DS initialisationTrond Myklebust2019-03-021-35/+31
| | | | | | | Get rid of the redundant parameter and rename the function ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()Trond Myklebust2019-03-021-15/+0
| | | | | | | nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is a valid pointer, then so is mirror->mirror_ds->ds. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()Trond Myklebust2019-03-023-26/+10
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfile: Simplify nfs4_ff_layout_ds_version()Trond Myklebust2019-03-022-5/+5
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify ff_layout_get_ds_cred()Trond Myklebust2019-03-023-8/+9
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()Trond Myklebust2019-03-023-10/+6
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()Trond Myklebust2019-03-023-16/+5
| | | | | | | Pass in a pointer to the mirror rather than having to retrieve it from the array and then verify the resulting pointer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Speed up read failover when DSes are downTrond Myklebust2019-03-023-12/+73
| | | | | | | If we notice that a DS may be down, we should attempt to read from the other mirrors first before we go back to retry the dead DS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Don't invalidate DS deviceids for being unresponsiveTrond Myklebust2019-03-022-21/+3
| | | | | | | | If the DS is unresponsive, we want to just mark it as such, while reporting the errors. If the server later returns the same deviceid in a new layout, then we don't want to have to look it up again. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Remove bogus checks for invalid deviceidsTrond Myklebust2019-03-021-20/+0
| | | | | | We already check the deviceids before we start the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Avoid unnecessary layout invalidationsTrond Myklebust2019-03-021-3/+3
| | | | | | | | | | | In ff_layout_mirror_valid() we may not want to invalidate the layout segment despite the call to GETDEVICEINFO failing. The reason is that a read may still be able to make progress on another mirror. So instead we let the caller (in this case nfs4_ff_layout_prepare_ds()) decide whether or not it needs to invalidate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()Trond Myklebust2019-03-023-22/+28
| | | | | | | | | While we may want to skip attempting to connect to a downed mirror when we're deciding which mirror to select for a read, we do not want to do so once we've committed to attempting the I/O in ff_layout_read/write_pagelist(), or ff_layout_initiate_commit() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Handle early exit in layoutget by returning an errorTrond Myklebust2019-03-021-2/+4
| | | | | | | If the LAYOUTGET rpc call exits early without an error, convert it to EAGAIN. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Send LAYOUTERROR when failing over mirrored readsTrond Myklebust2019-03-023-6/+57
| | | | | | | | | | | | | When a read to the preferred mirror returns an error, the flexfiles driver records the error in the inode list and currently marks the layout for return before failing over the attempted read to the next mirror. What we actually want to do is fire off a LAYOUTERROR to notify the MDS that there is an issue with the preferred mirror, then we fail over. Only once we've failed to read from all mirrors should we return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.2: Add client support for the generic 'layouterror' RPC callTrond Myklebust2019-03-018-1/+306
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust2019-03-013-0/+25
| | | | | | | | | | | | If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable()Trond Myklebust2019-03-011-0/+3
| | | | | | | Fix the memory barriers in nfs4_mark_deviceid_unavailable() and nfs4_test_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Fix up sparse RCU annotationsTrond Myklebust2019-03-011-2/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()Trond Myklebust2019-03-011-13/+19
| | | | | | | | | If the attempt to instantiate the mirror's layout DS pointer failed, then that pointer may hold a value of type ERR_PTR(), so we need to check that before we dereference it. Fixes: 65990d1afbd2d ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Add missing encode / decode sequence_maxsz to v4.2 operationsAnna Schumaker2019-03-011-0/+10
| | | | | | | | | | | | | | | | These really should have been there from the beginning, but we never noticed because there was enough slack in the RPC request for the extra bytes. Chuck's recent patch to use au_cslack and au_rslack to compute buffer size shrunk the buffer enough that this was now a problem for SEEK operations on my test client. Fixes: f4ac1674f5da4 ("nfs: Add ALLOCATE support") Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") Fixes: cb95deea0b4aa ("NFS OFFLOAD_CANCEL xdr") Fixes: 624bd5b7b683c ("nfs: Add DEALLOCATE support") Fixes: 1c6dcbe5ceff8 ("NFS: Implement SEEK") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Don't process the sequence op more than once.Trond Myklebust2019-03-011-1/+1
| | | | | | | Ensure that if we call nfs41_sequence_process() a second time for the same rpc_task, then we only process the results once. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Reinitialise sequence results before retransmitting a requestTrond Myklebust2019-03-011-4/+8
| | | | | | | | | If we have to retransmit a request, we should ensure that we reinitialise the sequence results structure, since in the event of a signal we need to treat the request as if it had not been sent. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org
* SUNRPC: Fix an Oops in udp_poll()Trond Myklebust2019-02-262-2/+20
| | | | | | | | udp_poll() checks the struct file for the O_NONBLOCK flag, so we must not call it with a NULL file pointer. Fixes: 0ffe86f48026 ("SUNRPC: Use poll() to fix up the socket requeue races") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-rdma-for-5.1-1' of ↵Trond Myklebust2019-02-2547-1581/+2078
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.1 New features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure enctypes - Reduce size of RPC receive buffers Bugfixes and cleanups: - Fix sparse warnings - Check inline size before providing a write chunk - Reduce the receive doorbell rate - Various tracepoint improvements [Trond: Fix up merge conflicts] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * SUNRPC: Use au_rslack when computing reply buffer sizeChuck Lever2019-02-141-3/+4
| | | | | | | | | | | | | | | | | | au_rslack is significantly smaller than (au_cslack << 2). Using that value results in smaller receive buffers. In some cases this eliminates an extra segment in Reply chunks (RPC/RDMA). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add rpc_auth::au_ralign fieldChuck Lever2019-02-145-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently rpc_inline_rcv_pages() uses au_rslack to estimate the size of the upper layer reply header. This is fine for auth flavors where au_verfsize == au_rslack. However, some auth flavors have more going on. krb5i for example has two more words after the verifier, and another blob following the RPC message. The calculation involving au_rslack pushes the upper layer reply header too far into the rcv_buf. au_rslack is still valuable: it's the amount of buffer space needed for the reply, and is used when allocating the reply buffer. We'll keep that. But, add a new field that can be used to properly estimate the location of the upper layer header in each RPC reply, based on the auth flavor in use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Make AUTH_SYS and AUTH_NULL set au_verfsizeChuck Lever2019-02-144-3/+7
| | | | | | | | | | | | | | | | au_verfsize will be needed for a non-flavor-specific computation in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Account for XDR pad of buf->pagesChuck Lever2019-02-145-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain NFS results (eg. READLINK) might expect a data payload that is not an exact multiple of 4 bytes. In this case, XDR encoding is required to pad that payload so its length on the wire is a multiple of 4 bytes. The constants that define the maximum size of each NFS result do not appear to account for this extra word. In each case where the data payload is to be received into pages: - 1 word is added to the size of the receive buffer allocated by call_allocate - rpc_inline_rcv_pages subtracts 1 word from @hdrsize so that the extra buffer space falls into the rcv_buf's tail iovec - If buf->pagelen is word-aligned, an XDR pad is not needed and is thus removed from the tail Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Introduce rpc_prepare_reply_pages()Chuck Lever2019-02-147-73/+102
| | | | | | | | | | | | | | | | | | | | | | | | prepare_reply_buffer() and its NFSv4 equivalents expose the details of the RPC header and the auth slack values to upper layer consumers, creating a layering violation, and duplicating code. Remedy these issues by adding a new RPC client API that hides those details from upper layers in a common helper function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add SPDX IDs to some net/sunrpc/auth_gss/ filesChuck Lever2019-02-148-136/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Files under net/sunrpc/auth_gss/ do not yet have SPDX ID tags. This directory is somewhat complicated because most of these files have license boilerplate that is not strictly GPL 2.0. In this patch I add ID tags where there is an obvious match. The less recognizable licenses are still under research. For reference, SPDX IDs added in this patch correspond to the following license text: GPL-2.0 https://spdx.org/licenses/GPL-2.0.html GPL-2.0+ https://spdx.org/licenses/GPL-2.0+.html BSD-3-Clause https://spdx.org/licenses/BSD-3-Clause.html Cc: Simo Sorce <simo@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Remove xdr_buf_trim()Chuck Lever2019-02-144-46/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | The key action of xdr_buf_trim() is that it shortens buf->len, the length of the xdr_buf's content. The other actions -- shortening the head, pages, and tail components -- are actually not necessary. In particular, changing the size of those components can corrupt the RPC message contained in the buffer. This is an accident waiting to happen rather than a current bug, as far as we know. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Introduce trace points in rpc_auth_gss.koChuck Lever2019-02-147-92/+530
| | | | | | | | | | | | | | | | | | | | Add infrastructure for trace points in the RPC_AUTH_GSS kernel module, and add a few sample trace points. These report exceptional or unexpected events, and observe the assignment of GSS sequence numbers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Use struct xdr_stream when decoding RPC Reply headerChuck Lever2019-02-147-201/+243
| | | | | | | | | | | | | | | | Modernize and harden the code path that parses an RPC Reply message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Clean up rpc_verify_header()Chuck Lever2019-02-133-128/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | - Recover some instruction count because I'm about to introduce a few xdr_inline_decode call sites - Replace dprintk() call sites with trace points - Reduce the hot path so it fits in fewer cachelines I've also renamed it rpc_decode_header() to match everything else in the RPC client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Use struct xdr_stream when constructing RPC Call headerChuck Lever2019-02-138-181/+266
| | | | | | | | | | | | | | | | Modernize and harden the code path that constructs each RPC Call message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add build option to disable support for insecure enctypesChuck Lever2019-02-133-1/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable distributions to enforce the rejection of ancient and insecure Kerberos enctypes in the kernel's RPCSEC_GSS implementation. These are the single-DES encryption types that were deprecated in 2012 by RFC 6649. Enctypes that were deprecated more recently (by RFC 8429) remain fully supported for now because they are still likely to be widely used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Simo Sorce <simo@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Remove rpc_xprt::tsh_sizeChuck Lever2019-02-137-58/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tsh_size was added to accommodate transports that send a pre-amble before each RPC message. However, this assumes the pre-amble is fixed in size, which isn't true for some transports. That makes tsh_size not very generic. Also I'd like to make the estimation of RPC send and receive buffer sizes more precise. tsh_size doesn't currently appear to be accounted for at all by call_allocate. Therefore let's just remove the tsh_size concept, and make the only transports that have a non-zero tsh_size employ a direct approach. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Remove some dprintk() call sites from auth functionsChuck Lever2019-02-132-37/+1
| | | | | | | | | | | | | | | | | | Clean up: Reduce dprintk noise by removing dprintk() call sites from hot path that do not report exceptions. These are usually replaceable with function graph tracing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Add trace events to report non-zero NFS status codesChuck Lever2019-02-136-4/+133
| | | | | | | | | | | | | | | | These can help field troubleshooting without needing the overhead of a full network capture (ie, tcpdump). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Remove print_overflow_msg()Chuck Lever2019-02-138-600/+219
| | | | | | | | | | | | | | This issue is now captured by a trace point in the RPC client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add trace event that reports reply page vector alignmentChuck Lever2019-02-132-6/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want READ payloads that are partially in the head iovec and in the page buffer because this requires pull-up, which can be expensive. The NFS/RPC client tries hard to predict the size of the head iovec so that the incoming READ data payload lands only in the page vector, but it doesn't always get it right. To help diagnose such problems, add a trace point in the logic that decodes READ-like operations that reports whether pull-up is being done. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add XDR overflow trace eventChuck Lever2019-02-132-7/+84
| | | | | | | | | | | | | | | | This can help field troubleshooting without needing the overhead of a full network capture (ie, tcpdump). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add xdr_stream::rqst fieldChuck Lever2019-02-138-15/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Having access to the controlling rpc_rqst means a trace point in the XDR code can report: - the XID - the task ID and client ID - the p_name of RPC being processed Subsequent patches will introduce such trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>