summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sunrpc: Prevent duplicate XID allocationChuck Lever2018-06-191-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Krzysztof Kozlowski <krzk@kernel.org> reports that a heavy NFSv4 WRITE workload against a slow NFS server causes his Raspberry Pi clients to stall. Krzysztof bisected it to commit 37ac86c3a76c ("SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock") . I was able to reproduce similar behavior and it appears that rarely the RPC client layer is re-allocating an XID for an RPC that it has already partially sent. This results in the client ignoring the subsequent reply, which carries the original XID. For various reasons, checking !req->rq_xmit_bytes_sent in xprt_prepare_transmit is not a 100% reliable mechanism for determining when a fresh XID is needed. Trond's preference is to allocate the XID at the time each rpc_rqst slot is initialized. This patch should also address a gcc 4.1.2 complaint reported by Geert Uytterhoeven <geert@linux-m68k.org>. Reported-by: Krzysztof Kozlowski <krzk@kernel.org> Fixes: 37ac86c3a76c ("SUNRPC: Initialize rpc_rqst outside of ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't send layoutreturn if the layout is already invalidTrond Myklebust2018-06-192-0/+21
| | | | | | | If the layout was invalidated due to a reboot, then don't try to send a layoutreturn for it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Always free the session slot on error in nfs4_layoutget_handle_exceptionTrond Myklebust2018-06-191-7/+10
| | | | | | | | Right now, we can call nfs_commit_inode() while holding the session slot, which could lead to NFSv4 deadlocks. Ensure we only keep the slot if the server returned a layout that we have to process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Fix an rcu deadlock in nfs_delegation_find_inode()Anna Schumaker2018-06-141-1/+3
| | | | | | | | | | I was able to reproduce this pretty regularily using xfstests generic/013 on NFS v4.0. Reported-by: Ross Zwisler <Ross.Zwisler@linux.intel.com> Fixes: 6c342655022d (NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* skip LAYOUTRETURN if layout is invalidOlga Kornievskaia2018-06-121-2/+4
| | | | | | | | | | | | | | | Currently, when IO to DS fails, client returns the layout and retries against the MDS. However, then on umounting (inode eviction) it returns the layout again. This is because pnfs_return_layout() was changed in commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR") to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned the layout, it will be returned again. Instead, let's also check if we have already marked the layout invalid. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRYTrond Myklebust2018-06-101-4/+7
| | | | | | | | If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP, then it thinks we're trying to replay an existing request. If so, then let's just bump the sequence ID and retry the operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Fix a typo in nfs41_sequence_processTrond Myklebust2018-06-091-1/+1
| | | | | | | | | We want to compare the slot_id to the highest slot number advertised by the server. Fixes: 3be0f80b5fe9c ("NFSv4.1: Fix up replays of interrupted requests") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..")Trond Myklebust2018-06-091-5/+2
| | | | | | | | | | The correct behaviour for NFSv4 sequence IDs is to wrap around to the value 0 after 0xffffffff. See https://tools.ietf.org/html/rfc5661#section-2.10.6.1 Fixes: 5f83d86cf531d ("NFSv4.x: Fix wraparound issues when validing...") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab()Trond Myklebust2018-06-081-12/+14
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()Trond Myklebust2018-06-082-9/+15
| | | | | | | | If the attempt to recall the delegation fails because the inode is in the process of being evicted from cache, then use NFS4ERR_DELAY to ask the server to retry later. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.0: Remove transport protocol name from non-UCS client IDChuck Lever2018-06-061-10/+4
| | | | | | | | | | | | | | | | | | Commit 69dd716c5ffd ("NFSv4: Add socket proto argument to setclientid") (2007) added the transport protocol name to the client ID string, but the patch description doesn't explain why this was necessary. At that time, the only transport protocol name that would have been used is "tcp" (for both IPv4 and IPv6), resulting in no additional distinctiveness of the client ID string. Since there is one client instance, the server should recognize it's state whether the client is connecting via TCP or RDMA. Same client, same lease. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.0: Remove cl_ipaddr from non-UCS client IDChuck Lever2018-06-061-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible for two distinct clients to have the same cl_ipaddr: - if the client admin disables callback with clientaddr=0.0.0.0 on more than one client - if two clients behind separate NATs use the same private subnet number - if the client admin specifies the same address via clientaddr= mount option (pointing the server at the same NAT box, for example) Because of the way the Linux NFSv4.0 client constructs its client ID string by default, such clients could interfere with each others' lease state when mounting the same server: scnprintf(str, len, "Linux NFSv4.0 %s/%s %s", clp->cl_ipaddr, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR), rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO)); cl_ipaddr is set to the value of the clientaddr= mount option. Two clients whose addresses are 192.168.3.77 that mount the same server (whose public IP address is, say, 3.4.5.6) would both generate the same client ID string when sending a SETCLIENTID: Linux NFSv4.0 192.168.3.77/3.4.5.6 tcp and thus the server would not be able to distinguish the clients' leases. If both clients are using AUTH_SYS when sending SETCLIENTID then the server could possibly permit the two clients to interfere with or purge each others' leases. To better ensure that Linux's NFSv4.0 client ID strings are distinct in these cases, remove cl_ipaddr from the client ID string and replace it with something more likely to be unique. Note that the replacement looks a lot like the uniform client ID string. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefinedTrond Myklebust2018-06-051-5/+0
| | | | | | | | | | Fix a compiler warning: fs/nfs/nfs4proc.c:910:13: warning: 'nfs4_layoutget_release' defined but not used [-Wunused-function] static void nfs4_layoutget_release(void *calldata) ^~~~~~~~~~~~~~~~~~~~~~ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-rdma-for-4.18-1' of ↵Trond Myklebust2018-06-0516-355/+359
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFS-over-RDMA client updates for Linux 4.18 Stable patches: - xprtrdma: Return -ENOBUFS when no pages are available New features: - Add ->alloc_slot() and ->free_slot() functions Bugfixes and cleanups: - Add missing SPDX tags to some files - Try to fail mount quickly if client has no RDMA devices - Create transport IDs in the correct network namespace - Fix max_send_wr computation - Clean up receive tracepoints - Refactor receive handling - Remove unused functions
| * xprtrdma: Remove transfertypes arrayChuck Lever2018-06-011-8/+0
| | | | | | | | | | | | | | | | | | Clean up: This array was used in a dprintk that was replaced by a trace point in commit ab03eff58eb5 ("xprtrdma: Add trace points in RPC Call transmit paths"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Add trace_xprtrdma_dma_map(mr)Chuck Lever2018-06-013-0/+3
| | | | | | | | | | | | | | Matches trace_xprtrdma_dma_unmap(mr). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Wait on empty sendctx queueChuck Lever2018-06-013-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when the sendctx queue is exhausted during marshaling, the RPC/RDMA transport places the RPC task on the delayq, which forces a wait for HZ >> 2 before the marshal and send is retried. With this change, the transport now places such an RPC task on the pending queue, and wakes it just as soon as more sendctxs become available. This typically takes less than a millisecond, and the write_space waking mechanism is less deadlock-prone. Moreover, the waiting RPC task is holding the transport's write lock, which blocks the transport from sending RPCs. Therefore faster recovery from sendctx queue exhaustion is desirable. Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN when there are no free MRs"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Move common wait_for_buffer_space call to parent functionChuck Lever2018-06-011-19/+12
| | | | | | | | | | | | | | | | | | Clean up: The logic to wait for write space is common to a bunch of the encoding helper functions. Lift it out and put it in the tail of rpcrdma_marshal_req(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Return -ENOBUFS when no pages are availableChuck Lever2018-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the transport never calls xprt_write_space() when more pages become available. -ENOBUFS will trigger the correct "delay briefly and call again" logic. Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Make rpcrdma_sendctx_put_locked() a static functionChuck Lever2018-05-072-2/+3
| | | | | | | | | | | | | | | | Clean up: The only call site is in the same file as the function's definition. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove rpcrdma_buffer_get_rep_locked()Chuck Lever2018-05-071-12/+3
| | | | | | | | | | | | | | Clean up: There is only one remaining call site for this helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove rpcrdma_buffer_get_req_locked()Chuck Lever2018-05-071-18/+4
| | | | | | | | | | | | | | | | Clean up. There is only one call-site for this helper, and it can be simplified by using list_first_entry_or_null(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove rpcrdma_ep_{post_recv, post_extra_recv}Chuck Lever2018-05-073-64/+0
| | | | | | | | | | | | | | Clean up: These functions are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Move Receive posting to Receive handlerChuck Lever2018-05-076-129/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Receive completion and Reply handling are done by a BOUND workqueue, meaning they run on only one CPU. Posting receives is currently done in the send_request path, which on large systems is typically done on a different CPU than the one handling Receive completions. This results in movement of Receive-related cachelines between the sending and receiving CPUs. More importantly, it means that currently Receives are posted while the transport's write lock is held, which is unnecessary and costly. Finally, allocation of Receive buffers is performed on-demand in the Receive completion handler. This helps guarantee that they are allocated on the same NUMA node as the CPU that handles Receive completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Clean up Receive trace pointsChuck Lever2018-05-072-21/+22
| | | | | | | | | | | | | | | | | | | | | | For clarity, report the posting and completion of Receive CQEs. Also, the wc->byte_len field contains garbage if wc->status is non-zero, and the vendor error field contains garbage if wc->status is zero. For readability, don't save those fields in those cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Make rpc_rqst part of rpcrdma_reqChuck Lever2018-05-074-76/+46
| | | | | | | | | | | | | | | | | | | | | | | | This simplifies allocation of the generic RPC slot and xprtrdma specific per-RPC resources. It also makes xprtrdma more like the socket-based transports: ->buf_alloc and ->buf_free are now responsible only for send and receive buffers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Introduce ->alloc_slot call-out for xprtrdmaChuck Lever2018-05-071-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpcrdma_buffer_get acquires an rpcrdma_req and rep for each RPC. Currently this is done in the call_allocate action, and sometimes it can fail if there are many outstanding RPCs. When call_allocate fails, the RPC task is put on the delayq. It is awoken a few milliseconds later, but there's no guarantee it will get a buffer at that time. The RPC task can be repeatedly put back to sleep or even starved. The call_allocate action should rarely fail. The delayq mechanism is not meant to deal with transport congestion. In the current sunrpc stack, there is a friendlier way to deal with this situation. These objects are actually tantamount to an RPC slot (rpc_rqst) and there is a separate FSM action, distinct from call_allocate, for allocating slot resources. This is the call_reserve action. When allocation fails during this action, the RPC is placed on the transport's backlog queue. The backlog mechanism provides a stronger guarantee that when the RPC is awoken, a buffer will be available for it; and backlogged RPCs are awoken one-at-a-time. To make slot resource allocation occur in the call_reserve action, create special ->alloc_slot and ->free_slot call-outs for xprtrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Add a ->free_slot transport calloutChuck Lever2018-05-075-2/+13
| | | | | | | | | | | | | | | | | | Refactor: xprtrdma needs to have better control over when RPCs are awoken from the backlog queue, so replace xprt_free_slot with a transport op callout. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lockChuck Lever2018-05-073-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | alloc_slot is a transport-specific op, but initializing an rpc_rqst is common to all transports. In addition, the only part of initial- izing an rpc_rqst that needs serialization is getting a fresh XID. Move rpc_rqst initialization to common code in preparation for adding a transport-specific alloc_slot to xprtrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Fix max_send_wr computationChuck Lever2018-05-073-24/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For FRWR, the computation of max_send_wr is split between frwr_op_open and rpcrdma_ep_create, which makes it difficult to tell that the max_send_wr result is currently incorrect if frwr_op_open has to reduce the credit limit to accommodate a small max_qp_wr. This is a problem now that extra WRs are needed for backchannel operations and a drain CQE. So, refactor the computation so that it is all done in ->ro_open, and fix the FRWR version of this computation so that it accommodates HCAs with small max_qp_wr correctly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Create transport's CM ID in the correct network namespaceChuck Lever2018-05-071-2/+2
| | | | | | | | | | | | | | | | | | Set up RPC/RDMA transport in mount.nfs's network namespace. This passes the correct namespace information to the RDMA core, similar to how RPC sockets are created (see xs_create_sock). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Try to fail quickly if proto=rdmaChuck Lever2018-05-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_resolve_addr(3) says: > This call is used to map a given destination IP address to a > usable RDMA address. The IP to RDMA address mapping is done > using the local routing tables, or via ARP. If this can't be done, there's no local device that can be used to establish an RDMA-capable network path to the remote. In this case, the RDMA CM very quickly posts an RDMA_CM_EVENT_ADDR_ERROR upcall. Currently rpcrdma_conn_upcall() converts RDMA_CM_EVENT_ADDR_ERROR to EHOSTUNREACH. mount.nfs seems to want to retry EHOSTUNREACH forever, thinking that this is a temporary situation. This makes mount.nfs appear to hang if I try to mount with proto=rdma through, say, a conventional Ethernet device. If the admin has specified proto=rdma along with a server IP address that requires a network path that does not support RDMA, instead let's fail with a permanent error. -EPROTONOSUPPORT is returned when NFSv4 or one of its minor versions is not supported. -EPROTO is not (currently) retried by mount.nfs. There are potentially other similar cases where -EPROTO is an appropriate return code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Add proper SPDX tags for NetApp-contributed sourceChuck Lever2018-05-077-0/+7
| | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | NFS: Filter cache invalidation when holding a delegationTrond Myklebust2018-06-041-3/+9
| | | | | | | | | | | | | | | | | | | | | | If the client holds a delegation, then ensure we filter out attempts to invalidate the size, owner, group owner, or mode unless we made the change, in which case, check that NFS_INO_REVAL_FORCED is set by the caller. Always filter out attempts to invalidate the change attribute and size, since we are authoritative for those. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()Trond Myklebust2018-06-041-2/+3
| | | | | | | | | | | | | | | | | | If we hold a delegation, we should not need to call nfs_check_inode_attributes() since we already know which attributes are valid, and which ones may still need revalidation. The state of the NFS_INO_REVAL_FORCED flag is therefore irrelevant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Improve caching while holding a delegationTrond Myklebust2018-06-041-7/+10
| | | | | | | | | | | | | | Make sure that the client completely ignores change attribute and size changes on the server when it holds a delegation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Fix attribute revalidationTrond Myklebust2018-06-041-19/+15
| | | | | | | | | | | | | | | | Don't mark attributes as invalid just because they have changed. Instead, for the purposes of adjusting the attribute cache timeout, keep a separate variable that tracks whether or not a change occurred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: fix up nfs_setattr_update_inodeTrond Myklebust2018-06-041-6/+42
| | | | | | | | | | | | | | Always try to set the attributes, even if we don't have a valid struct nfs_fattr. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4: Ensure the inode is clean when we set a delegationTrond Myklebust2018-06-041-0/+4
| | | | | | | | | | | | | | If there are attributes that are still invalid when we set a delegation, then we need to set the NFS_INO_REVAL_FORCED flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_accessTrond Myklebust2018-06-041-1/+1
| | | | | | | | | | | | | | | | If we hold a delegation, we don't need to care about whether or not the inode attributes are up to date. We know we can cache the results of this call regardless. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4: Don't ask for delegated attributes when adding a hard linkTrond Myklebust2018-06-041-2/+3
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4: Don't ask for delegated attributes when revalidating the inodeTrond Myklebust2018-06-041-2/+3
| | | | | | | | | | | | | | Again, when revalidating the inode, we don't need to ask for attributes for which we are authoritative. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Pass the inode down to the getattr() callbackTrond Myklebust2018-06-048-14/+23
| | | | | | | | | | | | | | Allow the getattr() callback to check things like whether or not we hold a delegation so that it can adjust the attributes that it is asking for. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4: Don't request size+change attribute if they are delegated to usTrond Myklebust2018-06-041-5/+35
| | | | | | | | | | | | | | | | | | When we hold a delegation, we should not need to request attributes such as the file size or the change attribute. For some servers, avoiding asking for these unneeded attributes can improve the overall system performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | pnfs: Don't release the sequence slot until we've processed layoutget on openTrond Myklebust2018-05-311-1/+2
| | | | | | | | | | | | | | | | | | If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | pnfs: Don't call commit on failed layoutget-on-openTrond Myklebust2018-05-311-6/+1
| | | | | | | | | | | | | | If the layoutget on open call failed, we can't really commit the inode, so don't bother calling it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached dataTrond Myklebust2018-05-311-0/+5
| | | | | | | | | | | | | | | | If we're only opening the file for reading, and the file is empty and/or we already have cached data, then heuristically optimise away the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4/pnfs: Don't switch off layoutget-on-open for transient errorsTrond Myklebust2018-05-311-7/+15
| | | | | | | | | | | | | | | | | | | | Ensure that we only switch off the LAYOUTGET operation in the OPEN compound when the server is truly broken, and/or it is complaining that the compound is too large. Currently, we end up turning off the functionality permanently, even for transient errors such as EACCES or ENOSPC. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised dataTrond Myklebust2018-05-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | We need to ensure that pnfs_parse_lgopen() doesn't try to parse a struct nfs4_layoutget_res that was not filled by a successful call to decode_layoutget(). This can happen if we performed a cached open, or if either the OP_ACCESS or OP_GETATTR operations preceding the OP_LAYOUTGET in the compound returned an error. By initialising the 'status' field to NFS4ERR_DELAY, we ensure that pnfs_parse_lgopen() won't try to interpret the structure. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGETFred Isaman2018-05-313-8/+21
| | | | | | | | | | | | | | The flag was not always being cleared after LAYOUTGET on OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>