summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'client-4.2' into linux-nextTrond Myklebust2014-09-3035-332/+1089
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge NFSv4.2 client SEEK implementation from Anna * client-4.2: (55 commits) NFS: Implement SEEK NFSD: Implement SEEK NFSD: Add generic v4.2 infrastructure svcrdma: advertise the correct max payload nfsd: introduce nfsd4_callback_ops nfsd: split nfsd4_callback initialization and use nfsd: introduce a generic nfsd4_cb nfsd: remove nfsd4_callback.cb_op nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence nfsd: fix nfsd4_cb_recall_done error handling nfsd4: clarify how grace period ends nfsd4: stop grace_time update at end of grace period nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls nfsd: serialize nfsdcltrack upcalls for a particular client nfsd: pass extra info in env vars to upcalls to allow for early grace period end nfsd: add a v4_end_grace file to /proc/fs/nfsd lockd: add a /proc/fs/lockd/nlm_end_grace file nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE nfsd: remove redundant boot_time parm from grace_done client tracking op ...
| * NFS: Implement SEEKAnna Schumaker2014-09-309-2/+221
| | | | | | | | | | | | | | | | | | The SEEK operation is used when an application makes an lseek call with either the SEEK_HOLE or SEEK_DATA flags set. I fall back on nfs_file_llseek() if the server does not have SEEK support. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * Merge commit '24bab491220f' into client-4.2Trond Myklebust2014-09-3026-330/+868
| |\ | | | | | | | | | | | | - Pull in patch 'NFSD: Implement SEEK' from Bruce's nfsd-next tree for dependencies.
| | * NFSD: Implement SEEKAnna Schumaker2014-09-293-2/+97
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Add generic v4.2 infrastructureAnna Schumaker2014-09-291-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's cleaner to introduce everything at once and have the server reply with "not supported" than it would be to introduce extra operations when implementing a specific one in the middle of the list. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: introduce nfsd4_callback_opsChristoph Hellwig2014-09-263-71/+83
| | | | | | | | | | | | | | | | | | | | | | | | Add a higher level abstraction than the rpc_ops for callback operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: split nfsd4_callback initialization and useChristoph Hellwig2014-09-263-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out initializing the nfs4_callback structure from using it. For the NULL callback this gets rid of tons of pointless re-initializations. Note that I don't quite understand what protects us from running multiple NULL callbacks at the same time, but at least this chance doesn't make it worse.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: introduce a generic nfsd4_cbChristoph Hellwig2014-09-263-35/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to queue up a callback. CB_NULL has a bit of special casing because it is special in the specification, but all other new callback operations will be able to share code with this and a few more changes to refactor the callback code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: remove nfsd4_callback.cb_opChristoph Hellwig2014-09-262-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can always get at the private data by using container_of, no need for a void pointer. Also introduce a little to_delegation helper to avoid opencoding the container_of everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: do not clear rpc_resp in nfsd4_cb_done_sequenceBenny Halevy2014-09-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is incorrect when a callback is has to be restarted, in which case the XDR decoding of the second iteration will see a NULL cb argument. [hch: updated description] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: fix nfsd4_cb_recall_done error handlingChristoph Hellwig2014-09-261-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For any error that is not EBADHANDLE or NFS4ERR_BAD_STATEID, nfsd4_cb_recall_done first marks the connection down, then retries until dl_retries hits zero, then marks the connection down again and sets cb_done. This changes the code to only retry for EBADHANDLE or NFS4ERR_BAD_STATEID, and factors setting cb_done into a single point in the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: clarify how grace period endsJ. Bruce Fields2014-09-171-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The grace period is ended in two steps--first userland is notified that the grace period is now long enough that any clients who have not yet reclaimed can be safely forgotten, then we flip the switch that forbids reclaims and allows new opens. I had to think a bit to convince myself that the ordering was right here. Document it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: stop grace_time update at end of grace periodJ. Bruce Fields2014-09-171-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attempt to automatically set a new grace period time at the end of the grace period isn't really helpful. We'll probably shut down and reboot before we actually make use of the new grace period time anyway. So may as well leave it up to the init system to get this right. This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime change from what they set it to. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: skip subsequent UMH "create" operations after the first one for v4.0 ↵Jeff Layton2014-09-171-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clients In the case of v4.0 clients, we may call into the "create" client tracking operation multiple times (once for each openowner). Upcalling for each one of those is wasteful and slow however. We can skip doing further "create" operations after the first one if we know that one has already been done. v4.1+ clients generally only call into this function once (on RECLAIM_COMPLETE), and we can't skip upcalling on the create even if the STABLE bit is set. Doing so would make it impossible for nfsdcltrack to lift the grace period early since the timestamp has a different meaning in the case where the client is expected to issue a RECLAIM_COMPLETE. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcallsJeff Layton2014-09-171-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfsdcltrack upcall doesn't utilize the NFSD4_CLIENT_STABLE flag, which basically results in an upcall every time we call into the client tracking ops. Change it to set this bit on a successful "check" or "create" request, and clear it on a "remove" request. Also, check to see if that bit is set before upcalling on a "check" or "remove" request, and skip upcalling appropriately, depending on its state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: serialize nfsdcltrack upcalls for a particular clientJeff Layton2014-09-172-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a later patch, we want to add a flag that will allow us to reduce the need for upcalls. In order to handle that correctly, we'll need to ensure that racing upcalls for the same client can't occur. In practice it should be rare for this to occur with a well-behaved client, but it is possible. Convert one of the bits in the cl_flags field to be an upcall bitlock, and use it to ensure that upcalls for the same client are serialized. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: pass extra info in env vars to upcalls to allow for early grace period endJeff Layton2014-09-172-15/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support lifting the grace period early, we must tell nfsdcltrack what sort of client the "create" upcall is for. We can't reliably tell if a v4.0 client has completed reclaiming, so we can only lift the grace period once all the v4.1+ clients have issued a RECLAIM_COMPLETE and if there are no v4.0 clients. Also, in order to lift the grace period, we have to tell userland when the grace period started so that it can tell whether a RECLAIM_COMPLETE has been issued for each client since then. Since this is all optional info, we pass it along in environment variables to the "init" and "create" upcalls. By doing this, we don't need to revise the upcall format. The UMH upcall can simply make use of this info if it happens to be present. If it's not then it can just avoid lifting the grace period early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: add a v4_end_grace file to /proc/fs/nfsdJeff Layton2014-09-173-1/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a privileged userland process to end the v4 grace period early. Writing "Y", "y", or "1" to the file will cause the v4 grace period to be lifted. The basic idea with this will be to allow the userland client tracking program to lift the grace period once it knows that no more clients will be reclaiming state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * lockd: add a /proc/fs/lockd/nlm_end_grace fileJeff Layton2014-09-174-0/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new procfile that will allow a (privileged) userland process to end the NLM grace period early. The basic idea here will be to have sm-notify write to this file, if it sent out no NOTIFY requests when it runs. In that situation, we can generally expect that there will be no reclaim requests so the grace period can be lifted early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETEJeff Layton2014-09-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As stated in RFC 5661, section 18.51.3: Once a RECLAIM_COMPLETE is done, there can be no further reclaim operations for locks whose scope is defined as having completed recovery. Once the client sends RECLAIM_COMPLETE, the server will not allow the client to do subsequent reclaims of locking state for that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. Ensure that we enforce that requirement. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: remove redundant boot_time parm from grace_done client tracking opJeff Layton2014-09-173-11/+10
| | | | | | | | | | | | | | | | | | Since it's stored in nfsd_net, we don't need to pass it in separately. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * lockd: move lockd's grace period handling into its own moduleJeff Layton2014-09-177-15/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, all of the grace period handling is part of lockd. Eventually though we'd like to be able to build v4-only servers, at which point we'll need to put all of this elsewhere. Move the code itself into fs/nfs_common and have it build a grace.ko module. Then, rejigger the Kconfig options so that both nfsd and lockd enable it automatically. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
| | * nfsd: update mtime on truncateChristoph Hellwig2014-09-111-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a failure in xfstests generic/313 because nfs doesn't update mtime on a truncate. The protocol requires this to be done implicity for a size changing setattr. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Put export if prepare_creds() failKinglong Mee2014-09-031-2/+4
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Full checking of authentication nameKinglong Mee2014-09-031-9/+5
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Fix bad using of return value from qword_getKinglong Mee2014-09-031-3/+3
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Fix a memory leak if nfsd4_recdir_load failKinglong Mee2014-09-031-13/+17
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Reset creds after mnt_want_write_file() failKinglong Mee2014-09-031-1/+2
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Put file after ima_file_check fail in nfsd_open()Kinglong Mee2014-09-031-10/+17
| | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfs: do not start the callback thread until we set rqstp->rq_taskTrond Myklebust2014-09-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | This fixes an Oopsable race when starting up the callback server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * lockd: Do not start the lockd thread before we've set nlmsvc_rqst->rq_taskTrond Myklebust2014-09-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | This fixes an Oopsable race when starting lockd. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: remove labeled NFS warning from config helpJ. Bruce Fields2014-08-281-3/+0
| | | | | | | | | | | | | | | | | | | | | The working group appears committed to keeping the protocol stable, the code has gotten some use and seems to work OK. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Update some as-yet unused 4.2 error codesAnna Schumaker2014-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent NFS v4.2 drafts have removed NFS4ERR_METADATA_NOTSUPP and reassigned the error code to NFS4ERR_UNION_NOTSUPP. I also add in the NFS4ERR_OFFLOAD_NO_REQS error code. We're not using any of these yet, so there's no harm done. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: Remove duplicate initialization of file_lockKinglong Mee2014-08-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | locks_alloc_lock() has initialized struct file_lock, no need to re-initialize it here. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: allow turning off nfsv3 readdir_plusRajesh Ghanekar2014-08-182-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of our customer's application only needs file names, not file attributes. With directories having 10K+ inodes (assuming buffer cache has directory blocks cached having file names, but inode cache is limited and hence need eviction of older cached inodes), older inodes are evicted periodically. So if they keep on doing readdir(2) from NSF client on multiple directories, some directory's files are periodically removed from inode cache and hence new readdir(2) on same directory requires disk access to bring back inodes again to inode cache. As READDIRPLUS request fetches attributes also, doing getattr on each file on server, it causes unnecessary disk accesses. If READDIRPLUS on NFS client is returned with -ENOTSUPP, NFS client uses READDIR request which just gets the names of the files in a directory, not attributes, hence avoiding disk accesses on server. There's already a corresponding client-side mount option, but an export option reduces the need for configuration across multiple clients. This flag affects NFSv3 only. If it turns out it's needed for NFSv4 as well then we may have to figure out how to extend the behavior to NFSv4, but it's not currently obvious how to do that. Signed-off-by: Rajesh Ghanekar <rajesh_ghanekar@symantec.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: reserve adequate space for LOCK opJ. Bruce Fields2014-08-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space", we permit the server to process a LOCK operation even if there might not be space to return the conflicting lockowner, because we've made returning the conflicting lockowner optional. However, the rpc server still wants to know the most we might possibly return, so we need to take into account the possible conflicting lockowner in the svc_reserve_space() call here. Symptoms were log messages like "RPC request reserved 88 but used 108". Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space" Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: remove obsolete commentJ. Bruce Fields2014-08-171-7/+0
| | | | | | | | | | | | | | | | | | We do what Neil suggests now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd3: Check write permission after checking existenceRoss Lagerwall2014-08-171-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a file that already exists in a read-only directory with O_EXCL, the NFSv3 server returns EACCES rather than EEXIST (which local files and the NFSv4 server return). Fix this by checking the MAY_CREATE permission only if the file does not exist. Since this already happens in do_nfsd_create, the check in nfsd3_proc_create can simply be removed. Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: call nfs4_put_deleg_lease outside of state_lockJeff Layton2014-08-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we hold the state_lock when releasing the lease. That's potentially problematic in the future if we allow for setlease methods that can sleep. Move the nfs4_put_deleg_lease call out of the delegation unhashing routine (which was always a bit goofy anyway), and into the unlocked sections of the callers of unhash_delegation_locked. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: protect lease-related nfs4_file fields with fi_lockJeff Layton2014-08-171-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these fields are protected with the state_lock, but that doesn't really make a lot of sense. These fields are "private" to the nfs4_file, and can be protected with the more granular fi_lock. The fi_lock is already held when setting these fields. Make the code hold the fp->fi_lock when clearing the lease-related fields in the nfs4_file, and no longer require that the state_lock be held when calling into this function. To prevent lock inversion with the i_lock, we also move the vfs_setlease and fput calls outside of the fi_lock. This also sets us up for allowing vfs_setlease calls to block in the future. Finally, remove a redundant NULL pointer check. unhash_delegation_locked locks the fp->fi_lock prior to that check, so fp in that function must never be NULL. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: Reorder nfsd_cache_match to check more powerful discriminators firstTrond Myklebust2014-08-171-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | We would normally expect the xid and the checksum to be the best discriminators. Check them before looking at the procedure number, etc. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: split DRC global spinlock into per-bucket locksTrond Myklebust2014-08-171-23/+20
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: convert num_drc_entries to an atomic_tTrond Myklebust2014-08-171-16/+12
| | | | | | | | | | | | | | | | | | | | | ...so we can remove the spinlocking around it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: Remove the cache_hash listTrond Myklebust2014-08-172-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | Now that the lru list is per-bucket, we don't need a second list for searches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: convert the lru list into a per-bucket thingTrond Myklebust2014-08-171-23/+50
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: Clean up drc cache in preparation for global spinlock eliminationTrond Myklebust2014-08-171-21/+24
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfs: Ensure that nfs_callback_start_svc sets the server rq_task...Trond Myklebust2014-08-171-0/+1
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * lockd: Ensure that lockd_start_svc sets the server rq_task...Trond Myklebust2014-08-171-0/+2
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | Merge branch 'bugfixes' into linux-nextTrond Myklebust2014-09-306-53/+61
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries
| * | | NFSv4.1: Fix an NFSv4.1 state renewal regressionAndy Adamson2014-09-302-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat call, on the wire to renew the NFSv4.1 state if the flag was not set. The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and sometimes does the following: In normal operation, the only way a future state renewal call is put on the wire is via a call to nfs4_schedule_state_renewal, which schedules a nfs4_renew_state workqueue task. nfs4_renew_state determines if the NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence, which only gets sent if the NFS4_RENEW_TIMEOUT flag is set. Then the nfs41_proc_async_sequence rpc_release function schedules another state remewal via nfs4_schedule_state_renewal. Without this change we can get into a state where an application stops accessing the NFSv4.1 share, state renewal calls stop due to the NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover from this situation is with a clientid re-establishment, once the application resumes and the server has timed out the lease and so returns NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation. An example application: open, lock, write a file. sleep for 6 * lease (could be less) ulock, close. In the above example with NFSv4.1 delegations enabled, without this change, there are no OP_SEQUENCE state renewal calls during the sleep, and the clientid is recovered due to lease expiration on the close. This issue does not occur with NFSv4.1 delegations disabled, nor with NFSv4.0, with or without delegations enabled. Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...) Cc: stable@vger.kernel.org # 3.2.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>