summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Don't hold the group lock when calling nfs_release_request()Trond Myklebust2017-09-091-1/+1
| | | | | | | | | That can deadlock if this is the last reference since nfs_page_group_destroy() calls nfs_page_group_sync_on_bit(). Note that even if the page was removed from the subpage list, the req->wb_head could still be pointing to the old head. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Remove pnfs_generic_transfer_commit_list()Trond Myklebust2017-09-092-41/+4
| | | | | | | It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlockTrond Myklebust2017-09-092-9/+22
| | | | | | | | | Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix 2 use after free issues in the I/O codeTrond Myklebust2017-09-093-17/+12
| | | | | | | | | | | | | | | The writeback code wants to send a commit after processing the pages, which is why we want to delay releasing the struct path until after that's done. Also, the layout code expects that we do not free the inode before we've put the layout segments in pnfs_writehdr_free() and pnfs_readhdr_free() Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete") Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Sync the correct byte range during synchronous writestarangg@amazon.com2017-09-071-3/+3
| | | | | | | | | | | | | | | | | | | | | Since commit 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()") nfs_file_write() has not flushed the correct byte range during synchronous writes. generic_write_sync() expects that iocb->ki_pos points to the right edge of the range rather than the left edge. To replicate the problem, open a file with O_DSYNC, have the client write at increasing offsets, and then print the successful offsets. Block port 2049 partway through that sequence, and observe that the client application indicates successful writes in advance of what the server received. Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()") Signed-off-by: Jacob Strauss <jsstraus@amazon.com> Signed-off-by: Tarang Gupta <tarangg@amazon.com> Tested-by: Tarang Gupta <tarangg@amazon.com> Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* lockd: Delete an error message for a failed memory allocation in reclaimer()Markus Elfring2017-09-061-5/+1
| | | | | | | | | Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: remove jiffies field from access cacheNeilBrown2017-09-062-5/+0
| | | | | | | | This field hasn't been used since commit 57b691819ee2 ("NFS: Cache access checks more aggressively"). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: flush data when locking a file to ensure cache coherence for mmap.NeilBrown2017-09-061-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a byte range lock (or flock) is taken out on an NFS file, the validity of the cached data is checked and the inode is marked NFS_INODE_INVALID_DATA. However the cached data isn't flushed from the page cache. This is sufficient for future read() requests or mmap() requests as they call nfs_revalidate_mapping() which performs the flush if necessary. However an existing mapping is not affected. Accessing data through that mapping will continue to return old data even though the inode is marked NFS_INODE_INVALID_DATA. This can easily be confirmed using the 'nfs' tool in git://github.com/okirch/twopence-nfs.git and running nfs coherence FILENAME on one client, and nfs coherence -r FILENAME on another client. It appears that prior to Linux 2.6.0 this worked correctly. However commit: http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0 removed the call to inode_invalidate_pages() from nfs_zap_caches(). I haven't tested this code, but inspection suggests that prior to this commit, file locking would invalidate all inode pages. This patch adds a call to nfs_revalidate_mapping() after a successful SETLK so that invalid data is flushed. With this patch the above test passes. To minimize impact (and possibly avoid a GETATTR call) this only happens if the mapping might be mapped into userspace. Cc: Olaf Kirch <okir@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: don't expect errors from mempool_alloc().NeilBrown2017-09-061-4/+2
| | | | | | | | | | | | Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header") reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix usage of mempools.") had recently removed. Remove it again. Cc: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'bugfixes'Trond Myklebust2017-08-205-43/+61
|\
| * NFS: Fix NFSv2 security settingsChuck Lever2017-08-201-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a while now any NFSv2 mount where sec= is specified uses AUTH_NULL. If sec= is not specified, the mount uses AUTH_UNIX. Commit e68fd7c8071d ("mount: use sec= that was specified on the command line") attempted to address a very similar problem with NFSv3, and should have fixed this too, but it has a bug. The MNTv1 MNT procedure does not return a list of security flavors, so our client makes up a list containing just AUTH_NULL. This should enable nfs_verify_authflavors() to assign the sec= specified flavor, but instead, it incorrectly sets it to AUTH_NULL. I expect this would also be a problem for any NFSv3 server whose MNTv3 MNT procedure returned a security flavor list containing only AUTH_NULL. Fixes: e68fd7c8071d ("mount: use sec= that was specified on ... ") BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=310 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'NeilBrown2017-08-201-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An NFSv4.1 client might close a file after the user who opened it has logged off. In this case the user's credentials may no longer be valid, if they are e.g. kerberos credentials that have expired. NFSv4.1 has a mechanism to allow the client to use machine credentials to close a file. However due to a short-coming in the RFC, a CLOSE with those credentials may not be possible if the file in question isn't exported to the same security flavor - the required PUTFH must be rejected when this is the case. Specifically if a server and client support kerberos in general and have used it to form a machine credential, but the file is only exported to "sec=sys", a PUTFH with the machine credentials will fail, so CLOSE is not possible. As RPC_AUTH_UNIX (used by sec=sys) credentials can never expire, there is no value in using the machine credential in place of them. So in that case, just use the users credentials for CLOSE etc, as you would in NFSv4.0 Signed-off-by: Neil Brown <neilb@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Remove unused parameter gfp_flags from nfs_pageio_init()Trond Myklebust2017-08-203-5/+3
| | | | | | | | | | | | | | Now that the mirror allocation has been moved, the parameter can go. Also remove the redundant symbol export. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4: Fix up mirror allocationTrond Myklebust2017-08-201-34/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of callers of nfs_pageio_complete() that want to continue using the nfs_pageio_descriptor without needing to call nfs_pageio_init() again. Examples include nfs_pageio_resend() and nfs_pageio_cond_complete(). The problem is that nfs_pageio_complete() also calls nfs_pageio_cleanup_mirroring(), which frees up the array of mirrors. This can lead to writeback errors, in the next call to nfs_pageio_setup_mirroring(). Fix by simply moving the allocation of the mirrors to nfs_pageio_setup_mirroring(). Link: https://bugzilla.kernel.org/show_bug.cgi?id=196709 Reported-by: JianhongYin <yin-jianhong@163.com> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge branch 'writeback'Trond Myklebust2017-08-189-348/+257
|\ \
| * | NFS: Wait for requests that are locked on the commit listTrond Myklebust2017-08-153-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | If a request is on the commit list, but is locked, we will currently skip it, which can lead to livelocking when the commit count doesn't reduce to zero. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()Trond Myklebust2017-08-153-45/+2
| | | | | | | | | | | | | | | | | | | | | Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Switch to using mapping->private_lock for page writeback lookups.Trond Myklebust2017-08-151-11/+16
| | | | | | | | | | | | | | | | | | | | | Switch from using the inode->i_lock for this to avoid contention with other metadata manipulation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Use an atomic_long_t to count the number of commitsTrond Myklebust2017-08-152-6/+8
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Use an atomic_long_t to count the number of requestsTrond Myklebust2017-08-155-22/+11
| | | | | | | | | | | | | | | | | | | | | Rather than forcing us to take the inode->i_lock just in order to bump the number. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Use a mutex to protect the per-inode commit listsTrond Myklebust2017-08-154-22/+22
| | | | | | | | | | | | | | | | | | | | | The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Refactor nfs_page_find_head_request()Trond Myklebust2017-08-151-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | Split out the 2 cases so that we can treat the locking differently. The issue is that the locking in the pageswapcache cache is highly linked to the commit list locking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()Trond Myklebust2017-08-151-15/+20
| | | | | | | | | | | | | | | | | | | | | Hide the locking from nfs_lock_and_join_requests() so that we can separate out the requirements for swapcache pages. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix up nfs_page_group_covers_page()Trond Myklebust2017-08-151-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | Fix up the test in nfs_page_group_covers_page(). The simplest implementation is to check that we have a set of intersecting or contiguous subrequests that connect page offset 0 to nfs_page_length(req->wb_page). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Remove unused parameter from nfs_page_group_lock()Trond Myklebust2017-08-152-23/+14
| | | | | | | | | | | | | | | | | | | | | nfs_page_group_lock() is now always called with the 'nonblock' parameter set to 'false'. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Remove unuse function nfs_page_group_lock_wait()Trond Myklebust2017-08-151-21/+0
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Remove nfs_page_group_clear_bits()Trond Myklebust2017-08-151-26/+3
| | | | | | | | | | | | | | | | | | | | | At this point, we only expect ever to potentially see PG_REMOVE and PG_TEARDOWN being set on the subrequests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race casesTrond Myklebust2017-08-151-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since nfs_page_group_destroy() does not take any locks on the requests to be freed, we need to ensure that we don't inadvertently free the request in nfs_destroy_unlinked_subrequests() while the last reference is being released elsewhere. Do this by: 1) Taking a reference to the request unless it is already being freed 2) Checking (under the page group lock) if PG_TEARDOWN is already set before freeing an unreferenced request in nfs_destroy_unlinked_subrequests() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Further optimise nfs_lock_and_join_requests()Trond Myklebust2017-08-151-27/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When locking the entire group in order to remove subrequests, the locks are always taken in order, and with the page group lock being taken after the page head is locked. The intention is that: 1) The lock on the group head guarantees that requests may not be removed from the group (although new entries could be appended if we're not holding the group lock). 2) It is safe to drop and retake the page group lock while iterating through the list, in particular when waiting for a subrequest lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests()Trond Myklebust2017-08-151-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should no longer need the inode->i_lock, now that we've straightened out the request locking. The locking schema is now: 1) Lock page head request 2) Lock the page group 3) Lock the subrequests one by one Note that there is a subtle race with nfs_inode_remove_request() due to the fact that the latter does not lock the page head, when removing it from the struct page. Only the last subrequest is locked, hence we need to re-check that the PagePrivate(page) is still set after we've locked all the subrequests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Remove page group limit in nfs_flush_incompatible()Trond Myklebust2017-08-151-2/+0
| | | | | | | | | | | | | | | | | | nfs_try_to_update_request() should be able to cope now. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Teach nfs_try_to_update_request() to deal with request page_groupsTrond Myklebust2017-08-151-40/+20
| | | | | | | | | | | | | | | | | | Simplify the code, and avoid some flushes to disk. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix the inode request accounting when pages have subrequestsTrond Myklebust2017-08-151-12/+15
| | | | | | | | | | | | | | | | | | | | | Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests() manipulate the inode flags adjusting the NFS_I(inode)->nrequests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Don't unlock writebacks before declaring PG_WB_ENDTrond Myklebust2017-08-151-4/+4
| | | | | | | | | | | | | | | | | | | | | We don't want nfs_lock_and_join_requests() to start fiddling with the request before the call to nfs_page_group_sync_on_bit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Don't check request offset and size without holding a lockTrond Myklebust2017-08-151-12/+12
| | | | | | | | | | | | | | | | | | | | | Request offsets and sizes are not guaranteed to be stable unless you are holding the request locked. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix an ABBA issue in nfs_lock_and_join_requests()Trond Myklebust2017-08-151-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | All other callers of nfs_page_group_lock() appear to already hold the page lock on the head page, so doing it in the opposite order here is inefficient, although not deadlock prone since we roll back all locks on contention. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix a reference and lock leak in nfs_lock_and_join_requests()Trond Myklebust2017-08-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Yes, this is a situation that should never happen (hence the WARN_ON) but we should still ensure that we free up the locks and references to the faulty pages. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Ensure we always dereference the page head lastTrond Myklebust2017-08-151-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | This fixes a race with nfs_page_group_sync_on_bit() whereby the call to wake_up_bit() in nfs_page_group_unlock() could occur after the page header had been freed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Reduce lock contention in nfs_try_to_update_request()Trond Myklebust2017-08-151-5/+3
| | | | | | | | | | | | | | | | | | Micro-optimisation to move the lockless check into the for(;;) loop. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Reduce lock contention in nfs_page_find_head_request()Trond Myklebust2017-08-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | Add a lockless check for whether or not the page might be carrying an existing writeback before we grab the inode->i_lock. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Simplify page writebackTrond Myklebust2017-08-151-20/+10
| |/ | | | | | | | | | | | | | | We don't expect the page header lock to ever be held across I/O, so it should always be safe to wait for it, even if we're doing nonblocking writebacks. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge branch 'open_state'Trond Myklebust2017-08-1510-39/+75
|\ \
| * | NFSv4: Use the nfs4_state being recovered in _nfs4_opendata_to_nfs4_state()Trond Myklebust2017-08-141-16/+25
| | | | | | | | | | | | | | | | | | | | | If we're recovering a nfs4_state, then we should try to use that instead of looking up a new stateid. Only do that if the inodes match, though. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state()Trond Myklebust2017-08-141-5/+23
| |/ | | | | | | | | | | | | When doing open by filehandle we don't really want to lookup a new inode, but rather update the one we've got. Add a helper which does this for us. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-08-113-2/+3
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes from me for rc5. Dros has a stable fix for flexfiles to prevent leaking the nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a potential recovery loop situation with the TEST_STATEID operation, and Christoph fixed up the pNFS blocklayout Kconfig options to prevent unsafe use with kernels that don't have large block device support. Summary: Stable fix: - fix leaking nfs4_ff_ds_version array Other fixes: - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit compile errors" * tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs: pnfs/blocklayout: require 64-bit sector_t NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid() nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
| | * pnfs/blocklayout: require 64-bit sector_tChristoph Hellwig2017-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid()Trond Myklebust2017-08-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just means we raced with other calls to OPEN. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * nfs/flexfiles: fix leak of nfs4_ff_ds_version arraysWeston Andros Adamson2017-08-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The client was freeing the nfs4_ff_layout_ds, but not the contained nfs4_ff_ds_version array. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | Merge branch 'for-linus' of ↵Linus Torvalds2017-08-112-4/+6
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a few bugs in fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: set mapping error in writepage_locked when it fails fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio fuse: initialize the flock flag in fuse_file on allocation
| | * | fuse: set mapping error in writepage_locked when it failsJeff Layton2017-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>