linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	pnfs: pnfs_update_layout needs to consider if strict iomode checking is on	Tom Haynes	2016-05-26	1	-12/+22
\| \| \| \| \| \| \| \|	As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: make pnfs_layout_process more robust	Jeff Layton	2016-05-17	1	-16/+11
\| \| \| \| \| \| \| \| \| \| \|	It can return NULL if layoutgets are blocked currently. Fix it to return -EAGAIN in that case, so we can properly handle it in pnfs_update_layout. Also, clean up and simplify the error handling -- eliminate "status" and just use "lseg". Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: rework LAYOUTGET retry handling	Jeff Layton	2016-05-17	1	-68/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several problems in the way a stateid is selected for a LAYOUTGET operation: We pick a stateid to use in the RPC prepare op, but that makes it difficult to serialize LAYOUTGETs that use the open stateid. That serialization is done in pnfs_update_layout, which occurs well before the rpc_prepare operation. Between those two events, the i_lock is dropped and reacquired. pnfs_update_layout can find that the list has lsegs in it and not do any serialization, but then later pnfs_choose_layoutget_stateid ends up choosing the open stateid. This patch changes the client to select the stateid to use in the LAYOUTGET earlier, when we're searching for a usable layout segment. This way we can do it all while holding the i_lock the first time, and ensure that we serialize any LAYOUTGET call that uses a non-layout stateid. This also means a rework of how LAYOUTGET replies are handled, as we must now get the latest stateid if we want to retransmit in response to a retryable error. Most of those errors boil down to the fact that the layout state has changed in some fashion. Thus, what we really want to do is to re-search for a layout when it fails with a retryable error, so that we can avoid reissuing the RPC at all if possible. While the LAYOUTGET RPC is async, the initiating thread always waits for it to complete, so it's effectively synchronous anyway. Currently, when we need to retry a LAYOUTGET because of an error, we drive that retry via the rpc state machine. This means that once the call has been submitted, it runs until it completes. So, we must move the error handling for this RPC out of the rpc_call_done operation and into the caller. In order to handle errors like NFS4ERR_DELAY properly, we must also pass a pointer to the sliding timeout, which is now moved to the stack in pnfs_update_layout. The complicating errors are -NFS4ERR_RECALLCONFLICT and -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give up and return NULL back to the caller. So, there is some special handling for those errors to ensure that the layers driving the retries can handle that appropriately. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: lift retry logic from send_layoutget to pnfs_update_layout	Jeff Layton	2016-05-17	1	-36/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we get back something like NFS4ERR_OLD_STATEID, that will be translated into -EAGAIN, and the do/while loop in send_layoutget will drive the call again. This is not quite what we want, I think. An error like that is a sign that something has changed. That something could have been a concurrent LAYOUTGET that would give us a usable lseg. Lift the retry logic into pnfs_update_layout instead. That allows us to redo the layout search, and may spare us from having to issue an RPC. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: fix bad error handling in send_layoutget	Jeff Layton	2016-05-17	1	-3/+8
\| \| \| \| \| \| \| \| \| \|	Currently, the code will clear the fail bit if we get back a fatal error. I don't think that's correct -- we want to clear that bit if we do not get a fatal error. Fixes: 0bcbf039f6 (nfs: handle request add failure properly) Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args	Jeff Layton	2016-05-17	1	-22/+42
\| \| \| \| \| \| \| \| \| \| \|	LAYOUTRETURN is "special" in that servers and clients are expected to work with old stateids. When the client sends a LAYOUTRETURN with an old stateid in it then the server is expected to only tear down layout segments that were present when that seqid was current. Ensure that the client handles its accounting accordingly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: keep track of the return sequence number in pnfs_layout_hdr	Jeff Layton	2016-05-17	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we want to selectively do a LAYOUTRETURN, we need to specify a stateid that represents most recent layout acquisition that is to be returned. When we mark a layout stateid to be returned, we update the return sequence number in the layout header with that value, if it's newer than the existing one. Then, when we go to do a LAYOUTRETURN on layout header put, we overwrite the seqid in the stateid with the saved one, and then zero it out. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: record sequence in pnfs_layout_segment when it's created	Jeff Layton	2016-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	In later patches, we're going to teach the client to be more selective about how it returns layouts. This means keeping a record of what the stateid's seqid was at the time that the server handed out a layout segment. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pNFS: Fix a leaked layoutstats flag	Trond Myklebust	2016-05-17	1	-1/+2
\| \| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	pnfs: set NFS_IOHDR_REDO in pnfs_read_resend_pnfs	Weston Andros Adamson	2016-05-09	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Like other resend paths, mark the (old) hdr as NFS_IOHDR_REDO. This ensures the hdr completion function will not count the (old) hdr as good bytes. Also, vector the error back through the hdr->task.tk_status like other retry calls. This fixes a bug with the FlexFiles layout where libaio was reporting more bytes read than requested. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
*	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	Kirill A. Shutemov	2016-04-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls	Trond Myklebust	2016-02-22	1	-11/+6
\| \| \| \| \| \| \|	Replace another case where the layout 'plh_block_lgets' can trigger infinite loops in send_layoutget(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout	Trond Myklebust	2016-02-22	1	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the server reboots while there is a layoutget outstanding, then the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN error, which causes an infinite loop in send_layoutget(). The reason why we never break out of the loop is that the layout 'plh_block_lgets' field is never cleared. Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which can be reset after a new layoutget. Fixes: ab7d763e477c5 ("pNFS: Ensure nfs4_layoutget_prepare returns...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode	Trond Myklebust	2016-02-15	1	-2/+1
\| \| \| \| \| \| \| \| \|	When setting the layout return mode, we must always also set the NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn. Otherwise pnfs_error_mark_layout_for_return() could set the mode, but fail to send the layoutreturn because another is already in flight. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	pNFS: Fix pnfs_mark_matching_lsegs_return()	Trond Myklebust	2016-02-15	1	-2/+13
\| \| \| \| \| \| \|	We don't need to schedule a layoutreturn if the layout segment can be freed immediately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSE	Trond Myklebust	2016-01-28	1	-5/+5
\| \| \| \| \| \| \| \| \|	NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	pNFS: Fix missing layoutreturn calls	Trond Myklebust	2016-01-27	1	-62/+56
\| \| \| \| \| \| \| \| \| \| \| \|	The layoutreturn code currently relies on pnfs_put_lseg() to initiate the RPC call when conditions are right. A problem arises when we want to free the layout segment from inside an inode->i_lock section (e.g. in pnfs_clear_request_commit()), since we cannot sleep. The workaround is to move the actual call to pnfs_send_layoutreturn() to pnfs_put_layout_hdr(), which doesn't have this restriction. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	Merge branch 'pnfs_generic'	Trond Myklebust	2016-01-04	1	-27/+55
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
\| *	NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments	Trond Myklebust	2016-01-04	1	-3/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures	Trond Myklebust	2016-01-04	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()	Trond Myklebust	2016-01-04	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Make it more obvious what we're returning... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout	Trond Myklebust	2016-01-04	1	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode	Trond Myklebust	2016-01-04	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for return, then it must also set the return iomode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids	Trond Myklebust	2016-01-04	1	-3/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()	Trond Myklebust	2016-01-04	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	A stateid is a structure, pass it as a pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid	Trond Myklebust	2015-12-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	pNFS: If we have to delay the layout callback, mark the layout for return	Trond Myklebust	2015-12-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
\| *	NFSv4.1/pNFS: Add a helper to mark the layout as returned	Trond Myklebust	2015-12-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This ensures that we don't reuse the stateid if a layout return or implied layout return means that we've returned all layout segments Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* \|	pNFS/flexfiles: Don't mark the entire layout as failed, when returning it	Trond Myklebust	2015-12-28	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pNFS/flexfiles, we want to return the layout without necessarily marking it as having completely failed. We therefore move the call to pnfs_layout_io_set_failed() out of pnfs_error_mark_layout_for_return(), and then ensura that pNFS/files layout calls it separately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* \|	pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET	Trond Myklebust	2015-12-28	1	-16/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* \|	nfs: handle request add failure properly	Peng Tao	2015-12-28	1	-12/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* \|	nfs: centralize pgio error cleanup	Peng Tao	2015-12-28	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case we fail during setting things up for read/write IO, set pg_error in IO descriptor and do the cleanup in nfs_pageio_add_request, where we clean up all pages that are still hanging around on the IO descriptor. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* \|	NFS41: pop some layoutget errors to application	Peng Tao	2015-12-28	1	-6/+18
\|/ \| \| \| \| \| \| \| \| \| \|	For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we should just bail out instead of hiding the error and retrying inband IO. Change all the call sites to pop the error all the way up. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	pNFS: Modify pnfs_update_layout tracepoints to use layout stateid	Trond Myklebust	2015-12-28	1	-10/+10
\| \| \| \| \| \| \| \|	Instead of displaying a layout segment pointer in these tracepoints, let's use the layout stateid, now that Olga gave us a set of tools for displaying them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	nfs: add new tracepoint for pnfs_update_layout	Jeff Layton	2015-12-28	1	-6/+32
\| \| \| \| \| \| \| \| \| \|	pnfs_update_layout is really the "nexus" of layout handling. If it returns NULL then we end up going through the MDS. This patch adds some tracepoints to that function that allow us to determine the cause when we end up going through the MDS unexpectedly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	sched/wait: Fix the signal handling fix	Peter Zijlstra	2015-12-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nfs4: resend LAYOUTGET when there is a race that changes the seqid	Jeff Layton	2015-11-25	1	-25/+31
\| \| \| \| \| \| \| \| \| \| \|	pnfs_layout_process will check the returned layout stateid against what the kernel has in-core. If it turns out that the stateid we received is older, then we should resend the LAYOUTGET instead of falling back to MDS I/O. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Retry through MDS when getting bad length of data	Kinglong Mee	2015-10-21	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If non rpc-based layout driver return bad length of data, nfs retries by calling rpc_restart_call_prepare() that cause an NULL reference panic. This patch lets nfs retry through MDS for non rpc-based layout driver return bad length of data. [13034.883329] BUG: unable to handle kernel NULL pointer dereference at (null) [13034.884902] IP: [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.886558] PGD 0 [13034.888126] Oops: 0000 [#1] KASAN [13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache] [13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G OE 4.3.0-rc5+ #279 [13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver] [13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000 [13034.906827] RIP: 0010:[<ffffffffa00db372>] [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.910522] RSP: 0018:ffff880035e97b58 EFLAGS: 00010282 [13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003 [13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282 [13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001 [13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000 [13034.921845] FS: 0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000 [13034.923645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0 [13034.932808] Stack: [13034.934813] ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2 [13034.936675] ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0 [13034.938593] ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f [13034.940454] Call Trace: [13034.942388] [<ffffffffa08800d2>] nfs_readpage_result+0x112/0x200 [nfs] [13034.944317] [<ffffffffa088029d>] ? nfs_readpage_done+0xdd/0x160 [nfs] [13034.946267] [<ffffffffa087e08f>] nfs_pgio_result+0x9f/0x120 [nfs] [13034.948166] [<ffffffffa09266cc>] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4] [13034.950247] [<ffffffffa03b07ee>] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver] [13034.952156] [<ffffffff810ebf62>] process_one_work+0x412/0x870 [13034.954102] [<ffffffff810ebe84>] ? process_one_work+0x334/0x870 [13034.955949] [<ffffffff810ebb50>] ? queue_delayed_work_on+0x40/0x40 [13034.957985] [<ffffffff810ec441>] worker_thread+0x81/0x6a0 [13034.959817] [<ffffffff810ec3c0>] ? process_one_work+0x870/0x870 [13034.961785] [<ffffffff810f43bd>] kthread+0x17d/0x1a0 [13034.963544] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.965479] [<ffffffff81100428>] ? finish_task_switch+0x88/0x220 [13034.967223] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.968929] [<ffffffff81b6ae5f>] ret_from_fork+0x3f/0x70 [13034.970534] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 <49> 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b [13034.977148] RIP [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.978780] RSP <ffff880035e97b58> [13034.980399] CR2: 0000000000000000 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFS41: make close wait for layoutreturn	Peng Tao	2015-09-23	1	-10/+25
\| \| \| \| \| \| \| \| \| \| \|	If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file	Trond Myklebust	2015-08-31	1	-0/+9
\| \| \| \| \| \| \|	If we have a read layout, then sanity check the minimal layout length so that it does not extend beyond the end of file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Don't ask for a read layout for an empty file.	Trond Myklebust	2015-08-31	1	-0/+3
\| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return must notify of layout return	Trond Myklebust	2015-08-28	1	-0/+2
\| \| \| \| \| \| \|	It's not sufficient to just mark the layout segment for layout return. We also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertion	Trond Myklebust	2015-08-26	1	-9/+50
\| \| \| \| \| \| \| \|	This is needed in order to allow merging of contiguous layout segments, and also to correct the ordering of layouts for those device drivers that don't necessarily want to place the read-write layouts first. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Add sanity check for the layout range returned by the server	Trond Myklebust	2015-08-25	1	-1/+24
\| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.2/pnfs: Make the layoutstats timer configurable	Trond Myklebust	2015-08-25	1	-0/+4
\| \| \| \| \| \| \|	Allow advanced users to set the layoutstats timer in order to lengthen or shorten the period between layoutstat transmissions to the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFS41: remove NFS_LAYOUT_ROC flag	Peng Tao	2015-08-25	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	If we return delegation before closing, we fail to do roc check during close because NFS_LAYOUT_ROC is cleared by delegreturn and it causes layouts to be still hanging around after delegreturn + close, which is a voilation against protocol. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Add a tracepoint for return-on-close events	Trond Myklebust	2015-08-25	1	-0/+1
\| \| \| \| \| \|	Allow tracing of return-on-close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	pNFS: Fix an unused variable warning in pnfs_roc_get_barrier	Trond Myklebust	2015-08-20	1	-2/+0
\| \| \| \|	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFS41: make sure sending LAYOUTRETURN before close if marked so	Peng Tao	2015-08-19	1	-23/+28
\| \| \| \| \| \| \| \| \|	If layout is marked by NFS_LAYOUT_RETURN_BEFORE_CLOSE, we should always send LAYOUTRETURN before close, and we don't need to do ROC drain if we do send LAYOUTRETURN. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
*	NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is set	Trond Myklebust	2015-08-19	1	-23/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The helper pnfs_roc() has already verified that we have no delegations, and no further open files, hence no outstanding I/O and it has marked all the return-on-close lsegs as being invalid. Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the close/delegreturn with all future layoutget calls on this inode. The checks in pnfs_roc_drain() for valid layout segments are therefore redundant: those cannot exist until another layoutget completes. The other check for whether or not NFS_LAYOUT_RETURN is set, actually causes a hang, since we already know that we hold that flag. To fix, we therefore strip out all the functionality in pnfs_roc_drain() except the retrieval of the barrier state, and then rename the function accordingly. Reported-by: Christoph Hellwig <hch@infradead.org> Fixes: 5c4a79fb2b1c ("Don't prevent layoutgets when doing return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>