summaryrefslogtreecommitdiffstats
path: root/net/sunrpc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xprtrdma: Add trace points to instrument memory registrationChuck Lever2018-01-232-22/+7
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in the RPC Reply handler pathsChuck Lever2018-01-232-19/+9
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in RPC Call transmit pathsChuck Lever2018-01-232-15/+5
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* rpcrdma: infrastructure for static trace points in rpcrdma.koChuck Lever2018-01-232-5/+9
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Introduce rpcrdma_mw_unmap_and_putChuck Lever2018-01-164-23/+33
| | | | | | | | | Clean up: Code review suggested that a common bit of code can be placed into a helper function, and this gives us fewer places to stick an "I DMA unmapped something" trace point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove usage of "mw"Chuck Lever2018-01-165-278/+292
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: struct rpcrdma_mw was named after Memory Windows, but xprtrdma no longer supports a Memory Window registration mode. Rename rpcrdma_mw and its fields to reduce confusion and make the code more sensible to read. Renaming "mw" was suggested by Tom Talpey, the author of the original xprtrdma implementation. It's a good idea, but I haven't done this until now because it's a huge diffstat for no benefit other than code readability. However, I'm about to introduce static trace points that expose a few of xprtrdma's internal data structures. They should make sense in the trace report, and it's reasonable to treat trace points as a kernel API contract which might be difficult to change later. While I'm churning things up, two additional changes: - rename variables unhelpfully called "r" to "mr", to improve code clarity, and - rename the MR-related helper functions using the form "rpcrdma_mr_<verb>", to be consistent with other areas of the code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Replace all usage of "frmr" with "frwr"Chuck Lever2018-01-164-99/+99
| | | | | | | | | | | | | | | Clean up: Over time, the industry has adopted the term "frwr" instead of "frmr". The term "frwr" is now more widely recognized. For the past couple of years I've attempted to add new code using "frwr" , but there still remains plenty of older code that still uses "frmr". Replace all usage of "frmr" to avoid confusion. While we're churning code, rename variables unhelpfully called "f" to "frwr", to improve code clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Don't clear RPC_BC_PA_IN_USE on pre-allocated rpc_rqst'sChuck Lever2018-01-161-6/+1
| | | | | | | | | | No need for the overhead of atomically setting and clearing this bit flag for every use of a pre-allocated backchannel rpc_rqst. These are a distinct pool of rpc_rqsts that are used only for callback operations, so it is safe to simply leave the bit set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Split xprt_rdma_send_requestChuck Lever2018-01-164-32/+50
| | | | | | | | | | | | Clean up. @rqst is set up differently for backchannel Replies. For example, rqst->rq_task and task->tk_client are both NULL. So it is easier to understand and maintain this code path if it is separated. Also, we can get rid of the confusing rl_connect_cookie hack in rpcrdma_bc_receive_call. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: buf_free not called for CB repliesChuck Lever2018-01-163-6/+1
| | | | | | | | | | | | | | Since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel operations leave rq_buffer set to NULL. xprt_release does not invoke ->op->buf_free when rq_buffer is NULL. The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore redundant because xprt_rdma_free is not invoked for backchannel requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Move unmap-safe logic to rpcrdma_marshal_reqChuck Lever2018-01-162-5/+11
| | | | | | | | | Clean up. This logic is related to marshaling the request, and I'd like to keep everything that touches req->rl_registered close together, for CPU cache efficiency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Support IPv6 in xprt_rdma_set_portChuck Lever2018-01-161-4/+24
| | | | | | | | | | | | Clean up a harmless oversight. xprtrdma's ->set_port method has never properly supported IPv6. This issue has never been a problem because NFS/RDMA mounts have always required "port=20049", thus so far, rpcbind is not invoked for these mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove another sockaddr_storage field (cdata::addr)Chuck Lever2018-01-163-19/+12
| | | | | | | | | | Save more space in struct rpcrdma_xprt by removing the redundant "addr" field from struct rpcrdma_create_data_internal. Wherever we have rpcrdma_xprt, we also have the rpc_xprt, which has a sockaddr_storage field with the same content. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Initialize the xprt address string array earlierChuck Lever2018-01-163-13/+21
| | | | | | | | | | | This makes the address strings available for debugging messages in earlier stages of transport set up. The first benefit is to get rid of the single-use rep_remote_addr field, saving 128+ bytes in struct rpcrdma_ep. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove unused padding variablesChuck Lever2018-01-162-7/+3
| | | | | | | | | Clean up. Remove fields that should have been removed by commit b3221d6a53c4 ("xprtrdma: Remove logic that constructs RDMA_MSGP type calls"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove ri_reminv_expectedChuck Lever2018-01-162-3/+0
| | | | | | | | | | Clean up. Commit b5f0afbea4f2 ("xprtrdma: Per-connection pad optimization") should have removed this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Per-mode handling for Remote InvalidationChuck Lever2018-01-164-30/+27
| | | | | | | | | | | | Refactoring change: Remote Invalidation is particular to the memory registration mode that is use. Use a callout instead of a generic function to handle Remote Invalidation. This gets rid of the 8-byte flags field in struct rpcrdma_mw, of which only a single bit flag has been allocated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_requestChuck Lever2018-01-161-1/+1
| | | | | | | | | | The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix backchannel allocation of extra rpcrdma_repsChuck Lever2018-01-163-24/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix buffer leak after transport set up failureChuck Lever2018-01-161-11/+2
| | | | | | | | | | | | | | | This leak has been around forever, and is exceptionally rare. EINVAL causes mount to fail with "an incorrect mount option was specified" although it's not likely that one of the mount options is incorrect. Instead, return ENODEV in this case, as this appears to be an issue with system or device configuration rather than a specific mount option. Some obsolete comments are also removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-12-165-15/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Anna Schumaker: "This has two stable bugfixes, one to fix a BUG_ON() when nfs_commit_inode() is called with no outstanding commit requests and another to fix a race in the SUNRPC receive codepath. Additionally, there are also fixes for an NFS client deadlock and an xprtrdma performance regression. Summary: Stable bugfixes: - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a commit in the case that there were no commit requests. - SUNRPC: Fix a race in the receive code path Other fixes: - NFS: Fix a deadlock in nfs client initialization - xprtrdma: Fix a performance regression for small IOs" * tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Fix a race in the receive code path nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests xprtrdma: Spread reply processing over more CPUs nfs: fix a deadlock in nfs client initialization
| * SUNRPC: Fix a race in the receive code pathTrond Myklebust2017-12-151-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Spread reply processing over more CPUsChuck Lever2017-12-154-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker2017-12-153-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-12-022-0/+7
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Anna Schumaker: "These patches fix a problem with compiling using an old version of gcc, and also fix up error handling in the SUNRPC layer. - NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid" - SUNRPC: Allow connect to return EHOSTUNREACH - SUNRPC: Handle ENETDOWN errors" * tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Handle ENETDOWN errors SUNRPC: Allow connect to return EHOSTUNREACH NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
| * SUNRPC: Handle ENETDOWN errorsTrond Myklebust2017-11-302-0/+6
| | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Allow connect to return EHOSTUNREACHTrond Myklebust2017-11-291-0/+1
| | | | | | | | | | | | | | Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | SUNRPC: make cache_detail structures constBhumika Goyal2017-11-272-4/+4
| | | | | | | | | | | | | | | | | | Make these const as they are only getting passed to the function cache_create_net having the argument as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc: make the function arg as constBhumika Goyal2017-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | Make the struct cache_detail *tmpl argument of the function cache_create_net as const as it is only getting passed to kmemup having the argument as const void *. Add const to the prototype too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook2017-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>
* | Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-11-184-82/+55
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Lots of good bugfixes, including: - fix a number of races in the NFSv4+ state code - fix some shutdown crashes in multiple-network-namespace cases - relax our 4.1 session limits; if you've an artificially low limit to the number of 4.1 clients that can mount simultaneously, try upgrading" * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits) SUNRPC: Improve ordering of transport processing nfsd: deal with revoked delegations appropriately svcrdma: Enqueue after setting XPT_CLOSE in completion handlers nfsd: use nfs->ns.inum as net ID rpc: remove some BUG()s svcrdma: Preserve CB send buffer across retransmits nfds: avoid gettimeofday for nfssvc_boot time fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t lockd: double unregister of inetaddr notifiers nfsd4: catch some false session retries nfsd4: fix cached replies to solo SEQUENCE compounds sunrcp: make function _svc_create_xprt static SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status nfsd: use ARRAY_SIZE nfsd: give out fewer session slots as limit approaches nfsd: increase DRC cache limit nfsd: remove unnecessary nofilehandle checks nfs_common: convert int to bool ...
| * | SUNRPC: Improve ordering of transport processingTrond Myklebust2017-11-071-70/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since it can take a while before a specific thread gets scheduled, it is better to just implement a first come first served queue mechanism. That way, if a thread is already scheduled and is idle, it can pick up the work to do from the queue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Enqueue after setting XPT_CLOSE in completion handlersChuck Lever2017-11-071-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed the server was sometimes not closing the connection after a flushed Send. For example, if the client responds with an RNR NAK to a Reply from the server, that client might be deadlocked, and thus wouldn't send any more traffic. Thus the server wouldn't have any opportunity to notice the XPT_CLOSE bit has been set. Enqueue the transport so that svcxprt notices the bit even if there is no more transport activity after a flushed completion, QP access error, or device removal event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | rpc: remove some BUG()sJ. Bruce Fields2017-11-071-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be kinder to WARN() and recover in several spots here instead of BUG()ing. Also, it looks like the read_u32_from_xdr_buf() call could actually fail, though it might require a broken (or malicious) client, so convert that to just an error return. Reported-by: Weston Andros Adamson <dros@monkey.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Preserve CB send buffer across retransmitsChuck Lever2017-11-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During each NFSv4 callback Call, an RDMA Send completion frees the page that contains the RPC Call message. If the upper layer determines that a retransmit is necessary, this is too soon. One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 callback request, the following BUG fires on the NFS server: kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 kernel: flags: 0x2fffff80000000() kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 kernel: page dumped because: nonzero _refcount kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: Call Trace: kernel: dump_stack+0x62/0x80 kernel: bad_page+0xfe/0x11a kernel: free_pages_check_bad+0x76/0x78 kernel: free_pcppages_bulk+0x364/0x441 kernel: ? ttwu_do_activate.isra.61+0x71/0x78 kernel: free_hot_cold_page+0x1c5/0x202 kernel: __put_page+0x2c/0x36 kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] This issue exists all the way back to v4.5, but refactoring and code re-organization prevents this simple patch from applying to kernels older than v4.12. The fix is the same, however, if someone needs to backport it. Reported-by: Ben Coddington <bcodding@redhat.com> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314 Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | sunrcp: make function _svc_create_xprt staticColin Ian King2017-11-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function _svc_create_xprt is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol '_svc_create_xprt' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | Merge tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-11-1714-297/+531
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Revalidate "." and ".." correctly on open - Avoid RCU usage in tracepoints - Fix ugly referral attributes - Fix a typo in nomigration mount option - Revert "NFS: Move the flock open mode check into nfs_flock()" Features: - Implement a stronger send queue accounting system for NFS over RDMA - Switch some atomics to the new refcount_t type Other bugfixes and cleanups: - Clean up access mode bits - Remove special-case revalidations in nfs_opendir() - Improve invalidating NFS over RDMA memory for async operations that time out - Handle NFS over RDMA replies with a worqueue - Handle NFS over RDMA sends with a workqueue - Fix up replaying interrupted requests - Remove dead NFS over RDMA definitions - Update NFS over RDMA copyright information - Be more consistent with bool initialization and comparisons - Mark expected switch fall throughs - Various sunrpc tracepoint cleanups - Fix various OPEN races - Fix a typo in nfs_rename() - Use common error handling code in nfs_lock_and_join_request() - Check that some structures are properly cleaned up during net_exit() - Remove net pointer from dprintk()s" * tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (62 commits) NFS: Revert "NFS: Move the flock open mode check into nfs_flock()" NFS: Fix typo in nomigration mount option nfs: Fix ugly referral attributes NFS: super: mark expected switch fall-throughs sunrpc: remove net pointer from messages nfs: remove net pointer from messages sunrpc: exit_net cleanup check added nfs client: exit_net cleanup check added nfs/write: Use common error handling code in nfs_lock_and_join_requests() NFSv4: Replace closed stateids with the "invalid special stateid" NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state NFSv4: Check the open stateid when searching for expired state NFSv4: Clean up nfs4_delegreturn_done NFSv4: cleanup nfs4_close_done NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close NFSv4: Don't try to CLOSE if the stateid 'other' field has changed NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID. NFS: Fix a typo in nfs_rename() NFSv4: Fix open create exclusive when the server reboots ...
| * | sunrpc: remove net pointer from messagesVasily Averin2017-11-172-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Publishing of net pointer is not safe, use net->ns.inum as net ID [ 171.391947] RPC: created new rpcb local clients (rpcb_local_clnt: ..., rpcb_local_clnt4: ...) for net f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: exit_net cleanup check addedVasily Averin2017-11-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Be sure that all_clients list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: Add rpc_request static trace pointChuck Lever2017-11-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Display information about the RPC procedure being requested in the trace log. This sometimes critical information cannot always be derived from other RPC trace entries. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: Fix rpc_task_begin trace pointChuck Lever2017-11-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The rpc_task_begin trace point always display a task ID of zero. Move the trace point call site so that it picks up the new task ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | net: sunrpc: mark expected switch fall-throughsGustavo A. R. Silva2017-11-173-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Update copyright noticesChuck Lever2017-11-174-0/+4
| | | | | | | | | | | | | | | | | | | | | Credit work contributed by Oracle engineers since 2014. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Remove include for linux/prefetch.hChuck Lever2017-11-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up. This include should have been removed by commit 23826c7aeac7 ("xprtrdma: Serialize credit accounting again"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | rpcrdma: Remove C structure definitions of XDR data itemsChuck Lever2017-11-172-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: C-structure style XDR encoding and decoding logic has been replaced over the past several merge windows on both the client and server. These data structures are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE modeChuck Lever2017-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lift the Send and LocalInv completion handlers out of soft IRQ mode to make room for other work. Also, move the Send CQ to a different CPU than the CPU where the Receive CQ is running, for improved scalability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Remove atomic send completion countingChuck Lever2017-11-173-33/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | The sendctx circular queue now guarantees that xprtrdma cannot overflow the Send Queue, so remove the remaining bits of the original Send WQE counting mechanism. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: RPC completion should wait for Send completionChuck Lever2017-11-174-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an RPC Call includes a file data payload, that payload can come from pages in the page cache, or a user buffer (for direct I/O). If the payload can fit inline, xprtrdma includes it in the Send using a scatter-gather technique. xprtrdma mustn't allow the RPC consumer to re-use the memory where that payload resides before the Send completes. Otherwise, the new contents of that memory would be exposed by an HCA retransmit of the Send operation. So, block RPC completion on Send completion, but only in the case where a separate file data payload is part of the Send. This prevents the reuse of that memory while it is still part of a Send operation without an undue cost to other cases. Waiting is avoided in the common case because typically the Send will have completed long before the RPC Reply arrives. These days, an RPC timeout will trigger a disconnect, which tears down the QP. The disconnect flushes all waiting Sends. This bounds the amount of time the reply handler has to wait for a Send completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Refactor rpcrdma_deferred_completionChuck Lever2017-11-173-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Invoke a common routine for releasing hardware resources (for example, invalidating MRs). This needs to be done whether an RPC Reply has arrived or the RPC was terminated early. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Add a field of bit flags to struct rpcrdma_reqChuck Lever2017-11-174-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | We have one boolean flag in rpcrdma_req today. I'd like to add more flags, so convert that boolean to a bit flag. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>