| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| | |
Lots of overlapping changes and parallel additions, stuff
like that.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
srq_desc_size should be rounded up to pow of two before used, or related
calculation may cause allocating wrong size of memory for srq buffer.
Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1572575610-52530-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Size of pointer to buf field of struct hns_roce_hem_chunk should be
considered when calculating HNS_ROCE_HEM_CHUNK_LEN, or sg table size will
be larger than expected when allocating hem.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1572575610-52530-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Sirong Wang <wangsirong@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs
because it does not need post receive buffer on the responder side.
Consequently, as an enhancement to normal RDMA WRITE request inside the
hfi1 driver, TID RDMA WRITE request should not return such an error status
to ULPs, although it does receive RNR NAKs from the responder when TID
resources are not available. This behavior is violated when
qp->s_rnr_retry_cnt is set in current hfi1 implementation.
This patch enforces these semantics by avoiding any reaction to the updates
of the RNR QP attributes.
Fixes: 3c6cb20a0d17 ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs")
Link: https://lore.kernel.org/r/20191025195842.106825.71532.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For a TID RDMA WRITE request, a QP on the responder side could be put into
a queue when a hardware flow is not available. A RNR NAK will be returned
to the requester with a RNR timeout value based on the position of the QP
in the queue. The tid_rdma_flow_wt variable is used to calculate the
timeout value and is determined by using a MTU of 4096 at the module
loading time. This could reduce the timeout value by half from the desired
value, leading to excessive RNR retries.
This patch fixes the issue by calculating the flow weight with the real
MTU assigned to the QP.
Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The index r_tid_ack is used to indicate the next TID RDMA WRITE request to
acknowledge in the ring s_ack_queue[] on the responder side and should be
set to a valid index other than its initial value before r_tid_tail is
advanced to the next TID RDMA WRITE request and particularly before a TID
RDMA ACK is built. Otherwise, a NULL pointer dereference may result:
BUG: unable to handle kernel paging request at ffff9a32d27abff8
IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
PGD 2749032067 PUD 0
Oops: 0000 1 SMP
Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
Call Trace:
[<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
[<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
[<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
[<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffffaf0bae31>] kthread+0xd1/0xe0
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
[<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP <ffff9a11776abd08>
CR2: ffff9a32d27abff8
This problem can happen if a RESYNC request is received before r_tid_ack
is modified.
This patch fixes the issue by making sure that r_tid_ack is set to a valid
value before a TID RDMA ACK is built. Functions are defined to simplify
the code.
Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If an hfi1 card is inserted in a Gen4 systems, the driver will avoid the
gen3 speed bump and the card will operate at half speed.
This is because the driver avoids the gen3 speed bump when the parent bus
speed isn't identical to gen3, 8.0GT/s. This is not compatible with gen4
and newer speeds.
Fix by relaxing the test to explicitly look for the lower capability
speeds which inherently allows for gen4 and all future speeds.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: James Erwin <james.erwin@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
reload support
2) Do vport ACL configuration on per vport basis when
enabling/disabling a vport. This enables to have vports enabled/disabled
outside of eswitch config for future
3) Split the code for legacy vs offloads mode and make it clear
4) Tide up vport locking and workqueue usage
5) Fix metadata enablement for ECPF
6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION
7) E-Switch and flow steering core low level support and refactoring for
netfilter flowtables offload
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().
This enables to clearly identify PF, VF and non virtual functions.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num
is set to 0, or there will be memory leaks.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
_put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by
process_work(). fix double free by freeing the skb only in process_work().
Fixes: 1dad0ebeea1c ("iw_cxgb4: Avoid touch after free error in ARP failure handlers")
Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com
Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.
WARNING: inconsistent lock state
5.4.0-rc1 #12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0xe1/0x200
_raw_spin_lock_irqsave+0x35/0x50
reg_mr_callback+0x2dd/0x450 [mlx5_ib]
mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
[..]
Possible unsafe locking scenario:
CPU0
----
lock(&(&xa->xa_lock)->rlock#3);
<Interrupt>
lock(&(&xa->xa_lock)->rlock#3);
*** DEADLOCK ***
2 locks held by python3/343:
#0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
#1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]
stack backtrace:
CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x86/0xca
print_usage_bug.cold.50+0x2e5/0x355
mark_lock+0x871/0xb50
? match_held_lock+0x20/0x250
? check_usage_forwards+0x240/0x240
__lock_acquire+0x7de/0x23a0
? __kasan_check_read+0x11/0x20
? mark_lock+0xae/0xb50
? mark_held_locks+0xb0/0xb0
? find_held_lock+0xca/0xf0
lock_acquire+0xe1/0x200
? xa_erase+0x12/0x30
_raw_spin_lock+0x2a/0x40
? xa_erase+0x12/0x30
xa_erase+0x12/0x30
mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
uverbs_free_mw+0x1a/0x20 [ib_uverbs]
destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
[..]
Fixes: 0417791536ae ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.
[..]
rdma_nl_rcv()
mutex_lock()
[..]
rdma_nl_rcv_msg()
ib_get_client_nl_info()
request_module()
iw_cm_init()
rdma_nl_register()
mutex_lock(); <- Deadlock, acquiring mutex again
Due to above call sequence, following call trace and deadlock is observed.
kernel: __mutex_lock+0x35e/0x860
kernel: ? __mutex_lock+0x129/0x860
kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
kernel: rdma_nl_register+0x1a/0x90 [ib_core]
kernel: ? 0xffffffffc029b000
kernel: iw_cm_init+0x34/0x1000 [iw_cm]
kernel: do_one_initcall+0x67/0x2d4
kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
kernel: do_init_module+0x5a/0x223
kernel: load_module+0x1998/0x1e10
kernel: ? __symbol_put+0x60/0x60
kernel: __do_sys_finit_module+0x94/0xe0
kernel: do_syscall_64+0x5a/0x270
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
process stack trace:
[<0>] __request_module+0x1c9/0x460
[<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
[<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
[<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
[<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
[<0>] netlink_unicast+0x179/0x220
[<0>] netlink_sendmsg+0x2f6/0x3f0
[<0>] sock_sendmsg+0x30/0x40
[<0>] ___sys_sendmsg+0x27a/0x290
[<0>] __sys_sendmsg+0x58/0xa0
[<0>] do_syscall_64+0x5a/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.
1. Split the lock protecting the cb_table into a per-index lock, and make
it a rwlock. This lock is used to ensure no callbacks are running after
unregistration returns. Since a module will not be registered once it
is already running callbacks, this avoids the deadlock.
2. Use smp_store_release() to update the cb_table during registration so
that no lock is required. This avoids lockdep problems with thinking
all the rwsems are the same lock class.
Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The counter resource should return -EAGAIN if it was requested for a
different port, this is similar to how QP works if the users provides a
port filter.
Otherwise port filtering in netlink will return broken counter nests.
Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback")
Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that:
if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {
But we don't check if "attr.comp_vector" is negative. It could
potentially lead to an array underflow. My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.
And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.
Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current code tries to derive VLAN ID and compares it with GID
attribute for matching entry. This raw search fails on macvlan
netdevice as its not a VLAN device, but its an upper device of a VLAN
netdevice.
Due to this limitation, incoming QP1 packets fail to match in the
GID table. Such packets are dropped.
Hence, to support it, use the existing rdma_read_gid_l2_fields()
that takes care of diffferent device types.
Fixes: dbf727de7440 ("IB/core: Use GID table in AH creation and dmac resolution")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove spaces from the reported firmware version string.
Actual value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8. 37. 7. 0
Expected value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8.37.7.0
Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As siw_free_qp() is the last routine to access 'siw_base_qp' structure,
freeing this structure early in siw_destroy_qp() could cause
touch-after-free issue.
Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp().
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
kref release routines usually perform memory release operations,
hence, they should not be called with spinlocks held.
one such case is: SIW kref release routine siw_free_qp(), which
can sleep via vfree() while freeing queue memory.
Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks.
Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
pass_accept_req() is using the same skb for handling accept request and
sending accept reply to HW. Here req and rpl structures are pointing to
same skb->data which is over written by INIT_TP_WR() and leads to
accessing corrupt req fields in accept_cr() while checking for ECN flags.
Reordered code in accept_cr() to fetch correct req fields.
Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is no reason for a different pad buffer for the two
packet types.
Expand the current buffer allocation to allow for both
packet types.
Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.
However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.
This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.
Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before QP is closed it changes to ERROR state, when this happens
the QP was left with old rate limit that was already removed from
the table.
Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions")
Signed-off-by: Rafi Wiener <rafiw@mellanox.com>
Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:
CPU0 CPU1
mlx5_ib_alloc_implicit_mr()
implicit_mr_alloc()
live = 1
imr->umem = umem
num_pending_prefetch_inc()
if (live)
atomic_inc(num_pending_prefetch)
atomic_set(num_pending_prefetch,0) // Overwrites other thread's store
Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.
Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.
Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().
The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:
CPU0 CPU1
dereg_mr()
mlx5_ib_advise_mr_prefetch()
srcu_read_lock()
num_pending_prefetch_inc()
if (!live)
live = 0
atomic_read() == 0
// skip flush_workqueue()
atomic_inc()
queue_work();
srcu_read_unlock()
WARN_ON(atomic_read()) // Fails
Swap the order so that the synchronize_srcu() prevents this.
Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes a race of the form:
CPU0 CPU1
mlx5_ib_invalidate_range() mlx5_ib_invalidate_range()
// This one actually makes npages == 0
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
// This one does nothing
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
dying = 1;
dying = 1;
schedule_work(&umem_odp->work);
// Double schedule of the same work
schedule_work(&umem_odp->work); // BOOM
npages and dying must be read and written under the umem_mutex lock.
Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.
This avoids an expensive double critical section in
mlx5_ib_invalidate_range().
Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:
CPU0 CPU1
mlx5_ib_update_xlt()
mlx5_odp_populate_klm()
odp_lookup() == NULL
pklm = ZAP
implicit_mr_get_data()
implicit_mr_alloc()
<update interval tree>
mlx5_ib_update_xlt
mlx5_odp_populate_klm()
odp_lookup() != NULL
pklm = VALID
mlx5_ib_post_send_wait()
mlx5_ib_post_send_wait() // Replaces VALID with ZAP
This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.
The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.
The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.
Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
rdma_for_each_port is already incrementing the iterator's value it
receives therefore, after the first iteration the iterator is increased by
2 which eventually causing wrong queries and possible traces.
Fix the above by removing the old redundant incrementation that was used
before rdma_for_each_port() macro.
Cc: <stable@vger.kernel.org>
Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port")
Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Reviewed-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Properly unwind QP counter rebinding in case of failure.
Trying to rebind the counter after unbiding it is not going to work
reliably, move the unbind to the end so it doesn't have to be unwound.
Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink")
Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing. On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.
Resolve this by allocating the memory for a message off of the heap
instead of the stack. kmalloc() always will give us a proper memory
location that DMA will work correctly from.
Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.
This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().
Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().
Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.
Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().
Note: the kfree call in the first chunk below (in the cm_add_one error
flow) fixes an old, undetected memory leak.
Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device")
Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to surrounding error paths, it is likely that 'goto err_get;' is
expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be
missing.
Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback")
Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
i40iw IB device registration fails with ENODEV.
ib_register_device
setup_device/setup_port_data
i40iw_port_immutable
ib_query_port
iw_query_port
ib_device_get_netdev(ENODEV)
ib_device_get_netdev() does not have a netdev associated
with the ibdev and thus fails.
Use ib_device_set_netdev() to associate netdev to ibdev
in i40iw before IB device registration.
Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes the lock inversion complaint:
============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u16:6/171:
#0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
#1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
#2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
dump_stack+0x8a/0xd6
__lock_acquire.cold+0xe1/0x24d
lock_acquire+0x106/0x240
__mutex_lock+0x12e/0xcb0
mutex_lock_nested+0x1f/0x30
rdma_destroy_id+0x78/0x4a0 [rdma_cm]
iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
cm_work_handler+0xe62/0x1100 [iw_cm]
process_one_work+0x56d/0xac0
worker_thread+0x7a/0x5d0
kthread+0x1bc/0x210
ret_from_fork+0x24/0x30
This is not a bug as there are actually two lock classes here.
Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used
by the application. This patch matches the QPID before dumping them. Also
removes unwanted SRQ id addition to QP id xarray.
Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In sdma_init if rhashtable_init fails the allocated memory for
tmp_sdma_rht should be released.
Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable")
Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
iwarp_query_port
If an iWARP driver is probed and removed while there are no ips set for
the device, it will lead to a reference count leak on the inet device of
the netdevice.
In addition, the netdevice was accessed after already calling netdev_put,
which could lead to using the netdev after already freed.
Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925123332.10746-1-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In siw_qp_llp_write_space(), 'sock' members should be accessed with
sk_callback_lock held, otherwise, it could race with
siw_sk_restore_upcalls(). And this could cause "NULL deref" panic. Below
panic is due to the NULL cep returned from sk_to_cep(sk):
Call Trace:
<IRQ> siw_qp_llp_write_space+0x11/0x40 [siw]
tcp_check_space+0x4c/0xf0
tcp_rcv_established+0x52b/0x630
tcp_v4_do_rcv+0xf4/0x1e0
tcp_v4_rcv+0x9b8/0xab0
ip_protocol_deliver_rcu+0x2c/0x1c0
ip_local_deliver_finish+0x44/0x50
ip_local_deliver+0x6b/0xf0
? ip_protocol_deliver_rcu+0x1c0/0x1c0
ip_rcv+0x52/0xd0
? ip_rcv_finish_core.isra.14+0x390/0x390
__netif_receive_skb_one_core+0x83/0xa0
netif_receive_skb_internal+0x73/0xb0
napi_gro_frags+0x1ff/0x2b0
t4_ethrx_handler+0x4a7/0x740 [cxgb4]
process_responses+0x2c9/0x590 [cxgb4]
? t4_sge_intr_msix+0x1d/0x30 [cxgb4]
? handle_irq_event_percpu+0x51/0x70
? handle_irq_event+0x41/0x60
? handle_edge_irq+0x97/0x1a0
napi_rx_handler+0x14/0xe0 [cxgb4]
net_rx_action+0x2af/0x410
__do_softirq+0xda/0x2a8
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq+0x50/0x60
__local_bh_enable_ip+0x50/0x60
ip_finish_output2+0x18f/0x520
ip_output+0x6e/0xf0
? __ip_finish_output+0x1f0/0x1f0
__ip_queue_xmit+0x14f/0x3d0
? __slab_alloc+0x4b/0x58
__tcp_transmit_skb+0x57d/0xa60
tcp_write_xmit+0x23b/0xfd0
__tcp_push_pending_frames+0x2e/0xf0
tcp_sendmsg_locked+0x939/0xd50
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x57/0x80
siw_tx_hdt+0x894/0xb20 [siw]
? find_busiest_group+0x3e/0x5b0
? common_interrupt+0xa/0xf
? common_interrupt+0xa/0xf
? common_interrupt+0xa/0xf
siw_qp_sq_process+0xf1/0xe60 [siw]
? __wake_up_common_lock+0x87/0xc0
siw_sq_resume+0x33/0xe0 [siw]
siw_run_sq+0xac/0x190 [siw]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? siw_sq_resume+0xe0/0xe0 [siw]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
An extra kfree cleanup was missed since these are now deallocated by core.
Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com
Cc: <stable@vger.kernel.org>
Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core")
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull networking fixes from David Miller:
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Julian noted that rt_uses_gateway has a more subtle use than 'is gateway
set':
https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/
Revert that part of the commit referenced in the Fixes tag.
Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole
in the second cacheline which contains the gateway declaration. So move
rt_gw_family down to the gateway declarations since they are always used
together, and then re-use that u8 for rt_uses_gateway. End result is that
rtable size is unchanged.
Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.
Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Merge updates from Andrew Morton:
- a few hot fixes
- ocfs2 updates
- almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
zsmalloc)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
mm/zsmalloc.c: fix a -Wunused-function warning
zswap: do not map same object twice
zswap: use movable memory if zpool support allocate movable memory
zpool: add malloc_support_movable to zpool_driver
shmem: fix obsolete comment in shmem_getpage_gfp()
mm/madvise: reduce code duplication in error handling paths
mm: mmap: increase sockets maximum memory size pgoff for 32bits
mm/mmap.c: refine find_vma_prev() with rb_last()
riscv: make mmap allocation top-down by default
mips: use generic mmap top-down layout and brk randomization
mips: replace arch specific way to determine 32bit task with generic version
mips: adjust brk randomization offset to fit generic version
mips: use STACK_TOP when computing mmap base address
mips: properly account for stack randomization and stack guard gap
arm: use generic mmap top-down layout and brk randomization
arm: use STACK_TOP when computing mmap base address
arm: properly account for stack randomization and stack guard gap
arm64, mm: make randomization selected by generic topdown mmap layout
arm64, mm: move generic mmap layout functions to mm
arm64: consider stack randomization for mmap base only when necessary
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
[11~From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
v3.
There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:
* The block/bio related changes (Jerome mostly wrote those, but I've had
to move stuff around extensively, and add a little code)
* mm/ changes
* other subsystem patches
* an RFC that shows the current state of the tracking patch set. That
can only be applied after all call sites are converted, but it's good to
get an early look at it.
This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm:
introduce put_user_page*(), placeholder versions").
This patch (of 3):
Provide more capable variation of put_user_pages_dirty_lock(), and delete
put_user_pages_dirty(). This is based on the following:
1. Lots of call sites become simpler if a bool is passed into
put_user_page*(), instead of making the call site choose which
put_user_page*() variant to call.
2. Christoph Hellwig's observation that set_page_dirty_lock() is
usually correct, and set_page_dirty() is usually a bug, or at least
questionable, within a put_user_page*() calling chain.
This leads to the following API choices:
* put_user_pages_dirty_lock(page, npages, make_dirty)
* There is no put_user_pages_dirty(). You have to
hand code that, in the rare case that it's
required.
[jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
(Krzysztof Wilczynski)
- Fix incorrect PCIe device types and remove dev->has_secondary_link
to simplify code that deals with upstream/downstream ports (Mika
Westerberg)
- After suspend, restore Resizable BAR size bits correctly for 1MB
BARs (Sumit Saxena)
- Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)
Virtualization:
- Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
Labs (Ali Saidi)
- Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)
- Remove group write permissions from sysfs sriov_numvfs,
sriov_drivers_autoprobe (Kelsey Skunberg)
Hotplug:
- Simplify pciehp indicator control (Denis Efremov)
Peer-to-peer DMA:
- Allow P2P DMA between root ports for whitelisted bridges (Logan
Gunthorpe)
- Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)
- DMA map P2P DMA requests that traverse host bridge (Logan
Gunthorpe)
Amazon Annapurna Labs host bridge driver:
- Add DT binding and controller driver (Jonathan Chocron)
Hyper-V host bridge driver:
- Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)
- Fix PCI domain number collisions (Haiyang Zhang)
- Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)
- Fix build errors on non-SYSFS config (Randy Dunlap)
i.MX6 host bridge driver:
- Limit DBI register length (Stefan Agner)
Intel VMD host bridge driver:
- Fix config addressing issues (Jon Derrick)
Layerscape host bridge driver:
- Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)
- Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
(Xiaowei Bao)
Mediatek host bridge driver:
- Add MT7629 controller support (Jianjun Wang)
Mobiveil host bridge driver:
- Fix CPU base address setup (Hou Zhiqiang)
- Make "num-lanes" property optional (Hou Zhiqiang)
Tegra host bridge driver:
- Fix OF node reference leak (Nishka Dasgupta)
- Disable MSI for root ports to work around design problem (Vidya
Sagar)
- Add Tegra194 DT binding and controller support (Vidya Sagar)
- Add support for sideband pins and slot regulators (Vidya Sagar)
- Add PIPE2UPHY support (Vidya Sagar)
Misc:
- Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)
- Unexport pci_bus_get(), etc (Kelsey Skunberg)
- Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
the PCI core (Kelsey Skunberg)
- Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)
- Mark expected switch fall-through (Gustavo A. R. Silva)
- Propagate errors for optional regulators and PHYs (Thierry Reding)
- Fix kernel command line resource_alignment parameter issues (Logan
Gunthorpe)"
* tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
arm64: tegra: Add configuration for PCIe C5 sideband signals
PCI: tegra: Add support to enable slot regulators
PCI: tegra: Add support to configure sideband pins
PCI: vmd: Fix shadow offsets to reflect spec changes
PCI: vmd: Fix config addressing when using bus offsets
PCI: dwc: Add validation that PCIe core is set to correct mode
PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
PCI: Add ACS quirk for Amazon Annapurna Labs root ports
PCI: Add Amazon's Annapurna Labs vendor ID
MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
dt-bindings: PCI: tegra: Add sideband pins configuration entries
PCI: tegra: Add Tegra194 PCIe support
PCI: Get rid of dev->has_secondary_link flag
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg().
This is a prep patch to introduce correct mappings for p2pdma transactions
that go through the root complex.
Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|