| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Backwards Compatibility:
If userspace wants to determine whether RTM_NEWLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_NEWLINK. Userpace should then fallback to other means.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ipmr_vif_seq_show() prints the difference between two pointers with the
format string %2zd (z for size_t), however the correct format string is
%2td instead (t for ptrdiff_t).
The same bug in ip6mr_vif_seq_show() was already fixed long ago by
commit d430a227d272 ("bogus format in ip6mr").
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Wait for a response from the VNIC server before exiting after setting
the MAC address. The resolves an issue with bonding a VNIC client in
ALB or TLB modes. The bonding driver was changing the MAC address more
rapidly than the device could respond, causing the following errors.
"bond0: the hw address of slave eth2 is in use by the bond;
couldn't find a slave with a free hw address to give it
(this should not have happened)"
If the function waits until the change is finalized, these errors are
avoided.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following soft lockup was caught. This is a deadlock caused by
recusive locking.
Process kworker/u40:1:28016 was holding spin lock "mbx->queue_lock" in
qlcnic_83xx_mailbox_worker(), while a softirq came in and ask the same spin
lock in qlcnic_83xx_enqueue_mbx_cmd(). This lock should be hold by disable
bh..
[161846.962125] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u40:1:28016]
[161846.962367] Modules linked in: tun ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe libfcoe libfc sunrpc 8021q mrp garp bridge stp llc bonding dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core i2c_i801 shpchp lpc_ich mfd_core ioatdma ipmi_devintf ipmi_si ipmi_msghandler sg ext4 jbd2 mbcache2 sr_mod cdrom sd_mod igb i2c_algo_bit i2c_core ahci libahci megaraid_sas ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel qla2xxx scsi_transport_fc qlcnic crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod
[161846.962454]
[161846.962460] CPU: 1 PID: 28016 Comm: kworker/u40:1 Not tainted 4.1.12-94.5.9.el6uek.x86_64 #2
[161846.962463] Hardware name: Oracle Corporation SUN SERVER X4-2L /ASSY,MB,X4-2L , BIOS 26050100 09/19/2017
[161846.962489] Workqueue: qlcnic_mailbox qlcnic_83xx_mailbox_worker [qlcnic]
[161846.962493] task: ffff8801f2e34600 ti: ffff88004ca5c000 task.ti: ffff88004ca5c000
[161846.962496] RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20
[161846.962506] RSP: e02b:ffff880202e43388 EFLAGS: 00000206
[161846.962509] RAX: 0000000000000000 RBX: ffff8801f6996b70 RCX: ffffffff810013aa
[161846.962511] RDX: ffff880202e433cc RSI: ffff880202e433b0 RDI: 0000000000000003
[161846.962513] RBP: ffff880202e433d0 R08: 0000000000000000 R09: ffff8801fe893200
[161846.962516] R10: ffff8801fe400538 R11: 0000000000000206 R12: ffff880202e4b000
[161846.962518] R13: 0000000000000050 R14: 0000000000000001 R15: 000000000000020d
[161846.962528] FS: 0000000000000000(0000) GS:ffff880202e40000(0000) knlGS:ffff880202e40000
[161846.962531] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[161846.962533] CR2: 0000000002612640 CR3: 00000001bb796000 CR4: 0000000000042660
[161846.962536] Stack:
[161846.962538] ffff880202e43608 0000000000000000 ffffffff813f0442 ffff880202e433b0
[161846.962543] 0000000000000000 ffff880202e433cc ffffffff00000001 0000000000000000
[161846.962547] 00000009813f03d6 ffff880202e433e0 ffffffff813f0460 ffff880202e43440
[161846.962552] Call Trace:
[161846.962555] <IRQ>
[161846.962565] [<ffffffff813f0442>] ? xen_poll_irq_timeout+0x42/0x50
[161846.962570] [<ffffffff813f0460>] xen_poll_irq+0x10/0x20
[161846.962578] [<ffffffff81014222>] xen_lock_spinning+0xe2/0x110
[161846.962583] [<ffffffff81013f01>] __raw_callee_save_xen_lock_spinning+0x11/0x20
[161846.962592] [<ffffffff816e5c57>] ? _raw_spin_lock+0x57/0x80
[161846.962609] [<ffffffffa028acfc>] qlcnic_83xx_enqueue_mbx_cmd+0x7c/0xe0 [qlcnic]
[161846.962623] [<ffffffffa028e008>] qlcnic_83xx_issue_cmd+0x58/0x210 [qlcnic]
[161846.962636] [<ffffffffa028caf2>] qlcnic_83xx_sre_macaddr_change+0x162/0x1d0 [qlcnic]
[161846.962649] [<ffffffffa028cb8b>] qlcnic_83xx_change_l2_filter+0x2b/0x30 [qlcnic]
[161846.962657] [<ffffffff8160248b>] ? __skb_flow_dissect+0x18b/0x650
[161846.962670] [<ffffffffa02856e5>] qlcnic_send_filter+0x205/0x250 [qlcnic]
[161846.962682] [<ffffffffa0285c77>] qlcnic_xmit_frame+0x547/0x7b0 [qlcnic]
[161846.962691] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962696] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962701] [<ffffffff81630112>] sch_direct_xmit+0x112/0x220
[161846.962706] [<ffffffff8160b80f>] __dev_queue_xmit+0x1df/0x5e0
[161846.962710] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962721] [<ffffffffa0575bd5>] bond_dev_queue_xmit+0x35/0x80 [bonding]
[161846.962729] [<ffffffffa05769fb>] __bond_start_xmit+0x1cb/0x210 [bonding]
[161846.962736] [<ffffffffa0576a71>] bond_start_xmit+0x31/0x60 [bonding]
[161846.962740] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962745] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962749] [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0
[161846.962754] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962760] [<ffffffffa05cfa72>] vlan_dev_hard_start_xmit+0xb2/0x150 [8021q]
[161846.962764] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0
[161846.962769] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0
[161846.962773] [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0
[161846.962777] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20
[161846.962789] [<ffffffffa05adf74>] br_dev_queue_push_xmit+0x54/0xa0 [bridge]
[161846.962797] [<ffffffffa05ae4ff>] br_forward_finish+0x2f/0x90 [bridge]
[161846.962807] [<ffffffff810b0dad>] ? ttwu_do_wakeup+0x1d/0x100
[161846.962811] [<ffffffff815f929b>] ? __alloc_skb+0x8b/0x1f0
[161846.962818] [<ffffffffa05ae04d>] __br_forward+0x8d/0x120 [bridge]
[161846.962822] [<ffffffff815f613b>] ? __kmalloc_reserve+0x3b/0xa0
[161846.962829] [<ffffffff810be55e>] ? update_rq_runnable_avg+0xee/0x230
[161846.962836] [<ffffffffa05ae176>] br_forward+0x96/0xb0 [bridge]
[161846.962845] [<ffffffffa05af85e>] br_handle_frame_finish+0x1ae/0x420 [bridge]
[161846.962853] [<ffffffffa05afc4f>] br_handle_frame+0x17f/0x260 [bridge]
[161846.962862] [<ffffffffa05afad0>] ? br_handle_frame_finish+0x420/0x420 [bridge]
[161846.962867] [<ffffffff8160d057>] __netif_receive_skb_core+0x1f7/0x870
[161846.962872] [<ffffffff8160d6f2>] __netif_receive_skb+0x22/0x70
[161846.962877] [<ffffffff8160d913>] netif_receive_skb_internal+0x23/0x90
[161846.962884] [<ffffffffa07512ea>] ? xenvif_idx_release+0xea/0x100 [xen_netback]
[161846.962889] [<ffffffff816e5a10>] ? _raw_spin_unlock_irqrestore+0x20/0x50
[161846.962893] [<ffffffff8160e624>] netif_receive_skb_sk+0x24/0x90
[161846.962899] [<ffffffffa075269a>] xenvif_tx_submit+0x2ca/0x3f0 [xen_netback]
[161846.962906] [<ffffffffa0753f0c>] xenvif_tx_action+0x9c/0xd0 [xen_netback]
[161846.962915] [<ffffffffa07567f5>] xenvif_poll+0x35/0x70 [xen_netback]
[161846.962920] [<ffffffff8160e01b>] napi_poll+0xcb/0x1e0
[161846.962925] [<ffffffff8160e1c0>] net_rx_action+0x90/0x1c0
[161846.962931] [<ffffffff8108aaba>] __do_softirq+0x10a/0x350
[161846.962938] [<ffffffff8108ae75>] irq_exit+0x125/0x130
[161846.962943] [<ffffffff813f03a9>] xen_evtchn_do_upcall+0x39/0x50
[161846.962950] [<ffffffff816e7ffe>] xen_do_hypervisor_callback+0x1e/0x40
[161846.962952] <EOI>
[161846.962959] [<ffffffff816e5c4a>] ? _raw_spin_lock+0x4a/0x80
[161846.962964] [<ffffffff816e5b1e>] ? _raw_spin_lock_irqsave+0x1e/0xa0
[161846.962978] [<ffffffffa028e279>] ? qlcnic_83xx_mailbox_worker+0xb9/0x2a0 [qlcnic]
[161846.962991] [<ffffffff810a14e1>] ? process_one_work+0x151/0x4b0
[161846.962995] [<ffffffff8100c3f2>] ? check_events+0x12/0x20
[161846.963001] [<ffffffff810a1960>] ? worker_thread+0x120/0x480
[161846.963005] [<ffffffff816e187b>] ? __schedule+0x30b/0x890
[161846.963010] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[161846.963015] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[161846.963021] [<ffffffff810a6b3e>] ? kthread+0xce/0xf0
[161846.963025] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[161846.963031] [<ffffffff816e6522>] ? ret_from_fork+0x42/0x70
[161846.963035] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[161846.963037] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
socket can be disconnected and gets transformed back to a listening
socket, if sk_frag.page is not released, which will be cloned into
a new socket by sk_clone_lock, but the reference count of this page
is increased, lead to a use after free or double free issue
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using ioctl to get address of interface, we can't
get it anymore. For example, the command is show as below.
# ifconfig eth0
In the patch ("03aef17bb79b3"), the devinet_ioctl does not
return a suitable value, even though we can find it in
the kernel. Then fix it now.
Fixes: 03aef17bb79b3 ("devinet_ioctl(): take copyin/copyout to caller")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syzbot reported a lockdep splat in gen_new_estimator() /
est_fetch_counters() when attempting to lock est->stats_lock.
Since est_fetch_counters() is called from BH context from timer
interrupt, we need to block BH as well when calling it from process
context.
Most qdiscs use per cpu counters and are immune to the problem,
but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using
a spinlock to protect their data. They both call gen_new_estimator()
while object is created and not yet alive, so this bug could
not trigger a deadlock, only a lockdep splat.
Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Handle HRESP error by doing a SW reset of RX and TX and
re-initializing the descriptors, RX and TX queue pointers.
Signed-off-by: Harini Katakam <harinik@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
On TTC table creation, the indirection TIRs should be used instead of
the inner indirection TIRs.
Fixes: 1ae1df3a1193 ("net/mlx5e: Refactor RSS related objects and code")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Heiner reported a lockdep splat [1]
This is caused by attempting GFP_KERNEL allocation while RCU lock is
held and BH blocked.
We believe that addrconf_verify_rtnl() could run for a long period,
so instead of using GFP_ATOMIC here as Ido suggested, we should break
the critical section and restart it after the allocation.
[1]
[86220.125562] =============================
[86220.125586] WARNING: suspicious RCU usage
[86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
[86220.125641] -----------------------------
[86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section!
[86220.125711]
other info that might help us debug this:
[86220.125755]
rcu_scheduler_active = 2, debug_locks = 1
[86220.125792] 4 locks held by kworker/0:2/1003:
[86220.125817] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
[86220.125895] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
[86220.125959] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
[86220.126017] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
[86220.126111]
stack backtrace:
[86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
[86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
[86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
[86220.126288] Call Trace:
[86220.126312] dump_stack+0x70/0x9e
[86220.126337] lockdep_rcu_suspicious+0xce/0xf0
[86220.126365] ___might_sleep+0x1d3/0x240
[86220.126390] __might_sleep+0x45/0x80
[86220.126416] kmem_cache_alloc_trace+0x53/0x250
[86220.126458] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.126498] ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.126538] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.126580] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.126623] addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.126664] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.126708] addrconf_verify_work+0xe/0x20 [ipv6]
[86220.126738] process_one_work+0x258/0x680
[86220.126765] worker_thread+0x35/0x3f0
[86220.126790] kthread+0x124/0x140
[86220.126813] ? process_one_work+0x680/0x680
[86220.126839] ? kthread_create_worker_on_cpu+0x40/0x40
[86220.126869] ? umh_complete+0x40/0x40
[86220.126893] ? call_usermodehelper_exec_async+0x12a/0x160
[86220.126926] ret_from_fork+0x4b/0x60
[86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420
[86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
[86220.127082] 4 locks held by kworker/0:2/1003:
[86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
[86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
[86220.127242] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
[86220.127300] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
[86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
[86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
[86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
[86220.127568] Call Trace:
[86220.127591] dump_stack+0x70/0x9e
[86220.127616] ___might_sleep+0x14d/0x240
[86220.127644] __might_sleep+0x45/0x80
[86220.127672] kmem_cache_alloc_trace+0x53/0x250
[86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.127998] addrconf_verify_work+0xe/0x20 [ipv6]
[86220.128032] process_one_work+0x258/0x680
[86220.128063] worker_thread+0x35/0x3f0
[86220.128091] kthread+0x124/0x140
[86220.128117] ? process_one_work+0x680/0x680
[86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40
[86220.128180] ? umh_complete+0x40/0x40
[86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160
[86220.128243] ret_from_fork+0x4b/0x60
Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In current route cache aging logic, if a route has both RTF_EXPIRE and
RTF_GATEWAY set, the route will only be removed if the neighbor cache
has no NTF_ROUTER flag. Otherwise, even if the route has expired, it
won't get deleted.
Fix this logic to always check if the route has expired first and then
do the gateway neighbor cache check if previous check decide to not
remove the exception entry.
Fixes: 1859bac04fb6 ("ipv6: remove from fib tree aged out RTF_CACHE dst")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compared to ixgbe and other previous Intel drivers the i40e and i40evf
drivers actually reserve 2 additional descriptors in maybe_stop_tx for
cache line alignment. We need to update DESC_NEEDED to reflect this as
otherwise we are more likely to return TX_BUSY which will cause issues with
things like xmit_more.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure to cancel any pending work that might update driver coalesce
settings when taking down an interface.
Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Unsolicited IPv6 neighbor advertisements should be sent after DAD
completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
addresses and have those sent by addrconf_dad_completed after DAD.
Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up")
Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the frame check sequence (FCS) is split across the last two frames
of a fragmented packet, part of the FCS gets counted twice, once when
subtracting the FCS, and again when subtracting the previously received
data.
For example, if 1602 bytes are received, and the first fragment contains
the first 1600 bytes (including the first two bytes of the FCS), and the
second fragment contains the last two bytes of the FCS:
'skb->len == 1600' from the first fragment
size = lstatus & BD_LENGTH_MASK; # 1602
size -= ETH_FCS_LEN; # 1598
size -= skb->len; # -2
Since the size is unsigned, it wraps around and causes a BUG later in
the packet handling, as shown below:
kernel BUG at ./include/linux/skbuff.h:2068!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c021ec60] skb_pull+0x24/0x44
LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
Call Trace:
[df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
[df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
[df7edf40] [c023352c] net_rx_action+0x21c/0x274
[df7edf90] [c0329000] __do_softirq+0xd8/0x240
[df7edff0] [c000c108] call_do_irq+0x24/0x3c
[c0597e90] [c00041dc] do_IRQ+0x64/0xc4
[c0597eb0] [c000d920] ret_from_except+0x0/0x18
--- interrupt: 501 at arch_cpu_idle+0x24/0x5c
Change the size to a signed integer and then trim off any part of the
FCS that was received prior to the last fragment.
Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames")
Signed-off-by: Andy Spencer <aspencer@spacex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Cong Wang says:
====================
net_sched: reflect tx_queue_len change for pfifo_fast
This pathcset restores the pfifo_fast qdisc behavior of dropping
packets based on latest dev->tx_queue_len. Patch 1 introduces
a helper, patch 2 introduces a new Qdisc ops which is called when
we modify tx_queue_len, patch 3 implements this ops for pfifo_fast.
Please see each patch for details.
---
v3: use skb_array_resize_multiple()
v2: handle error case for ->change_tx_queue_len()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len,
so we have to resize skb array when we change tx_queue_len.
Other qdiscs which read tx_queue_len are fine because they
all save it to sch->limit or somewhere else in qdisc during init.
They don't have to implement this, it is nicer if they do so
that users don't have to re-configure qdisc after changing
tx_queue_len.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce a new qdisc ops ->change_tx_queue_len() so that
each qdisc could decide how to implement this if it wants.
Previously we simply read dev->tx_queue_len, after pfifo_fast
switches to skb array, we need this API to resize the skb array
when we change dev->tx_queue_len.
To avoid handling race conditions with TX BH, we need to
deactivate all TX queues before change the value and bring them
back after we are done, this also makes implementation easier.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch promotes the local change_tx_queue_len() to a core
helper function, dev_change_tx_queue_len(), so that rtnetlink
and net-sysfs could share the code. This also prepares for the
following patch.
Note, the -EFAULT in the original code doesn't make sense,
we should propagate the errno from notifiers.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't stop device before reset owner, this means we could try to
serve any virtqueue kick before reset dev->worker. This will result a
warn since the work was pending at llist during owner resetting. Fix
this by stopping device during owner reset.
Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Nicolas Dichtel says:
====================
net: Ease to follow an interface that moves to another netns
The goal of this series is to ease the user to follow an interface that
moves to another netns.
After this series, with a patched iproute2:
$ ip netns
bar
foo
$ ip monitor link &
$ ip link set dummy0 netns foo
Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6
=> new nsid: 0, new ifindex: 6 (was 5 in the previous netns)
$ ip link set eth1 netns bar
Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3
=> new nsid: 1, new ifindex: 3 (same ifindex)
$ ip netns
bar (id: 1)
foo (id: 0)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The goal is to let the user follow an interface that moves to another
netns.
CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
| |
The user should be able to follow any interface that moves to another
netns. There is no reason to hide physical interfaces.
CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to module name
It was found that ethtool provides unexisting module name while
it queries the specified network device for associated driver
information. Then user tries to unload that module by provided
module name and fails.
This happens because ethtool reads value of DRV_NAME macro,
while module name is defined at the driver's Makefile.
This patch is to correct Cavium CN88xx Thunder NIC driver names
(DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic'
to 'nicpf', sync bgx and xcv driver names accordingly to their
module names.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Michael S. Tsirkin says:
====================
ptr_ring fixes
This fixes a bunch of issues around ptr_ring use in net core.
One of these: "tap: fix use-after-free" is also needed on net,
but can't be backported cleanly.
I will post a net patch separately.
Lightly tested - Jason, could you pls confirm this
addresses the security issue you saw with ptr_ring?
Testing reports would be appreciated too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Offset 128 overlaps the last word of the redzone.
Use 132 which is always beyond that.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| | |
This is to make ptr_ring test build again.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| | |
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't rely on lockless guarantees, but it
seems cleaner than inverting __ptr_ring_peek.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In theory compiler could tear queue loads or stores in two. It does not
seem to be happening in practice but it seems easier to convert the
cases where this would be a problem to READ/WRITE_ONCE than worry about
it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
__skb_array_empty should use __ptr_ring_empty since that's the only
legal lockless function.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit bcecb4bbf88aa03171c30652bca761cf27755a6b.
If we try to allocate an extra entry as the above commit did, and when
the requested size is UINT_MAX, addition overflows causing zero size to
be passed to kmalloc().
kmalloc then returns ZERO_SIZE_PTR with a subsequent crash.
Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Similar to bcecb4bbf88a ("net: ptr_ring: otherwise safe empty checks can
overrun array bounds") a lockless use of __ptr_ring_full might
cause an out of bounds access.
We can fix this, but it's easier to just disallow lockless
__ptr_ring_full for now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Lockless access to __ptr_ring_full is only legal if ring is
never resized, otherwise it might cause use-after free errors.
Simply drop the lockless test, we'll drop the packet
a bit later when produce fails.
Fixes: 362899b8 ("macvtap: switch to use skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Lockless __ptr_ring_empty requires that consumer head is read and
written at once, atomically. Annotate accordingly to make sure compiler
does it correctly. Switch locked callers to __ptr_ring_peek which does
not support the lockless operation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The only function safe to call without locks
is __ptr_ring_empty. Move documentation about
lockless use there to make sure people do not
try to use __ptr_ring_peek outside locks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The comment near __ptr_ring_peek says:
* If ring is never resized, and if the pointer is merely
* tested, there's no need to take the lock - see e.g. __ptr_ring_empty.
but this was in fact never possible since consumer_head would sometimes
point outside the ring. Refactor the code so that it's always
pointing within a ring.
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket. However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.
This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.
In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here. The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.
It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED). Only one of the socket will
receive packet.
The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed. The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.
Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
reproduction which leads to a fix.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Christian Brauner says:
====================
rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK
Based on the previous discussion this enables passing a IFLA_IF_NETNSID
property along with RTM_SETLINK and RTM_DELLINK requests. The patch for
RTM_NEWLINK will be sent out in a separate patch since there are more
corner-cases to think about.
Changelog 2018-01-24:
* Preserve old behavior and report -ENODEV when either ifindex or ifname is
provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Backwards Compatibility:
If userspace wants to determine whether RTM_DELLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_DELLINK. Userpace should then fallback to other means.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Backwards Compatibility:
If userspace wants to determine whether RTM_SETLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_SETLINK. Userpace should then fallback to other means.
To retain backwards compatibility the kernel will first check whether a
IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either
one is found it will be used to identify the target network namespace.
This implies that users who do not care whether their running kernel
supports IFLA_IF_NETNSID with RTM_SETLINK can pass both
IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network
namespace.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RTM_{NEW,SET}LINK already allow operations on other network namespaces
by identifying the target network namespace through IFLA_NET_NS_{FD,PID}
properties. This is done by looking for the corresponding properties in
do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID
property. This introduces no functional changes since all callers of
do_setlink() currently block IFLA_IF_NETNSID by reporting an error before
they reach do_setlink().
This introduces the helpers:
static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct
nlattr *tb[])
static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
struct net *src_net,
struct nlattr *tb[], int cap)
to simplify permission checks and target network namespace retrieval for
RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended
to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look
for IFLA_NET_NS_{FD,PID} properties first before checking for
IFLA_IF_NETNSID.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| | |
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking fixes from David Miller:
1) The per-network-namespace loopback device, and thus its namespace,
can have its teardown deferred for a long time if a kernel created
TCP socket closes and the namespace is exiting meanwhile. The kernel
keeps trying to finish the close sequence until it times out (which
takes quite some time).
Fix this by forcing the socket closed in this situation, from Dan
Streetman.
2) Fix regression where we're trying to invoke the update_pmtu method
on route types (in this case metadata tunnel routes) that don't
implement the dst_ops method. Fix from Nicolas Dichtel.
3) Fix long standing memory corruption issues in r8169 driver by
performing the chip statistics DMA programming more correctly. From
Francois Romieu.
4) Handle local broadcast sends over VRF routes properly, from David
Ahern.
5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket
can never be released. From Alexey Kodanev.
6) Set poll flags properly in VSOCK protocol layer, from Stefan
Hajnoczi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
net: vrf: Add support for sends to local broadcast address
r8169: fix memory corruption on retrieval of hardware statistics.
net: don't call update_pmtu unconditionally
net: tcp: close sock if net namespace is exiting
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
select(2) with wfds but no rfds must return when the socket is shut down
by the peer. This way userspace notices socket activity and gets -EPIPE
from the next write(2).
Currently select(2) does not return for virtio-vsock when a SEND+RCV
shutdown packet is received. This is because vsock_poll() only sets
POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
socket is in when the shutdown is received.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.
Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:
unregister_netdevice: waiting for bond0 to become free. Usage count = 148
Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sukumar reported that sends to the local broadcast address
(255.255.255.255) are broken. Check for the address in vrf driver
and do not redirect to the VRF device - similar to multicast
packets.
With this change sockets can use SO_BINDTODEVICE to specify an
egress interface and receive responses. Note: the egress interface
can not be a VRF device but needs to be the enslaved device.
https://bugzilla.kernel.org/show_bug.cgi?id=198521
Reported-by: Sukumar Gopalakrishnan <sukumarg1973@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Hardware statistics retrieval hurts in tight invocation loops.
Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"
Let's add a helper to check if update_pmtu is available before calling it.
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|