summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-01-311666-45602/+120519
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
| * tls: Add support for encryption using async offload acceleratorVakul Garg2018-01-312-1/+9
| | | | | | | | | | | | | | | | | | | | | | Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM operation. If they are enabled, crypto_aead_encrypt() return error code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion till the time the response for crypto offload request is received. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip6mr: fix stale iteratorNikolay Aleksandrov2018-01-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ #128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: kconfig: Remove blank help textsUlf Magnusson2018-01-311-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | Blank help texts are probably either a typo, a Kconfig misunderstanding, or some kind of half-committing to adding a help text (in which case a TODO comment would be clearer, if the help text really can't be added right away). Best to remove them, IMO. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * openvswitch: meter: Use 64-bit arithmetic instead of 32-bitGustavo A. R. Silva2018-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add suffix LL to constant 1000 in order to give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type long long int (64 bits, signed). The expression (band->burst_size + band->rate) * 1000 is currently being evaluated using 32-bit arithmetic. Addresses-Coverity-ID: 1461563 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_nv: fix potential integer overflow in tcpnv_ackedGustavo A. R. Silva2018-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add suffix ULL to constant 80000 in order to avoid a potential integer overflow and give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type u64. The current cast to u64 effectively applies to the whole expression as an argument of type u64 to be passed to div64_u64, but it does not prevent it from being evaluated using 32-bit arithmetic instead of 64-bit arithmetic. Also, once the expression is properly evaluated using 64-bit arithmentic, there is no need for the parentheses and the external cast to u64. Addresses-Coverity-ID: 1357588 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * r8169: fix RTL8168EP take too long to complete driver initialization.Chunhao Lin2018-01-311-2/+2
| | | | | | | | | | | | | | | | | | | | Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver waiting until timeout. Fix this by waiting for the right register bit. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qmi_wwan: Add support for Quectel EP06Kristian Evensen2018-01-311-0/+1
| | | | | | | | | | | | | | | | | | The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINKChristian Brauner2018-01-311-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Backwards Compatibility: If userspace wants to determine whether RTM_NEWLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_NEWLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipmr: Fix ptrdiff_t print formattingJames Hogan2018-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipmr_vif_seq_show() prints the difference between two pointers with the format string %2zd (z for size_t), however the correct format string is %2td instead (t for ptrdiff_t). The same bug in ip6mr_vif_seq_show() was already fixed long ago by commit d430a227d272 ("bogus format in ip6mr"). Signed-off-by: James Hogan <jhogan@kernel.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Wait for device response when changing MACThomas Falcon2018-01-301-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wait for a response from the VNIC server before exiting after setting the MAC address. The resolves an issue with bonding a VNIC client in ALB or TLB modes. The bonding driver was changing the MAC address more rapidly than the device could respond, causing the following errors. "bond0: the hw address of slave eth2 is in use by the bond; couldn't find a slave with a free hw address to give it (this should not have happened)" If the function waits until the change is finalized, these errors are avoided. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qlcnic: fix deadlock bugJunxiao Bi2018-01-291-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following soft lockup was caught. This is a deadlock caused by recusive locking. Process kworker/u40:1:28016 was holding spin lock "mbx->queue_lock" in qlcnic_83xx_mailbox_worker(), while a softirq came in and ask the same spin lock in qlcnic_83xx_enqueue_mbx_cmd(). This lock should be hold by disable bh.. [161846.962125] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u40:1:28016] [161846.962367] Modules linked in: tun ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe libfcoe libfc sunrpc 8021q mrp garp bridge stp llc bonding dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core i2c_i801 shpchp lpc_ich mfd_core ioatdma ipmi_devintf ipmi_si ipmi_msghandler sg ext4 jbd2 mbcache2 sr_mod cdrom sd_mod igb i2c_algo_bit i2c_core ahci libahci megaraid_sas ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel qla2xxx scsi_transport_fc qlcnic crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod [161846.962454] [161846.962460] CPU: 1 PID: 28016 Comm: kworker/u40:1 Not tainted 4.1.12-94.5.9.el6uek.x86_64 #2 [161846.962463] Hardware name: Oracle Corporation SUN SERVER X4-2L /ASSY,MB,X4-2L , BIOS 26050100 09/19/2017 [161846.962489] Workqueue: qlcnic_mailbox qlcnic_83xx_mailbox_worker [qlcnic] [161846.962493] task: ffff8801f2e34600 ti: ffff88004ca5c000 task.ti: ffff88004ca5c000 [161846.962496] RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] xen_hypercall_sched_op+0xa/0x20 [161846.962506] RSP: e02b:ffff880202e43388 EFLAGS: 00000206 [161846.962509] RAX: 0000000000000000 RBX: ffff8801f6996b70 RCX: ffffffff810013aa [161846.962511] RDX: ffff880202e433cc RSI: ffff880202e433b0 RDI: 0000000000000003 [161846.962513] RBP: ffff880202e433d0 R08: 0000000000000000 R09: ffff8801fe893200 [161846.962516] R10: ffff8801fe400538 R11: 0000000000000206 R12: ffff880202e4b000 [161846.962518] R13: 0000000000000050 R14: 0000000000000001 R15: 000000000000020d [161846.962528] FS: 0000000000000000(0000) GS:ffff880202e40000(0000) knlGS:ffff880202e40000 [161846.962531] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [161846.962533] CR2: 0000000002612640 CR3: 00000001bb796000 CR4: 0000000000042660 [161846.962536] Stack: [161846.962538] ffff880202e43608 0000000000000000 ffffffff813f0442 ffff880202e433b0 [161846.962543] 0000000000000000 ffff880202e433cc ffffffff00000001 0000000000000000 [161846.962547] 00000009813f03d6 ffff880202e433e0 ffffffff813f0460 ffff880202e43440 [161846.962552] Call Trace: [161846.962555] <IRQ> [161846.962565] [<ffffffff813f0442>] ? xen_poll_irq_timeout+0x42/0x50 [161846.962570] [<ffffffff813f0460>] xen_poll_irq+0x10/0x20 [161846.962578] [<ffffffff81014222>] xen_lock_spinning+0xe2/0x110 [161846.962583] [<ffffffff81013f01>] __raw_callee_save_xen_lock_spinning+0x11/0x20 [161846.962592] [<ffffffff816e5c57>] ? _raw_spin_lock+0x57/0x80 [161846.962609] [<ffffffffa028acfc>] qlcnic_83xx_enqueue_mbx_cmd+0x7c/0xe0 [qlcnic] [161846.962623] [<ffffffffa028e008>] qlcnic_83xx_issue_cmd+0x58/0x210 [qlcnic] [161846.962636] [<ffffffffa028caf2>] qlcnic_83xx_sre_macaddr_change+0x162/0x1d0 [qlcnic] [161846.962649] [<ffffffffa028cb8b>] qlcnic_83xx_change_l2_filter+0x2b/0x30 [qlcnic] [161846.962657] [<ffffffff8160248b>] ? __skb_flow_dissect+0x18b/0x650 [161846.962670] [<ffffffffa02856e5>] qlcnic_send_filter+0x205/0x250 [qlcnic] [161846.962682] [<ffffffffa0285c77>] qlcnic_xmit_frame+0x547/0x7b0 [qlcnic] [161846.962691] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0 [161846.962696] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0 [161846.962701] [<ffffffff81630112>] sch_direct_xmit+0x112/0x220 [161846.962706] [<ffffffff8160b80f>] __dev_queue_xmit+0x1df/0x5e0 [161846.962710] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20 [161846.962721] [<ffffffffa0575bd5>] bond_dev_queue_xmit+0x35/0x80 [bonding] [161846.962729] [<ffffffffa05769fb>] __bond_start_xmit+0x1cb/0x210 [bonding] [161846.962736] [<ffffffffa0576a71>] bond_start_xmit+0x31/0x60 [bonding] [161846.962740] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0 [161846.962745] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0 [161846.962749] [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0 [161846.962754] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20 [161846.962760] [<ffffffffa05cfa72>] vlan_dev_hard_start_xmit+0xb2/0x150 [8021q] [161846.962764] [<ffffffff8160ac22>] xmit_one+0x82/0x1a0 [161846.962769] [<ffffffff8160ad90>] dev_hard_start_xmit+0x50/0xa0 [161846.962773] [<ffffffff8160bb1e>] __dev_queue_xmit+0x4ee/0x5e0 [161846.962777] [<ffffffff8160bc33>] dev_queue_xmit_sk+0x13/0x20 [161846.962789] [<ffffffffa05adf74>] br_dev_queue_push_xmit+0x54/0xa0 [bridge] [161846.962797] [<ffffffffa05ae4ff>] br_forward_finish+0x2f/0x90 [bridge] [161846.962807] [<ffffffff810b0dad>] ? ttwu_do_wakeup+0x1d/0x100 [161846.962811] [<ffffffff815f929b>] ? __alloc_skb+0x8b/0x1f0 [161846.962818] [<ffffffffa05ae04d>] __br_forward+0x8d/0x120 [bridge] [161846.962822] [<ffffffff815f613b>] ? __kmalloc_reserve+0x3b/0xa0 [161846.962829] [<ffffffff810be55e>] ? update_rq_runnable_avg+0xee/0x230 [161846.962836] [<ffffffffa05ae176>] br_forward+0x96/0xb0 [bridge] [161846.962845] [<ffffffffa05af85e>] br_handle_frame_finish+0x1ae/0x420 [bridge] [161846.962853] [<ffffffffa05afc4f>] br_handle_frame+0x17f/0x260 [bridge] [161846.962862] [<ffffffffa05afad0>] ? br_handle_frame_finish+0x420/0x420 [bridge] [161846.962867] [<ffffffff8160d057>] __netif_receive_skb_core+0x1f7/0x870 [161846.962872] [<ffffffff8160d6f2>] __netif_receive_skb+0x22/0x70 [161846.962877] [<ffffffff8160d913>] netif_receive_skb_internal+0x23/0x90 [161846.962884] [<ffffffffa07512ea>] ? xenvif_idx_release+0xea/0x100 [xen_netback] [161846.962889] [<ffffffff816e5a10>] ? _raw_spin_unlock_irqrestore+0x20/0x50 [161846.962893] [<ffffffff8160e624>] netif_receive_skb_sk+0x24/0x90 [161846.962899] [<ffffffffa075269a>] xenvif_tx_submit+0x2ca/0x3f0 [xen_netback] [161846.962906] [<ffffffffa0753f0c>] xenvif_tx_action+0x9c/0xd0 [xen_netback] [161846.962915] [<ffffffffa07567f5>] xenvif_poll+0x35/0x70 [xen_netback] [161846.962920] [<ffffffff8160e01b>] napi_poll+0xcb/0x1e0 [161846.962925] [<ffffffff8160e1c0>] net_rx_action+0x90/0x1c0 [161846.962931] [<ffffffff8108aaba>] __do_softirq+0x10a/0x350 [161846.962938] [<ffffffff8108ae75>] irq_exit+0x125/0x130 [161846.962943] [<ffffffff813f03a9>] xen_evtchn_do_upcall+0x39/0x50 [161846.962950] [<ffffffff816e7ffe>] xen_do_hypervisor_callback+0x1e/0x40 [161846.962952] <EOI> [161846.962959] [<ffffffff816e5c4a>] ? _raw_spin_lock+0x4a/0x80 [161846.962964] [<ffffffff816e5b1e>] ? _raw_spin_lock_irqsave+0x1e/0xa0 [161846.962978] [<ffffffffa028e279>] ? qlcnic_83xx_mailbox_worker+0xb9/0x2a0 [qlcnic] [161846.962991] [<ffffffff810a14e1>] ? process_one_work+0x151/0x4b0 [161846.962995] [<ffffffff8100c3f2>] ? check_events+0x12/0x20 [161846.963001] [<ffffffff810a1960>] ? worker_thread+0x120/0x480 [161846.963005] [<ffffffff816e187b>] ? __schedule+0x30b/0x890 [161846.963010] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0 [161846.963015] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0 [161846.963021] [<ffffffff810a6b3e>] ? kthread+0xce/0xf0 [161846.963025] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70 [161846.963031] [<ffffffff816e6522>] ? ret_from_fork+0x42/0x70 [161846.963035] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70 [161846.963037] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: release sk_frag.page in tcp_disconnectLi RongQing2018-01-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | socket can be disconnected and gets transformed back to a listening socket, if sk_frag.page is not released, which will be cloned into a new socket by sk_clone_lock, but the reference count of this page is increased, lead to a use after free or double free issue Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: Get the address of interface correctly.Tonghao Zhang2018-01-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using ioctl to get address of interface, we can't get it anymore. For example, the command is show as below. # ifconfig eth0 In the patch ("03aef17bb79b3"), the devinet_ioctl does not return a suitable value, even though we can find it in the kernel. Then fix it now. Fixes: 03aef17bb79b3 ("devinet_ioctl(): take copyin/copyout to caller") Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net_sched: gen_estimator: fix lockdep splatEric Dumazet2018-01-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a lockdep splat in gen_new_estimator() / est_fetch_counters() when attempting to lock est->stats_lock. Since est_fetch_counters() is called from BH context from timer interrupt, we need to block BH as well when calling it from process context. Most qdiscs use per cpu counters and are immune to the problem, but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using a spinlock to protect their data. They both call gen_new_estimator() while object is created and not yet alive, so this bug could not trigger a deadlock, only a lockdep splat. Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: macb: Handle HRESP errorHarini Katakam2018-01-292-4/+58
| | | | | | | | | | | | | | | | | | Handle HRESP error by doing a SW reset of RX and TX and re-initializing the descriptors, RX and TX queue pointers. Signed-off-by: Harini Katakam <harinik@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoringGal Pressman2018-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | On TTC table creation, the indirection TIRs should be used instead of the inner indirection TIRs. Fixes: 1ae1df3a1193 ("net/mlx5e: Refactor RSS related objects and code") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Shalom Lagziel <shaloml@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: addrconf: break critical section in addrconf_verify_rtnl()Eric Dumazet2018-01-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heiner reported a lockdep splat [1] This is caused by attempting GFP_KERNEL allocation while RCU lock is held and BH blocked. We believe that addrconf_verify_rtnl() could run for a long period, so instead of using GFP_ATOMIC here as Ido suggested, we should break the critical section and restart it after the allocation. [1] [86220.125562] ============================= [86220.125586] WARNING: suspicious RCU usage [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted [86220.125641] ----------------------------- [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section! [86220.125711] other info that might help us debug this: [86220.125755] rcu_scheduler_active = 2, debug_locks = 1 [86220.125792] 4 locks held by kworker/0:2/1003: [86220.125817] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.125895] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.125959] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20 [86220.126017] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6] [86220.126111] stack backtrace: [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7 [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015 [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6] [86220.126288] Call Trace: [86220.126312] dump_stack+0x70/0x9e [86220.126337] lockdep_rcu_suspicious+0xce/0xf0 [86220.126365] ___might_sleep+0x1d3/0x240 [86220.126390] __might_sleep+0x45/0x80 [86220.126416] kmem_cache_alloc_trace+0x53/0x250 [86220.126458] ? ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.126498] ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.126538] ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.126580] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.126623] addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.126664] ? addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.126708] addrconf_verify_work+0xe/0x20 [ipv6] [86220.126738] process_one_work+0x258/0x680 [86220.126765] worker_thread+0x35/0x3f0 [86220.126790] kthread+0x124/0x140 [86220.126813] ? process_one_work+0x680/0x680 [86220.126839] ? kthread_create_worker_on_cpu+0x40/0x40 [86220.126869] ? umh_complete+0x40/0x40 [86220.126893] ? call_usermodehelper_exec_async+0x12a/0x160 [86220.126926] ret_from_fork+0x4b/0x60 [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420 [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2 [86220.127082] 4 locks held by kworker/0:2/1003: [86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.127242] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20 [86220.127300] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6] [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7 [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015 [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6] [86220.127568] Call Trace: [86220.127591] dump_stack+0x70/0x9e [86220.127616] ___might_sleep+0x14d/0x240 [86220.127644] __might_sleep+0x45/0x80 [86220.127672] kmem_cache_alloc_trace+0x53/0x250 [86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127998] addrconf_verify_work+0xe/0x20 [ipv6] [86220.128032] process_one_work+0x258/0x680 [86220.128063] worker_thread+0x35/0x3f0 [86220.128091] kthread+0x124/0x140 [86220.128117] ? process_one_work+0x680/0x680 [86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40 [86220.128180] ? umh_complete+0x40/0x40 [86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160 [86220.128243] ret_from_fork+0x4b/0x60 Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: change route cache aging logicWei Wang2018-01-291-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current route cache aging logic, if a route has both RTF_EXPIRE and RTF_GATEWAY set, the route will only be removed if the neighbor cache has no NTF_ROUTER flag. Otherwise, even if the route has expired, it won't get deleted. Fix this logic to always check if the route has expired first and then do the gateway neighbor cache check if previous check decide to not remove the exception entry. Fixes: 1859bac04fb6 ("ipv6: remove from fib tree aged out RTF_CACHE dst") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * i40e/i40evf: Update DESC_NEEDED value to reflect larger valueAlexander Duyck2018-01-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When compared to ixgbe and other previous Intel drivers the i40e and i40evf drivers actually reserve 2 additional descriptors in maybe_stop_tx for cache line alignment. We need to update DESC_NEEDED to reflect this as otherwise we are more likely to return TX_BUSY which will cause issues with things like xmit_more. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: cleanup DIM work on device shutdownAndy Gospodarek2018-01-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | Make sure to cancel any pending work that might update driver coalesce settings when taking down an interface. Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation") Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Cc: Michael Chan <michael.chan@broadcom.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ipv6: send unsolicited NA after DADDavid Ahern2018-01-292-4/+31
| | | | | | | | | | | | | | | | | | | | | | Unsolicited IPv6 neighbor advertisements should be sent after DAD completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic addresses and have those sent by addrconf_dad_completed after DAD. Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up") Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: prevent integer wrapping in the rx handlerAndy Spencer2018-01-291-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the frame check sequence (FCS) is split across the last two frames of a fragmented packet, part of the FCS gets counted twice, once when subtracting the FCS, and again when subtracting the previously received data. For example, if 1602 bytes are received, and the first fragment contains the first 1600 bytes (including the first two bytes of the FCS), and the second fragment contains the last two bytes of the FCS: 'skb->len == 1600' from the first fragment size = lstatus & BD_LENGTH_MASK; # 1602 size -= ETH_FCS_LEN; # 1598 size -= skb->len; # -2 Since the size is unsigned, it wraps around and causes a BUG later in the packet handling, as shown below: kernel BUG at ./include/linux/skbuff.h:2068! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c021ec60] skb_pull+0x24/0x44 LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690 Call Trace: [df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable) [df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c [df7edf40] [c023352c] net_rx_action+0x21c/0x274 [df7edf90] [c0329000] __do_softirq+0xd8/0x240 [df7edff0] [c000c108] call_do_irq+0x24/0x3c [c0597e90] [c00041dc] do_IRQ+0x64/0xc4 [c0597eb0] [c000d920] ret_from_except+0x0/0x18 --- interrupt: 501 at arch_cpu_idle+0x24/0x5c Change the size to a signed integer and then trim off any part of the FCS that was received prior to the last fragment. Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames") Signed-off-by: Andy Spencer <aspencer@spacex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'net_sched-reflect-tx_queue_len-change-for-pfifo_fast'David S. Miller2018-01-296-37/+89
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cong Wang says: ==================== net_sched: reflect tx_queue_len change for pfifo_fast This pathcset restores the pfifo_fast qdisc behavior of dropping packets based on latest dev->tx_queue_len. Patch 1 introduces a helper, patch 2 introduces a new Qdisc ops which is called when we modify tx_queue_len, patch 3 implements this ops for pfifo_fast. Please see each patch for details. --- v3: use skb_array_resize_multiple() v2: handle error case for ->change_tx_queue_len() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net_sched: implement ->change_tx_queue_len() for pfifo_fastCong Wang2018-01-291-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len, so we have to resize skb array when we change tx_queue_len. Other qdiscs which read tx_queue_len are fine because they all save it to sch->limit or somewhere else in qdisc during init. They don't have to implement this, it is nicer if they do so that users don't have to re-configure qdisc after changing tx_queue_len. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net_sched: plug in qdisc ops change_tx_queue_lenCong Wang2018-01-293-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new qdisc ops ->change_tx_queue_len() so that each qdisc could decide how to implement this if it wants. Previously we simply read dev->tx_queue_len, after pfifo_fast switches to skb array, we need this API to resize the skb array when we change dev->tx_queue_len. To avoid handling race conditions with TX BH, we need to deactivate all TX queues before change the value and bring them back after we are done, this also makes implementation easier. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: introduce helper dev_change_tx_queue_len()Cong Wang2018-01-294-37/+35
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | This patch promotes the local change_tx_queue_len() to a core helper function, dev_change_tx_queue_len(), so that rtnetlink and net-sysfs could share the code. This also prepares for the following patch. Note, the -EFAULT in the original code doesn't make sense, we should propagate the errno from notifiers. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vhost_net: stop device during reset ownerJason Wang2018-01-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | We don't stop device before reset owner, this means we could try to serve any virtqueue kick before reset dev->worker. This will result a warn since the work was pending at llist during owner resetting. Fix this by stopping device during owner reset. Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'net-Ease-to-follow-an-interface-that-moves-to-another-netns'David S. Miller2018-01-294-23/+36
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nicolas Dichtel says: ==================== net: Ease to follow an interface that moves to another netns The goal of this series is to ease the user to follow an interface that moves to another netns. After this series, with a patched iproute2: $ ip netns bar foo $ ip monitor link & $ ip link set dummy0 netns foo Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6 => new nsid: 0, new ifindex: 6 (was 5 in the previous netns) $ ip link set eth1 netns bar Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3 => new nsid: 1, new ifindex: 3 (same ifindex) $ ip netns bar (id: 1) foo (id: 0) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * dev: advertise the new ifindex when the netns iface changesNicolas Dichtel2018-01-294-20/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal is to let the user follow an interface that moves to another netns. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * dev: always advertise the new nsid when the netns iface changesNicolas Dichtel2018-01-291-4/+1
| |/ | | | | | | | | | | | | | | | | | | | | The user should be able to follow any interface that moves to another netns. There is no reason to hide physical interfaces. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ethernet: cavium: Correct Cavium Thunderx NIC driver names accordingly ↵Vadim Lomovtsev2018-01-295-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to module name It was found that ethtool provides unexisting module name while it queries the specified network device for associated driver information. Then user tries to unload that module by provided module name and fails. This happens because ethtool reads value of DRV_NAME macro, while module name is defined at the driver's Makefile. This patch is to correct Cavium CN88xx Thunder NIC driver names (DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic' to 'nicpf', sync bgx and xcv driver names accordingly to their module names. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'ptr_ring-fixes'David S. Miller2018-01-297-45/+110
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael S. Tsirkin says: ==================== ptr_ring fixes This fixes a bunch of issues around ptr_ring use in net core. One of these: "tap: fix use-after-free" is also needed on net, but can't be backported cleanly. I will post a net patch separately. Lightly tested - Jason, could you pls confirm this addresses the security issue you saw with ptr_ring? Testing reports would be appreciated too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| | * tools/virtio: fix smp_mb on x86Michael S. Tsirkin2018-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Offset 128 overlaps the last word of the redzone. Use 132 which is always beyond that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * tools/virtio: copy READ/WRITE_ONCEMichael S. Tsirkin2018-01-291-0/+57
| | | | | | | | | | | | | | | | | | | | | This is to make ptr_ring test build again. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * tools/virtio: more stubs to fix tools buildMichael S. Tsirkin2018-01-292-1/+2
| | | | | | | | | | | | | | | Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * tools/virtio: switch to __ptr_ring_emptyMichael S. Tsirkin2018-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | We don't rely on lockless guarantees, but it seems cleaner than inverting __ptr_ring_peek. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ptr_ring: prevent queue load/store tearingMichael S. Tsirkin2018-01-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In theory compiler could tear queue loads or stores in two. It does not seem to be happening in practice but it seems easier to convert the cases where this would be a problem to READ/WRITE_ONCE than worry about it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * skb_array: use __ptr_ring_emptyMichael S. Tsirkin2018-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | __skb_array_empty should use __ptr_ring_empty since that's the only legal lockless function. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Revert "net: ptr_ring: otherwise safe empty checks can overrun array bounds"Michael S. Tsirkin2018-01-291-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bcecb4bbf88aa03171c30652bca761cf27755a6b. If we try to allocate an extra entry as the above commit did, and when the requested size is UINT_MAX, addition overflows causing zero size to be passed to kmalloc(). kmalloc then returns ZERO_SIZE_PTR with a subsequent crash. Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ptr_ring: disallow lockless __ptr_ring_fullMichael S. Tsirkin2018-01-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to bcecb4bbf88a ("net: ptr_ring: otherwise safe empty checks can overrun array bounds") a lockless use of __ptr_ring_full might cause an out of bounds access. We can fix this, but it's easier to just disallow lockless __ptr_ring_full for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * tap: fix use-after-freeMichael S. Tsirkin2018-01-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockless access to __ptr_ring_full is only legal if ring is never resized, otherwise it might cause use-after free errors. Simply drop the lockless test, we'll drop the packet a bit later when produce fails. Fixes: 362899b8 ("macvtap: switch to use skb array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ptr_ring: READ/WRITE_ONCE for __ptr_ring_emptyMichael S. Tsirkin2018-01-291-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockless __ptr_ring_empty requires that consumer head is read and written at once, atomically. Annotate accordingly to make sure compiler does it correctly. Switch locked callers to __ptr_ring_peek which does not support the lockless operation. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ptr_ring: clean up documentationMichael S. Tsirkin2018-01-291-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only function safe to call without locks is __ptr_ring_empty. Move documentation about lockless use there to make sure people do not try to use __ptr_ring_peek outside locks. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ptr_ring: keep consumer_head valid at all timesMichael S. Tsirkin2018-01-291-9/+16
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment near __ptr_ring_peek says: * If ring is never resized, and if the pointer is merely * tested, there's no need to take the lock - see e.g. __ptr_ring_empty. but this was in fact never possible since consumer_head would sometimes point outside the ring. Refactor the code so that it's always pointing within a ring. Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6onlyMartin KaFai Lau2018-01-291-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it implicitly implies it is an ipv6only socket. However, in inet6_bind(), this addr_type checking and setting sk->sk_ipv6only to 1 are only done after sk->sk_prot->get_port(sk, snum) has been completed successfully. This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses the 'get_port()'. In particular, when binding SO_REUSEPORT UDP sockets, udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock() checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be 1 while ipv6_only_sock(sk) is still 0 here. The end result is, reuseport_alloc(sk) is called instead of adding sk to the existing sk2->sk_reuseport_cb. It can be reproduced by binding two SO_REUSEPORT UDP sockets on an IPv6 address (!ANY and !MAPPED). Only one of the socket will receive packet. The fix is to set the implicit sk_ipv6only before calling get_port(). The original sk_ipv6only has to be saved such that it can be restored in case get_port() failed. The situation is similar to the inet_reset_saddr(sk) after get_port() has failed. Thanks to Calvin Owens <calvinowens@fb.com> who created an easy reproduction which leads to a fix. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_DELLINK-RTM_SETINK'David S. Miller2018-01-291-21/+75
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christian Brauner says: ==================== rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK Based on the previous discussion this enables passing a IFLA_IF_NETNSID property along with RTM_SETLINK and RTM_DELLINK requests. The patch for RTM_NEWLINK will be sent out in a separate patch since there are more corner-cases to think about. Changelog 2018-01-24: * Preserve old behavior and report -ENODEV when either ifindex or ifname is provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINKChristian Brauner2018-01-291-11/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Backwards Compatibility: If userspace wants to determine whether RTM_DELLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_DELLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINKChristian Brauner2018-01-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Backwards Compatibility: If userspace wants to determine whether RTM_SETLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_SETLINK. Userpace should then fallback to other means. To retain backwards compatibility the kernel will first check whether a IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either one is found it will be used to identify the target network namespace. This implies that users who do not care whether their running kernel supports IFLA_IF_NETNSID with RTM_SETLINK can pass both IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network namespace. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * rtnetlink: enable IFLA_IF_NETNSID in do_setlink()Christian Brauner2018-01-291-7/+47
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RTM_{NEW,SET}LINK already allow operations on other network namespaces by identifying the target network namespace through IFLA_NET_NS_{FD,PID} properties. This is done by looking for the corresponding properties in do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID property. This introduces no functional changes since all callers of do_setlink() currently block IFLA_IF_NETNSID by reporting an error before they reach do_setlink(). This introduces the helpers: static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct nlattr *tb[]) static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb, struct net *src_net, struct nlattr *tb[], int cap) to simplify permission checks and target network namespace retrieval for RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look for IFLA_NET_NS_{FD,PID} properties first before checking for IFLA_IF_NETNSID. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>