summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ipv4, ipv6: ensure raw socket message is big enough to hold an IP headerAlexander Potapenko2017-05-042-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied from the userspace contains the IPv4/IPv6 header, so if too few bytes are copied, parts of the header may remain uninitialized. This bug has been detected with KMSAN. For the record, the KMSAN report: ================================================================== BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0 inter: 0 CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x143/0x1b0 lib/dump_stack.c:52 kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078 __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510 nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577 ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 NF_HOOK ./include/linux/netfilter.h:255 rawv6_send_hdrinc net/ipv6/raw.c:673 rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919 inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 sock_sendmsg net/socket.c:643 SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696 SyS_sendto+0xbc/0xe0 net/socket.c:1664 do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246 RIP: 0033:0x436e03 RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000 origin: 00000000d9400053 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257 kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270 slab_alloc_node mm/slub.c:2735 __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341 __kmalloc_reserve net/core/skbuff.c:138 __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231 alloc_skb ./include/linux/skbuff.h:933 alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678 sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903 sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920 rawv6_send_hdrinc net/ipv6/raw.c:638 rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919 inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 sock_sendmsg net/socket.c:643 SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696 SyS_sendto+0xbc/0xe0 net/socket.c:1664 do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246 ================================================================== , triggered by the following syscalls: socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3 sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket instead of a PF_INET6 one. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: remove redundant null check on headColin Ian King2017-05-041-2/+1
| | | | | | | | | | | head is previously null checked and so the 2nd null check on head is redundant and therefore can be removed. Detected by CoverityScan, CID#1399505 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: do not inherit fastopen_req from parentEric Dumazet2017-05-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under fuzzer stress, it is possible that a child gets a non NULL fastopen_req pointer from its parent at accept() time, when/if parent morphs from listener to active session. We need to make sure this can not happen, by clearing the field after socket cloning. BUG: Double free or freeing an invalid pointer Unexpected shadow byte: 0xFB CPU: 3 PID: 20933 Comm: syz-executor3 Not tainted 4.11.0+ #306 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x292/0x395 lib/dump_stack.c:52 kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 kasan_report_double_free+0x5c/0x70 mm/kasan/report.c:185 kasan_slab_free+0x9d/0xc0 mm/kasan/kasan.c:580 slab_free_hook mm/slub.c:1357 [inline] slab_free_freelist_hook mm/slub.c:1379 [inline] slab_free mm/slub.c:2961 [inline] kfree+0xe8/0x2b0 mm/slub.c:3882 tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline] tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328 inet_child_forget+0xb8/0x600 net/ipv4/inet_connection_sock.c:898 inet_csk_reqsk_queue_add+0x1e7/0x250 net/ipv4/inet_connection_sock.c:928 tcp_get_cookie_sock+0x21a/0x510 net/ipv4/syncookies.c:217 cookie_v4_check+0x1a19/0x28b0 net/ipv4/syncookies.c:384 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1384 [inline] tcp_v4_do_rcv+0x731/0x940 net/ipv4/tcp_ipv4.c:1421 tcp_v4_rcv+0x2dc0/0x31c0 net/ipv4/tcp_ipv4.c:1715 ip_local_deliver_finish+0x4cc/0xc20 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:257 [inline] ip_local_deliver+0x1ce/0x700 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:492 [inline] ip_rcv_finish+0xb1d/0x20b0 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:257 [inline] ip_rcv+0xd8c/0x19c0 net/ipv4/ip_input.c:487 __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4210 __netif_receive_skb+0x2a/0x1a0 net/core/dev.c:4248 process_backlog+0xe5/0x6c0 net/core/dev.c:4868 napi_poll net/core/dev.c:5270 [inline] net_rx_action+0xe70/0x18e0 net/core/dev.c:5335 __do_softirq+0x2fb/0xb99 kernel/softirq.c:284 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:899 </IRQ> do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328 do_softirq kernel/softirq.c:176 [inline] __local_bh_enable_ip+0x1cf/0x1e0 kernel/softirq.c:181 local_bh_enable include/linux/bottom_half.h:31 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:931 [inline] ip_finish_output2+0x9ab/0x15e0 net/ipv4/ip_output.c:230 ip_finish_output+0xa35/0xdf0 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f6/0x7b0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_queue_xmit+0x9a8/0x1a10 net/ipv4/ip_output.c:503 tcp_transmit_skb+0x1ade/0x3470 net/ipv4/tcp_output.c:1057 tcp_write_xmit+0x79e/0x55b0 net/ipv4/tcp_output.c:2265 __tcp_push_pending_frames+0xfa/0x3a0 net/ipv4/tcp_output.c:2450 tcp_push+0x4ee/0x780 net/ipv4/tcp.c:683 tcp_sendmsg+0x128d/0x39b0 net/ipv4/tcp.c:1342 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x660/0x810 net/socket.c:1696 SyS_sendto+0x40/0x50 net/socket.c:1664 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x446059 RSP: 002b:00007faa6761fb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 0000000000446059 RDX: 0000000000000001 RSI: 0000000020ba3fcd RDI: 0000000000000017 RBP: 00000000006e40a0 R08: 0000000020ba4ff0 R09: 0000000000000010 R10: 0000000020000000 R11: 0000000000000282 R12: 0000000000708150 R13: 0000000000000000 R14: 00007faa676209c0 R15: 00007faa67620700 Object at ffff88003b5bbcb8, in cache kmalloc-64 size: 64 Allocated: PID = 20909 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:513 set_track mm/kasan/kasan.c:525 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745 kmalloc include/linux/slab.h:490 [inline] kzalloc include/linux/slab.h:663 [inline] tcp_sendmsg_fastopen net/ipv4/tcp.c:1094 [inline] tcp_sendmsg+0x221a/0x39b0 net/ipv4/tcp.c:1139 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x660/0x810 net/socket.c:1696 SyS_sendto+0x40/0x50 net/socket.c:1664 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed: PID = 20909 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:513 set_track mm/kasan/kasan.c:525 [inline] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589 slab_free_hook mm/slub.c:1357 [inline] slab_free_freelist_hook mm/slub.c:1379 [inline] slab_free mm/slub.c:2961 [inline] kfree+0xe8/0x2b0 mm/slub.c:3882 tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline] tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328 __inet_stream_connect+0x20c/0xf90 net/ipv4/af_inet.c:593 tcp_sendmsg_fastopen net/ipv4/tcp.c:1111 [inline] tcp_sendmsg+0x23a8/0x39b0 net/ipv4/tcp.c:1139 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x660/0x810 net/socket.c:1696 SyS_sendto+0x40/0x50 net/socket.c:1664 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 7db92362d2fe ("tcp: fix potential double free issue for fastopen_req") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* forcedeth: remove unnecessary carrier status checkZhu Yanjun2017-05-041-4/+2
| | | | | | | | | | | Since netif_carrier_on() will do nothing if device's carrier is already on, so it's unnecessary to do carrier status check. It's the same for netif_carrier_off(). Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'ibmvnic-Updated-reset-handler-andcode-fixes'David S. Miller2017-05-032-204/+388
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nathan Fontenot says: ==================== ibmvnic: Updated reset handler and code fixes This set of patches multiple code fixes and a new rest handler for the ibmvnic driver. In order to implement the new reset handler for the ibmvnic driver resource initialization needed to be moved to its own routine, a state variable is introduced to replace the various is_* flags in the driver, and a new routine to handle the assorted reasons the driver can be reset. v4 updates: Patch 3/11: Corrected trailing whitespace Patch 7/11: Corrected trailing whitespace v3 updates: Patch 10/11: Correct patch subject line to be a description of the patch. v2 updates: Patch 11/11: Use __netif_subqueue_stopped() instead of netif_subqueue_stopped() to avoid possible use of an un-initialized skb variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Move queue restarting in ibmvnic_tx_completeNathan Fontenot2017-05-031-12/+10
| | | | | | | | | | | | | | | | Restart of the subqueue should occur outside of the loop processing any tx buffers instead of doing this in the middle of the loop. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Record SKB RX queue during pollThomas Falcon2017-05-031-0/+1
| | | | | | | | | | | | | | | | | | Map each RX SKB to the RX queue associated with the driver's RX SCRQ. This should improve the RX CPU load balancing issues seen by the performance team. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Continue skb processing after skb completion errorNathan Fontenot2017-05-031-1/+1
| | | | | | | | | | | | | | | | There is not a need to stop processing skbs if we encounter a skb that has a receive completion error. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Check for driver reset first in ibmvnic_xmitNathan Fontenot2017-05-031-6/+6
| | | | | | | | | | | | | | | | Move the check for the driver resetting to the first thing in ibmvnic_xmit(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Wait for any pending scrqs entries at driver closeNathan Fontenot2017-05-031-20/+27
| | | | | | | | | | | | | | | | When closing the ibmvnic driver we need to wait for any pending sub crq entries to ensure they are handled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Clean up tx pools when closingNathan Fontenot2017-05-031-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When closing the ibmvnic driver, most notably during the reset path, the tx pools need to be cleaned to ensure there are no hanging skbs that need to be free'ed. The need for this was found during debugging a loss of network traffic after handling a driver reset. The underlying cause was some skbs in the tx pool that were never free'ed. As a result the upper network layers never tried a re-send since it believed the driver still had the skb. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Whitespace correction in release_rx_poolsNathan Fontenot2017-05-031-1/+1
| | | | | | | | | | Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Delete napi's when releasing driver resourcesNathan Fontenot2017-05-031-0/+9
| | | | | | | | | | | | | | | | The napi structs allocated at drivier initializatio need to be free'ed when releasing the drivers resources. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Updated reset handlingNathan Fontenot2017-05-032-157/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ibmvnic driver has multiple handlers for resetting the driver depending on the reason the reset is needed (failover, lpm, fatal erors,...). All of the reset handlers do essentially the same thing, this patch moves this work to a common reset handler. By doing this we also allow the driver to better handle situations where we can get a reset while handling a reset. The updated reset handling works by adding a reset work item to the list of resets and then scheduling work to perform the reset. This step is necessary because we can receive a reset in interrupt context and we want to handle the reset out of interrupt context. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Replace is_closed with state fieldNathan Fontenot2017-05-032-9/+23
| | | | | | | | | | | | | | | | | | Replace the is_closed flag in the ibmvnic adapter strcut with a more comprehensive state field that tracks the current state of the driver. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ibmvnic: Move resource initialization to its own routineNathan Fontenot2017-05-031-32/+39
|/ | | | | | | | Move all of the calls to initialize resources for the driver to a separate routine. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2017-05-0314-64/+174
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS/OVS fixes for net The following patchset contains a rather large batch of Netfilter, IPVS and OVS fixes for your net tree. This includes fixes for ctnetlink, the userspace conntrack helper infrastructure, conntrack OVS support, ebtables DNAT target, several leaks in error path among other. More specifically, they are: 1) Fix reference count leak in the CT target error path, from Gao Feng. 2) Remove conntrack entry clashing with a matching expectation, patch from Jarno Rajahalme. 3) Fix bogus EEXIST when registering two different userspace helpers, from Liping Zhang. 4) Don't leak dummy elements in the new bitmap set type in nf_tables, from Liping Zhang. 5) Get rid of module autoload from conntrack update path in ctnetlink, we don't need autoload at this late stage and it is happening with rcu read lock held which is not good. From Liping Zhang. 6) Fix deadlock due to double-acquire of the expect_lock from conntrack update path, this fixes a bug that was introduced when the central spinlock got removed. Again from Liping Zhang. 7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock protection that was selected when the central spinlock was removed was not really protecting anything at all. 8) Protect sequence adjustment under ct->lock. 9) Missing socket match with IPv6, from Peter Tirsek. 10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from Linus Luessing. 11) Don't give up on evaluating the expression on new entries added via dynset expression in nf_tables, from Liping Zhang. 12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals with non-linear skbuffs. 13) Don't allow IPv6 service in IPVS if no IPv6 support is available, from Paolo Abeni. 14) Missing mutex release in error path of xt_find_table_lock(), from Dan Carpenter. 15) Update maintainers files, Netfilter section. Add Florian to the file, refer to nftables.org and change project status from Supported to Maintained. 16) Bail out on mismatching extensions in element updates in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: check if same extensions are set when adding elementsPablo Neira Ayuso2017-05-031-0/+5
| | | | | | | | | | | | | | If no NLM_F_EXCL is set and the element already exists in the set, make sure that both elements have the same extensions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: update MAINTAINERS filePablo Neira Ayuso2017-04-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several updates on the MAINTAINERS section for Netfilter: 1) Add Florian Westphal, he's been part of the coreteam since October 2012. He's been dedicating tireless efforts to improve the Netfilter codebase, fix bugs and push ongoing new developments ever since. 2) Add http://www.nftables.org/ URL, currently pointing to http://www.netfilter.org. 3) Update project status from Supported to Maintained. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
| * Merge tag 'ipvs-fixes-for-v4.11' of ↵Pablo Neira Ayuso2017-04-281-5/+17
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== IPVS Fixes for v4.11 I would also like it considered for stable. * Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled to avoid oops caused by IPVS accesing IPv6 routing code in such circumstances. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabledPaolo Abeni2017-04-281-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a new ipvs service, ipv6 addresses are always accepted if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family is not explicitly checked. This allows the user-space to configure ipvs services even if the system is booted with ipv6.disable=1. On specific configuration, ipvs can try to call ipv6 routing code at setup time, causing the kernel to oops due to fib6_rules_ops being NULL. This change addresses the issue adding a check for the ipv6 module being enabled while validating ipv6 service operations and adding the same validation for dest operations. According to git history, this issue is apparently present since the introduction of ipv6 support, and the oops can be triggered since commit 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * | netfilter: x_tables: unlock on error in xt_find_table_lock()Dan Carpenter2017-04-281-1/+3
| |/ | | | | | | | | | | | | | | | | | | According to my static checker we should unlock here before the return. That seems reasonable to me as well. Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 pathDave Johnson2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When recalculating the outer ICMPv6 checksum for a reverse path NATv6 such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was accessing data beyond the headlen of the skb for non-linear skb. This resulted in incorrect ICMPv6 checksum as garbage data was used. Patch replaces csum_partial() with skb_checksum() which supports non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4. Signed-off-by: Dave Johnson <dave-kernel@centerclick.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nft_dynset: continue to next expr if _OP_ADD succeededLiping Zhang2017-04-251-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, after adding the following nft rules: # nft add set x target1 { type ipv4_addr \; flags timeout \;} # nft add rule x y set add ip daddr timeout 1d @target1 counter the counters will always be zero despite of the elements are added to the dynamic set "target1" or not, as we will break the nft expr traversal unconditionally: # nft list ruleset ... set target1 { ... elements = { 8.8.8.8 expires 23h59m53s} } chain output { ... set add ip daddr timeout 1d @target1 counter packets 0 bytes 0 ^ ^ ... } Since we add the elements to the set successfully, we should continue to the next expression. Additionally, if elements are added to "flow table" successfully, we will _always_ continue to the next expr, even if the operation is _OP_ADD. So it's better to keep them to be consistent. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * bridge: ebtables: fix reception of frames DNAT-ed to bridge device/portLinus Lüssing2017-04-251-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to redirect bridged frames to the bridge device itself or a bridge port (brouting) via the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge device or port just fine. However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target left the skb->pkt_type as PACKET_OTHERHOST. Fixing this by resetting skb->pkt_type to an appropriate type after dnat'ing. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: xt_socket: Fix broken IPv6 handlingPeter Tirsek2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro which was removed earlier in commit 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro never being defined, the xt_socket match emits an "Unknown family 10" warning when used with IPv6: WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket] Unknown family 10 Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...] CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1 Hardware name: [...] Call Trace: ? __warn+0xe7/0x100 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? warn_slowpath_fmt+0x39/0x40 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_v2_check+0x12/0x40 [xt_socket] ? xt_check_match+0x6b/0x1a0 [x_tables] ? xt_find_match+0x93/0xd0 [x_tables] ? xt_request_find_match+0x20/0x80 [x_tables] ? translate_table+0x48e/0x870 [ip6_tables] ? translate_table+0x577/0x870 [ip6_tables] ? walk_component+0x3a/0x200 ? kmalloc_order+0x1d/0x50 ? do_ip6t_set_ctl+0x181/0x490 [ip6_tables] ? filename_lookup+0xa5/0x120 ? nf_setsockopt+0x3a/0x60 ? ipv6_setsockopt+0xb0/0xc0 ? sock_common_setsockopt+0x23/0x30 ? SyS_socketcall+0x41d/0x630 ? vfs_read+0xfa/0x120 ? do_fast_syscall_32+0x7a/0x110 ? entry_SYSENTER_32+0x47/0x71 This patch brings the conditional back in line with how the rest of the file handles IPv6. Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") Signed-off-by: Peter Tirsek <peter@tirsek.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadjLiping Zhang2017-04-241-6/+15
| | | | | | | | | | | | | | | | | | We should acquire the ct->lock before accessing or modifying the nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same time during its packet proccessing. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: make it safer when updating ct->statusLiping Zhang2017-04-242-13/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After converting to use rcu for conntrack hash, one CPU may update the ct->status via ctnetlink, while another CPU may process the packets and update the ct->status. So the non-atomic operation "ct->status |= status;" via ctnetlink becomes unsafe, and this may clear the IPS_DYING_BIT bit set by another CPU unexpectedly. For example: CPU0 CPU1 ctnetlink_change_status __nf_conntrack_find_get old = ct->status nf_ct_gc_expired - nf_ct_kill - test_and_set_bit(IPS_DYING_BIT new = old | status; - ct->status = new; <-- oops, _DYING_ is cleared! Now using a series of atomic bit operation to solve the above issue. Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly, so make these two bits be unchangable too. If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free, but actually it is alloced by nf_conntrack_alloc. If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer deference, as the nfct_seqadj(ct) maybe NULL. Last, add some comments to describe the logic change due to the commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS processing"), which makes me feel a little confusing. Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twiceLiping Zhang2017-04-241-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ctnetlink_change_conntrack is always protected by _expect_lock, but this will cause a deadlock when deleting the helper from a conntrack, as the _expect_lock will be acquired again by nf_ct_remove_expectations: CPU0 ---- lock(nf_conntrack_expect_lock); lock(nf_conntrack_expect_lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by lt-conntrack_gr/12853: #0: (&table[i].mutex){+.+.+.}, at: [<ffffffffa05e2009>] nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink] #1: (nf_conntrack_expect_lock){+.....}, at: [<ffffffffa05f2c1f>] ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink] Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x1608/0x1680 ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink] lock_acquire+0x100/0x1f0 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] _raw_spin_lock_bh+0x3f/0x50 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink] ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink] ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink] ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink] netlink_rcv_skb+0xa4/0xc0 nfnetlink_rcv+0x87/0x770 [nfnetlink] Since the operations are unrelated to nf_ct_expect, so we can drop the _expect_lock. Also note, after removing the _expect_lock protection, another CPU may invoke nf_conntrack_helper_unregister, so we should use rcu_read_lock to protect __nf_conntrack_helper_find invoked by ctnetlink_change_helper. Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: drop the incorrect cthelper module requestLiping Zhang2017-04-241-16/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, when creating a new ct, we will invoke request_module to try to load the related inkernel cthelper. So there's no need to call the request_module again when updating the ct helpinfo. Second, ctnetlink_change_helper may be called with rcu_read_lock held, i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse -> ctnetlink_glue_parse -> ctnetlink_glue_parse_ct -> ctnetlink_change_helper. But the request_module invocation may sleep, so we can't call it with the rcu_read_lock held. Remove it now. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nft_set_bitmap: free dummy elements when destroy the setLiping Zhang2017-04-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We forget to free dummy elements when deleting the set. So when I was running nft-test.py, I saw many kmemleak warnings: kmemleak: 1344 new suspected memory leaks ... # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800631345c8 (size 32): comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s) hex dump (first 32 bytes): f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff .c.c.....y.c.... 04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00 ................ backtrace: [<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81288174>] __kmalloc+0x164/0x310 [<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables] [<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables] [<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables] [<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink] [<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310 [<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0 ... Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlinkLiping Zhang2017-04-241-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cthelpers added via nfnetlink may have the same tuple, i.e. except for the l3proto and l4proto, other fields are all zero. So even with the different names, we will also fail to add them: # nfct helper add ssdp inet udp # nfct helper add tftp inet udp nfct v1.4.3: netlink error: File exists So in order to avoid unpredictable behaviour, we should: 1. cthelpers can be selected by nft ct helper obj or xt_CT target, so report error if duplicated { name, l3proto, l4proto } tuple exist. 2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when nf_ct_auto_assign_helper is enabled, so also report error if duplicated { l3proto, l4proto, src-port } tuple exist. Also note, if the cthelper is added from userspace, then the src-port will always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no need to check the second point listed above. Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * openvswitch: Delete conntrack entry clashing with an expectation.Jarno Rajahalme2017-04-241-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: xt_CT: fix refcnt leak on error pathGao Feng2017-04-241-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two cases which causes refcnt leak. 1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should free the timeout refcnt. Now goto the err_put_timeout error handler instead of going ahead. 2. When the time policy is not found, we should call module_put. Otherwise, the related cthelper module cannot be removed anymore. It is easy to reproduce by typing the following command: # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: conntrack: Force inlining of build check to prevent build failureGeert Uytterhoeven2017-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the build will fail with: net/built-in.o: In function `nf_conntrack_init_start': (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' or ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] undefined! Fix this by forcing inlining of total_extension_size(). Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes again") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | test_bpf: Use ULL suffix for 64-bit constantsGeert Uytterhoeven2017-05-031-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 32-bit: lib/test_bpf.c:4772: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4772: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4773: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4773: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4787: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4787: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4801: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4801: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4802: warning: integer constant is too large for ‘unsigned long’ type lib/test_bpf.c:4802: warning: integer constant is too large for ‘unsigned long’ type On 32-bit systems, "long" is only 32-bit. Replace the "UL" suffix by "ULL" to fix this. Fixes: 85f68fe898320575 ("bpf, arm64: implement jiting of BPF_XADD") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: usb: qmi_wwan: add Telit ME910 supportDaniele Palmas2017-05-031-0/+1
| | | | | | | | | | | | | | | | This patch adds support for Telit ME910 PID 0x1100. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tg3: don't clear stats while tg3_closeYueHaibing2017-05-031-4/+0
| | | | | | | | | | | | | | | | | | | | | | Now tg3 NIC's stats will be cleared after ifdown/ifup. bond_get_stats traverse its salves to get statistics,cumulative the increment.If a tg3 NIC is added to bonding as a slave,ifdown/ifup will cause bonding's stats become tremendous value (ex.1638.3 PiB) because of negative increment. Fixes: 92feeabf3f67 ("tg3: Save stats across chip resets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | selftests/bpf: get rid of -D__x86_64__Alexei Starovoitov2017-05-032-2/+3
| | | | | | | | | | | | | | | | | | | | | | -D__x86_64__ workaround was used to make /usr/include/features.h to follow expected path through the system include headers. This is not portable. Instead define dummy stubs.h which is used by 'clang -target bpf' Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range checks") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | selftests/bpf: add a test case to check verifier pointer arithmeticYonghong Song2017-05-033-1/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With clang/llvm 4.0+, the test case is able to generate the following pattern: .... 440: (b7) r1 = 15 441: (05) goto pc+73 515: (79) r6 = *(u64 *)(r10 -152) 516: (bf) r7 = r10 517: (07) r7 += -112 518: (bf) r2 = r7 519: (0f) r2 += r1 520: (71) r1 = *(u8 *)(r8 +0) 521: (73) *(u8 *)(r2 +45) = r1 .... commit 332270fdc8b6 ("bpf: enhance verifier to understand stack pointer arithmetic") improved verifier to handle such a pattern. This patch adds a C test case to actually generate such a pattern. A dummy tracepoint interface is used to load the program into the kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xdp: use common helper for netlink extended ack reportingDaniel Borkmann2017-05-033-17/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting") in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro for reporting. This also ensures that we consistently add the driver's prefix for dumping the report in user space to indicate that the error message is driver specific and not coming from core code. Furthermore, NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros check the pointer as suggested. References: https://www.spinics.net/lists/netdev/msg433267.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ipv6: Do not duplicate DAD on link upDavid Ahern2017-05-031-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey reported a warning triggered by the rcu code: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289 debug_print_object+0x175/0x210 ODEBUG: activate active (active state 1) object type: rcu_head hint: (null) Modules linked in: CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x192/0x22d lib/dump_stack.c:52 __warn+0x19f/0x1e0 kernel/panic.c:549 warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564 debug_print_object+0x175/0x210 lib/debugobjects.c:286 debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442 debug_rcu_head_queue kernel/rcu/rcu.h:75 __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229 call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288 rt6_rcu_free net/ipv6/ip6_fib.c:158 rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188 fib6_del_route net/ipv6/ip6_fib.c:1461 fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500 __ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174 ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187 __ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520 addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672 ... Andrey's reproducer program runs in a very tight loop, calling 'unshare -n' and then spawning 2 sets of 14 threads running random ioctl calls. The relevant networking sequence: 1. New network namespace created via unshare -n - ip6tnl0 device is created in down state 2. address added to ip6tnl0 - equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1 - DAD is started on the address and when it completes the host route is inserted into the FIB 3. ip6tnl0 is brought up - the new fixup_permanent_addr function restarts DAD on the address 4. exit namespace - teardown / cleanup sequence starts - once in a blue moon, lo teardown appears to happen BEFORE teardown of ip6tunl0 + down on 'lo' removes the host route from the FIB since the dst->dev for the route is loobback + host route added to rcu callback list * rcu callback has not run yet, so rt is NOT on the gc list so it has NOT been marked obsolete 5. in parallel to 4. worker_thread runs addrconf_dad_completed - DAD on the address on ip6tnl0 completes - calls ipv6_ifa_notify which inserts the host route All of that happens very quickly. The result is that a host route that has been deleted from the IPv6 FIB and added to the RCU list is re-inserted into the FIB. The exit namespace eventually gets to cleaning up ip6tnl0 which removes the host route from the FIB again, calls the rcu function for cleanup -- and triggers the double rcu trace. The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably, DAD should not be started in step 2. The interface is in the down state, so it can not really send out requests for the address which makes starting DAD pointless. Since the second DAD was introduced by a recent change, seems appropriate to use it for the Fixes tag and have the fixup function only start DAD for addresses in the PREDAD state which occurs in addrconf_ifdown if the address is retained. Big thanks to Andrey for isolating a reliable reproducer for this problem. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc911x: Adding support for Micochip LAN9250 Ethernet controllerDavid Cai2017-05-032-19/+49
| | | | | | | | | | | | | | | | Adding support for Microchip LAN9250 Ethernet controller. Signed-off-by: David Cai <david.cai@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'sample-bpf-loader-fixes'David S. Miller2017-05-036-82/+201
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jesper Dangaard Brouer says: ==================== Improve bpf ELF-loader under samples/bpf This series improves and fixes bpf ELF loader and programs under samples/bpf. The bpf_load.c created some hard to debug issues when the struct (bpf_map_def) used in the ELF maps section format changed in commit fb30d4b71214 ("bpf: Add tests for map-in-map"). This was hotfixed in commit 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong") by detecting the issue and aborting the program. In most situations the bpf-loader should be able to handle these kind of changes to the struct size. This patch series aim to do proper backward and forward compabilility handling when loading ELF files. This series also adjust the callback that was introduced in commit 9fd63d05f3e8 ("bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def") to use the new bpf_map_data structure, before more users start to use this callback. Hoping these changes can make the merge window, as above mentioned commits have not been merged yet, and it would be good to avoid users hitting these issues. ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | samples/bpf: export map_data[] for more info on mapsJesper Dangaard Brouer2017-05-031-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Giving *_user.c side tools access to map_data[] provides easier access to information on the maps being loaded. Still provide the guarantee that the order maps are being defined in inside the _kern.c file corresponds with the order in the array. Now user tools are not blind, but can inspect and verify the maps that got loaded from the ELF binary. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | samples/bpf: load_bpf.c make callback fixup more flexibleJesper Dangaard Brouer2017-05-033-18/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do this change before others start to use this callback. Change map_perf_test_user.c which seems to be the only user. This patch extends capabilities of commit 9fd63d05f3e8 ("bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def"). Give fixup callback access to struct bpf_map_data, instead of only stuct bpf_map_def. This add flexibility to allow userspace to reassign the map file descriptor. This is very useful when wanting to share maps between several bpf programs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | samples/bpf: make bpf_load.c code compatible with ELF maps section changesJesper Dangaard Brouer2017-05-031-69/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does proper parsing of the ELF "maps" section, in-order to be both backwards and forwards compatible with changes to the map definition struct bpf_map_def, which gets compiled into the ELF file. The assumption is that new features with value zero, means that they are not in-use. For backward compatibility where loading an ELF file with a smaller struct bpf_map_def, only copy objects ELF size, leaving rest of loaders struct zero. For forward compatibility where ELF file have a larger struct bpf_map_def, only copy loaders own struct size and verify that rest of the larger struct is zero, assuming this means the newer feature was not activated, thus it should be safe for this older loader to load this newer ELF file. Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map") Fixes: 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | samples/bpf: adjust rlimit RLIMIT_MEMLOCK for traceex2, tracex3 and tracex4Jesper Dangaard Brouer2017-05-033-0/+22
|/ / | | | | | | | | | | | | | | Needed to adjust max locked memory RLIMIT_MEMLOCK for testing these bpf samples as these are using more and larger maps than can fit in distro default 64Kbytes limit. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2017-05-0331-41/+39
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tty: fix comment for __tty_alloc_driver() init/main: properly align the multi-line comment init/main: Fix double "the" in comment Fix dead URLs to ftp.kernel.org drivers: Clean up duplicated email address treewide: Fix typo in xml/driver-api/basics.xml tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall" selftests/timers: Spelling s/privledges/privileges/ HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/ net: phy: dp83848: Fix Typo UBI: Fix typos Documentation: ftrace.txt: Correct nice value of 120 priority net: fec: Fix typo in error msg and comment treewide: Fix typos in printk
| * | tty: fix comment for __tty_alloc_driver()Thadeu Lima de Souza Cascardo2017-04-241-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@cascardo.eti.br> Signed-off-by: Jiri Kosina <jkosina@suse.cz>