summaryrefslogtreecommitdiffstats
path: root/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net: Fix warning on make htmldocs caused by skbuff.cMasanari Iida2014-01-291-1/+1
| | | | | | | | | | | | This patch fixed following Warning while executing "make htmldocs". Warning(/net/core/skbuff.c:2164): No description found for parameter 'from' Warning(/net/core/skbuff.c:2164): Excess function parameter 'source' description in 'skb_zerocopy' Replace "@source" with "@from" fixed the warning. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'rxrpc-20140126' of ↵David S. Miller2014-01-292-1/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== RxRPC fixes Here are some small AF_RXRPC fixes. (1) Fix a place where a spinlock is taken conditionally but is released unconditionally. (2) Fix a double-free that happens when cleaning up on a checksum error. (3) Fix handling of CHECKSUM_PARTIAL whilst delivering messages to userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * af_rxrpc: Handle frames delivered from another VMTim Smith2014-01-261-1/+2
| | | | | | | | | | | | | | | | On input, CHECKSUM_PARTIAL should be treated the same way as CHECKSUM_UNNECESSARY. See include/linux/skbuff.h Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * af_rxrpc: Avoid setting up double-free on checksum errorTim Smith2014-01-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset. This leaves a free'd skb on the queue, resulting a double-free later. Without this, the following oops can occur: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8154fcf7>] skb_dequeue+0x47/0x70 PGD 0 Oops: 0002 [#1] SMP Modules linked in: af_rxrpc ... CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000 RIP: 0010:[<ffffffff8154fcf7>] skb_dequeue+0x47/0x70 RSP: 0018:ffff880035c93db8 EFLAGS: 00010097 RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084 RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270 R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070 R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720 FS: 00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0 Stack: ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8 ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0 ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008 Call Trace: [<ffffffffa041a5b8>] rxrpc_release+0x128/0x2e0 [af_rxrpc] [<ffffffff81546cef>] sock_release+0x1f/0x80 [<ffffffff81546d62>] sock_close+0x12/0x20 [<ffffffff811aaba1>] __fput+0xe1/0x230 [<ffffffff811aad3e>] ____fput+0xe/0x10 [<ffffffff810862cc>] task_work_run+0xbc/0xe0 [<ffffffff8106a3be>] do_exit+0x2be/0xa10 [<ffffffff8116dc47>] ? do_munmap+0x297/0x3b0 [<ffffffff8106ab8f>] do_group_exit+0x3f/0xa0 [<ffffffff8106ac04>] SyS_exit_group+0x14/0x20 [<ffffffff8166b069>] system_call_fastpath+0x16/0x1b Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * RxRPC: do not unlock unheld spinlock in rxrpc_connect_exclusive()Alexey Khoroshilov2014-01-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | If rx->conn is not NULL, rxrpc_connect_exclusive() does not acquire the transport's client lock, but it still releases it. The patch adds locking of the spinlock to this path. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David Howells <dhowells@redhat.com>
* | llc: remove noisy WARN from llc_mac_hdr_initDave Jones2014-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sending malformed llc packets triggers this spew, which seems excessive. WARNING: CPU: 1 PID: 6917 at net/llc/llc_output.c:46 llc_mac_hdr_init+0x85/0x90 [llc]() device type not supported: 0 CPU: 1 PID: 6917 Comm: trinity-c1 Not tainted 3.13.0+ #95 0000000000000009 00000000007e257d ffff88009232fbe8 ffffffffac737325 ffff88009232fc30 ffff88009232fc20 ffffffffac06d28d ffff88020e07f180 ffff88009232fec0 00000000000000c8 0000000000000000 ffff88009232fe70 Call Trace: [<ffffffffac737325>] dump_stack+0x4e/0x7a [<ffffffffac06d28d>] warn_slowpath_common+0x7d/0xa0 [<ffffffffac06d30c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffffc01736d5>] llc_mac_hdr_init+0x85/0x90 [llc] [<ffffffffc0173759>] llc_build_and_send_ui_pkt+0x79/0x90 [llc] [<ffffffffc057cdba>] llc_ui_sendmsg+0x23a/0x400 [llc2] [<ffffffffac605d8c>] sock_sendmsg+0x9c/0xe0 [<ffffffffac185a37>] ? might_fault+0x47/0x50 [<ffffffffac606321>] SYSC_sendto+0x121/0x1c0 [<ffffffffac011847>] ? syscall_trace_enter+0x207/0x270 [<ffffffffac6071ce>] SyS_sendto+0xe/0x10 [<ffffffffac74aaa4>] tracesys+0xdd/0xe2 Until 2009, this was a printk, when it was changed in bf9ae5386bc: "llc: use dev_hard_header". Let userland figure out what -EINVAL means by itself. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: gre: use icmp_hdr() to get inner ip headerDuan Jiong2014-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dealing with icmp messages, the skb->data points the ip header that triggered the sending of the icmp message. In gre_cisco_err(), the parse_gre_header() is called, and the iptunnel_pull_header() is called to pull the skb at the end of the parse_gre_header(), so the skb->data doesn't point the inner ip header. Unfortunately, the ipgre_err still needs those ip addresses in inner ip header to look up tunnel by ip_tunnel_lookup(). So just use icmp_hdr() to get inner ip header instead of skb->data. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: 6lowpan: fixup for code movementStephen Rothwell2014-01-281-1/+1
| | | | | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Fix memory leak if TPROXY used with TCP early demuxHolger Eitzenberger2014-01-282-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9 [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5 [<ffffffff812702cf>] sk_clone_lock+0x14/0x283 [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b [<ffffffff8129a893>] netlink_broadcast+0x14/0x16 [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3 [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0 [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55 [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725 [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154 [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514 [<ffffffff8127aa77>] process_backlog+0xee/0x1c5 [<ffffffff8127c949>] net_rx_action+0xa7/0x200 [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ipv4: Use PTR_ERR_OR_ZEROSachin Kamat2014-01-271-1/+2
| | | | | | | | | | | | | | | | PTR_RET is deprecated. Use PTR_ERR_OR_ZERO instead. While at it also include missing err.h header. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add and use skb_gso_transport_seglen()Florian Westphal2014-01-272-10/+28
| | | | | | | | | | | | | | | | | | This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to skbuff core so it may be reused by upcoming ip forwarding path patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'for-3.14-merge-window' of ↵Linus Torvalds2014-01-263-5/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p changes from Eric Van Hensbergen: "Included are a new cache model for support of mmap, and several cleanups across the filesystem and networking portions of the code" * tag 'for-3.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: update documentation 9P: introduction of a new cache=mmap model. net/9p: remove virtio default hack and set appropriate bits instead 9p: remove useless 'name' variable and assignment 9p: fix return value in case in v9fs_fid_xattr_set() 9p: remove useless variable and assignment 9p: remove useless assignment 9p: remove unused 'super_block' struct pointer 9p: remove never used return variable 9p: remove unused 'p9_fid' struct pointer 9p: remove unused 'p9_client' struct pointer
| * | net/9p: remove virtio default hack and set appropriate bits insteadEric Van Hensbergen2013-11-233-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | A few releases back a patch made virtio the default transport, however it was done in a way which side-stepped the mechanism put in place to allow for this selection. This patch cleans that up while maintaining virtio as the default transport. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-01-25511-10268/+18338
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...
| * | | ipv6: reallocate addrconf router for ipv6 address when lo device upGao feng2014-01-251-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" allocates addrconf router for ipv6 address when lo device up. but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks this behavior. Since the addrconf router is moved to the garbage list when lo device down, we should release this router and rellocate a new one for ipv6 address when lo device up. This patch solves bug 67951 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=67951 change from v1: use ip6_rt_put to repleace ip6_del_rt, thanks Hannes! change code style, suggested by Sergei. CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | fib_frontend: fix possible NULL pointer dereferenceOliver Hartkopp2014-01-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two commits 0115e8e30d (net: remove delay at device dismantle) and 748e2d9396a (net: reinstate rtnl in call_netdevice_notifiers()) silently removed a NULL pointer check for in_dev since Linux 3.7. This patch re-introduces this check as it causes crashing the kernel when setting small mtu values on non-ip capable netdevices. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_infoJiri Pirko2014-01-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This check is not needed because the same check is done before fill_slave_info is used in rtnl_link_slave_info_fill. Also, by removing this check, kernel will fillup IFLA_INFO_SLAVE_KIND even for slaves of masters which does not implement fill_slave_info. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is calledDuan Jiong2014-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach") clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from GSO segmentation layer. But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes, and it clear IPCB behind the dst_link_failure(). So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach"). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Correctly sync addresses from multiple sources to single deviceVlad Yasevich2014-01-231-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we have multiple devices attempting to sync the same address to a single destination, each device should be permitted to sync it once. To accomplish this, pass the 'sync_cnt' of the source address when adding the addresss to the lower device. 'sync_cnt' tracks how many time a given address has been succefully synced. This way, we know that if the 'sync_cnt' passed in is 0, we should sync this address. Also, turn 'synced' member back into the counter as was originally done in commit 4543fbefe6e06a9e40d9f2b28d688393a299f079. net: count hw_addr syncs so that unsync works properly. It tracks how many time a given address has been added via a 'sync' operation. For every successfull 'sync' the counter is incremented, and for ever 'unsync', the counter is decremented. This makes sure that the address will be properly removed from the the lower device when all the upper devices have removed it. Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru> CC: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> CC: Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/udp_offload: Handle static checker complaintsShlomo Pongratz2014-01-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed few issues around using __rcu prefix and rcu_assign_pointer, also fixed a warning print to use ntohs(port) and not htons(port). net/ipv4/udp_offload.c:112:9: error: incompatible types in comparison expression (different address spaces) net/ipv4/udp_offload.c:113:9: error: incompatible types in comparison expression (different address spaces) net/ipv4/udp_offload.c:176:19: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: metrics: Handle v6/v4-mapped sockets in tcp-metricsChristoph Paasch2014-01-231-24/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6, but the IP being used is actually an IPv4-address. Current's tcp-metrics will thus represent it as an IPv6-address: root@server:~# ip tcp_metrics ::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10 10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10 This patch modifies the tcp-metrics so that they are able to handle the v6/v4-mapped sockets correctly. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | 6lowpan: add a license to 6lowpan_iphc moduleYann Droneaud2014-01-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8df8c56a5abc, 6lowpan_iphc is a module of its own. Unfortunately, it lacks some infrastructure to behave like a good kernel citizen: kernel: 6lowpan_iphc: module license 'unspecified' taints kernel. kernel: Disabling lock debugging due to kernel taint This patch adds the basic MODULE_LICENSE(); with GPL license: the code was copied from net/ieee802154/6lowpan.c which is GPL and the module exports symbol with EXPORT_SYMBOL_GPL();. Cc: Jukka Rissanen <jukka.rissanen@linux.intel.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bonding: convert netlink to use slave data info apiJiri Pirko2014-01-231-51/+0
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rtnetlink: provide api for getting and setting slave infoJiri Pirko2014-01-231-20/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent patch bonding: add netlink attributes to slave link dev (1d3ee88ae0d6) Introduced yet another device specific way to access slave information over rtnetlink. There is one already there for bridge. This patch introduces generic way to do this, for getting and setting info as well by extending link_ops. Later on, this new interface will be used for bridge ports as well. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rtnetlink: put "BOND" into nl attribute names which are related to bondingJiri Pirko2014-01-231-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/neighbour: queue work on power efficient wqviresh kumar2014-01-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Workqueue used in neighbour layer have no real dependency of scheduling these on the cpu which scheduled them. On a idle system, it is observed that an idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. This patch replaces normal workqueues with power efficient versions. This doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is enabled. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/ipv4: queue work on power efficient wqviresh kumar2014-01-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Workqueue used in ipv4 layer have no real dependency of scheduling these on the cpu which scheduled them. On a idle system, it is observed that an idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. This patch replaces normal workqueues with power efficient versions. This doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is enabled. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: enable anycast addresses as source addresses for datagramsFX Le Bail2014-01-232-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change allows to consider an anycast address valid as source address when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item. So, when sending a datagram with ancillary data, the unicast and anycast addresses are handled in the same way. - Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local on given interface or is global. - Uses it in ip6_datagram_send_ctl(). Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bridge: Remove unnecessary vlan_put_tag in br_handle_vlanToshiaki Makita2014-01-231-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing port is the bridge device. This is unnecessary because __netif_receive_skb_core() can handle skbs with HW accelerated vlan tag. In current implementation, __netif_receive_skb_core() needs to extract the vlan tag embedded in skb data. This could cause low network performance especially when receiving frames at a high frame rate on the bridge device. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: metrics: Fix rcu-race when deleting multiple entriesChristoph Paasch2014-01-231-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In bbf852b96ebdc6d1 I introduced the tmlist, which allows to delete multiple entries from the cache that match a specified destination if no source-IP is specified. However, as the cache is an RCU-list, we should not create this tmlist, as it will change the tcpm_next pointer of the element that will be deleted and so a thread iterating over the cache's entries while holding the RCU-lock might get "redirected" to this tmlist. This patch fixes this, by reverting back to the old behavior prior to bbf852b96ebdc6d1, which means that we simply change the tcpm_next pointer of the previous element (pp) to jump over the one we are deleting. The difference is that we call kfree_rcu() directly on the cache entry, which allows us to delete multiple entries from the list. Fixes: bbf852b96ebdc6d1 (tcp: metrics: Delete all entries matching a certain destination) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sch_htb: let skb->priority refer to non-leaf classHarry Mason2014-01-231-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the class in skb->priority is not a leaf, apply filters from the selected class, not the qdisc. This lets netfilter or user space partially classify the packet. Signed-off-by: Harry Mason <harry.mason@smoothwall.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | af_packet: Add Queue mapping mode to af_packet fanout operationNeil Horman2014-01-231-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a queue mapping mode to the fanout operation of af_packet sockets. This allows user space af_packet users to better filter on flows ingressing and egressing via a specific hardware queue, and avoids the potential packet reordering that can occur when FANOUT_CPU is being used and irq affinity varies. Tested successfully by myself. applies to net-next Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | reciprocal_divide: update/correction of the algorithmHannes Frederic Sowa2014-01-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit aee636c4809fa5 ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit 6a2d7a955d8d resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)*((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (n*m')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zip Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: introduce reciprocal_scale helper and convert usersDaniel Borkmann2014-01-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As David Laight suggests, we shouldn't necessarily call this reciprocal_divide() when users didn't requested a reciprocal_value(); lets keep the basic idea and call it reciprocal_scale(). More background information on this topic can be found in [1]. Joint work with Hannes Frederic Sowa. [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html Suggested-by: David Laight <david.laight@aculab.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | random32: add prandom_u32_max and convert open coded usersDaniel Borkmann2014-01-222-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many functions have open coded a function that returns a random number in range [0,N-1]. Under the assumption that we have a PRNG such as taus113 with being well distributed in [0, ~0U] space, we can implement such a function as uword t = (n*m')>>32, where m' is a random number obtained from PRNG, n the right open interval border and t our resulting random number, with n,m',t in u32 universe. Lets go with Joe and simply call it prandom_u32_max(), although technically we have an right open interval endpoint, but that we have documented. Other users can further be migrated to the new prandom_u32_max() function later on; for now, we need to make sure to migrate reciprocal_divide() users for the reciprocal_divide() follow-up fixup since their function signatures are going to change. Joint work with Hannes Frederic Sowa. Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Missing change from the ether_addr_copy() fixups.David S. Miller2014-01-221-0/+1
| | | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Fix some fallout from the etner_addr_copy() changes.David S. Miller2014-01-222-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/appletalk/aarp.c: In function ‘__aarp_send_query’: net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] ... net/atm/lec.c: In function ‘send_to_lecd’: net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default] In file included from net/atm/lec.c:17:0: include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’ ... net/caif/caif_usb.c: In function ‘cfusbl_create’: net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sctp: remove macros sctp_bh_[un]lock_sockwangweidong2014-01-224-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sctp: remove macros sctp_{lock|release}_sockwangweidong2014-01-221-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sctp: remove macros sctp_write_[un]_lockwangweidong2014-01-221-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redefined write_[un]lock to sctp_write_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sctp: remove macros sctp_spin_[un]lockwangweidong2014-01-221-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sctp: remove macros sctp_local_bh_{disable|enable}wangweidong2014-01-224-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable} for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dsa: Use ether_addr_copyJoe Perches2014-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | pktgen: Use ether_addr_copyJoe Perches2014-01-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netpoll: Use ether_addr_copyJoe Perches2014-01-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | caif_usb: Use ether_addr_copyJoe Perches2014-01-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | atm: Use ether_addr_copyJoe Perches2014-01-222-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | appletalk: Use ether_addr_copyJoe Perches2014-01-221-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN]. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | 8021q: Use ether_addr_copyJoe Perches2014-01-222-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Export gro_find_by_type helpersOr Gerlitz2014-01-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export the gro_find_receive/complete_by_type helpers to they can be invoked by the gro callbacks of encapsulation protocols such as vxlan. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>