summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-04-0638-376/+602
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Fix inaccuracies in network driver interface documentation, from Ben Hutchings. 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert. 3) Compile warning, locking, and refcounting fixes in netfilter's xt_CT, from Pablo Neira Ayuso. 4) phonet sendmsg needs to validate user length just like any other datagram protocol, fix from Sasha Levin. 5) Ipv6 multicast code uses wrong loop index, from RongQing Li. 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner and Yuval Mintz. 7) mlx4 erroneously allocates 4 pages at a time, regardless of page size, fix from Thadeu Lima de Souza Cascardo. 8) SCTP socket option wasn't extended in a backwards compatible way, fix from Thomas Graf. 9) Add missing address change event emissions to bonding, from Shlomo Pongratz. 10) /proc/net/dev regressed because it uses a private offset to track where we are in the hash table, but this doesn't track the offset pullback that the seq_file code does resulting in some entries being missed in large dumps. Fix from Eric Dumazet. 11) do_tcp_sendpage() unloads the send queue way too fast, because it invokes tcp_push() when it shouldn't. Let the natural sequence generated by the splice paths, and the assosciated MSG_MORE settings, guide the tcp_push() calls. Otherwise what goes out of TCP is spaghetti and doesn't batch effectively into GSO/TSO clusters. From Eric Dumazet. 12) Once we put a SKB into either the netlink receiver's queue or a socket error queue, it can be consumed and freed up, therefore we cannot touch it after queueing it like that. Fixes from Eric Dumazet. 13) PPP has this annoying behavior in that for every transmit call it immediately stops the TX queue, then calls down into the next layer to transmit the PPP frame. But if that next layer can take it immediately, it just un-stops the TX queue right before returning from the transmit method. Besides being useless work, it makes several facilities unusable, in particular things like the equalizers. Well behaved devices should only stop the TX queue when they really are full, and in PPP's case when it gets backlogged to the downstream device. David Woodhouse therefore fixed PPP to not stop the TX queue until it's downstream can't take data any more. 14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver changes, re-add. From Marc Kleine-Budde. 15) Fix link flaps in ixgbe, from Eric W. Multanen. 16) Descriptor writeback fixes in e1000e from Matthew Vick. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) net: fix a race in sock_queue_err_skb() netlink: fix races after skb queueing doc, net: Update ndo_start_xmit return type and values doc, net: Remove instruction to set net_device::trans_start doc, net: Update netdev operation names doc, net: Update documentation of synchronisation for TX multiqueue doc, net: Remove obsolete reference to dev->poll ethtool: Remove exception to the requirement of holding RTNL lock MAINTAINERS: update for Marvell Ethernet drivers bonding: properly unset current_arp_slave on slave link up phonet: Check input from user before allocating tcp: tcp_sendpages() should call tcp_push() once ipv6: fix array index in ip6_mc_add_src() mlx4: allocate just enough pages instead of always 4 pages stmmac: re-add IFF_UNICAST_FLT for dwmac1000 bnx2x: Clear MDC/MDIO warning message bnx2x: Fix BCM57711+BCM84823 link issue bnx2x: Clear BCM84833 LED after fan failure bnx2x: Fix BCM84833 PHY FW version presentation bnx2x: Fix link issue for BCM8727 boards. ...
| * net: fix a race in sock_queue_err_skb()Eric Dumazet2012-04-061-1/+3
| | | | | | | | | | | | | | | | | | As soon as an skb is queued into socket error queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: fix races after skb queueingEric Dumazet2012-04-061-11/+13
| | | | | | | | | | | | | | | | | | As soon as an skb is queued into socket receive_queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * doc, net: Update ndo_start_xmit return type and valuesBen Hutchings2012-04-061-10/+12
| | | | | | | | | | | | | | | | | | | | | | Commit dc1f8bf68b311b1537cb65893430b6796118498a ('netdev: change transmit to limited range type') changed the required return type and 9a1654ba0b50402a6bd03c7b0fe9b0200a5ea7b1 ('net: Optimize hard_start_xmit() return checking') changed the valid numerical return values. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * doc, net: Remove instruction to set net_device::trans_startBen Hutchings2012-04-061-5/+2
| | | | | | | | | | | | | | | | | | Commit 08baf561083bc27a953aa087dd8a664bb2b88e8e ('net: txq_trans_update() helper') made it unnecessary for most drivers to set net_device::trans_start (or netdev_queue::trans_start). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * doc, net: Update netdev operation namesBen Hutchings2012-04-062-14/+14
| | | | | | | | | | | | | | | | | | | | Commits d314774cf2cd5dfeb39a00d37deee65d4c627927 ('netdev: network device operations infrastructure') and 008298231abbeb91bc7be9e8b078607b816d1a4a ('netdev: add more functions to netdevice ops') moved and renamed net device operation pointers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * doc, net: Update documentation of synchronisation for TX multiqueueBen Hutchings2012-04-061-3/+3
| | | | | | | | | | | | | | | | | | | | Commits e308a5d806c852f56590ffdd3834d0df0cbed8d7 ('netdev: Add netdev->addr_list_lock protection.') and e8a0464cc950972824e2e128028ae3db666ec1ed ('netdev: Allocate multiple queues for TX.') introduced more fine-grained locks. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * doc, net: Remove obsolete reference to dev->pollBen Hutchings2012-04-061-2/+1
| | | | | | | | | | | | | | | | | | | | Commit bea3348eef27e6044b6161fd04c3152215f96411 ('[NET]: Make NAPI polling independent of struct net_device objects.') removed the automatic disabling of NAPI polling by dev_close(), and drivers must now do this themselves. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethtool: Remove exception to the requirement of holding RTNL lockBen Hutchings2012-04-061-2/+1
| | | | | | | | | | | | | | | | | | Commit e52ac3398c3d772d372b9b62ab408fd5eec96840 ('net: Use device model to get driver name in skb_gso_segment()') removed the only in-tree caller of ethtool ops that doesn't hold the RTNL lock. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * MAINTAINERS: update for Marvell Ethernet driversstephen hemminger2012-04-061-12/+7
| | | | | | | | | | | | | | | | | | | | Marvell has agreed to do maintenance on the sky2 driver. * Add the developer to the maintainers file * Remove the old reference to the long gone (sk98lin) driver * Rearrange to fit current topic organization Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bonding: properly unset current_arp_slave on slave link upVeaceslav Falico2012-04-061-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * phonet: Check input from user before allocatingSasha Levin2012-04-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A phonet packet is limited to USHRT_MAX bytes, this is never checked during tx which means that the user can specify any size he wishes, and the kernel will attempt to allocate that size. In the good case, it'll lead to the following warning, but it may also cause the kernel to kick in the OOM and kill a random task on the server. [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730() [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46 [ 8921.756672] Call Trace: [ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0 [ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20 [ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730 [ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20 [ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660 [ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240 [ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0 [ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0 [ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260 [ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170 [ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260 [ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90 [ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20 [ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380 [ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180 [ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90 [ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70 [ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0 [ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0 [ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140 [ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110 [ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0 [ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90 [ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0 [ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200 [ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50 [ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60 [ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0 [ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]--- Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: tcp_sendpages() should call tcp_push() onceEric Dumazet2012-04-064-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2f533844242 (tcp: allow splice() to build full TSO packets) added a regression for splice() calls using SPLICE_F_MORE. We need to call tcp_flush() at the end of the last page processed in tcp_sendpages(), or else transmits can be deferred and future sends stall. Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but with different semantic. For all sendpage() providers, its a transparent change. Only sock_sendpage() and tcp_sendpages() can differentiate the two different flags provided by pipe_to_sendpage() Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fix array index in ip6_mc_add_src()RongQing.Li2012-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert array index from the loop bound to the loop index. And remove the void type conversion to ip6_mc_del1_src() return code, seem it is unnecessary, since ip6_mc_del1_src() does not use __must_check similar attribute, no compiler will report the warning when it is removed. v2: enrich the commit header Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlx4: allocate just enough pages instead of always 4 pagesThadeu Lima de Souza Cascardo2012-04-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver uses a 2-order allocation, which is too much on architectures like ppc64, which has a 64KiB page. This particular allocation is used for large packet fragments that may have a size of 512, 1024, 4096 or fill the whole allocation. So, a minimum size of 16384 is good enough and will be the same size that is used in architectures of 4KiB sized pages. This will avoid allocation failures that we see when the system is under stress, but still has plenty of memory, like the one below. This will also allow us to set the interface MTU to higher values like 9000, which was not possible on ppc64 without this patch. Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB 83137 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 10420096kB Total swap = 10420096kB 107776 pages RAM 1184 pages reserved 147343 pages shared 28152 pages non-shared netstat: page allocation failure. order:2, mode:0x4020 Call Trace: [c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable) [c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930 [c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170 [c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en] [c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en] [c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en] [c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en] [c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450 [c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290 [c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24 [c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110 [c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0 [c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230 Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Tested-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * stmmac: re-add IFF_UNICAST_FLT for dwmac1000Marc Kleine-Budde2012-04-051-2/+4
| | | | | | | | | | | | | | | | | | | | In commit (bfab27a stmmac: add the experimental PCI support) the IFF_UNICAST_FLT flag has been removed from the stmmac_mac_device_setup() function. This patch re-adds the flag. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Clear MDC/MDIO warning messageYaniv Rosner2012-04-052-2/+1
| | | | | | | | | | | | | | | | | | | | This patch clears a warning message of "MDC/MDIO access timeout" which may appear when interface is loaded due to missing clock setting before resetting the LED, and starting periodic function too early. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix BCM57711+BCM84823 link issueYaniv Rosner2012-04-051-2/+2
| | | | | | | | | | | | | | | | | | Fix a link problem on the second port of BCM57711 + BCM84823 boards due to incorrect macro usage. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Clear BCM84833 LED after fan failureYaniv Rosner2012-04-052-0/+12
| | | | | | | | | | | | Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix BCM84833 PHY FW version presentationYaniv Rosner2012-04-051-2/+1
| | | | | | | | | | | | Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix link issue for BCM8727 boards.Yaniv Rosner2012-04-051-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a link problem on BCM57712 + BCM8727 designs in which the TX laser is controller by GPIO, after 1.60.xx drivers were previously loaded. On these designs the TX_LASER is enabled by logic AND between the PHY (through MDIO), and the GPIO. When an old driver is used, it disables the MDIO part, hence the GPIO control had no affect de facto. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Restore 1G LED on BCM57712+BCM8727 designs.Yaniv Rosner2012-04-051-15/+18
| | | | | | | | | | | | | | | | | | Fix no-LED problem when link speed is 1G on BCM57712 + BCM8727 designs, by removing a logic error checking for a different PHY. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix BCM578x0-SFI pre-emphasis settingsYaniv Rosner2012-04-051-5/+5
| | | | | | | | | | | | | | | | | | Fix 578x0-SFI pre-emphasis settings per HW recommendations to achieve better link strength. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix BCM57810-KR AN speed transitionYaniv Rosner2012-04-051-2/+18
| | | | | | | | | | | | | | | | | | BCM57810-KR link may not come up in 1G after running loopback test, so set the relevant registers to their default values before starting KR autoneg. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: Fix BCM57810-KR FCYaniv Rosner2012-04-052-0/+31
| | | | | | | | | | | | | | | | Fix 57810-KR flow-control handling link is achieved via CL37 AN. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnx2x: PFC fixYaniv Rosner2012-04-052-1/+18
| | | | | | | | | | | | | | | | | | | | Fix a problem in which PFC frames are not honored, due to incorrect link attributes synchronization following PMF migration, and verify PFC XON is not stuck from previous link change. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sky2: copy received packets on inefficient unaligned architecturestephen hemminger2012-04-051-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modified from original patch from Chris. The sky2 driver has to have 8 byte alignment of receive buffer on some chip versions. On architectures which don't support efficient unaligned access this doesn't work very well. The solution is to just copy all received packets which is what the driver already does for small packets. This allows the driver to be used on the Tilera TILEmpower-Gx, since the tile architecture doesn't currently handle kernel unaligned accesses, just userspace. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/bonding: correctly proxy slave neigh param setup ndo functionShlomo Pongratz2012-04-051-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implemenation was buggy for slaves who use ndo_neigh_setup, since the networking stack invokes the bonding device ndo entry (from neigh_params_alloc) before any devices are enslaved, and the bonding driver can't further delegate the call at that point in time. As a result when bonding IPoIB devices, the neigh_cleanup hasn't been called. Fix that by deferring the actual call into the slave ndo_neigh_setup from the time the bonding neigh_setup is called. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/bonding: emit address change event also in bond_releaseShlomo Pongratz2012-04-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit 7d26bb103c4 "bonding: emit event when bonding changes MAC" didn't take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding actually changes the mac address (to all zeroes). As a result the neighbours aren't deleted by the core networking code (which does so upon getting that event). Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Allow struct sctp_event_subscribe to grow without breaking binariesThomas Graf2012-04-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_conntrack: fix count leak in error path of __nf_conntrack_allocPablo Neira Ayuso2012-04-041-0/+1
| | | | | | | | | | | | | | | | | | We have to decrement the conntrack counter if we fail to access the zone extension. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: xt_CT: fix missing put timeout object in error pathPablo Neira Ayuso2012-04-041-5/+19
| | | | | | | | | | | | | | | | | | The error path misses putting the timeout object. This patch adds new function xt_ct_tg_timeout_put() to put the timeout object. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: xt_CT: allocation has to be GFP_ATOMIC under rcu_read_lock sectionPablo Neira Ayuso2012-04-041-1/+1
| | | | | | | | | | | | Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of git://1984.lsi.us.es/netDavid S. Miller2012-04-043-3/+5
| |\
| | * netfilter: xt_CT: remove a compile warningPablo Neira Ayuso2012-04-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_NF_CONNTRACK_TIMEOUT=n we have following warning : CC [M] net/netfilter/xt_CT.o net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’: net/netfilter/xt_CT.c:284: warning: label ‘err4’ defined but not used Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: ipset: avoid use of kernel-only typesJan Engelhardt2012-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the xt_set.h header in userspace, one will get these gcc reports: ipset/ip_set.h:184:1: error: unknown type name "u16" In file included from libxt_SET.c:21:0: netfilter/xt_set.h:61:2: error: unknown type name "u32" netfilter/xt_set.h:62:2: error: unknown type name "u32" Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: xt_LOG: don't use xchg() for simple assignmentJan Beulich2012-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At least on ia64 the (bogus) use of xchg() here results in the compiler warning about an unused expression result. As only an assignment is intended here, convert it to such. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | r8169: enable napi on resume.Artem Savkov2012-04-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NAPI is disabled during suspend and needs to be enabled on resume. Without this the driver locks up during resume in rtl_reset_work() trying to disable NAPI again. Signed-off-by: Artem Savkov <artem.savkov@gmail.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bnx2x: correction to firmware interfaceYuval Mintz2012-04-041-55/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 621b4d6 updated the bnx2x driver to a new FW version, but lacked a commit to a header file with changes to the firmware's interface. The missing interface change causes iscsi and fcoe to misbehave with the updated firmware. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | phy:icplus:fix Auto Power Saving in ip101a_config_init.Srinivas Kandagatla2012-04-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes Auto Power Saving configuration in ip101a_config_init which was broken as there is no phy register write followed after setting IP101A_APS_ON flag. This patch also fixes the return value of ip101a_config_init. Without this patch ip101a_config_init returns 2 which is not an error accroding to IS_ERR and the mac driver will continue accessing 2 as valid pointer to phy_dev resulting in memory fault. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2012-04-043-84/+112
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net
| | * | e1000e: Guarantee descriptor writeback flush success.Matthew Vick2012-04-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rare circumstances, a descriptor writeback flush may not work if it arrives on a specific clock cycle as a writeback request is going out. Signed-off-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | e1000e: prevent oops when adapter is being closed and reset simultaneouslyBruce Allan2012-04-042-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the adapter is closed while it is simultaneously going through a reset, it can cause a null-pointer dereference when the two different code paths simultaneously cleanup up the Tx/Rx resources. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | ixgbe: driver fix for link flapMultanen, Eric W2012-04-041-84/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up code so that changes in DCB settings are detected only when ixgbe_dcbnl_set_all is called. Previously, a series of 'change' commands followed by a call to ixgbe_dcbnl_set_all() would always be handled as a HW change - even if the net change was zero. This patch checks for this case of no actual change and skips going through the HW set process. Without this fix, the link could reset and result in a link flap. The core change in this patch is to check for changes in the ixgbe_copy_dcb_cfg() routine - and return a bitmask of detected changes. The other places where changes were detected previously can be removed. Signed-off-by: Eric Multanen <eric.w.multanen@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | | bpf jit: Let the x86 jit handle negative offsetsJan Seiffert2012-04-042-48/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now the helper function from filter.c for negative offsets is exported, it can be used it in the jit to handle negative offsets. First modify the asm load helper functions to handle: - know positive offsets - know negative offsets - any offset then the compiler can be modified to explicitly use these helper when appropriate. This fixes the case of a negative X register and allows to lift the restriction that bpf programs with negative offsets can't be jited. Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bpf jit: Make the filter.c::__load_pointer helper non-static for the jitsJan Seiffert2012-04-041-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function is renamed to make it a little more clear what it does. It is not added to any .h because it is not for general consumption, only for bpf internal use (and so by the jits). Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | TCP: update ip_local_port_range documentationFernando Luis Vazquez Cao2012-04-031-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The explanation of ip_local_port_range in Documentation/networking/ip-sysctl.txt contains several factual errors: - The default value of ip_local_port_range does not depend on the amount of memory available in the system. - tcp_tw_recycle is not enabled by default. - 1024-4999 is not the default value. - Etc. Clean up the mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: allow splice() to build full TSO packetsEric Dumazet2012-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ppp: Don't stop and restart queue on every TX packetDavid Woodhouse2012-04-031-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For every transmitted packet, ppp_start_xmit() will stop the netdev queue and then, if appropriate, restart it. This causes the TX softirq to run, entirely gratuitously. This is "only" a waste of CPU time in the normal case, but it's actively harmful when the PPP device is a TEQL slave — the wakeup will cause the offending device to receive the next TX packet from the TEQL queue, when it *should* have gone to the next slave in the list. We end up seeing large bursts of packets on just *one* slave device, rather than using the full available bandwidth over all slaves. This patch fixes the problem by *not* unconditionally stopping the queue in ppp_start_xmit(). It adds a return value from ppp_xmit_process() which indicates whether the queue should be stopped or not. It *doesn't* remove the call to netif_wake_queue() from ppp_xmit_process(), because other code paths (especially from ppp_output_wakeup()) need it there and it's messy to push it out to the other callers to do it based on the return value. So we leave it in place — it's a no-op in the case where the queue wasn't stopped, so it's harmless in the TX path. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: fix /proc/net/dev regressionEric Dumazet2012-04-033-48/+15
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f04565ddf52 (dev: use name hash for dev_seq_ops) added a second regression, as some devices are missing from /proc/net/dev if many devices are defined. When seq_file buffer is filled, the last ->next/show() method is canceled (pos value is reverted to value prior ->next() call) Problem is after above commit, we dont restart the lookup at right position in ->start() method. Fix this by removing the internal 'pos' pointer added in commit, since we need to use the 'loff_t *pos' provided by seq_file layer. This also reverts commit 5cac98dd0 (net: Fix corruption in /proc/*/net/dev_mcast), since its not needed anymore. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mihai Maruseac <mmaruseac@ixiacom.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>