summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* move iov_iter.c from mm/ to lib/Al Viro2015-02-183-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: dup_iter()Al Viro2015-02-182-0/+17
| | | | | | | Copy iter and kmemdup the underlying array for the copy. Returns a pointer to result of kmemdup() to be kfree()'d later. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-02-1852-143/+483
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Missing netlink attribute validation in nft_lookup, from Patrick McHardy. 2) Restrict ipv6 partial checksum handling to UDP, since that's the only case it works for. From Vlad Yasevich. 3) Clear out silly device table sentinal macros used by SSB and BCMA drivers. From Joe Perches. 4) Make sure the remote checksum code never creates a situation where the remote checksum is applied yet the tunneling metadata describing the remote checksum transformation is still present. Otherwise an external entity might see this and apply the checksum again. From Tom Herbert. 5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire. 6) Don't explicitly initialize timer struct fields, use setup_timer() and mod_timer() instead. From Vaishali Thakkar. 7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi Nomura. 8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet. 9) Don't potentially perform skb_get() on shared skbs, also from Eric Dumazet. 10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin KaFai Lau. 11) Fix merge resolution error between the iov_iter changes in vhost and some bug fixes that occurred at the same time. From Jason Wang. 12) If rtnl_configure_link() fails we have to perform a call to ->dellink() before unregistering the device. From WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits) net: dsa: Set valid phy interface type rtnetlink: call ->dellink on failure when ->newlink exists com20020-pci: add support for eae single card vhost_net: fix wrong iter offset when setting number of buffers net: spelling fixes net/core: Fix warning while make xmldocs caused by dev.c net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081 ipv6: fix ipv6_cow_metrics for non DST_HOST case openvswitch: Fix key serialization. r8152: restore hw settings hso: fix rx parsing logic when skb allocation fails tcp: make sure skb is not shared before using skb_get() bridge: netfilter: Move sysctl-specific error code inside #ifdef ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc ipvlan: add a missing __percpu pcpu_stats tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() bgmac: fix device initialization on Northstar SoCs (condition typo) qlcnic: Delete existing multicast MAC list before adding new net/mlx5_core: Fix configuration of log_uar_page_sz sunvnet: don't change gso data on clones ...
| * net: dsa: Set valid phy interface typeGuenter Roeck2015-02-171-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the phy interface mode is not found in devicetree, or if devicetree is not configured, of_get_phy_mode returns -ENODEV. The current code sets the phy interface mode to the return value from of_get_phy_mode without checking if it is valid. This invalid phy interface mode is passed as parameter to of_phy_connect or to phy_connect_direct. This sets the phy interface mode to the invalid value, which in turn causes problems for any code using phydev->interface. Fixes: b31f65fb4383 ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus") Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups") Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rtnetlink: call ->dellink on failure when ->newlink existsWANG Cong2015-02-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ignacy reported that when eth0 is down and add a vlan device on top of it like: ip link add link eth0 name eth0.1 up type vlan id 1 We will get a refcount leak: unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2 The problem is when rtnl_configure_link() fails in rtnl_newlink(), we simply call unregister_device(), but for stacked device like vlan, we almost do nothing when we unregister the upper device, more work is done when we unregister the lower device, so call its ->dellink(). Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * com20020-pci: add support for eae single cardMichael Grzeschik2015-02-151-3/+18
| | | | | | | | | | Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vhost_net: fix wrong iter offset when setting number of buffersJason Wang2015-02-151-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit ba7438aed924 ("vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()"), we advance iov iter fixup sizeof(struct virtio_net_hdr) bytes and fill the number of buffers after doing the socket recvmsg(). This work well but was broken after commit 6e03f896b52c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") which tries to advance sizeof(struct virtio_net_hdr_mrg_rxbuf). It will fill the number of buffers at the wrong place. This patch fixes this. Fixes 6e03f896b52c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: spelling fixesStephen Hemminger2015-02-153-3/+3
| | | | | | | | | | | | | | Spelling errors caught by codespell. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/core: Fix warning while make xmldocs caused by dev.cMasanari Iida2015-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix following warning wile make xmldocs. Warning(.//net/core/dev.c:5345): No description found for parameter 'bonding_info' Warning(.//net/core/dev.c:5345): Excess function parameter 'netdev_bonding_info' description in 'netdev_bonding_info_change' This warning starts to appear after following patch was added into Linus's tree during merger period. commit 61bd3857ff2c7daf756d49b41e6277bbdaa8f789 net/core: Add event for a change in slave state Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081Sylvain Rochet2015-02-151-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NAND-tree is used to check wiring between MAC and PHY using NAND gates on the PHY side, hence the name. NAND-tree initial status is latched at reset by probing the IRQ pin. However some devices are sharing the PHY IRQ pin with other peripherals such as Atmel SAMA5D[34]x-EK boards when using the optional TM7000 display module, therefore they are switching the PHY in NAND-tree test mode depending on the current IRQ line status at reset. This patch ensure PHY is not in NAND-tree test mode for all Micrel PHYs using IRQ line as a NAND-tree toggle mode at reset. Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fix ipv6_cow_metrics for non DST_HOST caseMartin KaFai Lau2015-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipv6_cow_metrics() currently assumes only DST_HOST routes require dynamic metrics allocation from inetpeer. The assumption breaks when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric. Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set() is called after the route is created. This patch creates the metrics array (by calling dst_cow_metrics_generic) in ipv6_cow_metrics(). Test: radvd.conf: interface qemubr0 { AdvLinkMTU 1300; AdvCurHopLimit 30; prefix fd00:face:face:face::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; Before: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec After: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300 fe80::/64 dev eth0 proto kernel metric 256 mtu 1300 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30 Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * openvswitch: Fix key serialization.Pravin B Shelar2015-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | Fix typo where mask is used rather than key. Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.") Reported-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * r8152: restore hw settingshayeswang2015-02-151-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a capability which let the hw could change the settings automatically when the power change to ON. However, the USB reset would reset the settings to the hw default, so the driver has to restore the relative settings. Otherwise, it would influence the functions of the hw, and the compatibility for the USB hub and USB host controller. The relative settings are as following. - set the power down scale to 96. - enable the power saving function of USB 2.0. - disable the ALDPS of ECM mode. - set burst mode depending on the burst size. - enable the flow control of endpoint full. - set fifo empty boundary to 32448 bytes. - enable the function of exiting LPM when Rx OK occurs. - set the connect timer to 1. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * hso: fix rx parsing logic when skb allocation failsAleksander Morgado2015-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If skb allocation fails once the IP header has been received, the rx state is being set to WAIT_SYNC. The logic, though, shouldn't directly return, as the buffer may contain a full packet, and therefore the WAIT_SYNC state needs to be processed (resetting state to WAIT_IP, clearing rx_buf_size and re-initializing rx_buf_missing). So, just let the while loop continue so that in the next iteration the WAIT_SYNC state cleanly stops the loop. The WAIT_SYNC processing will be done just after that, only if the end of packet is flagged. Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: make sure skb is not shared before using skb_get()Eric Dumazet2015-02-131-8/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv6 can keep a copy of SYN message using skb_get() in tcp_v6_conn_request() so that caller wont free the skb when calling kfree_skb() later. Therefore TCP fast open has to clone the skb it is queuing in child->sk_receive_queue, as all skbs consumed from receive_queue are freed using __kfree_skb() (ie assuming skb->users == 1) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: 5b7ed0892f2af ("tcp: move fastopen functions to tcp_fastopen.c") Signed-off-by: David S. Miller <davem@davemloft.net>
| * bridge: netfilter: Move sysctl-specific error code inside #ifdefGeert Uytterhoeven2015-02-121-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_SYSCTL=n: net/bridge/br_netfilter.c: In function ‘br_netfilter_init’: net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used Move the label and the code after it inside the existing #ifdef to get rid of the warning. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gcJan Stancek2015-02-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use spin_lock_bh in ip6_fl_purge() to prevent following potentially deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer. ================================= [ INFO: inconsistent lock state ] 3.19.0 #1 Not tainted --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (ip6_fl_lock){+.?...}, at: [<ffffffff8171155d>] ip6_fl_gc+0x2d/0x180 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ee9a0>] __lock_acquire+0x4a0/0x10b0 [<ffffffff810efd54>] lock_acquire+0xc4/0x2b0 [<ffffffff81751d2d>] _raw_spin_lock+0x3d/0x80 [<ffffffff81711798>] ip6_flowlabel_net_exit+0x28/0x110 [<ffffffff815f9759>] ops_exit_list.isra.1+0x39/0x60 [<ffffffff815fa320>] cleanup_net+0x100/0x1e0 [<ffffffff810ad80a>] process_one_work+0x20a/0x830 [<ffffffff810adf4b>] worker_thread+0x11b/0x460 [<ffffffff810b42f4>] kthread+0x104/0x120 [<ffffffff81752bfc>] ret_from_fork+0x7c/0xb0 irq event stamp: 84640 hardirqs last enabled at (84640): [<ffffffff81752080>] _raw_spin_unlock_irq+0x30/0x50 hardirqs last disabled at (84639): [<ffffffff81751eff>] _raw_spin_lock_irq+0x1f/0x80 softirqs last enabled at (84628): [<ffffffff81091ad1>] _local_bh_enable+0x21/0x50 softirqs last disabled at (84629): [<ffffffff81093b7d>] irq_exit+0x12d/0x150 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(ip6_fl_lock); <Interrupt> lock(ip6_fl_lock); *** DEADLOCK *** Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipvlan: add a missing __percpu pcpu_statsEric Dumazet2015-02-121-1/+1
| | | | | | | | | | | | | | Cosmetic patch to add __percpu qualifier to pcpu_stats Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()Jun'ichi Nomura \(NEC\)2015-02-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tg3_init_one() calls tg3_halt() without tp->lock despite its assumption and causes deadlock. If lockdep is enabled, a warning like this shows up before the stall: [ BUG: bad unlock balance detected! ] 3.19.0test #3 Tainted: G E ------------------------------------- insmod/369 is trying to release lock (&(&tp->lock)->rlock) at: [<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3] but there are no more locks to release! tg3_init_one() doesn't call tg3_halt() under normal situation but during kexec kdump I hit this problem. Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge tag 'wireless-drivers-for-davem-2015-02-11' of ↵David S. Miller2015-02-121-4/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers rtlwifi: * remove superfluous warning message which is not needed anymore Signed-off-by: David S. Miller <davem@davemloft.net>
| | * rtlwifi: Remove logging statement that is no longer neededLarry Finger2015-02-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit e9538cf4f907 ("rtlwifi: Fix error when accessing unmapped memory in skb"), a printk was included to indicate that the condition had been reached. There is now enough evidence from other users that the fix is working. That logging statement can now be removed. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> [V3.18] Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
| * | bgmac: fix device initialization on Northstar SoCs (condition typo)Rafał Miłecki2015-02-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | On Northstar (Broadcom's ARM architecture) we need to manually enable all cores. Code for that is already in place, but the condition for it was wrong. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Delete existing multicast MAC list before adding newShahed Shaikh2015-02-123-13/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver keeps adding multicast addresses without deleting removed MACs and worrying about adapters filter limit. This results into actual count of programmed multicast addresses get accumulated over the time and overruns the adapter's filter limit without putting device in ACCEPT_ALL_MULTI mode. This causes newly added multicast traffic to fail after the sequence of addition - deletion in certain pattern. This issue is seen only when netdev's mcast list count is less than adapters mcast filter limit. e.g. If adapters multicast filter limit is 38 per function then following sequence would result in multicast traffic failure for newly added MACs. - add less than 38 multicast MACs - remove previously added multicast MACs - add new multicast MACs (less than 38) Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/mlx5_core: Fix configuration of log_uar_page_szEli Cohen2015-02-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The current code failed to configure the page size for architectures with page size different than 4K - PPC for example. Signed-off-by: Carol L Soto <clsoto@us.ibm.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sunvnet: don't change gso data on clonesDavid L Stevens2015-02-121-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch unclones an skb for the case where the sunvnet driver needs to change the segmentation size so that it doesn't interfere with TCP SACK's use of them. Signed-off-by: David L Stevens <david.stevens@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | drivers/net: Use setup_timer and mod_timerVaishali Thakkar2015-02-121-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the use of functions setup_timer and mod_timer. This is done using Coccinelle and semantic patch used for this as follows: // <smpl> @@ expression x,y,z,a,b; @@ -init_timer (&x); +setup_timer (&x, y, z); +mod_timer (&a, b); -x.function = y; -x.data = z; -x.expires = b; -add_timer(&a); // </smpl> Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | drivers: net: xgene: Make xgene_enet_of_match depend on CONFIG_OFGeert Uytterhoeven2015-02-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_NET_XGENE=y but CONFIG_OF=n: drivers/net/ethernet/apm/xgene/xgene_enet_main.c:1033: warning: ‘xgene_enet_of_match’ defined but not used Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | et131x: use msecs_to_jiffies for conversionsNicholas Mc Guire2015-02-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is only an API consolidation and should make things more readable. Converting milliseconds to jiffies by "val * HZ / 1000" is technically OK but msecs_to_jiffies(val) is the cleaner solution and handles all corner cases correctly. This is a minor API cleanup only. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Mark Einon <mark.einon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'rco_correctness'David S. Miller2015-02-1211-45/+160
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tom Herbert says: ==================== net: Fixes to remote checksum offload and CHECKSUM_PARTIAL This patch set fixes a correctness problem with remote checksum offload, clarifies the meaning of CHECKSUM_PARTIAL, and allows remote checksum offload to set CHECKSUM_PARTIAL instead of calling csum_partial and modifying the checksum. Specifically: - In the GRO remote checksum path, restore the checksum after calling lower layer GRO functions. This is needed if the packet is forwarded off host with the Remote Checksum Offload option still present. - Clarify meaning of CHECKSUM PARTIAL in the receive path. Only the checksums referred to by checksum partial and any preceding checksums can be considered verified. - Fixes to UDP tunnel GRO complete. Need to set SKB_GSO_UDP_TUNNEL_*, SKB_GSO_TUNNEL_REMCSUM, and skb->encapsulation for forwarding case. - Infrastructure to allow setting of CHECKSUM_PARTIAL in remote checksum offload. This a potential performance benefit instead of calling csum_partial (potentially twice, once in GRO path and once in normal path). The downside of using CHECKSUM_PARTIAL and not actually writing the checksum is that we aren't verifying that the sender correctly wrote the pseudo checksum into the checksum field, or that the start/offset values actually point to a checksum. If the sender did not set up these fields correctly, a packet might be accepted locally, but not accepted by a peer when the packet is forwarded off host. Verifying these fields seems non-trivial, and because the fields can only be incorrect due to sender error and not corruption (outer checksum protects against that) we'll make use of CHECKSUM_PARTIAL the default. This behavior can be reverted as an netlink option on the encapsulation socket. - Change VXLAN and GUE to set CHECKSUM_PARTIAL in remote checksum offload by default, configuration hooks can revert to using csum_partial. Testing: I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200 streams for GRE/GUE and for VXLAN. This compares before the fixes, the fixes with not setting checksum partial in remote checksum offload, and with the fixes setting checksum partial. The overall effect seems be that using checksum partial is a slight performance win, perf definitely shows a significant reduction of time in csum_partial on the receive CPUs. GRE/GUE TCP_STREAM Before fixes 9.22% TX CPU utilization 13.57% RX CPU utilization 9133 Mbps Not using checksum partial 9.59% TX CPU utilization 14.95% RX CPU utilization 9132 Mbps Using checksum partial 9.37% TX CPU utilization 13.89% RX CPU utilization 9132 Mbps TCP_RR Before fixes CPU utilization 159/251/447 90/95/99% latencies 1.1462e+06 tps Not using checksum partial 92.94% CPU utilization 158/253/445 90/95/99% latencies 1.12988e+06 tps Using checksum partial 92.78% CPU utilization 158/250/450 90/95/99% latencies 1.15343e+06 tps VXLAN TCP_STREAM Before fixes 9.24% TX CPU utilization 13.74% RX CPU utilization 9093 Mbps Not using checksum partial 9.95% TX CPU utilization 14.66% RX CPU utilization 9094 Mbps Using checksum partial 10.24% TX CPU utilization 13.32% RX CPU utilization 9093 Mbps TCP_RR Before fixes 92.91% CPU utilization 151/241/437 90/95/99% latencies 1.15939e+06 tps Not using checksum partial 93.07% CPU utilization 156/246/425 90/95/99% latencies 1.1451e+06 tps Using checksum partial 95.51% CPU utilization 156/249/459 90/95/99% latencies 1.17004e+06 tps ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | gue: Use checksum partial with remote checksum offloadTom Herbert2015-02-122-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change remote checksum handling to set checksum partial as default behavior. Added an iflink parameter to configure not using checksum partial (calling csum_partial to update checksum). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | vxlan: Use checksum partial with remote checksum offloadTom Herbert2015-02-123-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change remote checksum handling to set checksum partial as default behavior. Added an iflink parameter to configure not using checksum partial (calling csum_partial to update checksum). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offloadTom Herbert2015-02-125-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds infrastructure so that remote checksum offload can set CHECKSUM_PARTIAL instead of calling csum_partial and writing the modfied checksum field. Add skb_remcsum_adjust_partial function to set an skb for using CHECKSUM_PARTIAL with remote checksum offload. Changed skb_remcsum_process and skb_gro_remcsum_process to take a boolean argument to indicate if checksum partial can be set or the checksum needs to be modified using the normal algorithm. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Use more bit fields in napi_gro_cbTom Herbert2015-02-121-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the free and same_flow fields to be bit fields (2 and 1 bit sized respectively). This frees up some space for u16's. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | udp: Set SKB_GSO_UDP_TUNNEL* in UDP GRO pathTom Herbert2015-02-122-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Properly set GSO types and skb->encapsulation in the UDP tunnel GRO complete so that packets are properly represented for GSO. This sets SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if the remote checksum option was processed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Clarify meaning of CHECKSUM_PARTIAL for receive pathTom Herbert2015-02-122-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current meaning of CHECKSUM_PARTIAL for validating checksums is that _all_ checksums in the packet are considered valid. However, in the manner that CHECKSUM_PARTIAL is set only the checksum at csum_start+csum_offset and any preceding checksums may be considered valid. If there are checksums in the packet after csum_offset it is possible they have not been verfied. This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and __skb_gro_checksum_validate_needed to only considered checksums referring to csum_start and any preceding checksums (with starting offset before csum_start) to be verified. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Fix remcsum in GRO path to not change packetTom Herbert2015-02-124-22/+47
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remote checksum offload processing is currently the same for both the GRO and non-GRO path. When the remote checksum offload option is encountered, the checksum field referred to is modified in the packet. So in the GRO case, the packet is modified in the GRO path and then the operation is skipped when the packet goes through the normal path based on skb->remcsum_offload. There is a problem in that the packet may be modified in the GRO path, but then forwarded off host still containing the remote checksum option. A remote host will again perform RCO but now the checksum verification will fail since GRO RCO already modified the checksum. To fix this, we ensure that GRO restores a packet to it's original state before returning. In this model, when GRO processes a remote checksum option it still changes the checksum per the algorithm but on return from lower layer processing the checksum is restored to its original value. In this patch we add define gro_remcsum structure which is passed to skb_gro_remcsum_process to save offset and delta for the checksum being changed. After lower layer processing, skb_gro_remcsum_cleanup is called to restore the checksum before returning from GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | openvswitch: Add missing initialization in validate_and_copy_set_tun()Geert Uytterhoeven2015-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’: net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function If ipv4_tun_from_nlattr() returns a different positive value than OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and validate_and_copy_set_tun() may return an undefined value instead of a zero success indicator. Initialize err to zero to fix this. Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | openvswitch: Reset key metadata for packet execution.Pravin B Shelar2015-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace packet execute command pass down flow key for given packet. But userspace can skip some parameter with zero value. Therefore kernel needs to initialize key metadata to zero. Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | treewide: Remove unnecessary SSB_DEVTABLE_END macroJoe Perches2015-02-116-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use the normal {} instead of a macro to terminate an array. Remove the macro too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | treewide: Remove unnecessary BCMA_CORETABLE_END macroJoe Perches2015-02-116-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use the normal {} instead of a macro to terminate an array. Remove the macro too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rds: rds_cong_queue_updates needs to defer the congestion update transmissionSowmini Varadhan2015-02-111-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the RDS transport is TCP, we cannot inline the call to rds_send_xmit from rds_cong_queue_update because (a) we are already holding the sock_lock in the recv path, and will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock lock (b) cong_queue_update does an irqsave on the rds_cong_lock, and this will trigger warnings (for a good reason) from functions called out of sock_lock. This patch reverts the change introduced by 2fa57129d ("RDS: Bypass workqueue when queueing cong updates"). The patch has been verified for both RDS/TCP as well as RDS/RDMA to ensure that there are not regressions for either transport: - for verification of RDS/TCP a client-server unit-test was used, with the server blocked in gdb and thus unable to drain its rcvbuf, eventually triggering a RDS congestion update. - for RDS/RDMA, the standard IB regression tests were used Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Partial checksum only UDP packetsVlad Yasevich2015-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip6_append_data is used by other protocols and some of them can't be partially checksummed. Only partially checksum UDP protocol. Fixes: 32dce968dd987a (ipv6: Allow for partial checksums on non-ufo packets) Reported-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2015-02-112-6/+58
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains two small Netfilter updates for your net-next tree, they are: 1) Add ebtables support to nft_compat, from Arturo Borrero. 2) Fix missing validation of the SET_ID attribute in the lookup expressions, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_IDPatrick McHardy2015-01-301-0/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | netfilter: nft_compat: add ebtables supportArturo Borrero2015-01-301-6/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends nft_compat to support ebtables extensions. ebtables verdict codes are translated to the ones used by the nf_tables engine, so we can properly use ebtables target extensions from nft_compat. This patch extends previous work by Giuseppe Longo <giuseppelng@gmail.com>. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | Merge tag 'md/3.20-fixes' of git://neil.brown.name/mdLinus Torvalds2015-02-183-3/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md bugfixes from Neil Brown: "Three bug md fixes for 3.20 yet-another-livelock in raid5, and a problem with write errors to 4K-block devices" * tag 'md/3.20-fixes' of git://neil.brown.name/md: md/raid5: Fix livelock when array is both resyncing and degraded. md/raid10: round up to bdev_logical_block_size in narrow_write_error. md/raid1: round up to bdev_logical_block_size in narrow_write_error
| * | | | md/raid5: Fix livelock when array is both resyncing and degraded.NeilBrown2015-02-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f: md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. Causes an RCW cycle to be forced even when the array is degraded. A degraded array cannot support RCW as that requires reading all data blocks, and one may be missing. Forcing an RCW when it is not possible causes a live-lock and the code spins, repeatedly deciding to do something that cannot succeed. So change the condition to only force RCW on non-degraded arrays. Reported-by: Manibalan P <pmanibalan@amiindia.co.in> Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f Cc: stable@vger.kernel.org (v3.7+)
| * | | | md/raid10: round up to bdev_logical_block_size in narrow_write_error.NeilBrown2015-02-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RAID10 version of earlier fix for RAID1. We must never initiate IO with sizes less that logical_block_size. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | | md/raid1: round up to bdev_logical_block_size in narrow_write_errorNate Dailey2015-02-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This modifies raid1's narrow_write_error to round up block_sectors to the device's logical block size. This prevents sd complaining about "Bad block number requested" for non-512-byte sector disks. Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | | Merge tag 'please-pull-fixmcelog' of ↵Linus Torvalds2015-02-181-4/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull mcelog regression fix from Tony Luck: "Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged" * tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix regression. All error records should report via /dev/mcelog