linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller	2014-10-06	32	-440/+699
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains another batch with Netfilter/IPVS updates for net-next, they are: 1) Add abstracted ICMP codes to the nf_tables reject expression. We introduce four reasons to reject using ICMP that overlap in IPv4 and IPv6 from the semantic point of view. This should simplify the maintainance of dual stack rule-sets through the inet table. 2) Move nf_send_reset() functions from header files to per-family nf_reject modules, suggested by Patrick McHardy. 3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the code now that br_netfilter can be modularized. Convert remaining spots in the network stack code. 4) Use rcu_barrier() in the nf_tables module removal path to ensure that we don't leave object that are still pending to be released via call_rcu (that may likely result in a crash). 5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad) idea was to probe the word size based on the xtables match/target info size, but this assumption is wrong when you have to dump the information back to userspace. 6) Allow to filter from prerouting and postrouting in the nf_tables bridge. In order to emulate the ebtables NAT chains (which are actually simple filter chains with no special semantics), we have support filtering from this hooks too. 7) Add explicit module dependency between xt_physdev and br_netfilter. This provides a way to detect if the user needs br_netfilter from the configuration path. This should reduce the breakage of the br_netfilter modularization. 8) Cleanup coding style in ip_vs.h, from Simon Horman. 9) Fix crash in the recently added nf_tables masq expression. We have to register/unregister the notifiers to clean up the conntrack table entries from the module init/exit path, not from the rule addition / deletion path. From Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: nft_masq: register/unregister notifiers on module init/exit	Arturo Borrero	2014-10-03	2	-46/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have to register the notifiers in the masquerade expression from the the module _init and _exit path. This fixes crashes when removing the masquerade rule with no ipt_MASQUERADE support in place (which was masking the problem). Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression") Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	ipvs: Clean up comment style in ip_vs.h	Simon Horman	2014-10-02	1	-139/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Consistently use the multi-line comment style for networking code: /* This * That * The other thing / Use single-line comment style for comments with only one line of text. * In general follow the leading '' of each line of a comment with a single space and then text. Add missing line break between functions, remove double line break, align comments to previous lines whenever possible. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: explicit module dependency between br_netfilter and physdev	Pablo Neira Ayuso	2014-10-02	3	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	You can use physdev to match the physical interface enslaved to the bridge device. This information is stored in skb->nf_bridge and it is set up by br_netfilter. So, this is only available when iptables is used from the bridge netfilter path. Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"), the br_netfilter code is modular. To reduce the impact of this change, we can autoload the br_netfilter if the physdev match is used since we assume that the users need br_netfilter in place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: nf_tables: allow to filter from prerouting and postrouting	Pablo Neira Ayuso	2014-10-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to emulate the NAT table in ebtables, which is actually a plain filter chain that hooks at prerouting, output and postrouting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: nft_compat: remove incomplete 32/64 bits arch compat code	Pablo Neira Ayuso	2014-10-02	1	-101/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code was based on the wrong asumption that you can probe based on the match/target private size that we get from userspace. This doesn't work at all when you have to dump the info back to userspace since you don't know what word size the userspace utility is using. Currently, the extensions that require arch compat are limit match and the ebt_mark match/target. The standard targets are not used by the nft-xt compat layer, so they are not affected. We can work around this limitation with a new revision that uses arch agnostic types. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: nf_tables: wait for call_rcu completion on module removal	Pablo Neira Ayuso	2014-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure the objects have been released before the nf_tables modules is removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)	Pablo Neira Ayuso	2014-10-02	10	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"), the bridge netfilter code has been modularized. Use IS_ENABLED instead of ifdef to cover the module case. Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: move nf_send_resetX() code to nf_reject_ipvX modules	Pablo Neira Ayuso	2014-10-02	7	-117/+309
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and nf_reject_ipv6 respectively. This code is shared by x_tables and nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: nft_reject: introduce icmp code abstraction for inet and bridge	Pablo Neira Ayuso	2014-10-02	7	-17/+241
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides an abstraction to the ICMP and ICMPv6 codes that you can use from the inet and bridge tables, they are: * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited You can still use the specific codes when restricting the rule to match the corresponding layer 3 protocol. I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have different semantics depending on the table family and to allow the user to specify ICMP family specific codes if they restrict it to the corresponding family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* \|	Merge branch 'bridge_default_pvid'	David S. Miller	2014-10-06	5	-12/+186
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Vladislav Yasevich says: ==================== bridge: Add vlan filtering support for default pvid This series adds default pvid support to vlan filtering in the bridge. VLAN 1 (as recommended by 802.1q spec) is used as default pvid on ports. Then the user can over-ride this configuration by configuring their own vlan information. The user can additionally change the default value through the sysfs interface (netlink coming shortly). The user can turn off default pvid functionality by setting default pvid to 0. This series changes the default behavior of the bridge when vlan filtering is turned on. Currently, ports without any vlan filtering configured will not recevie any traffic at all. This patch changes the behavior of the above ports to receive only untagged traffic. Since v3: - allocated 'changed' bitmap on the heap and re-arrange code to clean it up. - remove extra blank lines. - Fix patch1 to build by itself. - Fix error recover to not add vlan 0. - Restructure nbp_vlan_init to remove uneeded variable. Since v2: - Fix handling of invalid values in sysfs interface. - Add some additional log messages. - Fix default_pvid handling when vlan filtering is compiled out. - Fix sparse issues with new code. - Fix how we located the old default pvid (added a helper function). Since v1: - Add ability to turn off default_pvid settings. - Drop the automiatic filtering support based on configured vlan devices (will be its own series) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: Add filtering support for default_pvid	Vlad Yasevich	2014-10-06	4	-7/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when vlan filtering is turned on on the bridge, the bridge will drop all traffic untill the user configures the filter. This isn't very nice for ports that don't care about vlans and just want untagged traffic. A concept of a default_pvid was recently introduced. This patch adds filtering support for default_pvid. Now, ports that don't care about vlans and don't define there own filter will belong to the VLAN of the default_pvid and continue to receive untagged traffic. This filtering can be disabled by setting default_pvid to 0. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: Simplify pvid checks.	Vlad Yasevich	2014-10-06	2	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, if the pvid is not set, we return an illegal vlan value even though the pvid value is set to 0. Since pvid of 0 is currently invalid, just return 0 instead. This makes the current and future checks simpler. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: Add a default_pvid sysfs attribute	Vlad Yasevich	2014-10-06	3	-0/+48
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the user to set and retrieve default_pvid value. A new value can only be stored when vlan filtering is disabled. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: pxa168_eth: avoid using signed char for bitops	Antoine Ténart	2014-10-06	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signedness bugs may occur when using signed char for bitops, depending on if the highest bit is ever used. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'isdn-next'	David S. Miller	2014-10-06	3	-13/+7
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tilman Schmidt says: ==================== ISDN patches for net-next Here's a series of patches for the ISDN CAPI subsystem and the Gigaset ISDN driver. Please merge via net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	isdn/gigaset: use USB API function usb_endpoint_num()	Tilman Schmidt	2014-10-06	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use function usb_endpoint_num() for the bulk endpoint and store the endpoint number in the cardstate structure instead of the raw endpoint address value. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	isdn/gigaset: drop unused cardstate structure member	Tilman Schmidt	2014-10-06	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Field int_in_endpointAddr was set but never used. Drop it. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	isdn/gigaset: improve error handling when leaving DLE mode	Tilman Schmidt	2014-10-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid cascading warnings when leaving DLE mode fails by clearing the DLE flag before entering recovery. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	isdn/capi: drop two dead if branches	Tilman Schmidt	2014-10-06	1	-3/+0
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The last branch in command_2_index() cannot be reached since c==0xff is already caught by the first "if". The empty second branch makes no difference since no other branch will be taken for c<0x0f. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: sched: suspicious RCU usage in qdisc_watchdog	John Fastabend	2014-10-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Suspicious RCU usage in qdisc_watchdog call needs to be done inside rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations need to ensure timer is cancelled before removing qdisc structure. [ 3992.191339] =============================== [ 3992.191340] [ INFO: suspicious RCU usage. ] [ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted [ 3992.191345] ------------------------------- [ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage! [ 3992.191348] [ 3992.191348] other info that might help us debug this: [ 3992.191348] [ 3992.191351] [ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1 [ 3992.191353] no locks held by swapper/1/0. [ 3992.191355] [ 3992.191355] stack backtrace: [ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72 [ 3992.191360] Hardware name: /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012 [ 3992.191362] 0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000 [ 3992.191366] ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000 [ 3992.191370] ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98 [ 3992.191374] Call Trace: [ 3992.191376] <IRQ> [<ffffffff8178f92c>] dump_stack+0x4e/0x68 [ 3992.191387] [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130 [ 3992.191392] [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0 [ 3992.191396] [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420 [ 3992.191399] [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240 [ 3992.191403] [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0 [ 3992.191406] [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240 [ 3992.191410] [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140 [ 3992.191415] [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60 [ 3992.191419] [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60 [ 3992.191422] [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80 [ 3992.191424] <EOI> [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0 [ 3992.191432] [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0 [ 3992.191437] [<ffffffff815ed567>] cpuidle_enter+0x17/0x20 [ 3992.191441] [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0 [ 3992.191445] [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30 [ 3992.191448] [<ffffffff81033c16>] start_secondary+0x1b6/0x260 Fixes: b26b0d1e8b1 ("net: qdisc: use rcu prefix and silence sparse warnings") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: dsa: do not call phy_start_aneg	Florian Fainelli	2014-10-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f7f1de51edbd ("net: dsa: start and stop the PHY state machine") add calls to phy_start() in dsa_slave_open() respectively phy_stop() in dsa_slave_close(). We also call phy_start_aneg() in dsa_slave_create(), and this call is messing up with the PHY state machine, since we basically start the auto-negotiation, and later on restart it when calling phy_start(). phy_start() does not currently handle the PHY_FORCING or PHY_AN states properly, but such a fix would be too invasive for this window. Fixes: f7f1de51edbd ("net: dsa: start and stop the PHY state machine") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Removed unused inet6 address state	Sébastien Barré	2014-10-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition. Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Cleanup skb cloning by adding SKB_FCLONE_FREE	Vijay Subramanian	2014-10-05	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb. 1: If skb is allocated from head_cache, it indicates fclone is not available. 2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates it is available to be used. To avoid confusion for case 2 above, this patch replaces SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone companion skbs, this indicates it is free for use. SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and cannot / will not have a companion fclone. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4: add a new xmit_more counter	Eric Dumazet	2014-10-05	4	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ethtool -S reports a new counter, tracking number of time doorbell was not triggered, because skb->xmit_more was set. $ ethtool -S eth0 \| egrep "tx_packet\|xmit_more" tx_packets: 2413288400 xmit_more: 666121277 I merged the tso_packet false sharing avoidance in this patch as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'gudp'	David S. Miller	2014-10-04	8	-40/+235
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Herbert says: ==================== net: Generic UDP Encapsulation Generic UDP Encapsulation (GUE) is UDP encapsulation protocol which encapsulates packets of various IP protocols. The GUE protocol is described in http://tools.ietf.org/html/draft-herbert-gue-01. The receive path of GUE is implemented in the FOU over UDP module (FOU). This includes a UDP encap receive function for GUE as well as GUE specific GRO functions. Management and configuration of GUE ports shares most of the same code with FOU. For the transmit path, the previous FOU support for IPIP, sit, and GRE was simply extended for GUE (when GUE is enabled insert the GUE header on transmit in addition to UDP header inserted for FOU). Semantically GUE is the same as FOU in that the encapsulation (UDP and GUE headers) that are inserted on transmission and removed on reception so that IP packet is processed with the inner header. This patch set includes: - Some fixes to FOU, removal of IPv4,v6 specific GRO functions - Support to configure a GUE receive port - Implementation of GUE receive path (normal and GRO) - Additions to ip_tunnel netlink to configure GUE - GUE header inserion in ip_tunnel transmit path v2: - Include net/gue.h in patch set Testing: I ran performance numbers using netperf TCP_RR with 200 streams, comparing encapsulation without GUE, encapsulation with GUE, and encapsulation with FOU. GRE TCP_STREAM IPv4, FOU, UDP checksum enabled 14.04% TX CPU utilization 13.17% RX CPU utilization 9211 Mbps IPv4, GUE, UDP checksum enabled 14.99% TX CPU utilization 13.79% RX CPU utilization 9185 Mbps IPv4, FOU, UDP checksum disabled 13.14% TX CPU utilization 23.18% RX CPU utilization 9277 Mbps IPv4, GUE, UDP checksum disabled 13.66% TX CPU utilization 23.57% RX CPU utilization 9184 Mbps TCP_RR IPv4, FOU, UDP checksum enabled 94.2% CPU utilization 155/249/460 90/95/99% latencies 1.17018e+06 tps IPv4, GUE, UDP checksum enabled 93.9% CPU utilization 158/253/472 90/95/99% latencies 1.15045e+06 tps IPIP TCP_STREAM FOU, UDP checksum enabled 15.28% TX CPU utilization 13.92% RX CPU utilization 9342 Mbps GUE, UDP checksum enabled 13.99% TX CPU utilization 13.34% RX CPU utilization 9210 Mbps FOU, UDP checksum disabled 15.08% TX CPU utilization 24.64% RX CPU utilization 9226 Mbps GUE, UDP checksum disabled 15.90% TX CPU utilization 24.77% RX CPU utilization 9197 Mbps TCP_RR FOU, UDP checksum enabled 94.23% CPU utilization 149/237/429 90/95/99% latencies 1.19553e+06 tps GUE, UDP checksum enabled 93.75% CPU utilization 152/243/442 90/95/99% latencies 1.17027e+06 tps SIT TCP_STREAM FOU, UDP checksum enabled 14.47% TX CPU utilization 14.58% RX CPU utilization 9106 Mbps GUE, UDP checksum enabled 15.09% TX CPU utilization 14.84% RX CPU utilization 9080 Mbps FOU, UDP checksum disabled 15.70% TX CPU utilization 27.93% RX CPU utilization 9097 Mbps GUE, UDP checksum disabled 15.04% TX CPU utilization 27.54% RX CPU utilization 9073 Mbps TCP_RR FOU, UDP checksum enabled 96.9% CPU utilization 170/281/581 90/95/99% latencies 1.03372e+06 tps GUE, UDP checksum enabled 97.16% CPU utilization 172/286/576 90/95/99% latencies 1.00469e+06 tps ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ip_tunnel: Add GUE support	Tom Herbert	2014-10-04	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows configuring IPIP, sit, and GRE tunnels to use GUE. This is very similar to fou excpet that we need to insert the GUE header in addition to the UDP header on transmit. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	gue: Receive side for Generic UDP Encapsulation	Tom Herbert	2014-10-04	3	-9/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support receiving for GUE packets in the fou module. The fou module now supports direct foo-over-udp (no encapsulation header) and GUE. To support this a type parameter is added to the fou netlink parameters. For a GUE socket we define gue_udp_recv, gue_gro_receive, and gue_gro_complete to handle the specifics of the GUE protocol. Most of the code to manage and configure sockets is common with the fou. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	fou: eliminate IPv4,v6 specific GRO functions	Tom Herbert	2014-10-04	4	-40/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes fou[46]_gro_receive and fou[46]_gro_complete functions. The v4 or v6 variants were chosen for the UDP offloads based on the address family of the socket this is not necessary or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb. This is set in udp6_gro_receive and unset in udp4_gro_receive. In fou_gro_receive the value is used to select the correct inet_offloads for the protocol of the outer IP header. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ip_tunnel: Account for secondary encapsulation header in max_headroom	Tom Herbert	2014-10-04	1	-1/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adjusting max_header for the tunnel interface based on egress device we need to account for any extra bytes in secondary encapsulation (e.g. FOU). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: do not export skb_gro_receive()	Eric Dumazet	2014-10-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	skb_gro_receive() is only called from tcp_gro_receive() which is not in a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	drivers/net/irda/Kconfig: Let SH_IRDA depend on HAS_IOMEM	Chen Gang	2014-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SH_IRDA needs HAS_IOMEM, so depend on it. The related error(with allmodconfig under um): CC [M] drivers/net/irda/sh_irda.o drivers/net/irda/sh_irda.c: In function ‘sh_irda_probe’: drivers/net/irda/sh_irda.c:776:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration] self->membase = ioremap_nocache(res->start, resource_size(res)); ^ drivers/net/irda/sh_irda.c:776:16: warning: assignment makes pointer from integer without a cast [enabled by default] self->membase = ioremap_nocache(res->start, resource_size(res)); ^ drivers/net/irda/sh_irda.c:821:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration] iounmap(self->membase); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	drivers/net/ethernet/marvell/Kconfig: Let PXA168_ETH depend on HAS_IOMEM	Chen Gang	2014-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PXA168_ETH need HAS_IOMEM, so depend on it, the related error (with allmodconfig under um): CC [M] drivers/net/ethernet/marvell/pxa168_eth.o drivers/net/ethernet/marvell/pxa168_eth.c: In function ‘pxa168_eth_probe’: drivers/net/ethernet/marvell/pxa168_eth.c:1605:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration] iounmap(pep->base); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	drivers/net/dsa/Kconfig: Let NET_DSA_BCM_SF2 depend on HAS_IOMEM	Chen Gang	2014-10-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NET_DSA_BCM_SF2 need HAS_IOMEM, so depend on it, the related error (with allmodconfig under um): CC [M] drivers/net/dsa/bcm_sf2.o drivers/net/dsa/bcm_sf2.c: In function ‘bcm_sf2_sw_setup’: drivers/net/dsa/bcm_sf2.c:487:3: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration] iounmap(*base); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	drivers/net/can/Kconfig: Let CAN_AT91 depend on HAS_IOMEM	Chen Gang	2014-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CAN_AT91 needs HAS_IOMEM, so depends on it. The related error (with allmodconfig under um): CC [M] drivers/net/can/at91_can.o drivers/net/can/at91_can.c: In function ‘at91_can_probe’: drivers/net/can/at91_can.c:1329:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration] addr = ioremap_nocache(res->start, resource_size(res)); ^ drivers/net/can/at91_can.c:1329:7: warning: assignment makes pointer from integer without a cast [enabled by default] addr = ioremap_nocache(res->start, resource_size(res)); ^ drivers/net/can/at91_can.c:1384:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration] iounmap(addr); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2014-10-04	10	-77/+12
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-10-02 This series contains updates to fm10k, igb, ixgbe and i40e. Alex provides two updates to the fm10k driver. First reduces the buffer size to 2k for all page sizes, since most frames only have a 1500 MTU so supporting a buffer size larger than this is somewhat wasteful. Second fixes an issue where the number of transmit queues was not being updated, so added the lines necessary to update the number of transmit queues. Rick Jones provides two patches to convert ixgbe, igb and i40e to use dev_consume_skb_any(). Emil provides two patches for ixgbe, first cleans up a couple of wait loops on auto-negotiation that were not needed. Second fixes an issue reported by Fujitsu/Red Hat, which consolidates the logic behind the dynamically setting of TXDCTL.WTHRESH depending on interrupt throttle rate (ITR) setting regardless of BQL. Ethan Zhao provides a cleanup patch for ixgbe where he noticed a duplicate define. Bernhard Kaindl provides a patch for igb to remove a source of latency spikes by not calling code that uses mdelay() for feeding a PHY stat while being called with a spinlock held. Todd bumps the igb version based on the recent changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	igb: bump version to 5.2.15	Todd Fujinaka	2014-10-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bump version Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	i40e/igb: Convert to dev_consume_skb_any()	Rick Jones	2014-10-02	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert two more Intel NIC drivers to dev_consume_skb_any() to help make dropped packet profiling sane. Signed-off-by: Rick Jones <rick.jones2@hp.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	igb: remove blocking phy read from inside spinlock	Bernhard Kaindl	2014-10-02	3	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a source of latency spikes (in my case up to 10ms) by not calling code that uses mdelay() for feeding a phy statistic (rx errors for idle symbols - not data -> idle_errors) while being called with a spinlock held. As idle_errors isn't read, this patch only removes unused code and data. Later, more complicated changes may be applied to address the spinlock and allow for some PHY diagnostics by harvesting this PHY stats register fully. This patch is designed to fix the issue and be safe for longterm/stable. For the Intel e1000e driver, the same change was applied in 2008 with commit 23033fad5be0 ("e1000e: remove phy read from inside spinlock"). The mdelay is triggered by HW/SW semaphores, thus it depends on the HW. I've HW that triggers it even when idle. Others may trigger it only e.g. when Ethernet ports aquire or loose the link or on ifconfig up / down. We've noticed this first from delays in frame rx/tx due to the mdelay(). Example command for checking if the issue is triggered: cyclictest -Smp1 (Look for occasional "Max:" values > 4000 or use -b 4000 to stop if greater) It was observed with I350 ports connected to other I350 ports, but not if driver and EEPROM was modified to run the I350 in EEPROM-less mode. phy_stats.idle_errors and .receive_errors (isn't touched) occupy 64 not used bits in the adapter struct: Their allocation may be removed as well. Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Todd Fujinaka <todd.fujinaka@intel.com> Fixes: 12dcd86b75d5 ("igb: fix stats handling") (this added the spin_lock) Signed-off-by: Bernhard Kaindl <bk-linux@use.startmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	ixgbe: delete one duplicate marcro definition of IXGBE_MAX_L2A_QUEUES	Ethan Zhao	2014-10-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is typo in ixgbe.h, two marcro definition of IXGBE_MAX_L2A_QUEUES to 4, delete one, clear the compiler warning. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	ixgbe: fix setting of TXDCTL.WTRHESH when ITR is set to 0 and no BQL	Emil Tantilov	2014-10-02	2	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch consolidates the logic behind dynamically setting TXDCTL.WTHRESH depending on interrupt throttle rate (ITR) setting regardless of BQL. Previously TXDCTL.WTHRESH was dynamically being set only with BQL being enabled, but we have to set it regardless of BQL when ITR is low to avoid Tx stalls/hangs. CC: John Greene <jogreene@redhat.com> Reported by: Masayuki Gouji <gouji.masayuki@jp.fujitsu.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	ixgbe: remove wait loop on autoneg for copper devices	Emil Tantilov	2014-10-02	1	-41/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes couple of wait loops on autoneg that are not needed. During validation we noticed that the loops always time out, so there should be no user impact. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	ixgbe: Convert the normal transmit complete path to dev_consume_skb_any()	Rick Jones	2014-10-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the normal packet completion path to dev_consume_skb_any() so packet drop profiling via dropwatch or perf top -G -e skb_kfree_skb is not cluttered with false hits. Compile tested only. There is a dev_kfree_skb_any() in the routine ixgbe_ptp_tx_hwtstamp() in ixgbe_ptp.c that looks like a conversion candidate but I wasn't familiar enough with the code to pull the trigger. Signed-off-by: Rick Jones <rick.jones2@hp.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	fm10k: Correctly set the number of Tx queues	Alexander Duyck	2014-10-02	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The number of Tx queues was not being updated due to some issues when generating the patches. This change makes sure to add the lines necessary to update the number of Tx queues correctly. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| * \|	fm10k: Reduce buffer size when pages are larger than 4K	Alexander Duyck	2014-10-02	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change reduces the buffer size to 2K for all page sizes. The basic idea is that since most frames only have a 1500 MTU supporting a buffer size larger than this is somewhat wasteful. As such I have reduced the size to 2K for all page sizes which will allow for more uses per page. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* \| \|	Merge branch 'mlx5-next'	David S. Miller	2014-10-04	15	-455/+804
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eli Cohen says: ==================== mlx5 update for 3.18 This series integrates a new mechanism for populating and extracting field values used in the driver/firmware interaction around command mailboxes. Changes from V1: - Remove unused definition of memcpy_cpu_to_be32() - Remove definitions of non_existent_*() and use BUILD_BUG_ON() instead. - Added a patch one line patch to add support for ConnectX-4 devices. Changes from V0: - trimmed the auto-generated file to a minimum, as required by the reviewers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net/mlx5_core: Add ConnectX-4 to list of supported devices	Eli Cohen	2014-10-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the upcoming ConnectX-4 device to the list of supported devices by then mlx5 driver. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net/mlx5_core: Identify resources by their type	Eli Cohen	2014-10-04	4	-25/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch puts a common part as the first field of mlx5_core_qp. This field is used to identify which resource generated an event. This is required since upcoming new resource types such as DC targets are allocated for the same numerical space as regular QPs and may generate the same events. By searching the resource in the same table we can then look at the common field to identify the resource. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net/mlx5_core: use set/get macros in device caps	Eli Cohen	2014-10-04	5	-168/+298
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transform device capabilities related commands to use set/get macros to manipulate command mailboxes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net/mlx5_core: Use hardware registers description header file	Eli Cohen	2014-10-04	5	-114/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an auto generated header file that describes hardware registers along with set of macros that set/get values. The macros do static checks to avoid overflow, handle endianess, and overall provide a clean way to code commands. Currently the header file is small and we will add structs as we make use of the macros. A few commands were removed from the commands enum since they are not supported currently and will be added when support is available. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>