summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* batman-adv: iv_ogm_iface_enable, direct return valuesMarkus Pargmann2015-05-291-6/+2
| | | | | | | Directly return error values. No need to use a return variable. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: Makefile, Sort alphabeticallyMarkus Pargmann2015-05-291-1/+1
| | | | | | | | The whole Makefile is sorted, just the multicast rule is not at the right position. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: tvlv realloc, move error handling into if blockMarkus Pargmann2015-05-291-8/+8
| | | | | | | | Instead of hiding the normal function flow inside an if block, we should just put the error handling into the if block. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: debugfs, avoid compiling for !DEBUG_FSMarkus Pargmann2015-05-293-9/+35
| | | | | | | | | | | | Normally the debugfs framework will return error pointer with -ENODEV for function calls when DEBUG_FS is not set. batman does not notice this error code and continues trying to create debugfs files and executes more code. We can avoid this code execution by disabling compiling debugfs.c when DEBUG_FS is not set. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: Use only queued fragments when mergingSven Eckelmann2015-05-291-8/+5
| | | | | | | | | | | | | | | The fragment queueing code now validates the total_size of each fragment, checks when enough fragments are queued to allow to merge them into a single packet and if the fragments have the correct size. Therefore, it is not required to have any other parameter for the merging function than a list of queued fragments. This change should avoid problems like in the past when the different skb from the list and the function parameter were mixed incorrectly. Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: Check total_size when queueing fragmentsSven Eckelmann2015-05-292-2/+7
| | | | | | | | | | | | | | | | | | The fragmentation code was replaced in 610bfc6bc99bc83680d190ebc69359a05fc7f605 ("batman-adv: Receive fragmented packets and merge") by an implementation which handles the queueing+merging of fragments based on their size and the total_size of the non-fragmented packet. This total_size is announced by each fragment. The new implementation doesn't check if the the total_size information of the packets inside one chain is consistent. This is consistency check is recommended to allow using any of the packets in the queue to decide whether all fragments of a packet are received or not. Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: update copyright years for 2015Sven Eckelmann2015-05-2943-43/+43
| | | | | Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: Start new development cycleSimon Wunderlich2015-05-291-1/+1
| | | | Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
* Merge branch 'master' of ↵David S. Miller2015-05-293-30/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-05-28 1) Remove xfrm_queue_purge as this is the same as skb_queue_purge. 2) Optimize policy and state walk. 3) Use a sane return code if afinfo registration fails. 4) Only check fori a acquire state if the state is not valid. 5) Remove a unnecessary NULL check before xfrm_pol_hold as it checks the input for NULL. 6) Return directly if the xfrm hold queue is empty, avoid to take a lock as it is nothing to do in this case. 7) Optimize the inexact policy search and allow for matching of policies with priority ~0U. All from Li RongQing. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm: optimise to search the inexact policy listLi RongQing2015-05-181-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | The policies are organized into list by priority ascent of policy, so it is unnecessary to continue to loop the policy if the priority of current looped police is larger than or equal priority which is from the policy_bydst list. This allows to match policy with ~0U priority in inexact list too. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: move the checking for old xfrm_policy hold_queue to beginningLi RongQing2015-05-051-3/+3
| | | | | | | | | | | | | | | | if hold_queue of old xfrm_policy is NULL, return directly, then not need to run other codes, especially take the spin lock Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: remove the unnecessary checking before call xfrm_pol_holdLi RongQing2015-05-051-4/+3
| | | | | | | | | | | | | | xfrm_pol_hold will check its input with NULL Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: slightly optimise xfrm_inputLi RongQing2015-04-241-5/+5
| | | | | | | | | | | | | | | | Check x->km.state with XFRM_STATE_ACQ only when state is not XFRM_STAT_VALID, not everytime Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: fix the return code when xfrm_*_register_afinfo failedLi RongQing2015-04-233-3/+3
| | | | | | | | | | | | | | | | If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo->family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: optimise the use of walk list header in xfrm_policy/state_walkLi RongQing2015-04-232-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: remove the xfrm_queue_purge definitionLi RongQing2015-04-231-10/+2
| | | | | | | | | | | | | | The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | net: qlcnic: clean up sysfs error codesVladimir Zapolskiy2015-05-293-46/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace confusing QL_STATUS_INVALID_PARAM == -1 == -EPERM with -EINVAL and QLC_STATUS_UNSUPPORTED_CMD == -2 == -ENOENT with -EOPNOTSUPP, the latter error code is arguable, but it is already used in the driver, so let it be here as well. Also remove always false (!buf) check on read(), the driver should not care if userspace gets its EFAULT or not. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Acked-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | treewide: Add missing vmalloc.h inclusion.David S. Miller2015-05-287-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of these files were only building on non-x86 because of the indirect of inclusion of vmalloc.h by, of all things, "net/inet_hashtables.h" None of this got caught during build testing, because on x86 there is an implicit vmalloc.h include via on of the arch asm/ headers. This fixes all of these Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp/dccp: warn user for preferred ip_local_port_rangeEric Dumazet2015-05-272-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 07f4c90062f8f ("tcp/dccp: try to not exhaust ip_local_port_range in connect()") it is advised to have an even number of ports described in /proc/sys/net/ipv4/ip_local_port_range This means start/end values should have a different parity. Let's warn sysadmins of this, so that they can update their settings if they want to. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: connect() from bound sockets can be fasterEric Dumazet2015-05-272-4/+13
| | | | | | | | | | | | | | | | | | | | | | __inet_hash_connect() does not use its third argument (port_offset) if socket was already bound to a source port. No need to perform useless but expensive md5 computations. Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'cxgb4-next'David S. Miller2015-05-279-90/+170
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hariprasad Shenai says: ==================== cxgb4/cxgb4vf: Adds FL starvation support and cleanup This patch series adds the following. Adds debugfs entry to inject freelist starvation and some function and argument cleanup This patch series has been created against net-next tree and includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V2: Skipping patch "cxgb4: Add support for loopback between VI of same port". This needs some major code change, since module param is not recommended. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4/cxgb4vf: function and argument name cleanupHariprasad Shenai2015-05-279-89/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes variable name 'fn' to 'pf' of structure adapter. A 'fn' usually stands for PCI function which could be a PF or a VF. However, the use of this particular variable is explicitly limited to PF only. So, be specific about it in the variable name. Also corrects arguments passed for fn t4_ofld_eq_free, t4_ctrl_eq_free, t4_eth_eq_free, t4_iq_free, t4_alloc_vi, t4_fw_hello, t4_wr_mbox and t4_cfg_pfvf function. Also renames cxgb4_t4_bar2_sge_qregs to t4_bar2_sge_qregs and renames the latter function name in cxgb4vf driver to t4vf_bar2_sge_qregs to avoid conflicts. Also fixes alignment for these function. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4: Add debugfs facility to inject FL starvationHariprasad Shenai2015-05-274-1/+81
|/ / | | | | | | | | | | | | | | Add debugfs entry to inject Freelist starvation, used only for debugging purpose. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qla4xxx: add a missing includeEric Dumazet2015-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | vmalloc.h used to be included from include/net/inet_hashtables.h but it is no longer the case. Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumzet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'thunderx'David S. Miller2015-05-2717-0/+7382
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aleksey Makarov says: ==================== Adding support for Cavium ThunderX network controller This patchset adds support for the Cavium ThunderX network controller. changes in v6: * unused preprocessor symbols were removed * reduce no of atomic operations in SQ maintenance * support for TCP segmentation at driver level * reset RBDR if fifo state is FAIL * fixed an issue with link state mailbox message changes in v5: * __packed were removed. now we rely on C language ABI * nic_dbg() -> netdev_dbg() * fixes for a typo, constant spelling and using BIT_ULL * use print_hex_dump() * unnecessary conditions in a long if() chain were removed changes in v4: * the patch "pci: Add Cavium PCI vendor id" was attributed correctly * a note that Cavium id is used in many drivers was added * the license comments now match MODULE_LICENSE * a comment explaining usage of writeq_relaxed()/readq_relaxed() was added changes in v3: * code cleanup * issues discovered by reviewers were addressed changes in v2: * non-generic module parameters removed * ethtool support added (nicvf_set_rxnfc()) v5: https://lkml.kernel.org/g/<1432344498-17131-1-git-send-email-aleksey.makarov@caviumnetworks.com> v4: https://lkml.kernel.org/g/<1432000757-28700-1-git-send-email-aleksey.makarov@auriga.com> v3: https://lkml.kernel.org/g/<1431747401-20847-1-git-send-email-aleksey.makarov@auriga.com> v2: https://lkml.kernel.org/g/<1415596445-10061-1-git-send-email-rric@kernel.org> v1: https://lkml.kernel.org/g/<20141030165434.GW20170@rric.localhost> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Adding support for Cavium ThunderX network controllerSunil Goutham2015-05-2716-0/+7380
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the Cavium ThunderX network controller. The driver is on the pci bus and thus requires the Thunder PCIe host controller driver to be enabled. Signed-off-by: Maciej Czekaj <mjc@semihalf.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Kamil Rytarowski <kamil@semihalf.com> Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sruthi Vangala <svangala@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pci: Add Cavium PCI vendor idSunil Goutham2015-05-271-0/+2
|/ / | | | | | | | | | | | | | | | | | | | | This vendor id will be used by network (vNIC), USB (xHCI), SATA (AHCI), GPIO, I2C, MMC and maybe other drivers for ThunderX SoC. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | test_bpf: add similarly conflicting jump test case only for classicDaniel Borkmann2015-05-271-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While 3b52960266a3 ("test_bpf: add more eBPF jump torture cases") added the int3 bug test case only for eBPF, which needs exactly 11 passes to converge, here's a version for classic BPF with 11 passes, and one that would need 70 passes on x86_64 to actually converge for being successfully JITed. Effectively, all jumps are being optimized out resulting in a JIT image of just 89 bytes (from originally max BPF insns), only returning K. Might be useful as a receipe for folks wanting to craft a test case when backporting the fix in commit 3f7352bf21f8 ("x86: bpf_jit: fix compilation of large bpf programs") while not having eBPF. The 2nd one is delegated to the interpreter as the last pass still results in shrinking, in other words, this one won't be JITed on x86_64. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'sfc-next'David S. Miller2015-05-275-11/+156
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Edward Cree says: ==================== sfc: add MCDI tracing This patchset adds support for logging MCDI (Management-Controller-to- Driver Interface) interactions between the sfc driver and a bound device, to aid in debugging. Solarflare has a tool to decode the resulting traces and will look to open-source this if there is any external interest, but the protocol is already detailed in drivers/net/ethernet/sfc/mcdi_pcol.h. The logging buffer we allocate per MCDI context is a work area for constructing each individual message before logging it with netif_info. The reason the buffer is long-lived is simply to avoid the overhead of allocating and freeing it every MCDI call, since MCDIs are already known to be serialised for other reasons. -- v4: remove patch #4, which has already been applied via sshah v3: add some explanations to cover letter and patch #4 v2: avoid long lines in cover letter; fix multiline comment style ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sfc: add module parameter to enable MCDI logging on new functionsEdward Cree2015-05-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As many issues are encountered at probe time, where MCDI logging can't be enabled through the sysfs node, this change adds a module parameter 'mcdi_logging_default', which defaults to false. When set to true, newly- probed functions will have MCDI logging enabled. The setting can subsequently be changed as normal through the sysfs node. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sfc: add sysfs entry to control MCDI tracingEdward Cree2015-05-274-10/+48
| | | | | | | | | | | | | | | | | | | | | | | | MCDI tracing is enabled per-function with a sysfs file /sys/class/net/<NET_DEV>/device/mcdi_logging Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sfc: add tracing of MCDI commandsEdward Cree2015-05-274-4/+102
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | MCDI tracing is conditional on CONFIG_SFC_MCDI_LOGGING, which is enabled by default. Each MCDI command will produce a console line like sfc dom:bus:dev:fn ifname: MCDI RPC REQ: xxxxxxxx [yyyyyyyy...] where xxxxxxxx etc. are the raw MCDI payload in 32-bit hex chunks. The response will then produce a similar line with "RESP" instead of "REQ", and containing the MCDI response payload (if any). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: release lock after each bucket in vxlan_cleanupSorin Dumitru2015-05-271-2/+3
| | | | | | | | | | | | | | | | | | We're seeing some softlockups from this function when there are a lot fdb entries on a vxlan device. Taking the lock for each bucket instead of the whole table is enough to fix that. Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp/dccp: try to not exhaust ip_local_port_range in connect()Eric Dumazet2015-05-273-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A long standing problem on busy servers is the tiny available TCP port range (/proc/sys/net/ipv4/ip_local_port_range) and the default sequential allocation of source ports in connect() system call. If a host is having a lot of active TCP sessions, chances are very high that all ports are in use by at least one flow, and subsequent bind(0) attempts fail, or have to scan a big portion of space to find a slot. In this patch, I changed the starting point in __inet_hash_connect() so that we try to favor even [1] ports, leaving odd ports for bind() users. We still perform a sequential search, so there is no guarantee, but if connect() targets are very different, end result is we leave more ports available to bind(), and we spread them all over the range, lowering time for both connect() and bind() to find a slot. This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range is even, ie if start/end values have different parity. Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to 32768 - 60999 (instead of 32768 - 61000) There is no change on security aspects here, only some poor hashing schemes could be eventually impacted by this change. [1] : The odd/even property depends on ip_local_port_range values parity Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'ip_frag_next'David S. Miller2015-05-274-14/+49
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Florian Westphal says: ==================== net: force refragmentation for DF reassembed skbs output path tests: if (skb->len > mtu) ip_fragment() This breaks connectivity in one corner case: If the skb was reassembled, but has the DF bit set and .. .. its reassembled size is <= outdev mtu .. .. we will forward a DF packet larger than what the sender transmitted on wire. If a router later in the path can't forward this packet, it will send an icmp error in response to an mtu that the original sender never exceeded. This changes ipv4 defrag/output path to a) force refragmentation for DF reassembled skbs and b) set DF bit on all fragments when refragmenting if it was set on original frags. tested via: from scapy.all import * dip="10.23.42.2" payload="A"*1400 packet=IP(dst=dip,id=12345,flags='DF')/UDP(sport=42,dport=42)/payload frags=fragment(packet,fragsize=1200) for fragment in frags: send(fragment) Without this patch, we generate fragments without df bit set based on the outgoing device mtu when fragmenting after forwarding, ie. IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204) 192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400 IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244) 192.168.7.1 > 10.23.42.2: ip-proto-17 on ingress will either turn into IP (ttl 63, id 12345, offset 0, flags [+], proto UDP (17), length 1396) 192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400 IP (ttl 63, id 12345, offset 1376, flags [none], proto UDP (17), length 52) (mtu 1400: We strip df and send larger fragment), or IP (ttl 63, id 12345, offset 0, flags [DF], proto UDP (17), length 1428) 192.168.7.1.42 > 10.23.42.2.42: [udp sum ok] UDP, length 1400 if mtu is 1500. And in this case things break; router with a smaller mtu will send icmp error, but original sender only sent packets <= 1204 byte. With patch, we keep intent of such fragments and will emit DF-fragments that won't exceed 1204 byte in size. Joint work with Hannes Frederic Sowa. Changes since v2: - split unrelated patches from series - rework changelog of patch #2 to better illustrate breakage ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip_fragment: don't forward defragmented DF packetFlorian Westphal2015-05-274-8/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently always send fragments without DF bit set. Thus, given following setup: mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280 A R1 R2 B Where R1 and R2 run linux with netfilter defragmentation/conntrack enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1 will respond with icmp too big error if one of these fragments exceeded 1400 bytes. However, if R1 receives fragment sizes 1200 and 100, it would forward the reassembled packet without refragmenting, i.e. R2 will send an icmp error in response to a packet that was never sent, citing mtu that the original sender never exceeded. The other minor issue is that a refragmentation on R1 will conceal the MTU of R2-B since refragmentation does not set DF bit on the fragments. This modifies ip_fragment so that we track largest fragment size seen both for DF and non-DF packets, and set frag_max_size to the largest value. If the DF fragment size is larger or equal to the non-df one, we will consider the packet a path mtu probe: We set DF bit on the reassembled skb and also tag it with a new IPCB flag to force refragmentation even if skb fits outdev mtu. We will also set DF bit on each fragment in this case. Joint work with Hannes Frederic Sowa. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4: avoid repeated calls to ip_skb_dst_mtu helperFlorian Westphal2015-05-271-7/+12
|/ / | | | | | | | | | | | | | | | | | | | | ip_skb_dst_mtu is small inline helper, but its called in several places. before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'phy_rgmii'David S. Miller2015-05-274-14/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Florian Fainelli says: ==================== net: phy: phy_interface_is_rgmii helper As you suggested, here is the helper function to avoid missing some RGMII interface checks. Had to wait for net to be merged in net-next to avoid submitting the same patch/commit. Dan, you might want to rebase your dp83867 submission to use that helper when you this patchset gets merged into net-next, thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: Utilize phy_interface_is_rgmiiFlorian Fainelli2015-05-273-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | Update all open-coded tests for all 4 PHY_INTERFACE_MODE_RGMII* values to use the newly introduced helper: phy_interface_is_rgmii. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: Add phy_interface_is_rgmii helperFlorian Fainelli2015-05-271-0/+11
|/ / | | | | | | | | | | | | | | | | | | | | | | RGMII interfaces come in 4 different flavors that the PHY library needs to care about: regular RGMII (no delays), RGMII with either RX or TX delay, and both. In order to avoid errors of checking only for one type of RGMII interface and miss the 3 others, introduce a convenience function which tests for all values. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include.David S. Miller2015-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to get this indirectly I supposed, but no longer do. Either way, an explicit include should have been done in the first place. net/ipv4/fib_trie.c: In function '__node_free_rcu': >> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(n); ^ net/ipv4/fib_trie.c: In function 'tnode_alloc': >> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] return vzalloc(size); ^ >> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: tcp_tso_autosize() minimum is one packetEric Dumazet2015-05-273-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By making sure sk->sk_gso_max_segs minimal value is one, and sysctl_tcp_min_tso_segs minimal value is one as well, tcp_tso_autosize() will return a non zero value. We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b ("tcp: Do not apply TSO segment limit to non-TSO packets") and save few cpu cycles in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fix/cleanup inet_ehash_locks_alloc()Eric Dumazet2015-05-272-44/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If tcp ehash table is constrained to a very small number of buckets (eg boot parameter thash_entries=128), then we can crash if spinlock array has more entries. While we are at it, un-inline inet_ehash_locks_alloc() and make following changes : - Budget 2 cache lines per cpu worth of 'spinlocks' - Try to kmalloc() the array to avoid extra TLB pressure. (Most servers at Google allocate 8192 bytes for this hash table) - Get rid of various #ifdef Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: fix bug in link protocol message create functionJon Paul Maloy2015-05-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit dd3f9e70f59f43a5712eba9cf3ee4f1e6999540c ("tipc: add packet sequence number at instant of transmission") we made a change with the consequence that packets in the link backlog queue don't contain valid sequence numbers. However, when we create a link protocol message, we still use the sequence number of the first packet in the backlog, if there is any, as "next_sent" indicator in the message. This may entail unnecessary retransissions or stale packet transmission when there is very low traffic on the link. This commit fixes this issue by only using the current value of tipc_link::snd_nxt as indicator. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fix inet_proto_csum_replace4() sparse errorsEric Dumazet2015-05-261-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make C=2 CF=-D__CHECK_ENDIAN__ net/core/utils.o ... net/core/utils.c:307:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:307:72: expected restricted __wsum [usertype] addend net/core/utils.c:307:72: got restricted __be32 [usertype] from net/core/utils.c:308:34: warning: incorrect type in argument 2 (different base types) net/core/utils.c:308:34: expected restricted __wsum [usertype] addend net/core/utils.c:308:34: got restricted __be32 [usertype] to net/core/utils.c:310:70: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:70: expected restricted __wsum [usertype] addend net/core/utils.c:310:70: got restricted __be32 [usertype] from net/core/utils.c:310:77: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:77: expected restricted __wsum [usertype] addend net/core/utils.c:310:77: got restricted __be32 [usertype] to net/core/utils.c:312:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:312:72: expected restricted __wsum [usertype] addend net/core/utils.c:312:72: got restricted __be32 [usertype] from net/core/utils.c:313:35: warning: incorrect type in argument 2 (different base types) net/core/utils.c:313:35: expected restricted __wsum [usertype] addend net/core/utils.c:313:35: got restricted __be32 [usertype] to Note we can use csum_replace4() helper Fixes: 58e3cac5613aa ("net: optimise inet_proto_csum_replace4()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: remove a sparse error in secure_dccpv6_sequence_number()Eric Dumazet2015-05-261-1/+1
| | | | | | | | | | | | | | | | | | make C=2 CF=-D__CHECK_ENDIAN__ net/core/secure_seq.o net/core/secure_seq.c:157:50: warning: restricted __be32 degrades to integer Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: skip fdb add if the port shouldn't learnWilson Kok2015-05-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Check in fdb_add_entry() if the source port should learn, similar check is used in br_fdb_update. Note that new fdb entries which are added manually or as local ones are still permitted. This patch has been tested by running traffic via a bridge port and switching the port's state, also by manually adding/removing entries from the bridge's fdb. Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pktgen: remove one sparse errorEric Dumazet2015-05-261-5/+5
| | | | | | | | | | | | | | | | | | | | | | net/core/pktgen.c:2672:43: warning: incorrect type in assignment (different base types) net/core/pktgen.c:2672:43: expected unsigned short [unsigned] [short] [usertype] <noident> net/core/pktgen.c:2672:43: got restricted __be16 [usertype] protocol Let's use proper struct ethhdr instead of hard coding everything. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: ipv6_select_ident() returns a __be32Eric Dumazet2015-05-262-6/+6
| | | | | | | | | | | | | | | | | | | | ipv6_select_ident() returns a 32bit value in network order. Fixes: 286c2349f666 ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'cpsw-cleanups'David S. Miller2015-05-263-54/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Richard Cochran says: ==================== cpsw cleanups While working on an out-of-tree customization, I noticed a few minor problems in the cpsw code. This series cleans up the issues I found. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>