summaryrefslogtreecommitdiffstats
path: root/Documentation/networking (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Documentation: filter: Add MIPS to architectures with BPF JITMarkos Chandras2014-09-111-3/+3
| | | | | | | | | | | | | | | MIPS supports BPF JIT since v3.16-rc1 Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* i40e: adds FCoE to build and updates its documentationVasu Dev2014-08-031-2/+5
| | | | | | | | | | | Adds newly added FCoE files to the build but only if FCoE module is configured. Also, updates i40e document for added FCoE support. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: filter: split 'struct sk_filter' into socket and bpf partsAlexei Starovoitov2014-08-031-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: filter: rename sk_chk_filter() -> bpf_check_classic()Alexei Starovoitov2014-08-031-1/+1
| | | | | | | trivial rename to indicate that this functions performs classic BPF checking Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Documentation: networking: phy.txt: Update text for indirect MMD accessVince Bridgers2014-07-311-1/+17
| | | | | | | | | | | Update the PHY library documentation to describe how a specific PHY driver can use the PAL MMD register access routines or override those routines with it's own in the event the PHY does not support the IEEE standard for reading and writing MMD phy registers. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove deprecated syststamp timestampWillem de Bruijn2014-07-292-16/+3
| | | | | | | | | | | | | | | | | | | | The SO_TIMESTAMPING API defines three types of timestamps: software, hardware in raw format (hwtstamp) and hardware converted to system format (syststamp). The last has been deprecated in favor of combining hwtstamp with a PTP clock driver. There are no active users in the kernel. The option was device driver dependent. If set, but without hardware support, the correct behavior is to return zero in the relevant field in the SCM_TIMESTAMPING ancillary message. Without device drivers implementing the option, this field is effectively always zero. Remove the internal plumbing to dissuage new drivers from implementing the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to avoid breaking existing applications that request the timestamp. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet: remove deprecated syststamp timestampWillem de Bruijn2014-07-291-12/+6
| | | | | | | | | | | | No device driver will ever return an skb_shared_info structure with syststamp non-zero, so remove the branch that tests for this and optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE. Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes may refer to it. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: frag: set limits and make init_net's high_thresh limit globalNikolay Aleksandrov2014-07-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0) all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0) The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: frag: remove periodic secret rebuild timerFlorian Westphal2014-07-281-10/+0
| | | | | | | | | | | | | | | | | | | | | | merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: frag: move eviction of queues to work queueFlorian Westphal2014-07-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bonding: update bonding.txt for Layer2 hash factorsJianhua Xie2014-07-181-15/+16
| | | | | | | | | | | | | Document the Layer 2 hash factors with packet type ID field. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> CC: Pan Jiafei <Jiafei.Pan@freescale.com> Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-timestamp: document deprecated syststampWillem de Bruijn2014-07-161-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | The SO_TIMESTAMPING API defines option SOF_TIMESTAMPING_SYS_HW. This feature is deprecated. It should not be implemented by new device drivers. Existing drivers do not implement it, either -- with one exception. Driver developers are encouraged to expose the NIC hw clock as a PTP HW clock source, instead, and synchronize system time to the HW source. The control flag cannot be removed due to being part of the ABI, nor can the structure scm_timestamping that is returned. Due to the one legacy driver, the internal datapath and structure are not removed. This patch only clearly marks the interface as deprecated. Device drivers should always return a syststamp value of zero. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- We can consider adding a WARN_ON_ONCE in__sock_recv_timestamp if non-zero syststamp is encountered Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Implement automatic flow label generation on transmitTom Herbert2014-07-081-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Automatically generate flow labels for IPv6 packets on transmit. The flow label is computed based on skb_get_hash. The flow label will only automatically be set when it is zero otherwise (i.e. flow label manager hasn't set one). This supports the transmit side functionality of RFC 6438. Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this functionality per socket. By default, auto flowlabels are disabled to avoid possible conflicts with flow label manager, however if this feature proves useful we may want to enable it by default. It should also be noted that FreeBSD has already implemented automatic flow labels (including the sysctl and socket option). In FreeBSD, automatic flow labels default to enabled. Performance impact: Running super_netperf with 200 flows for TCP_RR and UDP_RR for IPv6. Note that in UDP case, __skb_get_hash will be called for every packet with explains slight regression. In the TCP case the hash is saved in the socket so there is no regression. Automatic flow labels disabled: TCP_RR: 86.53% CPU utilization 127/195/322 90/95/99% latencies 1.40498e+06 tps UDP_RR: 90.70% CPU utilization 118/168/243 90/95/99% latencies 1.50309e+06 tps Automatic flow labels enabled: TCP_RR: 85.90% CPU utilization 128/199/337 90/95/99% latencies 1.40051e+06 UDP_RR 92.61% CPU utilization 115/164/236 90/95/99% latencies 1.4687e+06 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pktgen: document tuning for max NIC performanceJesper Dangaard Brouer2014-07-021-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using pktgen I'm seeing the ixgbe driver "push-back", due TX ring running full. Thus, the TX ring is artificially limiting pktgen. (Diagnose via "ethtool -S", look for "tx_restart_queue" or "tx_busy" counters.) Using ixgbe, the real reason behind the TX ring running full, is due to TX ring not being cleaned up fast enough. The ixgbe driver combines TX+RX ring cleanups, and the cleanup interval is affected by the ethtool --coalesce setting of parameter "rx-usecs". Do not increase the default NIC TX ring buffer or default cleanup interval. Instead simply document that pktgen needs special NIC tuning for maximum packet per sec performance. Performance results with pktgen with clone_skb=100000. TX ring size 512 (default), adjusting "rx-usecs": (Single CPU performance, E5-2630, ixgbe) - 3935002 pps - rx-usecs: 1 (irqs: 9346) - 5132350 pps - rx-usecs: 10 (irqs: 99157) - 5375111 pps - rx-usecs: 20 (irqs: 50154) - 5454050 pps - rx-usecs: 30 (irqs: 33872) - 5496320 pps - rx-usecs: 40 (irqs: 26197) - 5502510 pps - rx-usecs: 50 (irqs: 21527) TX ring size adjusting (ethtool -G), "rx-usecs==1" (default): - 3935002 pps - tx-size: 512 - 5354401 pps - tx-size: 768 - 5356847 pps - tx-size: 1024 - 5327595 pps - tx-size: 1536 - 5356779 pps - tx-size: 2048 - 5353438 pps - tx-size: 4096 Notice after commit 6f25cd47d (pktgen: fix xmit test for BQL enabled devices) pktgen uses netif_xmit_frozen_or_drv_stopped() and ignores the BQL "stack" pause (QUEUE_STATE_STACK_XOFF) flag. This allow us to put more pressure on the TX ring buffers. It is the ixgbe_maybe_stop_tx() call that stops the transmits, and pktgen respecting this in the call to netif_xmit_frozen_or_drv_stopped(txq). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Allow accepting RA from local IP addresses.Ben Greear2014-07-011-0/+12
| | | | | | | | | | | | | | | | | | | | This can be used in virtual networking applications, and may have other uses as well. The option is disabled by default. A specific use case is setting up virtual routers, bridges, and hosts on a single OS without the use of network namespaces or virtual machines. With proper use of ip rules, routing tables, veth interface pairs and/or other virtual interfaces, and applications that can bind to interfaces and/or IP addresses, it is possibly to create one or more virtual routers with multiple hosts attached. The host interfaces can act as IPv6 systems, with radvd running on the ports in the virtual routers. With the option provided in this patch enabled, those hosts can now properly obtain IPv6 addresses from the radvd. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-06-124-50/+791
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
| * net: filter: document internal instruction encodingAlexei Starovoitov2014-06-121-0/+161
| | | | | | | | | | | | | | | | | | This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: filter: mention eBPF terminology as wellAlexei Starovoitov2014-06-121-42/+43
| | | | | | | | | | | | | | | | | | | | Since the term eBPF is used anyway on mailing list discussions, lets also document that in the main BPF documentation file and replace a couple of occurrences with eBPF terminology to be more clear. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: filter: cleanup A/X name usageAlexei Starovoitov2014-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macro 'A' used in internal BPF interpreter: #define A regs[insn->a_reg] was easily confused with the name of classic BPF register 'A', since 'A' would mean two different things depending on context. This patch is trying to clean up the naming and clarify its usage in the following way: - A and X are names of two classic BPF registers - BPF_REG_A denotes internal BPF register R0 used to map classic register A in internal BPF programs generated from classic - BPF_REG_X denotes internal BPF register R7 used to map classic register X in internal BPF programs generated from classic - internal BPF instruction format: struct sock_filter_int { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; - BPF_X/BPF_K is 1 bit used to encode source operand of instruction In classic: BPF_X - means use register X as source operand BPF_K - means use 32-bit immediate as source operand In internal: BPF_X - means use 'src_reg' register as source operand BPF_K - means use 32-bit immediate as source operand Suggested-by: Chema Gonzalez <chema@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-05-242-2/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: filter: doc: add section for BPF test suiteDaniel Borkmann2014-05-231-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | Mention the recently added test suite in the documentation file. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | can: add documentation for CAN filter usage optimisationOliver Hartkopp2014-05-191-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | To benefit from special filters for single SFF or single EFF CAN identifier subscriptions the CAN_EFF_FLAG bit and the CAN_RTR_FLAG bit has to be set together with the CAN_(SFF|EFF)_MASK in can_filter.mask. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * | net: cdc_mbim: add driver documentationBjørn Mork2014-05-131-0/+339
| | | | | | | | | | | | | | | | | | | | | | | | | | | An initial attempt on describing some of the odd APIs provided by this driver. Cc: Greg Suarez <gsuarez@smithmicro.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: filter: doc: expand and improve BPF documentationAlexei Starovoitov2014-05-051-4/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | In particular, this patch tries to clarify internal BPF calling convention and adds internal BPF examples, JIT guide, use cases. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-04-241-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/intel/igb/e1000_mac.c net/core/filter.c Both conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bonding: Add tlb_dynamic_lb parameter for tlb modeMahesh Bandewar2014-04-241-7/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aggresive load balancing causes packet re-ordering as active flows are moved from a slave to another within the group. Sometime this aggresive lb is not necessary if the preference is for less re-ordering. This parameter if used with value "0" disables this dynamic flow shuffling minimizing packet re-ordering. Of course the side effect is that it has to live with the static load balancing that the hashing distribution provides. This impact is less severe if the correct xmit-hashing-policy is used for the tlb setup. The default value of the parameter is set to "1" mimicing the earlier behavior. Ran the netperf test with 200 stream for 1 min between two hosts with 4x1G trunk (xmit-lb mode with xmit-policy L3+4) before and after these changes. Following was the command used for those 200 instances - netperf -t TCP_RR -l 60 -s 5 -H <host> -- -r81920,81920 Transactions per second: Before change: 1,367.11 After change: 1,470.65 Change-Id: Ie3f75c77282cf602e83a6e833c6eb164e72a0990 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bonding: Added bond_tlb_xmit() for tlb mode.Mahesh Bandewar2014-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-organized the xmit function for the lb mode separating tlb xmit from the alb mode. This will enable use of the hashing policies like 802.3ad mode. Also extended use of xmit-hash-policy to tlb mode. Now the tlb-mode defaults to BOND_XMIT_POLICY_LAYER2 if the xmit policy module parameter is not set (just like 802.3ad, or Xor mode). Change-Id: I140257403d272df75f477b380207338d0f04963e Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | filter: added BPF random opcodeChema Gonzalez2014-04-231-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added a new ancillary load (bpf call in eBPF parlance) that produces a 32-bit random number. We are implementing it as an ancillary load (instead of an ISA opcode) because (a) it is simpler, (b) allows easy JITing, and (c) seems more in line with generic ISAs that do not have "get a random number" as a instruction, but as an OS call. The main use for this ancillary load is to perform random packet sampling. Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-06-042-2/+2
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next Pull trivial tree changes from Jiri Kosina: "Usual pile of patches from trivial tree that make the world go round" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) staging: go7007: remove reference to CONFIG_KMOD aic7xxx: Remove obsolete preprocessor define of: dma: doc fixes doc: fix incorrect formula to calculate CommitLimit value doc: Note need of bc in the kernel build from 3.10 onwards mm: Fix printk typo in dmapool.c modpost: Fix comment typo "Modules.symvers" Kconfig.debug: Grammar s/addition/additional/ wimax: Spelling s/than/that/, wording s/destinatary/recipient/ aic7xxx: Spelling s/termnation/termination/ arm64: mm: Remove superfluous "the" in comment of: Spelling s/anonymouns/anonymous/ dma: imx-sdma: Spelling s/determnine/determine/ ath10k: Improve grammar in comments ath6kl: Spelling s/determnine/determine/ of: Improve grammar for of_alias_get_id() documentation drm/exynos: Spelling s/contro/control/ radio-bcm2048.c: fix wrong overflow check doc: printk-formats: do not mention casts for u64/s64 doc: spelling error changes ...
| * | | doc: spelling error changesCarlos Garcia2014-05-052-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed multiple spelling errors. Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Carlos E. Garcia <carlos@cgarcia.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | | net: doc: Update references to skb->rxhashTobias Klauser2014-05-222-2/+2
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | In commit 61b905da33 ("net: Rename skb->rxhash to skb->hash"), skb->rxhash was renamed to skb->hash. Update references in Documentation accordingly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Update my email addressBen Hutchings2014-04-231-1/+1
| |/ |/| | | | | | | Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-04-0311-119/+569
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Here is my initial pull request for the networking subsystem during this merge window: 1) Support for ESN in AH (RFC 4302) from Fan Du. 2) Add full kernel doc for ethtool command structures, from Ben Hutchings. 3) Add BCM7xxx PHY driver, from Florian Fainelli. 4) Export computed TCP rate information in netlink socket dumps, from Eric Dumazet. 5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas Dichtel. 6) Convert many drivers to pci_enable_msix_range(), from Alexander Gordeev. 7) Record SKB timestamps more efficiently, from Eric Dumazet. 8) Switch to microsecond resolution for TCP round trip times, also from Eric Dumazet. 9) Clean up and fix 6lowpan fragmentation handling by making use of the existing inet_frag api for it's implementation. 10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss. 11) Auto size SKB lengths when composing netlink messages based upon past message sizes used, from Eric Dumazet. 12) qdisc dumps can take a long time, add a cond_resched(), From Eric Dumazet. 13) Sanitize netpoll core and drivers wrt. SKB handling semantics. Get rid of never-used-in-tree netpoll RX handling. From Eric W Biederman. 14) Support inter-address-family and namespace changing in VTI tunnel driver(s). From Steffen Klassert. 15) Add Altera TSE driver, from Vince Bridgers. 16) Optimizing csum_replace2() so that it doesn't adjust the checksum by checksumming the entire header, from Eric Dumazet. 17) Expand BPF internal implementation for faster interpreting, more direct translations into JIT'd code, and much cleaner uses of BPF filtering in non-socket ocntexts. From Daniel Borkmann and Alexei Starovoitov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits) netpoll: Use skb_irq_freeable to make zap_completion_queue safe. net: Add a test to see if a skb is freeable in irq context qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port' net: ptp: move PTP classifier in its own file net: sxgbe: make "core_ops" static net: sxgbe: fix logical vs bitwise operation net: sxgbe: sxgbe_mdio_register() frees the bus Call efx_set_channels() before efx->type->dimension_resources() xen-netback: disable rogue vif in kthread context net/mlx4: Set proper build dependancy with vxlan be2net: fix build dependency on VxLAN mac802154: make csma/cca parameters per-wpan mac802154: allow only one WPAN to be up at any given time net: filter: minor: fix kdoc in __sk_run_filter netlink: don't compare the nul-termination in nla_strcmp can: c_can: Avoid led toggling for every packet. can: c_can: Simplify TX interrupt cleanup can: c_can: Store dlc private can: c_can: Reduce register access can: c_can: Make the code readable ...
| * Merge tag 'linux-can-fixes-for-3.15-20140401' of ↵David S. Miller2014-04-011-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://gitorious.org/linux-can/linux-can linux-can-fixes-for-3.15-20140401 Marc Kleine-Budde says: ==================== this is a pull request of 16 patches for the 3.15 release cycle. Bjorn Van Tilt contributes a patch which fixes a memory leak in usb_8dev's usb_8dev_start_xmit()s error path. A patch by Robert Schwebel fixes a typo in the can documentation. The remaining patches all target the c_can driver. Two of them are by me; they add a missing netif_napi_del() and return value checking. Thomas Gleixner contributes 12 patches, which address several shortcomings in the driver like hardware initialisation, concurrency, message ordering and poor performance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * can: Documentation: fix parameter name "sample-point"Robert Schwebel2014-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the name of the parameter to configure the sample point used in iproute2's ip command. The correct writing is "sample-point" not "sample_point". Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * | doc: filter: extend BPF documentation to document new internalsAlexei Starovoitov2014-03-311-0/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | Further extend the current BPF documentation to document new BPF engine internals. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Documentation: networking: phy.txt: MDIO bus reset is optionalFlorian Fainelli2014-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Update the MDIO bus documentation to mention that the MDIO bus reset function is completely optional. It became optional with commit e13934563db0 ("[PATCH] PHY Layer fixup") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-03-261-2/+2
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | doc: update driver TX algorithm in timestamping.txtJakub Kicinski2014-03-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since cd4d8fdad1f1 ("net: kernel panic in dev_hard_start_xmit: remove faulty software TX time stamping") dev_hard_start_xmit() will not provide software timestamps. It's a responsibility of the drivers to call skb_tx_timestamp() at the right time. Cc: linux-doc@vger.kernel.org Signed-off-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | igb: remove references to long gone command line parametersFernando Luis Vazquez Cao2014-03-181-48/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Command line parameters QueuePairs, Node, EEE, DMAC and InterruptThrottleRate do not exist these days. Remove all references to them in the Documentation folder and update code comments. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Documentation: networking: Add Altera Ethernet (TSE) DocumentationVince Bridgers2014-03-181-0/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a bindings description for the Altera Triple Speed Ethernet (TSE) driver. The bindings support the legacy SGDMA soft IP as well as the preferred MSGDMA soft IP. The TSE can be configured and synthesized in soft logic using Altera's Quartus toolchain. Please consult the bindings document for supported options. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-03-152-21/+33
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/r8152.c drivers/net/xen-netback/netback.c Both the r8152 and netback conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ \ Merge tag 'rxrpc-devel-20140304' of ↵David S. Miller2014-03-071-0/+81
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== net-next: AF_RXRPC fixes and development Here are some AF_RXRPC fixes: (1) Fix to remove incorrect checksum calculation made during recvmsg(). It's unnecessary to try to do this there since we check the checksum before reading the RxRPC header from the packet. (2) Fix to prevent the sending of an ABORT packet in response to another ABORT packet and inducing a storm. (3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we don't handle the ICMP packet not specifying an MTU size. And development patches: (4) Add sysctls for configuring RxRPC parameters, specifically various delays pertaining to ACK generation, the time before we resend a packet for which we don't receive an ACK, the maximum time a call is permitted to live and the amount of time transport, connection and dead call information is cached. (5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED packets, ignoring the MORE_PACKETS flag, delaying the production of otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production (barring the call termination) to half a second. (6) Add more sysctl parameters to expose the Rx window size, the maximum packet size that we're willing to receive and the number of jumbo rxrpc packets we're willing to handle in a single UDP packet. (7) Request ACKs on alternate DATA packets so that the other side doesn't wait till we fill up the Tx window. (8) Use a RCU hash table to look up the rxrpc_call for an incoming packet rather than stepping through a hierarchy involving several spinlocks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | af_rxrpc: Expose more RxRPC parameters via sysctlsDavid Howells2014-02-261-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU maximum size and the number of packets that can be glued into a jumbo packet. More info added to Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
| | * | | af_rxrpc: Add sysctls for configuring RxRPC parametersDavid Howells2014-02-261-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysctls for configuring RxRPC protocol handling, specifically controls on delays before ack generation, the delay before resending a packet, the maximum lifetime of a call and the expiration times of calls, connections and transports that haven't been recently used. More info added in Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-03-061-6/+0
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | pktgen: document all supported flagsMathias Krause2014-02-251-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation misses a few of the supported flags. Fix this. Also respect the dependency to CONFIG_XFRM for the IPSEC flag. Cc: Fan Du <fan.du@windriver.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-02-191-45/+0
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/bonding/bond_3ad.h drivers/net/bonding/bond_main.c Two minor conflicts in bonding, both of which were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | bonding: document the new _arp options for arp_validateVeaceslav Falico2014-02-181-30/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | bonding: permit using arp_validate with non-ab modesVeaceslav Falico2014-02-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it's disabled because it's sometimes hard, in typical configs, to make it work - because of the nature how the loadbalance modes work - as it's hard to deliver valid arp replies to correct slaves by the switch. However we still can use arp_validation in loadbalance with several other configs, per example with arp_validate == 2 for backup with one broadcast domain, without the switch(es) doing any balancing - this way we'd be (a bit more) sure that the slave is up. So, enable it to let users decide which one works/suits them best. Also correct the mode limitation from BOND_OPT_ARP_VALIDATE. CC: Nikolay Aleksandrov <nikolay@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>