| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, the VFs do not read the default VLAN during initialization,
so they will not be able to indicate untagged frames properly.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Corrected a spelling mistake that was found over time.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Output of ethtool was reporting 2 rx_errors entries. This change
removes one of the redundant entries.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The function collecting Tx statistics was actually using values from the RX
ring. Thus, Tx and Rx statistics values reported by "ifconfig" will
return identical values. This change corrects this error and the Tx
statistics is now reading from the Tx ring.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
|
| |\ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
A final pull request, I know it's very late but this time I think it's worth a
bit of rush.
The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.
This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.
This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.
Patrick McHardy says:
====================
netfilter: nf_tables: concatenation support
The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).
The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.
Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.
Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.
As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.
The patches are split up as follows:
* the first five patches add length validation to register loads and
stores to make sure we stay within bounds and prepare the validation
functions for the new addressing mode
* the next patches prepare for changing to 32 bit addressing by
introducing a struct nft_regs, which holds the verdict register as
well as the data registers. The verdict members are moved to a new
struct nft_verdict to allow to pull struct nft_data out of the stack.
* the next patches contain preparatory conversions of expressions and
sets to use 32 bit addressing
* the next patch introduces so far unused register conversion helpers
for parsing and dumping register numbers over netlink
* following is the real conversion to 32 bit addressing, consisting of
replacing struct nft_data in struct nft_regs by an array of u32s and
actually translating and validating the new register numbers.
* the final two patches add support for variable sized data items and
variable sized keys / data in set elements
The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================
Patrick McHardy says:
====================
netfilter: nf_tables: dynamic stateful expression instantiation
The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.
Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.
In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.
We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.
The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:
1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:
flow ip saddr timeout 60s \
limit 10/second \
accept
2. Account network traffic between each set of /24 networks:
flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter
3. Account traffic to each host per user:
flow skuid . ip daddr \
counter
4. Account traffic for each combination of source address and TCP flags:
flow ip saddr . tcp flags \
counter
The resulting set content after a Xmas-scan look like this:
{
192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
192.168.122.1 . ack : counter packets 74 bytes 3848,
192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}
In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There's an example net/netfilter/nft_expr_template.c example file in tree that
got out of sync along time, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Support instantiating stateful expressions based on a template that
are associated with dynamically created set entries. The expressions
are evaluated when adding or updating the set element.
This allows to maintain per flow state using the existing set
infrastructure and expression types, with arbitrary definitions of
a flow.
Usage is currently restricted to anonymous sets, meaning only a single
binding can exist, since the desired semantics of multiple independant
bindings haven't been defined so far.
Examples (userspace syntax is still WIP):
1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:
flow ip saddr timeout 60s \
limit 10/second \
accept
2. Account network traffic between each set of /24 networks:
flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter
3. Account traffic to each host per user:
flow skuid . ip daddr \
counter
4. Account traffic for each combination of source address and TCP flags:
flow ip saddr . tcp flags \
counter
The resulting set content after a Xmas-scan look like this:
{
192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
192.168.122.1 . ack : counter packets 74 bytes 3848,
192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a set flag to indicate that the set is used as a state table and
contains expressions for evaluation. This operation is mutually
exclusive with the mapping operation, so sets specifying both are
rejected. The lookup expression also rejects binding to state tables
since it only deals with loopup and map operations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a flag to mark stateful expressions.
This is used for dynamic expression instanstiation to limit the usable
expressions. Strictly speaking only the dynset expression can not be
used in order to avoid recursion, but since dynamically instantiating
non-stateful expressions will simply create an identical copy, which
behaves no differently than the original, this limits to expressions
where it actually makes sense to dynamically instantiate them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add helper functions for initializing, cloning, dumping and destroying
a single expression that is not part of a rule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
linux/if.h creates conflicts in userspace with net/if.h
By using it here we force userspace to use linux/if.h while
net/if.h may be needed.
Note that:
include/linux/netfilter_ipv4/ip_tables.h and
include/linux/netfilter_ipv6/ip6_tables.h
don't include linux/if.h and they also refer to IFNAMSIZ, so they are
expecting userspace to include use net/if.h from the client program.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.
As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.
The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.
To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.
Userspace fully passes the testsuite using both old and new register
addressing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Simple conversion to use u32 pointers to the beginning of the data
area to keep follow up patches smaller.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Only needlessly complicates things due to requiring specific argument
types. Use memcmp directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Simple conversion to use u32 pointers to the beginning of the registers
to keep follow up patches smaller.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.
This is needed to seperate the verdict from the data registers completely,
so their size can be changed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In preparation of validating the length of a register store, use
nft_validate_register_store() in nft_lookup instead of open coding the
validation.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.
Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2015-04-09
1) We dereferenced the xfrm outer_mode too early, larval
SAs don't have it set. Move the dereference of the
outer mode below the larval SA check to fix it.
From Alexey Dobriyan.
2) Fix vti6 tunnel uninit on namespace crosssing.
From Yao Xiwei.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When the kernel deleted a vti6 interface, this interface was not removed from
the tunnels list. Thus, when the ip6_vti module was removed, this old interface
was found and the kernel tried to delete it again. This was leading to a kernel
panic.
Fixes: 61220ab34948 ("vti6: Enable namespace changing")
Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
https://bugzilla.kernel.org/show_bug.cgi?id=95211
Commit 70be6c91c86596ad2b60c73587880b47df170a41
("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check
which dereferences ->outer_mode too early but larval SAs don't have
this pointer set (yet). So check for tunnel stuff later.
Mike Noordermeer reported this bug and patiently applied all the debugging.
Technically this is remote-oops-in-interrupt-context type of thing.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
IP: [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0
...
[<ffffffff81500fc6>] ? xfrm4_esp_rcv+0x36/0x70
[<ffffffff814acc9a>] ? ip_local_deliver_finish+0x9a/0x200
[<ffffffff81471b83>] ? __netif_receive_skb_core+0x6f3/0x8f0
...
RIP [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0
Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
"kevent" is an extremely generic name that causes trouble
if debugging for work queues is used. So change it to
something clear.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In socfpga_dwmac_parse_data forward error code from devm_reset_control_get.
This gives the driver another chance to laod if altr,rst-mgr is loaded after
the network driver.
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The numPosted field in the ERX Doorbell register is 8-bits wide.
So the max buffers that we can post at a time is 255 and not 256
which we are doing currently.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
const __read_mostly is a senseless combination. If something
is already const it cannot be __read_mostly. Remove the bogus
__read_mostly in the fou driver.
This fixes section conflicts with LTO.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Sowmini Varadhan says:
====================
RDS: RDS-core fixes
This patch-series updates the RDS core and rds-tcp modules with
some bug fixes that were originally authored by Andy Grover,
Zach Brown, and Chris Mason.
v2: Code review comment by Sergei Shtylov
V3: DaveM comments:
- dropped patches 3, 5 for "heuristic" changes in rds_send_xmit().
Investigation into the root-cause of these IB-triggered changes
produced the feedback: "I don't remember seeing "RDS: Stuck RM"
message in last 1-1.5 years and checking with other folks. It may very
well be some old workaround for stale connection for which long term
fix is already made and this part of code not exercised anymore."
Any such fixes, *if* they are needed, can/should be done in the
IB specific RDS transport modules.
- similarly dropped the LL_SEND_FULL patch (patch 6 in v2 set)
v4: Documentation/networking/rds.txt contains incorrect references
to "missing sysctl values for pf_rds and sol_rds in mainline".
The sysctl values were never needed in mainline, thus fix the
documentation.
v5: Clarify comment per http://www.spinics.net/lists/netdev/msg324220.html
v6: Re-added entire version history to cover letter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If a determined set of concurrent senders keep the send queue full,
we can loop forever inside rds_send_xmit. This fix has two parts.
First we are dropping out of the while(1) loop after we've processed a
large batch of messages.
Second we add a generation number that gets bumped each time the
xmit bit lock is acquired. If someone else has jumped in and
made progress in the queue, we skip our goto restart.
Original patch by Chris Mason.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Passive connections were added for the case where one loopback IB
connection between identical addresses needs another connection to store
the second QP. Unfortunately, they were also created in the case where
the addesses differ and we already have both QPs.
This lead to a message reordering bug.
- two different IB interfaces and addresses on a machine: A B
- traffic is sent from A to B
- connection from A-B is created, connect request sent
- listening accepts connect request, B-A is created
- traffic flows, next_rx is incremented
- unacked messages exist on the retrans list
- connection A-B is shut down, new connect request sent
- listen sees existing loopback B-A, creates new passive B-A
- retrans messages are sent and delivered because of 0 next_rx
The problem is that the second connection request saw the previously
existing parent connection. Instead of using it, and using the existing
next_rx_seq state for the traffic between those IPs, it mistakenly
thought that it had to create a passive connection.
We fix this by only using passive connections in the special case where
laddr and faddr match. In this case we'll only ever have one parent
sending connection requests and one passive connection created as the
listening path sees the existing parent connection which initiated the
request.
Original patch by Zach Brown
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/ / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.
Fixes: 0c5f9b8830aa ("RDS: Documentation")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/ /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
instances
Fixes byte backlog accounting for the first of two chained netem instances.
Bytes backlog reported now corresponds to the number of queued packets.
When two netem instances are chained, for instance to apply rate and queue
limitation followed by packet delay, the number of backlogged bytes reported
by the first netem instance is wrong. It reports the sum of bytes in the queues
of the first and second netem. The first netem reports the correct number of
backlogged packets but not bytes. This is shown in the example below.
Consider a chain of two netem schedulers created using the following commands:
$ tc -s qdisc replace dev veth2 root handle 1:0 netem rate 10000kbit limit 100
$ tc -s qdisc add dev veth2 parent 1:0 handle 2: netem delay 50ms
Start an iperf session to send packets out on the specified interface and
monitor the backlog using tc:
$ tc -s qdisc show dev veth2
Output using unpatched netem:
qdisc netem 1: root refcnt 2 limit 100 rate 10000Kbit
Sent 98422639 bytes 65434 pkt (dropped 123, overlimits 0 requeues 0)
backlog 172694b 73p requeues 0
qdisc netem 2: parent 1: limit 1000 delay 50.0ms
Sent 98422639 bytes 65434 pkt (dropped 0, overlimits 0 requeues 0)
backlog 63588b 42p requeues 0
The interface used to produce this output has an MTU of 1500. The output for
backlogged bytes behind netem 1 is 172694b. This value is not correct. Consider
the total number of sent bytes and packets. By dividing the number of sent
bytes by the number of sent packets, we get an average packet size of ~=1504.
If we divide the number of backlogged bytes by packets, we get ~=2365. This is
due to the first netem incorrectly counting the 63588b which are in netem 2's
queue as being in its own queue. To verify this is the case, we subtract them
from the reported value and divide by the number of packets as follows:
172694 - 63588 = 109106 bytes actualled backlogged in netem 1
109106 / 73 packets ~= 1494 bytes (which matches our MTU)
The root cause is that the byte accounting is not done at the
same time with packet accounting. The solution is to update the backlog value
every time the packet queue is updated.
Signed-off-by: Joseph D Beshay <joseph.beshay@utdallas.edu>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Hariprasad Shenai says:
====================
cxgb4: Misc. fixes for sge
Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist
starving threshold based on adapter type, add comments for tx flits and sge
length code and don't call t4_slow_intr_handler when we are not master PF.
This patch series has been created against net-next tree and includes patches on
cxgb4 driver
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add comment for tx filt and sge length calucaltion code, also remove
a hardcoded value
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fl_starv_thres could be different from adapter to adapter, don't use
hardcoded values
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This allows a significant latency drop for packets of sizes between 128 and 192
bytes
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The ring size is always known at compile time, so make the code a bit
more efficient
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The driver needs to inform the hardware about the first invalid (not yet
filled) rx slot, by writing its DMA descriptor pointer offset to the
BGMAC_DMA_RX_INDEX register.
This register was set to a value exceeding the rx ring size, effectively
allowing the hardware constant access to the full ring, regardless of
which slots are initialized.
To fix this issue, always mark the last filled rx slot as invalid.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|