| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This commit simplifies the broadcast link transmission function, by
leveraging previous changes to the link transmission function and the
broadcast transmission link life cycle.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Realizing that unicast is just a special case of broadcast, we also see
that we can go in the other direction, i.e., that modest changes to the
current unicast link can make it generic enough to support broadcast.
The following changes are introduced here:
- A new counter ("ackers") in struct tipc_link, to indicate how many
peers need to ack a packet before it can be released.
- A corresponding counter in the skb user area, to keep track of how
many peers a are left to ack before a buffer can be released.
- A new counter ("acked"), to keep persistent track of how far a peer
has acked at the moment, i.e., where in the transmission queue to
start updating buffers when the next ack arrives. This is to avoid
double acknowledgements from a peer, with inadvertent relase of
packets as a result.
- A more generic tipc_link_retrans() function, where retransmit starts
from a given sequence number, instead of the first packet in the
transmision queue. This is to minimize the number of retransmitted
packets on the broadcast media.
When the new functionality is taken into use in the next commits,
we expect it to have minimal effect on unicast mode performance.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The broadcast link instance (struct tipc_link) used for sending is
currently aggregated into struct tipc_bclink. This means that we cannot
use the regular tipc_link_create() function for initiating the link, but
do instead have to initiate numerous fields directly from the
bcast_init() function.
We want to reduce dependencies between the broadcast functionality
and the inner workings of tipc_link. In this commit, we introduce
a new function tipc_bclink_create() to link.c, and allocate the
instance of the link separately using this function.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In reality, the link implementation is already independent from
struct tipc_bearer, in that it doesn't store any reference to it.
However, we still pass on a pointer to a bearer instance in the
function tipc_link_create(), just to have it extract some
initialization information from it.
I later commits, we need to create instances of tipc_link without
having any associated struct tipc_bearer. To facilitate this, we
want to extract the initialization data already in the creator
function in node.c, before calling tipc_link_create(), and pass
this info on as individual parameters in the call.
This commit introduces this change.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The broadcast transmission link is currently instantiated when the
network subsystem is started, i.e., on order from user space via netlink.
This forces the broadcast transmission code to do unnecessary tests for
the existence of the transmission link, as well in single mode node as
in network mode.
In this commit, we do instead create the link during initialization of
the name space, and remove it when it is stopped. The fact that the
transmission link now has a guaranteed longer life cycle than any of its
potential clients paves the way for further code simplifcations
and optimizations.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.
In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.
We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.
Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.
There are no functional changes in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
the returned buffer of register_sysctl() is stored into net_header
variable, but net_header is not used after, and compiler maybe
optimise the variable out, and lead kmemleak reported the below warning
comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s)
hex dump (first 32 bytes):
90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffc00020f134>] create_object+0x10c/0x2a0
[<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0
[<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8
[<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0
[<ffffffc00028eef0>] register_sysctl+0x30/0x40
[<ffffffc00099c304>] net_sysctl_init+0x20/0x58
[<ffffffc000994dd8>] sock_init+0x10/0xb0
[<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8
[<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0
[<ffffffc00070ed6c>] kernel_init+0x1c/0xe8
[<ffffffc000083bfc>] ret_from_fork+0xc/0x50
[<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>>
Before fix, the objdump result on ARM64:
0000000000000000 <net_sysctl_init>:
0: a9be7bfd stp x29, x30, [sp,#-32]!
4: 90000001 adrp x1, 0 <net_sysctl_init>
8: 90000000 adrp x0, 0 <net_sysctl_init>
c: 910003fd mov x29, sp
10: 91000021 add x1, x1, #0x0
14: 91000000 add x0, x0, #0x0
18: a90153f3 stp x19, x20, [sp,#16]
1c: 12800174 mov w20, #0xfffffff4 // #-12
20: 94000000 bl 0 <register_sysctl>
24: b4000120 cbz x0, 48 <net_sysctl_init+0x48>
28: 90000013 adrp x19, 0 <net_sysctl_init>
2c: 91000273 add x19, x19, #0x0
30: 9101a260 add x0, x19, #0x68
34: 94000000 bl 0 <register_pernet_subsys>
38: 2a0003f4 mov w20, w0
3c: 35000060 cbnz w0, 48 <net_sysctl_init+0x48>
40: aa1303e0 mov x0, x19
44: 94000000 bl 0 <register_sysctl_root>
48: 2a1403e0 mov w0, w20
4c: a94153f3 ldp x19, x20, [sp,#16]
50: a8c27bfd ldp x29, x30, [sp],#32
54: d65f03c0 ret
After:
0000000000000000 <net_sysctl_init>:
0: a9bd7bfd stp x29, x30, [sp,#-48]!
4: 90000000 adrp x0, 0 <net_sysctl_init>
8: 910003fd mov x29, sp
c: a90153f3 stp x19, x20, [sp,#16]
10: 90000013 adrp x19, 0 <net_sysctl_init>
14: 91000000 add x0, x0, #0x0
18: 91000273 add x19, x19, #0x0
1c: f90013f5 str x21, [sp,#32]
20: aa1303e1 mov x1, x19
24: 12800175 mov w21, #0xfffffff4 // #-12
28: 94000000 bl 0 <register_sysctl>
2c: f9002260 str x0, [x19,#64]
30: b40001a0 cbz x0, 64 <net_sysctl_init+0x64>
34: 90000014 adrp x20, 0 <net_sysctl_init>
38: 91000294 add x20, x20, #0x0
3c: 9101a280 add x0, x20, #0x68
40: 94000000 bl 0 <register_pernet_subsys>
44: 2a0003f5 mov w21, w0
48: 35000080 cbnz w0, 58 <net_sysctl_init+0x58>
4c: aa1403e0 mov x0, x20
50: 94000000 bl 0 <register_sysctl_root>
54: 14000004 b 64 <net_sysctl_init+0x64>
58: f9402260 ldr x0, [x19,#64]
5c: 94000000 bl 0 <unregister_sysctl_table>
60: f900227f str xzr, [x19,#64]
64: 2a1503e0 mov w0, w21
68: f94013f5 ldr x21, [sp,#32]
6c: a94153f3 ldp x19, x20, [sp,#16]
70: a8c37bfd ldp x29, x30, [sp],#48
74: d65f03c0 ret
Add the possible error handle to free the net_header to remove the
kmemleak warning
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.
Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| | |
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.
Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Very rarely, the KSZ9031 will appear to complete autonegotiation, but
will drop all traffic afterwards. When this happens, the idle error
count will read 0xFF after autonegotiation completes. Reset the PHY
when in that state.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Hannes Frederic Sowa says:
====================
overflow-arith: begin to add support for overflow builtins functions
I add a new header, linux/overflow-arith.h, as the central place to add
overflow and wrap-around checking functions. The reason I am doing so
is that it can make use of compiler supported builtin functions which
can leverage hardware.
As I need this for a fix in the ipv6 stack, which is also included in
this series, I propose to add it sooner than later over Davem's net
tree. This is also the reason why I start slowly with only the one
function I need at this time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
issues
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.
Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The idea of the overflow-arith.h header is to collect overflow checking
functions in one central place.
If gcc compiler supports the __builtin_overflow_* builtins we use them
because they might give better performance, otherwise the code falls
back to normal overflow checking functions.
The builtin_overflow functions are supported by gcc-5 and clang. The
matter of supporting clang is to just provide a corresponding
CC_HAVE_BUILTIN_OVERFLOW, because the specific overflow checking builtins
don't differ between gcc and clang.
I just provide overflow_usub function here as I intend this to get merged
into net, more functions will definitely follow as they are needed.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less
than 1 << dctcp_shift_g, then alpha may never reach zero. For example,
given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha
remains 15. The effect isn't noticeable in this case below cwnd=137, but
could gradually drive uncongested flows with leftover alpha down to
cwnd=137. A larger dctcp_shift_g would have a greater effect.
This change causes alpha=15 to drop to 0 instead of being decrementing by 1
as it would when alpha=16. However, it requires one less conditional to
implement since it doesn't have to guard against subtracting 1 from 0U. A
decay of 15 is not unreasonable since an equal or greater amount occurs at
alpha >= 240.
Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We don't have fraglist support in TAP_FEATURES. This will lead
software segmentation of gro skb with frag list. Fixes by having
frag list support in TAP_FEATURES.
With this patch single session of netperf receiving were restored from
about 5Gb/s to about 12Gb/s on mlx4.
Fixes a567dd6252 ("macvtap: simplify usage of tap_features")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices. Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-22
This series contains fixes to i40e only.
Jesse provides two small fixes for i40e, first fixes counters that were
being displayed incorrectly due to indexing beyond the array of strings
when printing stats. Then fixed the fact that the driver was printing
a message about not being able to assign VMDq because a lack of MSI-X
vectors, when it was not true. It was due to a line missing that
initialized a variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The driver was printing a message about not being able
to assign VMDq because of a lack of MSI-X vectors.
This was because a line was missing that initialized a variable,
simply a merge error.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The code was setting up stats that were not being initialized.
This caused several counters to be displayed incorrectly, due
to indexing beyond the array of strings when printing stats.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.
Testing: Verified fix on kernel with lockdep enabled.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| | |
New device IDs shamelessly lifted from the vendor driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2015-10-22
1) Fix IPsec pre-encap fragmentation for GSO packets.
From Herbert Xu.
2) Fix some header checks in _decode_session6.
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups. From Mathias Krause.
3) Allow to change the replay threshold and expiry timer of a
state without having to set other attributes like replay
counter and byte lifetime. Changing these other attributes
may break the SA. From Michael Rossberg.
4) Fix pmtu discovery for local generated packets.
We may fail dispatch to the inner address family.
As a reault, the local error handler is not called
and the mtu value is not reported back to userspace.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.
We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.
Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.
Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ensure there's enough data left prior calling pskb_may_pull(). If
skb->data was already advanced, we'll call pskb_may_pull() with a
negative value converted to unsigned int -- leading to a huge
positive value. That won't matter in practice as pskb_may_pull()
will likely fail in this case, but it leads to underflow reports on
kernels handling such kind of over-/underflows, e.g. a PaX enabled
kernel instrumented with the size_overflow plugin.
Reported-by: satmd <satmd@lain.at>
Reported-and-tested-by: Marcin Jurkowski <marcin1j@gmail.com>
Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
packets. That is, we perform fragmentation pre-encap rather than
post-encap.
A check was added later to ensure that proper MTU information is
passed back for locally generated traffic. Unfortunately this
check was performed on all IPsec packets, including transport-mode
packets.
What's more, the check failed to take GSO into account.
The end result is that transport-mode GSO packets get dropped at
the check.
This patch fixes it by moving the tunnel mode check forward as well
as adding the GSO check.
Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.
Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Karsten Keil says:
====================
Fix potential NULL pointer access and memory leak in ISDN layer2 functions
Insu Yun did brinup the issue with not checking the skb_clone() return
value in the layer2 I-frame ull functions.
This series fix the issue in a way which avoid protocol violations/data loss
on a temporary memory shortage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The old code did not check the return value of skb_clone().
The extra skb_clone() is not needed at all, if using skb_realloc_headroom()
instead, which gives us a private copy with enough headroom as well.
We need to requeue the original skb if the call failed, because we cannot
inform upper layers about the data loss. Restructure the code to minimise
rollback effort if it happens.
This fix kernel bug #86091
Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The skb_clone() return value was not checked and the skb_realloc_headroom()
usage was wrong, the old skb was not freed. It turned out, that the
skb_clone is not needed at all, the skb_realloc_headroom() will create a
private copy with enough headroom and the original SKB can be used for the
ACK queue.
We need to requeue the original skb if the call failed, since the upper
layer cannot be informed about memory shortage.
Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions to a worker thread. All these functions were
deallocation of resources related to the transport itself.
Furthermore, an unused callback was removed to simplify the
cleanup.
Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.
Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).
Reviewed-by: Aditya Asarwade <asarwade@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.
This can be easily reproduced by the following snippet:
int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));
This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().
Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add support for the TI DP83848 Ethernet PHY device.
The DP83848 is a highly reliable, feature rich, IEEE 802.3 compliant
single port 10/100 Mb/s Ethernet Physical Layer Transceiver supporting
the MII and RMII interfaces.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix sun4i-emac not releasing the following resources:
-iomapped memory not released on probe-failure nor on remove
-clock not getting disabled on probe-failure nor on remove
-sram not being released on remove
And while at it also add error checking to the clk_prepare_enable call
done on probe.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct).
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The presence of this attribute does not modify the ct_state for the
current packet, only future packets. Make this more clear in the header
definition.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit e520af48c7e5a introduced the following bug when setting the
TCP_REPAIR sockoption:
[ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164
[ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20
[ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1
[ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012
[ 2860.657054] ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002
[ 2860.657058] 0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18
[ 2860.657062] ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38
[ 2860.657065] Call Trace:
[ 2860.657072] [<ffffffff8185d765>] dump_stack+0x4f/0x7b
[ 2860.657075] [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0
[ 2860.657078] [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20
[ 2860.657082] [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100
[ 2860.657085] [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30
[ 2860.657089] [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830
[ 2860.657093] [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30
[ 2860.657097] [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20
[ 2860.657100] [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0
[ 2860.657104] [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75
Since tcp_xmit_probe_skb() can be called from process context, use
NET_INC_STATS() instead of NET_INC_STATS_BH().
Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters")
Signed-off-by: Renato Westphal <renatow@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains four Netfilter fixes for net, they are:
1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6.
2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking
--accept-local, from Xin Long.
3) Wait for RCU grace period after dropping the pending packets in the
nfqueue, from Florian Westphal.
4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 00590fdd5be0 introduced RCU locking in list type and in
doing so introduced a memory allocation in list_set_add, which
is done in an atomic context, due to the fact that ipset rcu
list modifications are serialised with a spin lock. The reason
why we can't use a mutex is that in addition to modifying the
list with ipset commands, it's also being modified when a
particular ipset rule timeout expires aka garbage collection.
This gc is triggered from set_cleanup_entries, which in turn
is invoked from a timer thus requiring the lock to be bh-safe.
Concretely the following call chain can lead to "sleeping function
called in atomic context" splat:
call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
And since GFP_KERNEL allows initiating direct reclaim thus
potentially sleeping in the allocation path.
To fix the issue change the allocation type to GFP_ATOMIC, to
correctly reflect that it is occuring in an atomic context.
Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We need to sync packet rx again after flushing the queue entries.
Otherwise, the following race could happen:
cpu1: nf_unregister_hook(H) called, H unliked from lists, calls
synchronize_net() to wait for packet rx completion.
Problem is that while no new nf_queue_entry structs that use H can be
allocated, another CPU might receive a verdict from userspace just before
cpu1 calls nf_queue_nf_hook_drop to remove this entry:
cpu2: receive verdict from userspace, lock queue
cpu2: unlink nf_queue_entry struct E, which references H, from queue list
cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock
cpu2: unlock queue
cpu1: nf_queue_nf_hook_drop drops affected queue entries
cpu2: call nf_reinject for E
cpu1: kfree(H)
cpu2: potential use-after-free for H
Cc: Eric W. Biederman <ebiederm@xmission.com>
Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
--accept-local option works for res.type == RTN_LOCAL, which should be
from the local table, but there, the fib_info's nh->nh_scope =
RT_SCOPE_NOWHERE ( > RT_SCOPE_HOST). in fib_create_info().
if (cfg->fc_scope == RT_SCOPE_HOST) {
struct fib_nh *nh = fi->fib_nh;
/* Local address is added. */
if (nhs != 1 || nh->nh_gw)
goto err_inval;
nh->nh_scope = RT_SCOPE_NOWHERE; <===
nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif);
err = -ENODEV;
if (!nh->nh_dev)
goto failure;
but in our rpfilter_lookup_reverse():
if (dev_match || flags & XT_RPFILTER_LOOSE)
return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST;
if nh->nh_scope > RT_SCOPE_HOST, it will fail. --accept-local option
will never be passed.
it seems the test is bogus and can be removed to fix this issue.
if (dev_match || flags & XT_RPFILTER_LOOSE)
return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST;
ipv6 does not have this issue.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
net/built-in.o: In function `nf_dup_ipv4': (.text+0xed24d): undefined reference to `nf_conntrack_untracked'
net/built-in.o: In function `nf_dup_ipv4': (.text+0xed267): undefined reference to `nf_conntrack_untracked'
net/built-in.o: In function `nf_dup_ipv6': (.text+0x158aef): undefined reference to `nf_conntrack_untracked'
net/built-in.o: In function `nf_dup_ipv6': (.text+0x158b09): undefined reference to `nf_conntrack_untracked'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In commit d999297c3dbbe ("tipc: reduce locking scope during packet reception")
we altered the packet retransmission function. Since then, when
restransmitting packets, we create a clone of the original buffer
using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of
the area we want to have copied, but also the smallest possible TIPC
packet size. The value of MIN_H_SIZE is 24.
Unfortunately, __pskb_copy() also has the effect that the headroom
of the cloned buffer takes the size MIN_H_SIZE. This is too small
for carrying the packet over the UDP tunnel bearer, which requires
a minimum headroom of 28 bytes. A change to just use pskb_copy()
lets the clone inherit the original headroom of 80 bytes, but also
assumes that the copied data area is of at least that size, something
that is not always the case. So that is not a viable solution.
We now fix this by adding a check for sufficient headroom in the
transmit function of udp_media.c, and expanding it when necessary.
Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
CONFIG_NET_VENDOR_CAVIUM is only used to hide/show config options and to
include subdirectories in the build, so it doesn't make sense to make it
tristate.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|