| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions. (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).
However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.
Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.
The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.
Before we allocate the route, we lookup the exception.
Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.
Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.
With help from Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.
Suggested by Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.
The new interpretation is:
1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr
2) rt_gateway != 0, destination requires a nexthop gateway
Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
|
|
|
|
| |
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
| |
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
| |
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.
The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.
What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables. The former of which is
controllable by external entitites.
Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
As this is going to be used not only by bonding.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Also cut out unused function parameters and possible err in return
value.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.
When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.
tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.
Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.
This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.
Uses hash_32() instead of custom hash.
tcpmhash_entries should be an unsigned int.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|
| |\
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change addresses an L2CAP ERTM throughput problem when a remote
device does not fully utilize the available transmit window.
The L2CAP ERTM transmit window size determines the maximum number of
unacked frames that may be outstanding at any time. It is configured
separately for each direction of an ERTM connection. Each side sends a
configuration request with a tx_win field indicating how many unacked
frames it is capable of receiving before sending an ack. The
configuration response's tx_win field shows how many frames the
transmitter will actually send before waiting for an ack.
It's important to trace both the actual transmit window (to check for
validity of incoming frames) and the number of frames that the
transmitter will send before waiting (to send acks at the appropriate
time). Now there are separate tx_win and ack_win values. ack_win is
updated based on configuration responses, and is used to determine
when acks are sent.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Improve debug output.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Improve debugging of hci_conn objects by: adding print to hci_conn
refcounting, adding object spcifier when missing, change conn to hcon
since conn is heavily used for l2cap_conn objects and this is misleading.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move AUTO_OFF_TIMEOUT to other constants changing name to
HCI_AUTO_OFF_TIMEOUT and convert to jiffies.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change flags field to matches userspace structure.
This field needs to be converted to little endian before forward it.
Signed-off-by: Jefferson Delfes <jefferson.delfes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The HCI constants are always used in form of jiffies. So just
include the conversion from msecs in the define itself. This has the
advantage of making the code where the timeout is used more readable
and avoiding unnecessary conversions.
The patch is similar to commit ba13ccd9 doing the same job for L2CAP
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Conflicts:
net/bluetooth/hci_event.c
|
| |\ \ \
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Cellular base stations can provide hints to cfg80211 about
where they think we are. This can be done for example on
a cell phone. To enable these hints we simply allow them
through as user regulatory hints but we allow userspace
to clasify the hint as either coming directly from the
user or coming from a cellular base station. This option
is only available when you enable
CONFIG_CFG80211_CERTIFICATION_ONUS.
The base station hints themselves will not be processed
by the core unless at least one device on the system
supports this feature.
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Let the user configure serveral TX error conection quality monitoring
parameters: % error rate, survey interval, and # of attempted packets.
On exceeding the TX failure rate over the given interval, the driver
will send a CQM notify event with the actual TX failure rate and
packets attempted.
Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Revert commit b78e8ceac23655e1e06b30aa95ab11742d1ac7c0
("cfg80211: track monitor channel") and remove the
set_monitor_enabled() callback.
Due to the tracking happening in NETDEV_PRE_UP, it had
introduced bugs because the monitor interface callback
would be called before the device was started. It looks
like there's no way to fix this, and using NETDEV_PRE_UP
is broken anyway (since there's no NETDEV_UP_FAIL), so
remove all that code, track interfaces in NETDEV_UP and
also stop tracking the monitor channel in cfg80211.
This mostly reverts to before the tracking, except that
we keep the interface count tracking so that setting the
monitor channel can be rejected properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This essentially reverts commit 2e165b818456 but
introduces the get_channel operation with a new
wireless_dev argument so that you can retrieve
the channel per interface. This is necessary as
even though we can track all interface channels
(except monitor) we can't track the channel type
used.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| |\| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
drivers/net/wireless/iwmc3200wifi/cfg80211.c
drivers/net/wireless/mwifiex/cfg80211.c
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Some drivers (iwlegacy, iwlwifi and rt2x00) today use the
bss_conf.last_tsf value. By itself though that value is
completely worthless since it may be ancient. What really
is needed is synchronisation between some device time and
the TSF.
To clarify this, rename bss_conf.last_tsf to sync_tsf and
add sync_device_ts which is obtained from rx_status which
gets a new field device_timestamp for this purpose. This
is intentionally not using the mactime field since that
is used for other things and in IBSS is expected to sync
with the IBSS's TSF which isn't necessarily true for the
device timestamp.
Also, since we have the information and it's useful even
before the connection has been established, give all the
timing details to the driver before authenticating.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We waste a lot of space in this struct because it uses
int values where smaller ones would be sufficient. The
upcoming A-MPDU information needs some space, optimize
the struct now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The new P2P Device will have to be able to scan for
P2P search, so move scanning to use struct wireless_dev
instead of struct net_device.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to be able to create P2P Device wdevs, move
the virtual interface management over to wireless_dev
structures.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The management frame and remain-on-channel APIs will be
needed in the P2P device abstraction, so move them over
to the new wdev-based APIs. Userspace can still use both
the interface index and wdev identifier for them so it's
backward compatible, but for the P2P Device wdev it will
be able to use the wdev identifier only.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to support a P2P device abstraction and
Bluetooth high-speed AMPs, we need to have a way
to identify virtual interfaces that don't have a
netdev associated.
Do this by adding a NL80211_ATTR_WDEV attribute
to identify a wdev which may or may not also be
a netdev.
To simplify things, use a 64-bit value with the
high 32 bits being the wiphy index for this new
wdev identifier in the nl80211 API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This API call was intended to be used by drivers
if they want to optimize key handling by removing
one key when another is added. Remove it since no
driver is using it. If needed, it can always be
added back.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Conflicts:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.
[ 579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[ 579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[ 579.781792] PGD 218f9067 PUD 0
[ 579.781865] Oops: 0000 [#1] SMP
[ 579.781945] CPU 0
[ 579.781983] Modules linked in:
[ 579.782047]
[ 579.782080]
[ 579.782114] Pid: 4644, comm: qemu Tainted: G W 3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard /30E8
[ 579.782300] RIP: 0010:[<ffffffff817b1ca5>] [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[ 579.782455] RSP: 0018:ffff88007b003a98 EFLAGS: 00010287
[ 579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
[ 579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
[ 579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
[ 579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
[ 579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
[ 579.783177] FS: 0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
[ 579.783306] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
[ 579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
[ 579.783919] Stack:
[ 579.783959] ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[ 579.784110] ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[ 579.784260] ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[ 579.784477] Call Trace:
[ 579.784523] <IRQ>
[ 579.784562]
[ 579.784603] [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
[ 579.784707] [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[ 579.784797] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[ 579.784906] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[ 579.784995] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[ 579.785175] [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
[ 579.785179] [<ffffffff817ac417>] __br_forward+0x97/0xa2
[ 579.785179] [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
[ 579.785179] [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
[ 579.785179] [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
[ 579.785179] [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[ 579.785179] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[ 579.785179] [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
[ 579.785179] [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
[ 579.785179] [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
[ 579.785179] [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
[ 579.785179] [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
[ 579.785179] [<ffffffff816e6800>] net_rx_action+0xdf/0x242
[ 579.785179] [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
[ 579.785179] [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 579.785179] [<ffffffff8188812c>] call_softirq+0x1c/0x30
The steps to reproduce as follow,
1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
ipvsadm -A -t 192.168.1.106:80 -s rr
ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
ab -n 1000 http://192.168.1.106/
ip_vs_reply4
ip_vs_out
handle_response
ip_vs_notrack
nf_reset()
{
skb->nf_bridge = NULL;
}
Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In trusted networks, e.g., intranet, data-center, the client does not
need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
of cookie availability.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
On paths with firewalls dropping SYN with data or experimental TCP options,
Fast Open connections will have experience SYN timeout and bad performance.
The solution is to track such incidents in the cookie cache and disables
Fast Open temporarily.
Since only the original SYN includes data and/or Fast Open option, the
SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
such drops. If a path has recurring Fast Open SYN drops, Fast Open is
disabled for 2^(recurring_losses) minutes starting from four minutes up to
roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
it behaves as connect() then write().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.
For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.
For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().
Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.
The buffer size of sendmsg() is independent of the MSS of the connection.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).
The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
the SYN will not include any Fast Open option (for fall back operations).
To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
The basic ones are MSS and the cookies. Later patch will cache more to
handle unfriendly middleboxes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch impelements the common code for both the client and server.
1. TCP Fast Open option processing. Since Fast Open does not have an
option number assigned by IANA yet, it shares the experiment option
code 254 by implementing draft-ietf-tcpm-experimental-options
with a 16 bits magic number 0xF989. This enables global experiments
without clashing the scarce(2) experimental options available for TCP.
When the draft status becomes standard (maybe), the client should
switch to the new option number assigned while the server supports
both numbers for transistion.
2. The new sysctl tcp_fastopen
3. A place holder init function
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.
This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.
To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)
Additional features :
1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.
2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()
3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)
[1] These packets are only generated when no socket was matched for
the incoming packet.
Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.
v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception
v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.
Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)
Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.
Use it in sunrpc and use more bit shuffling, using hash_32().
Use it in net/ipv6/addrconf.c, using hash_32() as well.
As a cleanup, use it in net/ipv4/tcp_metrics.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Incorporated David and Steffen's comments.
Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module.
Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|