| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The encapsulated ethernet and VLAN header may be outside the received
ethernet frame. Thus the skb buffer size has to be checked before it can be
parsed to find out if it encapsulates another batman-adv packet.
Fixes: 420193573f11 ("batman-adv: softif bridge loop avoidance")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"A few fixes for the RTC subsystem. The documentation fix already
missed 4.5 so I think it is worth taking it now:
A documentation fix for s3c and two fixes for the ds1307"
* tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: ds1307: Use irq when available for wakeup-source device
rtc: ds1307: ds3231 temperature s16 overflow
rtc: s3c: Document in binding that only s3c6410 needs a src clk
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With commit 8bc2a40730ec ("rtc: ds1307: add support for the
DT property 'wakeup-source'") we lost the ability for rtc irq
functionality for devices that are actually hooked on a real IRQ
line and have capability to wakeup as well. This is not an expected
behavior. So, instead of just not requesting IRQ, skip the IRQ
requirement only if interrupts are not defined for the device.
Fixes: 8bc2a40730ec ("rtc: ds1307: add support for the DT property 'wakeup-source'")
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Michael Lange <linuxstuff@milaw.biz>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
while retrieving temperature from ds3231, the result may be overflow
since s16 is too small for a multiplication with 250.
ie. if temp_buf[0] == 0x2d, the result (s16 temp) will be negative.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Michael Tatarinov <kukabu@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The S3C binding doc says that the RTC and RTC source clocks are required
but the S3C driver supports different HW IP and only the s3c6410 needs a
source clock.
Fix the binding explaining that the source clock is only needed for the
s3c6410-rtc compatible controller.
Reported-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Two fixes for issues introduced recently, one for an intel_pstate
driver problem uncovered by the recent switch over from using timers
and the other one for a potential cpufreq core problem related to
system suspend/resume.
Specifics:
- Fix an intel_pstate driver problem causing CPUs to get stuck in the
highest P-state when completely idle uncovered by the recent switch
over from using timers (Rafael Wysocki).
- Avoid attempts to get the current CPU frequency when all devices
(like I2C controllers that may be nedded for that purpose) have
been suspended during system suspend/resume (Rafael Wysocki)"
* tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
intel_pstate: Avoid getting stuck in high P-states when idle
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | | |
* pm-cpufreq-fixes:
cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
intel_pstate: Avoid getting stuck in high P-states when idle
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since governor operations are generally skipped if cpufreq_suspended
is set, cpufreq_start_governor() should do nothing in that case.
That function is called in the cpufreq_online() path, and may also
be called from cpufreq_offline() in some cases, which are invoked
by the nonboot CPUs disabing/enabling code during system suspend
to RAM and resume. That happens when all devices have been
suspended, so if the cpufreq driver relies on things like I2C to
get the current frequency, it may not be ready to do that then.
To prevent problems from happening for this reason, make
cpufreq_update_current_freq(), which is the only function invoked
by cpufreq_start_governor() that doesn't check cpufreq_suspended
already, return 0 upfront if cpufreq_suspended is set.
Fixes: 3bbf8fe3ae08 (cpufreq: Always update current frequency before startig governor)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate:
Replace timers with utilization update callbacks) caused the CPUs in
his Haswell-based system to stay in the very high frequency region
even if the system is completely idle.
That turns out to be an existing problem in the intel_pstate driver's
P-state selection algorithm for Core processors. Namely, all
decisions made by that algorithm are based on the average frequency
of the CPU between sampling events and on the P-state requested on
the last invocation, so it may get stuck at a very hight frequency
even if the utilization of the CPU is very low (in fact, it may get
stuck in a inadequate P-state regardless of the CPU utilization).
The only way to kick it out of that limbo is a sufficiently long idle
period (3 times longer than the prescribed sampling interval), but if
that doesn't happen often enough (eg. due to a timing change like
after the above commit), the P-state of the CPU may be inadequate
pretty much all the time.
To address the most egregious manifestations of that issue, reset the
core_busy value used to determine the next P-state to request if the
utilization of the CPU, determined with the help of the MPERF
feedback register and the TSC, is below 1%.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull networking fixes from David Miller:
1) Fix memory leak in iwlwifi, from Matti Gottlieb.
2) Add missing registration of netfilter arp_tables into initial
namespace, from Florian Westphal.
3) Fix potential NULL deref in DecNET routing code.
4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
Ivanov.
5) Fix dst ref counting in VRF, from David Ahern.
6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.
7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.
8) Ravalidate IPV6 datagram socket cached routes properly, particularly
with UDP, from Martin KaFai Lau.
9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.
10) Fix stats typing in bcmgenet driver, from Eric Dumazet.
11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
from Joe Stringer.
12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.
13) atl2 doesn't actually support scatter-gather, so don't advertise the
feature. From Ben Hucthings.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
openvswitch: use flow protocol when recalculating ipv6 checksums
Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
atl2: Disable unimplemented scatter/gather feature
net/mlx4_en: Split SW RX dropped counter per RX ring
net/mlx4_core: Don't allow to VF change global pause settings
net/mlx4_core: Avoid repeated calls to pci enable/disable
net/mlx4_core: Implement pci_resume callback
net: phy: spi_ks8895: Don't leak references to SPI devices
net: ethernet: davinci_emac: Fix platform_data overwrite
net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
qede: Fix single MTU sized packet from firmware GRO flow
qede: Fix setting Skb network header
qede: Fix various memory allocation error flows for fastpath
tcp: Merge tx_flags and tskey in tcp_shifted_skb
tcp: Merge tx_flags and tskey in tcp_collapse_retrans
drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
openvswitch: Orphan skbs before IPv6 defrag
Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
VSOCK: Only check error on skb_recv_datagram when skb is NULL
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.
This patch resolves the problem by relying on the protocol
in the flow key rather than that in the set field action.
Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.")
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.
Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Or Gerlitz says:
====================
Mellaox 40G driver fixes for 4.6-rc
With the fix for ARM bug being under the works, these are
few other fixes for mlx4 we have ready to go.
Eran addressed the problematic/wrong reporting of dropped packets, Daniel
fixed some matters related to PPC EEH's and Jenny's patch makes sure
VFs can't change the port's pause settings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.
In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.
Fixes: a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently changing global pause settings is done via SET_PORT
command with input modifier GENERAL. This command is allowed
for each VF since MTU setting is done via the same command.
Change the above to the following scheme: before passing the
request to the FW, the PF will check whether it was issued
by a slave. If yes, don't change global pause and warn,
otherwise change to the requested value and store for
further reference.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device. Performing the actions more than once without doing
its opposite results in warning logs.
This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.
Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Move resume related activities to a new pci_resume function instead of
performing them in mlx4_pci_slot_reset. This change is needed to avoid
a hotplug during EEH recovery due to commit f2da4ccf8bd4 ("powerpc/eeh:
More relaxed hotplug criterion").
Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The ks8895 driver is using spi_dev_get() apparently just to take a copy
of the SPI device used to instantiate it but never calls spi_dev_put()
to free it. Since the device is guaranteed to exist between probe() and
remove() there should be no need for the driver to take an extra
reference to it so fix the leak by just using a straight assignment.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When the DaVinci emac driver is removed and re-probed, the actual
pdev->dev.platform_data is populated with an unwanted valid pointer saved by
the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
calling priv->int_disable() in emac_int_disable().
Unable to handle kernel paging request at virtual address c8622a80
...
[<c0426fb4>] (emac_int_disable) from [<c0427700>] (emac_dev_open+0x290/0x5f8)
[<c0427700>] (emac_dev_open) from [<c04c00ec>] (__dev_open+0xb8/0x120)
[<c04c00ec>] (__dev_open) from [<c04c0370>] (__dev_change_flags+0x88/0x14c)
[<c04c0370>] (__dev_change_flags) from [<c04c044c>] (dev_change_flags+0x18/0x48)
[<c04c044c>] (dev_change_flags) from [<c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
[<c052bafc>] (devinet_ioctl) from [<c04a1428>] (sock_ioctl+0x1d8/0x2c0)
[<c04a1428>] (sock_ioctl) from [<c014f054>] (do_vfs_ioctl+0x41c/0x600)
[<c014f054>] (do_vfs_ioctl) from [<c014f2a4>] (SyS_ioctl+0x6c/0x7c)
[<c014f2a4>] (SyS_ioctl) from [<c000ff60>] (ret_fast_syscall+0x0/0x1c)
Fixes: 42f59967a091 ("net: ethernet: davinci_emac: add OF support")
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
emac driver when the device is removed and re-probed, and a
pm_runtime_disable() call in davinci_emac_remove().
Actually, using unbind/bind on a TI DM8168 SoC gives :
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
net eth1: DaVinci EMAC: davinci_emac_remove()
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
davinci_emac 4a120000.ethernet: Unbalanced pm_runtime_enable
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes: 3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Manish Chopra says:
====================
qede: Bug fixes
This series fixes -
* various memory allocation failure flows for fastpath
* issues with respect to driver GRO packets handling
V1->V2
* Send series against net instead of net-next.
Please consider applying this series to "net"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In firmware assisted GRO flow there could be a single MTU sized
segment arriving due to firmware aggregation timeout/last segment
in an aggregation flow, which is not expected to be an actual gro
packet. So If a skb has zero frags from the GRO flow then simply
push it in the stack as non gso skb.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Skb's network header needs to be set before extracting IPv4/IPv6
headers from it.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch handles memory allocation failures for fastpath
gracefully in the driver.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Martin KaFai Lau says:
====================
tcp: Merge timestamp info when coalescing skbs
This series is separated from the RFC series related to
tcp_sendmsg(MSG_EOR) and it is targeting for the net branch.
This patchset is focusing on fixing cases where TCP
timestamp could be lost after coalescing skbs.
A BPF prog is used to kprobe to sock_queue_err_skb()
and print out the value of serr->ee.ee_data. The BPF
prog (run-able from bcc) is attached here:
BPF prog used for testing:
~~~~~
from __future__ import print_function
from bcc import BPF
bpf_text = """
int trace_err_skb(struct pt_regs *ctx)
{
struct sk_buff *skb = (struct sk_buff *)ctx->si;
struct sock *sk = (struct sock *)ctx->di;
struct sock_exterr_skb *serr;
u32 ee_data = 0;
if (!sk || !skb)
return 0;
serr = SKB_EXT_ERR(skb);
bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
bpf_trace_printk("ee_data:%u\\n", ee_data);
return 0;
};
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible. tx_flags and tskey also have to be
merged.
This patch reuses the tcp_skb_collapse_tstamp() to handle
them.
BPF Output Before:
~~~~~
<no-output-due-to-missing-tstamp-event>
BPF Output After:
~~~~~
<...>-2024 [007] d.s. 88.644374: : ee_data:14599
Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140
0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257
0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If two skbs are merged/collapsed during retransmission, the current
logic does not merge the tx_flags and tskey. The end result is
the SCM_TSTAMP_ACK timestamp could be missing for a packet.
The patch:
1. Merge the tx_flags
2. Overwrite the prev_skb's tskey with the next_skb's tskey
BPF Output Before:
~~~~~~
<no-output-due-to-missing-tstamp-event>
BPF Output After:
~~~~~~
packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459
Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
0.200 write(4, ..., 11680) = 11680
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 13141 win 257
0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The cpsw_ndo_open() could try to access CPSW registers before
calling pm_runtime_get_sync(). This will trigger L3 error:
WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c()
44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access
and CPSW will stop functioning.
Hence, fix it by moving pm_runtime_get_sync() before the first access
to CPSW registers in cpsw_ndo_open().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
it could incorrectly think that a skb has already
been acked and queue a SCM_TSTAMP_ACK cmsg to the
sk->sk_error_queue.
In tcp_ack_tstamp(), it checks
'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
script, between() returns true but the tskey is actually not acked.
e.g. try between(3, 2, 1).
The fix is to replace between() with one before() and one !before().
By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
removed.
A packetdrill script is used to reproduce the dup ack scenario.
Due to the lacking cmsg support in packetdrill (may be I
cannot find it), a BPF prog is used to kprobe to
sock_queue_err_skb() and print out the value of
serr->ee.ee_data.
Both the packetdrill and the bcc BPF script is attached at the end of
this commit message.
BPF Output Before Fix:
~~~~~~
<...>-2056 [001] d.s. 433.927987: : ee_data:1459 #incorrect
packetdrill-2056 [001] d.s. 433.929563: : ee_data:1459 #incorrect
packetdrill-2056 [001] d.s. 433.930765: : ee_data:1459 #incorrect
packetdrill-2056 [001] d.s. 434.028177: : ee_data:1459
packetdrill-2056 [001] d.s. 434.029686: : ee_data:14599
BPF Output After Fix:
~~~~~~
<...>-2049 [000] d.s. 113.517039: : ee_data:1459
<...>-2049 [000] d.s. 113.517253: : ee_data:14599
BCC BPF Script:
~~~~~~
#!/usr/bin/env python
from __future__ import print_function
from bcc import BPF
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
#include <linux/errqueue.h>
#ifdef memset
#undef memset
#endif
int trace_err_skb(struct pt_regs *ctx)
{
struct sk_buff *skb = (struct sk_buff *)ctx->si;
struct sock *sk = (struct sock *)ctx->di;
struct sock_exterr_skb *serr;
u32 ee_data = 0;
if (!sk || !skb)
return 0;
serr = SKB_EXT_ERR(skb);
bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
bpf_trace_printk("ee_data:%u\\n", ee_data);
return 0;
};
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()
Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 1460) = 1460
0.200 write(4, ..., 13140) = 13140
0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257
0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
orphan skbs inside ip_defrag()").
Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
cloned (implicitly orphaning) prior to queueing for reassembly. As such,
when the IPv6 message is eventually reassembled, the skb->sk for all
fragments would be NULL. After that commit was introduced, rather than
cloning, the original skbs were queued directly without orphaning. The
end result is that all frags except for the first and last may have a
socket attached.
This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
kernel BUG at net/ipv6/ip6_output.c:631!
[...]
Call Trace:
<IRQ>
[<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
[<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
[<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
[<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
[<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
[<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
[<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
[<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
[<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
[<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
[<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
[<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
[<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
[<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
[<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
[<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
[<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
[<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
[<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
[<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
[<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
[<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
[<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
[<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
[<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
[<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
[<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
[<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
[<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
[<ffffffff81690990>] dev_queue_xmit+0x10/0x20
[<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
[<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
[<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
[<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
[<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
[<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
[<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
[<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
[<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
[<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
[<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
[<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
[<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
[<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
[<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
[<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
[<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
[<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
[<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
[<ffffffff817567a0>] ip6_input+0x30/0xa0
[<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
[<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
[<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
[<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
[<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
[<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
[<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
[<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
[<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
[<ffffffff8168c088>] process_backlog+0x78/0x230
[<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
[<ffffffff8168db43>] net_rx_action+0x203/0x480
[<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
[<ffffffff817c156e>] __do_softirq+0xde/0x49f
[<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
[<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
[<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
[<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
[<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
[<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
[<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
[<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
[<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
[<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
[<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
[<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
[<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
[<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
[<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
[<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
[<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
[<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
[<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
[<ffffffff8166d078>] sock_sendmsg+0x38/0x50
[<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
[<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
[<ffffffff8166e38e>] SyS_sendto+0xe/0x10
[<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
RIP [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
RSP <ffff880072803120>
Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone
operations")
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This reverts commit cfe255600154f0072d4a8695590dbd194dfd1aeb
This can result in a "Unable to handle kernel paging request"
during boot. This was due to using an uninitialised struct member,
data->slaves.
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If skb_recv_datagram returns an skb, we should ignore the err
value returned. Otherwise, datagram receives will return EAGAIN
when they have to wait for a datagram.
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
skb->sk could point to timewait or request socket which has no sk_classid.
Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify".
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch fixes couple error paths after allocation failures.
Atomic set of page reference counter is safe only if it is zero,
otherwise set can race with any speculative get_page_unless_zero.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
High order pages are optional here since commit 51151a16a60f ("mlx4: allow
order-0 memory allocations in RX path"), so here is no reason for depleting
reserves. Generic __netdev_alloc_frag() implements the same logic.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The new MACsec driver uses the AES crypto algorithm, but can be configured
even if CONFIG_CRYPTO is disabled, leading to a build error:
warning: (MAC80211 && MACSEC) selects CRYPTO_GCM which has unmet direct dependencies (CRYPTO)
warning: (BT && CEPH_LIB && INET && MAC802154 && MAC80211 && BLK_DEV_RBD && MACSEC && AIRO_CS && LIBIPW && HOSTAP && USB_WUSB && RTLLIB_CRYPTO_CCMP && FS_ENCRYPTION && EXT4_ENCRYPTION && CEPH_FS && BIG_KEYS && ENCRYPTED_KEYS) selects CRYPTO_AES which has unmet direct dependencies (CRYPTO)
crypto/built-in.o: In function `gcm_enc_copy_hash':
aes_generic.c:(.text+0x2b8): undefined reference to `crypto_xor'
aes_generic.c:(.text+0x2dc): undefined reference to `scatterwalk_map_and_copy'
This adds an explicit 'select CRYPTO' statement the way that other
drivers handle it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
On 64bit kernels, device stats are 64bit wide, not 32bit.
Fixes: 1c1008c793fa4 ("net: bcmgenet: add main driver file")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: fix hardware cross-chip bridging
In order to accelerate cross-chip switching of frames with the hardware,
the DSA Tag ports, used to interconnect switch devices, must learn SA
and DA addresses, and share the same FDB with the user ports.
The two first patches restore address learning on DSA links. This fixes
hardware cross-chip bridging in a VLAN filtering enabled system, which
implements a bridge group as a 802.1Q VLAN and thus share an isolated
address database between DSA and user ports.
The third patch changes the distinct default databases used for each
port, to the same address database. This fixes the hardware cross-chip
bridging in a VLAN filtering disabled system, where a bridge group gets
implemented only as a port-based VLAN.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
For hardware cross-chip bridging to work, user ports *and* DSA ports
need to share a common address database, in order to switch a frame to
the correct interconnected device.
This is currently working for VLAN filtering aware systems, since Linux
will implement a bridge group as a 802.1Q VLAN, which has its own FDB,
including DSA and CPU links as members.
However when the system doesn't support VLAN filtering, Linux only
relies on the port-based VLAN to implement a bridge group.
To fix hardware cross-chip bridging for such systems, set the same
default address database 0 for user and DSA ports, instead of giving
them all a different default database.
Note that the bridging code prevents frames to egress between unbridged
ports, and flushes FDB entries of a port when changing its STP state.
Also note that the FID 0 is special and means "all" for ATU operations,
but it's OK since it is used as a default forwarding address database.
Fixes: 2db9ce1fd9a3 ("net: dsa: mv88e6xxx: assign default FDB to ports")
Fixes: 466dfa077022 ("net: dsa: mv88e6xxx: assign dynamic FDB to bridges")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In multi-chip systems, DSA Tag ports must learn SA addresses in order to
correctly switch frames between interconnected chips.
This fixes cross-chip hardware bridging in a VLAN filtering aware
system, because a bridge group gets implemented as an hardware 802.1Q
VLAN and thus DSA and user ports share the same FDB.
Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Locking a port generates an hardware interrupt when a new SA address is
received. This enables CPU directed learning, which is needed for 802.1X
MAC authentication.
To disable automatic learning on a port, the only configuration needed
is to set its Port Association Vector to all zero.
Clear PAV when SA learning should be disabled instead of locking a port.
Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.
Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.
Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
dp->dp_ack_seq is used in big endian format. We need to do the
big endianness conversion when we assign a value in host format
to it.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When __vlan_insert_tag() fails from skb_vlan_push() path due to the
skb_cow_head(), we need to undo the __skb_push() in the error path
as well that was done earlier to move skb->data pointer to mac header.
Moreover, I noticed that when in the non-error path the __skb_pull()
is done and the original offset to mac header was non-zero, we fixup
from a wrong skb->data offset in the checksum complete processing.
So the skb_postpush_rcsum() really needs to be done before __skb_pull()
where skb->data still points to the mac header start and thus operates
under the same conditions as in __vlan_insert_tag().
Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Adding a 2nd PHY to cpsw results in a NULL pointer dereference
as below. Fix by maintaining a reference to each PHY node in slave
struct instead of a single reference in the priv struct which was
overwritten by the 2nd PHY.
[ 17.870933] Unable to handle kernel NULL pointer dereference at virtual address 00000180
[ 17.879557] pgd = dc8bc000
[ 17.882514] [00000180] *pgd=9c882831, *pte=00000000, *ppte=00000000
[ 17.889213] Internal error: Oops: 17 [#1] ARM
[ 17.893838] Modules linked in:
[ 17.897102] CPU: 0 PID: 1657 Comm: connmand Not tainted 4.5.0-ge463dfb-dirty #11
[ 17.904947] Hardware name: Cambrionix whippet
[ 17.909576] task: dc859240 ti: dc968000 task.ti: dc968000
[ 17.915339] PC is at phy_attached_print+0x18/0x8c
[ 17.920339] LR is at phy_attached_info+0x14/0x18
[ 17.925247] pc : [<c042baec>] lr : [<c042bb74>] psr: 600f0113
[ 17.925247] sp : dc969cf8 ip : dc969d28 fp : dc969d18
[ 17.937425] r10: dda7a400 r9 : 00000000 r8 : 00000000
[ 17.942971] r7 : 00000001 r6 : ddb00480 r5 : ddb8cb34 r4 : 00000000
[ 17.949898] r3 : c0954cc0 r2 : c09562b0 r1 : 00000000 r0 : 00000000
[ 17.956829] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 17.964401] Control: 10c5387d Table: 9c8bc019 DAC: 00000051
[ 17.970500] Process connmand (pid: 1657, stack limit = 0xdc968210)
[ 17.977059] Stack: (0xdc969cf8 to 0xdc96a000)
[ 17.981692] 9ce0: dc969d28 dc969d08
[ 17.990386] 9d00: c038f9bc c038f6b4 ddb00480 dc969d34 dc969d28 c042bb74 c042bae4 00000000
[ 17.999080] 9d20: c09562b0 c0954cc0 dc969d5c dc969d38 c043ebfc c042bb6c 00000007 00000003
[ 18.007773] 9d40: ddb00000 ddb8cb58 ddb00480 00000001 dc969dec dc969d60 c0441614 c043ea68
[ 18.016465] 9d60: 00000000 00000003 00000000 fffffff4 dc969df4 0000000d 00000000 00000000
[ 18.025159] 9d80: dc969db4 dc969d90 c005dc08 c05839e0 dc969df4 0000000d ddb00000 00001002
[ 18.033851] 9da0: 00000000 00000000 dc969dcc dc969db8 c005ddf4 c005dbc8 00000000 00000118
[ 18.042544] 9dc0: dc969dec dc969dd0 ddb00000 c06db27c ffff9003 00001002 00000000 00000000
[ 18.051237] 9de0: dc969e0c dc969df0 c057c88c c04410dc dc969e0c ddb00000 ddb00000 00000001
[ 18.059930] 9e00: dc969e34 dc969e10 c057cb44 c057c7d8 ddb00000 ddb00138 00001002 beaeda20
[ 18.068622] 9e20: 00000000 00000000 dc969e5c dc969e38 c057cc28 c057cac0 00000000 dc969e80
[ 18.077315] 9e40: dda7a40c beaeda20 00000000 00000000 dc969ecc dc969e60 c05e36d0 c057cc14
[ 18.086007] 9e60: dc969e84 00000051 beaeda20 00000000 dda7a40c 00000014 ddb00000 00008914
[ 18.094699] 9e80: 30687465 00000000 00000000 00000000 00009003 00000000 00000000 00000000
[ 18.103391] 9ea0: 00001002 00008914 dd257ae0 beaeda20 c098a428 beaeda20 00000011 00000000
[ 18.112084] 9ec0: dc969edc dc969ed0 c05e4e54 c05e3030 dc969efc dc969ee0 c055f5ac c05e4cc4
[ 18.120777] 9ee0: beaeda20 dd257ae0 dc8ab4c0 00008914 dc969f7c dc969f00 c010b388 c055f45c
[ 18.129471] 9f00: c071ca40 dd257ac0 c00165e8 dc968000 dc969f3c dc969f20 dc969f64 dc969f28
[ 18.138164] 9f20: c0115708 c0683ec8 dd257ac0 dd257ac0 dc969f74 dc969f40 c055f350 c00fc66c
[ 18.146857] 9f40: dd82e4d0 00000011 00000000 00080000 dd257ac0 00000000 dc8ab4c0 dc8ab4c0
[ 18.155550] 9f60: 00008914 beaeda20 00000011 00000000 dc969fa4 dc969f80 c010bc34 c010b2fc
[ 18.164242] 9f80: 00000000 00000011 00000002 00000036 c00165e8 dc968000 00000000 dc969fa8
[ 18.172935] 9fa0: c00163e0 c010bbcc 00000000 00000011 00000011 00008914 beaeda20 00009003
[ 18.181628] 9fc0: 00000000 00000011 00000002 00000036 00081018 00000001 00000000 beaedc10
[ 18.190320] 9fe0: 00083188 beaeda1c 00043a5d b6d29c0c 600b0010 00000011 00000000 00000000
[ 18.198989] Backtrace:
[ 18.201621] [<c042bad8>] (phy_attached_print) from [<c042bb74>] (phy_attached_info+0x14/0x18)
[ 18.210664] r3:c0954cc0 r2:c09562b0 r1:00000000
[ 18.215588] r4:ddb00480
[ 18.218322] [<c042bb60>] (phy_attached_info) from [<c043ebfc>] (cpsw_slave_open+0x1a0/0x280)
[ 18.227293] [<c043ea5c>] (cpsw_slave_open) from [<c0441614>] (cpsw_ndo_open+0x544/0x674)
[ 18.235874] r7:00000001 r6:ddb00480 r5:ddb8cb58 r4:ddb00000
[ 18.241944] [<c04410d0>] (cpsw_ndo_open) from [<c057c88c>] (__dev_open+0xc0/0x128)
[ 18.249972] r9:00000000 r8:00000000 r7:00001002 r6:ffff9003 r5:c06db27c r4:ddb00000
[ 18.258255] [<c057c7cc>] (__dev_open) from [<c057cb44>] (__dev_change_flags+0x90/0x154)
[ 18.266745] r5:00000001 r4:ddb00000
[ 18.270575] [<c057cab4>] (__dev_change_flags) from [<c057cc28>] (dev_change_flags+0x20/0x50)
[ 18.279523] r9:00000000 r8:00000000 r7:beaeda20 r6:00001002 r5:ddb00138 r4:ddb00000
[ 18.287811] [<c057cc08>] (dev_change_flags) from [<c05e36d0>] (devinet_ioctl+0x6ac/0x76c)
[ 18.296483] r9:00000000 r8:00000000 r7:beaeda20 r6:dda7a40c r5:dc969e80 r4:00000000
[ 18.304762] [<c05e3024>] (devinet_ioctl) from [<c05e4e54>] (inet_ioctl+0x19c/0x1c8)
[ 18.312882] r10:00000000 r9:00000011 r8:beaeda20 r7:c098a428 r6:beaeda20 r5:dd257ae0
[ 18.321235] r4:00008914
[ 18.323956] [<c05e4cb8>] (inet_ioctl) from [<c055f5ac>] (sock_ioctl+0x15c/0x2d8)
[ 18.331829] [<c055f450>] (sock_ioctl) from [<c010b388>] (do_vfs_ioctl+0x98/0x8d0)
[ 18.339765] r7:00008914 r6:dc8ab4c0 r5:dd257ae0 r4:beaeda20
[ 18.345822] [<c010b2f0>] (do_vfs_ioctl) from [<c010bc34>] (SyS_ioctl+0x74/0x84)
[ 18.353573] r10:00000000 r9:00000011 r8:beaeda20 r7:00008914 r6:dc8ab4c0 r5:dc8ab4c0
[ 18.361924] r4:00000000
[ 18.364653] [<c010bbc0>] (SyS_ioctl) from [<c00163e0>] (ret_fast_syscall+0x0/0x3c)
[ 18.372682] r9:dc968000 r8:c00165e8 r7:00000036 r6:00000002 r5:00000011 r4:00000000
[ 18.380960] Code: e92dd810 e24cb010 e24dd010 e59b4004 (e5902180)
[ 18.387580] ---[ end trace c80529466223f3f3 ]---
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Only core revisions older than 4 use BGMAC_CMDCFG_SR_REV0. This mainly
fixes support for BCM4708A0KF SoCs with Ethernet core rev 5 (it means
only some devices as most of BCM4708A0KF-s got core rev 4).
This was tested for regressions on BCM47094 which doesn't seem to care
which bit gets used.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Craig Gallek says:
====================
Fixes for SO_REUSEPORT and mixed v4/v6 sockets
Recent changes to the datastructures associated with SO_REUSEPORT broke
an existing behavior when equivalent SO_REUSEPORT sockets are created
using both AF_INET and AF_INET6. This patch series restores the previous
behavior and includes a test to validate it.
This series should be a trivial merge to stable kernels (if deemed
necessary), but will have conflicts in net-next. The following patches
recently replaced the use of hlist_nulls with hlists for UDP and TCP
socket lists:
ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
If this series is accepted, I will send an RFC for the net-next change
to assist with the merge.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Test to validate the behavior of SO_REUSEPORT sockets that are
created with both AF_INET and AF_INET6. See the commit prior to this
for a description of this behavior.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.
Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).
The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match. They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.
To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list. This will force AF_INET sockets to always be
considered first.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|