| Commit message (Collapse) | Author | Files | Lines |
|
libteam needs this event notification in userspace when dev's master
dev has been changed. After this, the redundant notifications issue
would be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY
correctly in do_setlink'.
Fixes: b6b36eb23a46 ("rtnetlink: Do not generate notifications for NETDEV_CHANGEUPPER event")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As I said in patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back
in rtnetlink_event', removing NETDEV_POST_TYPE_CHANGE event was not the
right fix for the redundant notifications issue.
So bring this event process back to rtnetlink_event and the old redundant
notifications issue would be fixed in the later patch 'rtnetlink: check
DO_SETLINK_NOTIFY correctly in do_setlink'.
Fixes: aef091ae58aa ("rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rtnetlink_event
The same fix for changing mtu in the patch 'rtnetlink: bring
NETDEV_CHANGEMTU event process back in rtnetlink_event' is
needed for changing tx_queue_len.
Note that the redundant notifications issue for tx_queue_len
will be fixed in the later patch 'rtnetlink: do not send
notification for tx_queue_len in do_setlink'.
Fixes: 27b3b551d8a7 ("rtnetlink: Do not generate notifications for NETDEV_CHANGE_TX_QUEUE_LEN event")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU
events") tried to fix the redundant notifications issue when ip link
set mtu by removing NETDEV_CHANGEMTU event process in rtnetlink_event.
But it also resulted in no notification generated when dev's mtu is
changed via other methods, like:
'ifconfig eth1 mtu 1400' or 'echo 1400 > /sys/class/net/eth1/mtu'
It would cause users not to be notified by this change.
This patch is to fix it by bringing NETDEV_CHANGEMTU event back into
rtnetlink_event, and the redundant notifications issue will be fixed
in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in
do_setlink'.
Fixes: 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU events")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hwrm_send_message() is replaced with _hwrm_send_message(), and
hwrm_cmd_lock mutex lock is grabbed for the whole period of
firmware call until the firmware DCB parameters have been copied.
This will prevent possible corruption of the firmware data.
Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In bnxt_find_nvram_item(), it is copying firmware response data after
releasing the mutex. This can cause the firmware response data
to be corrupted if the next firmware response overwrites the response
buffer. The rare problem shows up when running ethtool -i repeatedly.
Fix it by calling the new variant _hwrm_send_message_silent() that requires
the caller to take the mutex and to release it after the response data has
been copied.
Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package version reporting via Ethtool GDRVINFO")
Reported-by: Sarveswara Rao Mygapula <sarveswararao.mygapula@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In bnxt_sriov_enable(), we calculate to see if we have enough hardware
resources to enable the requested number of VFs. The logic to check
for minimum completion rings and statistics contexts is missing. Add
the required checks so that VF configuration won't fail.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCIE PCIE_EP_REG_LINK_STATUS_CONTROL register is only defined in PF
config space, so we must read it from the PF.
Fixes: 90c4f788f6c0 ("bnxt_en: Report PCIe link speed and width during driver load")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic. With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions. If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout. This patch fixes these
timeouts.
Fixes: 90c694bb7181 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Link status query firmware messages originating from the VFs are forwarded
to the PF. The driver handles these interactions in a workqueue for the
VF and PF. The VF driver waits for the response from the PF in the
workqueue. If the PF and VF driver are running on the same host and the
work for both PF and VF are queued on the same workqueue, the VF driver
may not get the response if the PF work item is queued behind it on the
same workqueue. This will lead to the VF link query message timing out.
To prevent this, we create a private workqueue for PFs instead of using
the common workqueue. The VF query and PF response will never be on
the same workqueue.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The 88E6060 Ethernet switch always transmits the multicast bit of the
switch MAC address as a zero. It re-uses the corresponding bit 8 of the
register "Switch MAC Address Register Bytes 0 & 1" for "DiffAddr".
If the "DiffAddr" bit is 0, then all ports transmit the same source
address. If it is set to 1, then bit 2:0 are used for the port number.
The mv88e6060 driver is currently wrongly shifting the MAC address byte
0 by 9. To fix this, shift it by 8 as usual and clear its bit 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When pppol2tp_session_ioctl() is called by pppol2tp_tunnel_ioctl(),
the session may be unconnected. That is, it was created by
pppol2tp_session_create() and hasn't been connected with
pppol2tp_connect(). In this case, ps->sock is NULL, so we need to check
for this case in order to avoid dereferencing a NULL pointer.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
function
The DMA reset timeout, used in read_poll_timeout, is
ten times shorter than the sleep time.
This patch fixes these values interchanging them, as it was
before the read_poll_timeout introduction.
Fixes: 8a70aeca80c2 ("net: stmmac: Use readl_poll_timeout")
Signed-off-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While experimenting with changes to the timekeeping code, I
ran into a build error in the liquidio driver:
drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 'liquidio_ptp_settime':
drivers/net/ethernet/cavium/liquidio/lio_main.c:1850:22: error: passing argument 1 of 'timespec_to_ns' from incompatible pointer type [-Werror=incompatible-pointer-types]
The driver had a type mismatch since it was first merged, but
this never caused problems because it is only built on 64-bit
architectures that define timespec and timespec64 to the same
type.
If we ever want to compile-test the driver on 32-bit or change
the way that 64-bit timespec64 is defined, we need to fix it,
so let's just do it now.
Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kernel config help for policy routing was still pointing at
an ancient document from 2000 that refers to Linux 2.1. Update it
to point to something that is at least occasionally updated.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we drop any new VLAN ids if there are more than the current
(or last used) channel can support. Most importantly this is a problem
if no channel has been selected yet, resulting in a segfault.
Secondly this does not necessarily reflect the capabilities of any other
channels. Instead only drop a new VLAN id if we are already tracking the
maximum allowed by the NCSI specification. Per-channel limits are
already handled by ncsi_add_filter(), but add a message to set_one_vid()
to make it obvious that the channel can not support any more VLAN ids.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rtl_init_one() currently enables PCI wakeups if the ethernet device
is found to be WOL-capable. There is no need to do this when
rtl8169_set_wol() will correctly enable or disable the same wakeup flag
when WOL is activated/deactivated.
This works around an ACPI DSDT bug which prevents the Acer laptop models
Aspire ES1-533, Aspire ES1-732, PackardBell ENTE69AP and Gateway NE533
from entering S3 suspend - even when no ethernet cable is connected.
On these platforms, the DSDT says that GPE08 is a wakeup source for
ethernet, but this GPE fires as soon as the system goes into suspend,
waking the system up immediately. Having the wakeup normally disabled
avoids this issue in the default case.
With this change, WOL will continue to be unusable on these platforms
(it will instantly wake up if WOL is later enabled by the user) but we
do not expect this to be a commonly used feature on these consumer
laptops. We have separately determined that WOL works fine without any
ACPI GPEs enabled during sleep, so a DSDT fix or override would be
possible to make WOL work.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: cda7ea690350 ("macsec: check return value of skb_to_sgvec always")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If for some reason, the newly allocated child need to be freed,
we will call cgroup_put() (via sk_free_unlock_clone()) while the
corresponding cgroup_get() was not yet done, and we will free memory
too soon.
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit fbb1fb4ad415cb31ce944f65a5ca700aaf73a227.
This was not the proper fix, lets cleanly revert it, so that
following patch can be carried to stable versions.
sock_cgroup_ptr() callers do not expect a NULL return value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
page_address() does not handle NULL argument gracefully,
make sure we NULL-check the page pointer before passing it
to page_address().
Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The while loop fetching 64 bit ethtool statistics may have
to retry multiple times, it shouldn't modify the outside state.
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert Variable Length Array in Struct (VLAIS) to valid C by converting
local struct definition to use a flexible array. The structure is only
used to define a cast of a buffer so the size of the struct is not used
to allocate storage.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Mark Charebois <charlebm@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.
This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.
Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Reported-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Anders K Pedersen <akp@cohaesio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Caller needs to acquire the lock. Called functions will not.
Fixes: 09f79fd49d94 ("i40e: avoid NVM acquire deadlock during NVM update")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
sk_clone_lock() might run while TCP/DCCP listener already vanished.
In order to prevent use after free, it is better to defer cgroup_sk_alloc()
to the point we know both parent and child exist, and from process context.
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of calling mem_cgroup_sk_alloc() from BH context,
it is better to call it from inet_csk_accept() in process context.
Not only this removes code in mem_cgroup_sk_alloc(), but it also
fixes a bug since listener might have been dismantled and css_get()
might cause a use-after-free.
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback.
This module allows switching to different USB profiles with the
'AT+UUSBCONF' command, and provides a ECM network interface when the
'AT+UUSBCONF=2' profile is selected.
The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB
configuration includes a ECM network interface.
Both these modules are controlled via AT commands through one of the
TTYs exposed. Connecting these modules may be done just by activating
the desired PDP context with 'AT+CGACT=1,<cid>' and then running DHCP
on the ECM interface.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit bc044e8db796 ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.
As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.
Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().
This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.
It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.
In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent patch removed the dst_free() on the allocated
dst_entry in ipv4_blackhole_route(). The dst_free() marked the
dst_entry as dead and added it to the gc list. I.e. it was setup
for a one time usage. As a result we may now have a blackhole
route cached at a socket on some IPsec scenarios. This makes the
connection unusable.
Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.
Fixes: b838d5e1c5b6 ("ipv4: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent patch removed the dst_free() on the allocated
dst_entry in ipv6_blackhole_route(). The dst_free() marked
the dst_entry as dead and added it to the gc list. I.e. it
was setup for a one time usage. As a result we may now have
a blackhole route cached at a socket on some IPsec scenarios.
This makes the connection unusable.
Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.
Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Changing the TX ring parameters with an XDP program attached may
cause the XDP queues to be cleared and the TX rings to be incorrectly
configured.
Fix by doing correct ring accounting in setup call.
Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.
With this new flag we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
In ixgbe_clear_udp_tunnel_port(), we read the IXGBE_VXLANCTRL register
and then try to mask some bits out of the value, using the logical
instead of bitwise and operator.
Fixes: a21d0822ff69 ("ixgbe: add support for geneve Rx offload")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
In cases where PHY register access is not supported, don't mislead
a caller into thinking that it is supported by returning a PHY
address. Instead, return -EOPNOTSUPP when PHY access is not
supported.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.
However this breaks subsequent iptables calls:
# iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
# iptables -A INPUT -s 5.6.7.8 -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.
That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.
However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.
One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.
However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.
It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.
Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
[2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but
the real server maybe reply an icmp error packet related to the exist
tcp conntrack, so we will access wrong tcp data.
Fix it by checking for the protocol field and only process tcp traffic.
Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When a bundling message is received, the function tipc_link_input()
calls function tipc_msg_extract() to unbundle all inner messages of
the bundling message before adding them to input queue.
The function tipc_msg_extract() just clones all inner skb for all
inner messagges from the bundling skb. This means that the skb
headroom of an inner message overlaps with the data part of the
preceding message in the bundle.
If the message in question is a name addressed message, it may be
subject to a secondary destination lookup, and eventually be sent out
on one of the interfaces again. But, since what is perceived as headroom
by the device driver in reality is the last bytes of the preceding
message in the bundle, the latter will be overwritten by the MAC
addresses of the L2 header. If the preceding message has not yet been
consumed by the user, it will evenually be delivered with corrupted
contents.
This commit fixes this by uncloning all messages passing through the
function tipc_msg_lookup_dest(), hence ensuring that the headroom
is always valid when the message is passed on.
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We change the initialization of the skb transmit buffer queues
in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
initialize their spinlocks. This is needed because we may, during
error conditions, need to call skb_queue_purge() on those queues
further down the stack.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
we will get incorrect results (payload length, pcsum) for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload.
The issue was found with LTP vxlan & gre tests over ixgbe NIC.
Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.
In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.
With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.
For now, avoid this optimization until it can be properly re-added in
net-next.
Fixes: fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
while processing Rx = Ry instruction the verifier does
regs[insn->dst_reg] = regs[insn->src_reg]
which often clears write mark (when Ry doesn't have it)
that was just set by check_reg_arg(Rx) prior to the assignment.
That causes mark_reg_read() to keep marking Rx in this block as
REG_LIVE_READ (since the logic incorrectly misses that it's
screened by the write) and in many of its parents (until lucky
write into the same Rx or beginning of the program).
That causes is_state_visited() logic to miss many pruning opportunities.
Furthermore mark_reg_read() logic propagates the read mark
for BPF_REG_FP as well (though it's readonly) which causes
harmless but unnecssary work during is_state_visited().
Note that do_propagate_liveness() skips FP correctly,
so do the same in mark_reg_read() as well.
It saves 0.2 seconds for the test below
program before after
bpf_lb-DLB_L3.o 2604 2304
bpf_lb-DLB_L4.o 11159 3723
bpf_lb-DUNKNOWN.o 1116 1110
bpf_lxc-DDROP_ALL.o 34566 28004
bpf_lxc-DUNKNOWN.o 53267 39026
bpf_netdev.o 17843 16943
bpf_overlay.o 8672 7929
time ~11 sec ~4 sec
Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Should be "802.3ad" like everywhere else in the document.
Signed-off-by: Axel Beckert <abe@deuxchevaux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.
However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.
Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.
We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.
Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reported-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ppp_release() tries to ensure that netdevices are unregistered before
decrementing the unit refcount and running ppp_destroy_interface().
This is all fine as long as the the device is unregistered by
ppp_release(): the unregister_netdevice() call, followed by
rtnl_unlock(), guarantee that the unregistration process completes
before rtnl_unlock() returns.
However, the device may be unregistered by other means (like
ppp_nl_dellink()). If this happens right before ppp_release() calling
rtnl_lock(), then ppp_release() has to wait for the concurrent
unregistration code to release the lock.
But rtnl_unlock() releases the lock before completing the device
unregistration process. This allows ppp_release() to proceed and
eventually call ppp_destroy_interface() before the unregistration
process completes. Calling free_netdev() on this partially unregistered
device will BUG():
------------[ cut here ]------------
kernel BUG at net/core/dev.c:8141!
invalid opcode: 0000 [#1] SMP
CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
Call Trace:
ppp_destroy_interface+0xd8/0xe0 [ppp_generic]
ppp_disconnect_channel+0xda/0x110 [ppp_generic]
ppp_unregister_channel+0x5e/0x110 [ppp_generic]
pppox_unbind_sock+0x23/0x30 [pppox]
pppoe_connect+0x130/0x440 [pppoe]
SYSC_connect+0x98/0x110
? do_fcntl+0x2c0/0x5d0
SyS_connect+0xe/0x10
entry_SYSCALL_64_fastpath+0x1a/0xa5
RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88
---[ end trace ed294ff0cc40eeff ]---
We could set the ->needs_free_netdev flag on PPP devices and move the
ppp_destroy_interface() logic in the ->priv_destructor() callback. But
that'd be quite intrusive as we'd first need to unlink from the other
channels and units that depend on the device (the ones that used the
PPPIOCCONNECT and PPPIOCATTACH ioctls).
Instead, we can just let the netdevice hold a reference on its
ppp_file. This reference is dropped in ->priv_destructor(), at the very
end of the unregistration process, so that neither ppp_release() nor
ppp_disconnect_channel() can call ppp_destroy_interface() in the interim.
Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|