| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The function returns always non-negative values.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We found that a TCP Fast Open passive connection was vulnerable
to reorders, as the exchange might look like
[1] C -> S S <FO ...> <request>
[2] S -> C S. ack request <options>
[3] S -> C . <answer>
packets [2] and [3] can be generated at almost the same time.
If C receives the 3rd packet before the 2nd, it will drop it as
the socket is in SYN_SENT state and expects a SYNACK.
S will have to retransmit the answer.
Current OOO avoidance in linux is defeated because SYNACK
packets are attached to the LISTEN socket, while DATA packets
are attached to the children. They might be sent by different cpus,
and different TX queues might be selected.
It turns out that for TFO, we created a child, which is a
full blown socket in TCP_SYN_RECV state, and we simply can attach
the SYNACK packet to this socket.
This means that at the time tcp_sendmsg() pushes DATA packet,
skb->ooo_okay will be set iff the SYNACK packet had been sent
and TX completed.
This removes the reorder source at the host level.
We also removed the export of tcp_try_fastopen(), as it is no
longer called from IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-09-28
This series contains updates to i40e, i40evf and igb to resolve issues
seen and reported by Red Hat.
Kiran moves i40e_get_head() in preparation for the refactor of the Tx
timeout logic, so that it can be used in other areas of the driver.
Refactored the driver timeout logic by issuing a writeback request via
a software interrupt to the hardware the first time the driver detects
a hang. This was due to the driver being too aggressive in resetting a
hung queue.
Shannon adds the GRE protocol to the transmit checksum encoding.
Anjali fixes an issue of forcing writeback too often, which caused us to
not benefit from NAPI. We now disable force writeback in the clean
routine for X710 and XL710 adapters. The X722 adapters do not enable
interrupt to force a writeback and benefit from WB_ON_ITR and so force
WB is left enabled for those adapters. Fixed a possible deadlock issue
where sync_vsi_filters() can be called directly under RTNL or through
the timer subtask without RTNL. So update the flow to see if we are
already under RTNL before trying to grab it.
Stefan Assmann provides a fix for igb where SR-IOV was not getting
enabled properly and we ran into a NULL pointer if the max_vfs module
parameter is specified. This is prevented by setting the
IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs().
v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch
to the series, as it resolves another issues seen and reported by
Red Hat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()
This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).
[ 7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
[ 7.419370] PGD 0
[ 7.419373] Oops: 0002 [#1] SMP
[ 7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[ 7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[ 7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
[...]
[ 7.419431] Call Trace:
[ 7.419442] [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
[ 7.419447] [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0
Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The sync_vsi_filters function can be called directly under RTNL
or through the timer subtask without one. This was causing a deadlock.
If sync_vsi_filters is called from a thread which held the lock,
and in another thread the PROMISC setting got changed we would
be executing the PROMISC change in the thread which already held
the lock alongside the other filter update. The PROMISC change
requires a reset if we are on a VEB, which requires it to be called
under RTNL.
Earlier the driver would call reset for PROMISC change without
checking if we were already under RTNL and would try to grab it
causing a deadlock. This patch changes the flow to see if we are
already under RTNL before trying to grab it.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes the issue of forcing WB too often causing us to not
benefit from NAPI.
Without this patch we were forcing WB/arming interrupt too often taking
away the benefits of NAPI and causing a performance impact.
With this patch we disable force WB in the clean routine for X710
and XL710 adapters. X722 adapters do not enable interrupt to force
a WB and benefit from WB_ON_ITR and hence force WB is left enabled
for those adapters.
For XL710 and X710 adapters if we have less than 4 packets pending
a software Interrupt triggered from service task will force a WB.
This patch also changes the conditions for setting RS bit as described
in code comments. This optimizes when the HW does a tail bump and when
it does a WB. It also optimizes when we do a wmb.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Make sure the Tx checksum encoder knows about GRE protocol and sets the
descriptor flag appropriately.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch modifies the driver timeout logic by issuing a writeback
request via a software interrupt to the hardware the first time the
driver detects a hang. The driver was too aggressive in resetting a hung
queue, so back that off by removing logic to down the netdevice after
too many hangs, and move the function to the service task.
Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|/
|
|
|
|
|
|
|
| |
i40e_get_head needs to be called in multiple files in a further patch,
prepare by moving the function into a header file.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
|
|
|
|
|
|
| |
Allow bridge forward delay to be configured when Spanning Tree is enabled.
Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Jiri Benc says:
====================
vxlan: support both IPv4 and IPv6 sockets
Note: this needs net merged into net-next in order to apply.
It's currently not easy enough to work with metadata based vxlan tunnels. In
particular, it's necessary to create separate network interfaces for IPv4
and IPv6 tunneling. Assigning an IPv6 address to an IPv4 interface is
allowed yet won't do what's expected. With route based tunneling, one has to
pay attention to use the vxlan interface opened with the correct family.
Other users of this (openvswitch) would need to always create two vxlan
interfaces.
Furthermore, there's no sane API for creating an IPv6 vxlan metadata based
interface.
This patchset simplifies this by opening both IPv4 and IPv6 socket if the
vxlan interface has the metadata flag (IFLA_VXLAN_COLLECT_METADATA) set.
Assignment of addresses etc. works as expected after this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is
much more user friendly: it's not necessary to create two vxlan interfaces
and pay attention to using the right one in routing rules.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
| |
Make vxlan_sock_add both alloc the socket and attach it to vxlan_dev. Let
vxlan_sock_release accept vxlan_dev as its parameter instead of vxlan_sock.
This makes vxlan_sock_add and vxlan_sock release complementary. It reduces
code duplication in the next patch.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When fixing the TSO support I noticed we just mask ->gso_size with the
MSSMask value and don't care about the consequences.
Provide a .ndo_features_check() method which drops the NETIF_F_TSO
feature for any skb which would exceed the maximum, and thus forces it
to be segmented by software.
Then we can stop the masking in cp_start_xmit(), and just WARN if the
maximum is exceeded, which should now never happen.
Finally, Francois Romieu noticed that we didn't even have the right
value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
I fixed TSO. Hardware checksum and scatter/gather also appear to be
working correctly both on real hardware and in QEMU's emulation.
Let's enable them by default and see if anyone screams...
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
net/ipv4/arp.c
The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking fixes from David Miller:
1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
instead of cloning them. From Daniel Borkmann.
2) When converting classical BPF into eBPF, fix the setting of the
source reg to BPF_REG_X. From Tycho Andersen.
3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
Linus Lussing.
4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
Roopa Prabhu.
6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
Florian Westphal.
8) Check DMA mapping errors in bna driver, from Ivan Vecera.
9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
misclearing of interrupt status in TX timeout handler, etc.) from
David Woodhouse.
10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
Hugne.
11) Fix autobind races et al. in netlink code, from Herbert Xu with
help from Tejun Heo and others.
12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
Eric Dumazet.
14) Fix array overruns in mac80211, from Johannes Berg and Dan
Carpenter.
15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
17) Fix byte order in geneve tunnel port config, from John W Linville.
18) Fix handling of ARP replies over lightweight tunnels, from Jiri
Benc.
19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
Kok and Roopa Prabhu.
20) Several reference count handling bug fixes in the PHY/MDIO layer
from Russel King.
21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
22) Fix crash in icmp_route_lookup(), from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: Fix panic in icmp_route_lookup
net: update docbook comment for __mdiobus_register()
ppp: fix lockdep splat in ppp_dev_uninit()
net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
phy: marvell: add link partner advertised modes
net: fix net_device refcounting
phy: add phy_device_remove()
phy: fixed-phy: properly validate phy in fixed_phy_update_state()
net: fix phy refcounting in a bunch of drivers
of_mdio: fix MDIO phy device refcounting
phy: add proper phy struct device refcounting
phy: fix mdiobus module safety
net: dsa: fix of_mdio_find_bus() device refcount leak
phy: fix of_mdio_find_bus() device refcount leak
ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
ip6_gre: Reduce log level in ip6gre_err() to debug
fib_rules: fix fib rule dumps across multiple skbs
bnx2x: byte swap rss_key to comply to Toeplitz specs
net: revert "net_sched: move tp->root allocation into fw_init()"
lwtunnel: remove source and destination UDP port config option
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Andrey reported a panic:
[ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
[ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014 08/02/2006
[ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
[ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
[ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
[ 7249.867077] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
[ 7249.867141] Stack:
[ 7249.867165] 320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268] f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361] 8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494] [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534] [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576] [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615] [<c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648] [<c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681] [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714] [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746] [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780] [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838] [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889] [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933] [<c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970] [<c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002] [<c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034] [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066] [<c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097] [<c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129] [<c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163] [<c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196] [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227] [<c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257] [<c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289] [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322] [<c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353] [<c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384] [<c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415] [<c167e280>] ip_rcv+0x2e0/0x420
...
Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.
Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <temnota.am@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Update the docbook comment for __mdiobus_register() to include the new
module owner argument. This resolves a warning found by the 0-day
builder.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
ppp_create_interface() must then lock these mutexes in that same order
to avoid possible deadlock.
[ 120.880011] ======================================================
[ 120.880011] [ INFO: possible circular locking dependency detected ]
[ 120.880011] 4.2.0 #1 Not tainted
[ 120.880011] -------------------------------------------------------
[ 120.880011] ppp-apitest/15827 is trying to acquire lock:
[ 120.880011] (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[ 120.880011]
[ 120.880011] but task is already holding lock:
[ 120.880011] (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[ 120.880011]
[ 120.880011] which lock already depends on the new lock.
[ 120.880011]
[ 120.880011]
[ 120.880011] the existing dependency chain (in reverse order) is:
[ 120.880011]
[ 120.880011] -> #1 (rtnl_mutex){+.+.+.}:
[ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[ 120.880011] [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[ 120.880011] [<ffffffff812d9d94>] register_netdev+0x11/0x27
[ 120.880011] [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
[ 120.880011] [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
[ 120.880011] [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
[ 120.880011] [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 120.880011]
[ 120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
[ 120.880011] [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76
[ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[ 120.880011] [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[ 120.880011] [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252
[ 120.880011] [<ffffffff812d5381>] rollback_registered+0x29/0x38
[ 120.880011] [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
[ 120.880011] [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
[ 120.880011] [<ffffffff8112d9f6>] __fput+0xec/0x192
[ 120.880011] [<ffffffff8112dacc>] ____fput+0x9/0xb
[ 120.880011] [<ffffffff8105447a>] task_work_run+0x66/0x80
[ 120.880011] [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
[ 120.880011] [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
[ 120.880011] [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
[ 120.880011]
[ 120.880011] other info that might help us debug this:
[ 120.880011]
[ 120.880011] Possible unsafe locking scenario:
[ 120.880011]
[ 120.880011] CPU0 CPU1
[ 120.880011] ---- ----
[ 120.880011] lock(rtnl_mutex);
[ 120.880011] lock(&pn->all_ppp_mutex);
[ 120.880011] lock(rtnl_mutex);
[ 120.880011] lock(&pn->all_ppp_mutex);
[ 120.880011]
[ 120.880011] *** DEADLOCK ***
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The builds of allmodconfig of avr32 is failing with:
drivers/net/ethernet/via/via-rhine.c:1098:2: error: implicit declaration
of function 'pci_iomap' [-Werror=implicit-function-declaration]
drivers/net/ethernet/via/via-rhine.c:1119:2: error: implicit declaration
of function 'pci_iounmap' [-Werror=implicit-function-declaration]
The generic empty pci_iomap and pci_iounmap is used only if CONFIG_PCI
is not defined and CONFIG_GENERIC_PCI_IOMAP is defined.
Add GENERIC_PCI_IOMAP in the dependency list for VIA_RHINE as we are
getting build failure when CONFIG_PCI and CONFIG_GENERIC_PCI_IOMAP both
are not defined.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Read the standard link partner advertisment registers and store it in
phydev->lp_advertising, so ethtool can report this information to
userspace via ethtool. Zero it as per genphy if autonegotiation is
disabled. Tested with a Marvell 88E1512 PHY.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Russell King says:
====================
Phy, mdiobus, and netdev struct device fixes
The third version of this series fixes the build error which David
identified, and drops the broken changes for the Cavium Thunger BGX
ethernet driver as this driver requires some complex changes to
resolve the leakage - and this is best done by people who can test
the driver.
Compared to v2, the only patch which has changed is patch 6
"net: fix phy refcounting in a bunch of drivers"
I _think_ I've been able to build-test all the drivers touched by
that patch to some degree now, though several of them needed the
Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
their dependencies.)
Previous cover letters below:
This is the second version of the series, with the comments David had
on the first patch fixed up. Original series description with updated
diffstat below.
While looking at the DSA code, I noticed we have a
of_find_net_device_by_node(), and it looks like users of that are
similarly buggy - it looks like net/dsa/dsa.c is the only user. Fix
that too.
Hi,
While looking at the phy code, I identified a number of weaknesses
where refcounting on device structures was being leaked, where
modules could be removed while in-use, and where the fixed-phy could
end up having unintended consequences caused by incorrect calls to
fixed_phy_update_state().
This patch series resolves those issues, some of which were discovered
with testing on an Armada 388 board. Not all patches are fully tested,
particularly the one which touches several network drivers.
When resolving the struct device refcounting problems, several different
solutions were considered before settling on the implementation here -
one of the considerations was to avoid touching many network drivers.
The solution here is:
phy_attach*() - takes a refcount
phy_detach*() - drops the phy_attach refcount
Provided drivers always attach and detach their phys, which they should
already be doing, this should change nothing, even if they leak a refcount.
of_phy_find_device() and of_* functions which use that take
a refcount. Arrange for this refcount to be dropped once
the phy is attached.
This is the reason why the previous change is important - we can't drop
this refcount taken by of_phy_find_device() until something else holds
a reference on the device. This resolves the leaked refcount caused by
using of_phy_connect() or of_phy_attach().
Even without the above changes, these drivers are leaking by calling
of_phy_find_device(). These drivers are addressed by adding the
appropriate release of that refcount.
The mdiobus code also suffered from the same kind of leak, but thankfully
this only happened in one place - the mdio-mux code.
I also found that the try_module_get() in the phy layer code was utterly
useless: phydev->dev.driver was guaranteed to always be NULL, so
try_module_get() was always being called with a NULL argument. I proved
this with my SFP code, which declares its own MDIO bus - the module use
count was never incremented irrespective of how I set the MDIO bus up.
This allowed the MDIO bus code to be removed from the kernel while there
were still PHYs attached to it.
One other bug was discovered: while using in-band-status with mvneta, it
was found that if a real phy is attached with in-band-status enabled,
and another ethernet interface is using the fixed-phy infrastructure, the
interface using the fixed-phy infrastructure is configured according to
the other interface using the in-band-status - which is caused by the
fixed-phy code not verifying that the phy_device passed in is actually
a fixed-phy device, rather than a real MDIO phy.
Lastly, having mdio_bus reversing phy_device_register() internals seems
like a layering violation - it's trivial to move that code to the phy
device layer.
====================
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
of_find_net_device_by_node() uses class_find_device() internally to
lookup the corresponding network device. class_find_device() returns
a reference to the embedded struct device, with its refcount
incremented.
Add a comment to the definition in net/core/net-sysfs.c indicating the
need to drop this refcount, and fix the DSA code to drop this refcount
when the OF-generated platform data is cleaned up and freed. Also
arrange for the ref to be dropped when handling errors.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add a phy_device_remove() function to complement phy_device_register(),
which undoes the effects of phy_device_register() by removing the phy
device from visibility, but not freeing it.
This allows these details to be moved out of the mdio bus code into
the phy code where this action belongs.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Validate that the phy_device passed into fixed_phy_update_state() is a
fixed-phy device before walking the list of phys for a fixed phy at the
same address.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
of_phy_find_device() increments the phy struct device refcount, which
we need to properly balance. Add code to network drivers using this
function to ensure that the struct device refcount is correctly
balanced.
For xgene, looking back in the history, we should be able to use
of_phy_connect() with a zero flags argument for the DT case as this is
how the driver used to operate prior to de7b5b3d790a ("net: eth: xgene:
change APM X-Gene SoC platform ethernet to support ACPI").
This leaves the Cavium Thunder BGX unfixed; fixing this driver is a
complicated task, one which the maintainers need to be involved with.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
bus_find_device() is defined as:
* This is similar to the bus_for_each_dev() function above, but it
* returns a reference to a device that is 'found' for later use, as
* determined by the @match callback.
and it does indeed return a reference-counted pointer to the device:
while ((dev = next_device(&i)))
if (match(dev, data) && get_device(dev))
^^^^^^^^^^^^^^^
break;
klist_iter_exit(&i);
return dev;
What that means is that when we're done with the struct device, we must
drop that reference. Neither of_phy_connect() nor of_phy_attach() did
this when phy_connect_direct() or phy_attach_direct() failed.
With our previous patch, phy_connect_direct() and phy_attach_direct()
take a new refcount on the phy device when successful, so we can drop
our local reference immediatley after these functions, whether or not
they succeeded.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Take a refcount on the phy struct device when the phy device is attached
to a network device, and drop it after it's detached. This ensures that
a refcount is held on the phy device while the device is being used by
a network device, thereby preventing the phy_device from being
unexpectedly kfree()'d by phy_device_release().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Re-implement the mdiobus module refcounting to ensure that we actually
ensure that the mdiobus module code does not go away while we might call
into it.
The old scheme using bus->dev.driver was buggy, because bus->dev is a
class device which never has a struct device_driver associated with it,
and hence the associated code trying to obtain a refcount did nothing
useful.
Instead, take the approach that other subsystems do: pass the module
when calling mdiobus_register(), and record that in the mii_bus struct.
When we need to increment the module use count in the phy code, use
this stored pointer. When the phy is deteched, drop the module
refcount, remembering that the phy device might go away at that point.
This doesn't stop the mii_bus going away while there are in-use phys -
it merely stops the underlying code vanishing.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Current users of of_mdio_find_bus() leak a struct device refcount, as
they fail to clean up the reference obtained inside class_find_device().
Fix the DSA code to properly refcount the returned MDIO bus by:
1. taking a reference on the struct device whenever we assign it to
pd->chip[x].host_dev.
2. dropping the reference when we overwrite the existing reference.
3. dropping the reference when we free the data structure.
4. dropping the initial reference we obtained after setting up the
platform data structure, or on failure.
In step 2 above, where we obtain a new MDIO bus, there is no need to
take a reference on it as we would only have to drop it immediately
after assignment again, iow:
put_device(cd->host_dev); /* drop original assignment ref */
cd->host_dev = get_device(&mdio_bus_switch->dev); /* get our ref */
put_device(&mdio_bus_switch->dev); /* drop of_mdio_find_bus ref */
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
of_mdio_find_bus() leaks a struct device refcount, caused by using
class_find_device() and not realising that the device reference has
its refcount incremented:
* Note, you will need to drop the reference with put_device() after use.
...
while ((dev = class_dev_iter_next(&iter))) {
if (match(dev, data)) {
get_device(dev);
break;
}
Update the comment, and arrange for the phy code to drop this refcount
when disposing of a reference to it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently error log messages in ip6_tnl_err are printed at 'warn'
level. This is different to other tunnel types which don't print
any messages. These log messages don't provide any information that
couldn't be deduced with networking tools. Also it can be annoying
to have one end of the tunnel go down and have the logs fill with
pointless messages such as "Path to destination invalid or inactive!".
This patch reduces the log level of these messages to 'dbg' level to
bring the visible behaviour into line with other tunnel types.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two small fixes:
* VHT MCS mask array overrun, reported by Dan Carpenter
* reset CQM history to always get a notification, from Sara Sharon
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The current behavior of notifying CQM events is inconsistent:
Upon first configuration there is a cqm event with the current
status according to threshold configured, regardless of signal
stability.
When there is reconfiguration no event is sent unless there is
a significant change to the signal level according to the new
configuration.
Since the current reconfiguration behavior might cause missing
CQM events in case the current signal did not change but is on
the other side of the new threshold, fix that by resetting the
stored signal level upon reconfiguration.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The HT MCS mask has 9 bytes, the VHT one only has 8 streams.
Split the loops to handle this correctly.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently error log messages in ip6gre_err are printed at 'warn'
level. This is different to most other tunnel types which don't
print any messages. These log messages don't provide any information
that couldn't be deduced with networking tools. Also it can be annoying
to have one end of the tunnel go down and have the logs fill with
pointless messages such as "Path to destination invalid or inactive!".
This patch reduces the log level of these messages to 'dbg' level to
bring the visible behaviour into line with other tunnel types.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.
This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After a good amount of debugging, I found bnx2x was byte swaping
the 40 bytes of rss_key.
If we byte swap the key, then bnx2x generates hashes matching
MSDN specs as documented in (Verifying the RSS Hash Calculation)
https://msdn.microsoft.com/en-us/library/windows/hardware/ff571021%
28v=vs.85%29.aspx
It is mostly a non issue, unless we want to mix different NIC
in a host, and want consistent hashing among all of them, ie
if they all use the boot time generated rss key, or if some application
is choosing specific tuple(s) so that incoming traffic lands into known
rx queue(s).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
fw filter uses tp->root==NULL to check if it is the old method,
so it doesn't need allocation at all in this case. This patch
reverts the offending commit and adds some comments for old
method to make it obvious.
Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()")
Reported-by: Akshat Kakkar <akshat.1984@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Jiri Benc says:
====================
lwtunnel: make it really work, for IPv4
One of the selling points of lwtunnel was the ability to specify the tunnel
destination using routes. However, this doesn't really work currently, as
ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
have tunnel metadata attached, thus they will be sent out with the default
parameters or not sent at all, either way never reaching the requester.
Most of the egress tunnel parameters can be inferred from the ingress
metada. The only and important exception is UDP ports. This patchset infers
the egress data from the ingress data and disallow settings of UDP ports in
tunnel routes. If there's a need for different UDP ports, a new interface
needs to be created for each port combination. Note that it's still possible
to specify the UDP ports to use, it just needs to be done while creating the
vxlan/geneve interface.
This covers only ARPs. IPv6 ndisc has the same problem but is harder to
solve, as there's already dst attached to outgoing skbs. Ideas to solve this
are welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.
As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.
Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).
If in the future we rework ARP handling to allow port specification, the
attributes can be added back.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/ /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.
We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.
The only thing to do is to "reverse" the ip_tunnel_info.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The device is set as wakeup capable using proper wakeup API but the
driver misuses IRQF_NO_SUSPEND to set the interrupt as wakeup source
which is incorrect.
This patch removes the use of IRQF_NO_SUSPEND flags replacing it with
enable_irq_wake instead.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). Following patch fixes the bug by setting
checksum-none while pulling outer header.
Following is the kernel panic msg from old kernel hitting the bug.
------------[ cut here ]------------
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
Call Trace:
<IRQ>
[<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
[<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
[<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
[<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
[<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
[<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
[<ffffffff81550128>] ip_local_deliver+0x88/0x90
[<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
[<ffffffff81550365>] ip_rcv+0x235/0x300
[<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
[<ffffffff8151c360>] netif_receive_skb+0x80/0x90
[<ffffffff81459935>] virtnet_poll+0x555/0x6f0
[<ffffffff8151cd04>] net_rx_action+0x134/0x290
[<ffffffff810683d8>] __do_softirq+0xa8/0x210
[<ffffffff8162fe6c>] call_softirq+0x1c/0x30
[<ffffffff810161a5>] do_softirq+0x65/0xa0
[<ffffffff810687be>] irq_exit+0x8e/0xb0
[<ffffffff81630733>] do_IRQ+0x63/0xe0
[<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
Reported-by: Anupam Chanda <achanda@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way. You have to pair store_release
> and load_acquire. Besides, it isn't a particularly good idea to
OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.
> depend on memory barriers embedded in other data structures like the
> above. Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.
But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.
> There's no reason to be overly smart here. This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.
It's not about being overly smart. It's about actually understanding
what's going on with the code. I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.
> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
> > }
> > }
> >
> > - if (!nlk->portid) {
> > + if (!nlk->bound) {
>
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable. That doesn't change anything. Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.
The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.
However, there is a real bug here that none of these acquire/release
helpers discovered. The two bound tests here used to be a single
one. Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket. So we need to
repeat the portid check in order to maintain consistency.
> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
> > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> > return -EPERM;
> >
> > - if (!nlk->portid)
> > + if (!nlk->bound)
>
> Don't we need load_acquire here too? Is this path holding a lock
> which makes that unnecessary?
Ditto.
---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.
Tejun is right that a barrier is unavoidable. Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.
Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.
I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one. This
combined with only reading nlk->bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.
Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nacked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is primarily for consistancy with vxlan and other tunnels which
use network byte order for similar parameters.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We are seeing unexplained TX timeouts under heavy load. Let's try to get
a better idea of what's going on.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The low 16 bits of the 'opts1' field in the TX descriptor are supposed
to still contain the buffer length when the descriptor is handed back to
us. In practice, at least on my hardware, they don't. So stash the
original value of the opts1 field and get the length to unmap from
there.
There are other ways we could have worked out the length, but I actually
want a stash of the opts1 field anyway so that I can dump it alongside
the contents of the descriptor ring when we suffer a TX timeout.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We calculate the value of the opts1 descriptor field in three different
places. With two different behaviours when given an invalid packet to
be checksummed — none of them correct. Sort that out.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|