summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds2020-08-062088-40877/+117368
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
| * net: thunderx: initialize VF's mailbox mutex before first usageDean Nelson2020-08-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A VF's mailbox mutex is not getting initialized by nicvf_probe() until after it is first used. And such usage is resulting in... [ 28.270927] ------------[ cut here ]------------ [ 28.270934] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 28.270980] WARNING: CPU: 9 PID: 675 at kernel/locking/mutex.c:938 __mutex_lock+0xdac/0x12f0 [ 28.270985] Modules linked in: ast(+) nicvf(+) i2c_algo_bit drm_vram_helper drm_ttm_helper ttm nicpf(+) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ixgbe(+) sg thunder_bgx mdio i2c_thunderx mdio_thunder thunder_xcv mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [ 28.271064] CPU: 9 PID: 675 Comm: systemd-udevd Not tainted 4.18.0+ #1 [ 28.271070] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 [ 28.271078] pstate: 60000005 (nZCv daif -PAN -UAO) [ 28.271086] pc : __mutex_lock+0xdac/0x12f0 [ 28.271092] lr : __mutex_lock+0xdac/0x12f0 [ 28.271097] sp : ffff800d42146fb0 [ 28.271103] x29: ffff800d42146fb0 x28: 0000000000000000 [ 28.271113] x27: ffff800d24361180 x26: dfff200000000000 [ 28.271122] x25: 0000000000000000 x24: 0000000000000002 [ 28.271132] x23: ffff20001597cc80 x22: ffff2000139e9848 [ 28.271141] x21: 0000000000000000 x20: 1ffff001a8428e0c [ 28.271151] x19: ffff200015d5d000 x18: 1ffff001ae0f2184 [ 28.271160] x17: 0000000000000000 x16: 0000000000000000 [ 28.271170] x15: ffff800d70790c38 x14: ffff20001597c000 [ 28.271179] x13: ffff20001597cc80 x12: ffff040002b2f779 [ 28.271189] x11: 1fffe40002b2f778 x10: ffff040002b2f778 [ 28.271199] x9 : 0000000000000000 x8 : 00000000f1f1f1f1 [ 28.271208] x7 : 00000000f2f2f2f2 x6 : 0000000000000000 [ 28.271217] x5 : 1ffff001ae0f2186 x4 : 1fffe400027eb03c [ 28.271227] x3 : dfff200000000000 x2 : ffff1001a8428dbe [ 28.271237] x1 : c87fdfac7ea11d00 x0 : 0000000000000000 [ 28.271246] Call trace: [ 28.271254] __mutex_lock+0xdac/0x12f0 [ 28.271261] mutex_lock_nested+0x3c/0x50 [ 28.271297] nicvf_send_msg_to_pf+0x40/0x3a0 [nicvf] [ 28.271316] nicvf_register_misc_interrupt+0x20c/0x328 [nicvf] [ 28.271334] nicvf_probe+0x508/0xda0 [nicvf] [ 28.271344] local_pci_probe+0xc4/0x180 [ 28.271352] pci_device_probe+0x3ec/0x528 [ 28.271363] driver_probe_device+0x21c/0xb98 [ 28.271371] device_driver_attach+0xe8/0x120 [ 28.271379] __driver_attach+0xe0/0x2a0 [ 28.271386] bus_for_each_dev+0x118/0x190 [ 28.271394] driver_attach+0x48/0x60 [ 28.271401] bus_add_driver+0x328/0x558 [ 28.271409] driver_register+0x148/0x398 [ 28.271416] __pci_register_driver+0x14c/0x1b0 [ 28.271437] nicvf_init_module+0x54/0x10000 [nicvf] [ 28.271447] do_one_initcall+0x18c/0xc18 [ 28.271457] do_init_module+0x18c/0x618 [ 28.271464] load_module+0x2bc0/0x4088 [ 28.271472] __se_sys_finit_module+0x110/0x188 [ 28.271479] __arm64_sys_finit_module+0x70/0xa0 [ 28.271490] el0_svc_handler+0x15c/0x380 [ 28.271496] el0_svc+0x8/0xc [ 28.271502] irq event stamp: 52649 [ 28.271513] hardirqs last enabled at (52649): [<ffff200011b4d790>] _raw_spin_unlock_irqrestore+0xc0/0xd8 [ 28.271522] hardirqs last disabled at (52648): [<ffff200011b4d3c4>] _raw_spin_lock_irqsave+0x3c/0xf0 [ 28.271530] softirqs last enabled at (52330): [<ffff200010082af4>] __do_softirq+0xacc/0x117c [ 28.271540] softirqs last disabled at (52313): [<ffff20001019b354>] irq_exit+0x3cc/0x500 [ 28.271545] ---[ end trace a9b90324c8a0d4ee ]--- This problem is resolved by moving the call to mutex_init() up earlier in nicvf_probe(). Fixes: 609ea65c65a0 ("net: thunderx: add mutex to protect mailbox from concurrent calls for same VF") Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'misc-bug-fixes-for-the-hso-driver'David S. Miller2020-08-061-9/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oliver Neukum says: ==================== misc bug fixes for the hso driver 1. Code reuse led to an unregistration of a net driver that has not been registered 2. The kernel complains generically if kmalloc with GFP_KERNEL fails 3. A race that can lead to an URB that is in use being reused or a use after free ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * usb: hso: remove bogus check for EINPROGRESSOliver Neukum2020-08-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This check an inherent race. It opens a race where an error code has already been set or cleared yet the URB has not been given back. We cannot do such an optimization and must unlink unconditionally. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * usb: hso: no complaint about kmalloc failureOliver Neukum2020-08-061-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | If this fails, kmalloc() will print a report including a stack trace. There is no need for a separate complaint. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * hso: fix bailout in error case of probeOliver Neukum2020-08-061-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | The driver tries to reuse code for disconnect in case of a failed probe. If resources need to be freed after an error in probe, the netdev must not be freed because it has never been registered. Fix it by telling the helper which path we are in. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUMStefano Brivio2020-08-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On architectures defining _HAVE_ARCH_IPV6_CSUM, we get csum_ipv6_magic() defined by means of arch checksum.h headers. On other architectures, we actually need to include net/ip6_checksum.h to be able to use it. Without this include, building with defconfig breaks at least for s390. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests/net: relax cpu affinity requirement in msg_zerocopy testWillem de Bruijn2020-08-051-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The msg_zerocopy test pins the sender and receiver threads to separate cores to reduce variance between runs. But it hardcodes the cores and skips core 0, so it fails on machines with the selected cores offline, or simply fewer cores. The test mainly gives code coverage in automated runs. The throughput of zerocopy ('-z') and non-zerocopy runs is logged for manual inspection. Continue even when sched_setaffinity fails. Just log to warn anyone interpreting the data. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mptcp: be careful on subflow creationPaolo Abeni2020-08-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nicolas reported the following oops: [ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 1521.394189] #PF: supervisor read access in kernel mode [ 1521.395376] #PF: error_code(0x0000) - not-present page [ 1521.396607] PGD 0 P4D 0 [ 1521.397156] Oops: 0000 [#1] SMP PTI [ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109 [ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1521.401728] Workqueue: events mptcp_worker [ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0 [ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b [ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246 [ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000 [ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80 [ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00 [ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0 [ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000 [ 1521.417592] FS: 0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000 [ 1521.419490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0 [ 1521.422511] Call Trace: [ 1521.423103] __mptcp_subflow_connect+0x94/0x1f0 [ 1521.425376] mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0 [ 1521.426736] mptcp_worker+0x31b/0x390 [ 1521.431324] process_one_work+0x1fc/0x3f0 [ 1521.432268] worker_thread+0x2d/0x3b0 [ 1521.434197] kthread+0x117/0x130 [ 1521.435783] ret_from_fork+0x22/0x30 on some unconventional configuration. The MPTCP protocol is trying to create a subflow for an unaccepted server socket. That is allowed by the RFC, even if subflow creation will likely fail. Unaccepted sockets have still a NULL sk_socket field, avoid the issue by failing earlier. Reported-and-tested-by: Nicolas Rybowski <nicolas.rybowski@tessares.net> Fixes: 7d14b0d2b9b3 ("mptcp: set correct vfs info for subflows") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'selftests-rtnetlink-Fix-for-false-negative-return-values'David S. Miller2020-08-051-22/+46
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Po-Hsu Lin says: ==================== selftests: rtnetlink: Fix for false-negative return values This patchset will address the false-negative return value issue caused by the following: 1. The return value "ret" in this script will be reset to 0 from the beginning of each sub-test in rtnetlink.sh, therefore this rtnetlink test will always pass if the last sub-test has passed. 2. The test result from two sub-tests in kci_test_encap() were not being processed, thus they will not affect the final test result of this test. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * selftests: rtnetlink: make kci_test_encap() return sub-test resultPo-Hsu Lin2020-08-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kci_test_encap() is actually composed by two different sub-tests, kci_test_encap_vxlan() and kci_test_encap_fou() Therefore we should check the test result of these two in kci_test_encap() to let the script be aware of the pass / fail status. Otherwise it will generate false-negative result like below: $ sudo ./test.sh PASS: policy routing PASS: route get PASS: preferred_lft addresses have expired PASS: promote_secondaries complete PASS: tc htb hierarchy PASS: gre tunnel endpoint PASS: gretap PASS: ip6gretap PASS: erspan PASS: ip6erspan PASS: bridge setup PASS: ipv6 addrlabel PASS: set ifalias 5b193daf-0a08-46d7-af2c-e7aadd422ded for test-dummy0 PASS: vrf PASS: vxlan FAIL: can't add fou port 7777, skipping test PASS: macsec PASS: bridge fdb get PASS: neigh get $ echo $? 0 Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * selftests: rtnetlink: correct the final return value for the testPo-Hsu Lin2020-08-051-22/+43
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value "ret" will be reset to 0 from the beginning of each sub-test in rtnetlink.sh, therefore this test will always pass if the last sub-test has passed: $ sudo ./rtnetlink.sh PASS: policy routing PASS: route get PASS: preferred_lft addresses have expired PASS: promote_secondaries complete PASS: tc htb hierarchy PASS: gre tunnel endpoint PASS: gretap PASS: ip6gretap PASS: erspan PASS: ip6erspan PASS: bridge setup PASS: ipv6 addrlabel PASS: set ifalias a39ee707-e36b-41d3-802f-63179ed4d580 for test-dummy0 PASS: vrf PASS: vxlan FAIL: can't add fou port 7777, skipping test PASS: macsec PASS: ipsec 3,7c3,7 < sa[0] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1 < sa[0] key=0x31323334 35363738 39303132 33343536 < sa[1] rx ipaddr=0x00000000 00000000 00000000 c0a87b03 < sa[1] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1 < sa[1] key=0x31323334 35363738 39303132 33343536 --- > sa[0] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 > sa[0] key=0x34333231 38373635 32313039 36353433 > sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0 > sa[1] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 > sa[1] key=0x34333231 38373635 32313039 36353433 FAIL: ipsec_offload incorrect driver data FAIL: ipsec_offload PASS: bridge fdb get PASS: neigh get $ echo $? 0 Make "ret" become a local variable for all sub-tests. Also, check the sub-test results in kci_test_rtnl() and return the final result for this test. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dsa: sja1105: use detected device id instead of DT one on mismatchVladimir Oltean2020-08-051-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although we can detect the chip revision 100% at runtime, it is useful to specify it in the device tree compatible string too, because otherwise there would be no way to assess the correctness of device tree bindings statically, without booting a board (only some switch versions have internal RGMII delays and/or an SGMII port). But for testing the P/Q/R/S support, what I have is a reworked board with the SJA1105T replaced by a pin-compatible SJA1105Q, and I don't want to keep a separate device tree blob just for this one-off board. Since just the chip has been replaced, its RGMII delay setup is inherently the same (meaning: delays added by the PHY on the slave ports, and by PCB traces on the fixed-link CPU port). For this board, I'd rather have the driver shout at me, but go ahead and use what it found even if it doesn't match what it's been told is there. [ 2.970826] sja1105 spi0.1: Device tree specifies chip SJA1105T but found SJA1105Q, please fix it! [ 2.980010] sja1105 spi0.1: Probed switch chip: SJA1105Q [ 3.005082] sja1105 spi0.1: Enabled switch tagging Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'net-fix-a-mcast-issue-for-tipc-udp-media'David S. Miller2020-08-053-0/+49
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xin Long says: ==================== net: fix a mcast issue for tipc udp media Patch 1 is to add a function to get the dev by source address, which will be used by Patch 2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * tipc: set ub->ifindex for local ipv6 addressXin Long2020-08-051-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without ub->ifindex set for ipv6 address in tipc_udp_enable(), ipv6_sock_mc_join() may make the wrong dev join the multicast address in enable_mcast(). This causes that tipc links would never be created. So fix it by getting the right netdev and setting ub->ifindex, as it does for ipv4 address. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ipv6: add ipv6_dev_find()Xin Long2020-08-052-0/+41
| |/ | | | | | | | | | | | | | | | | | | | | This is to add an ip_dev_find like function for ipv6, used to find the dev by saddr. It will be used by TIPC protocol. So also export it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: openvswitch: silence suspicious RCU usage warningTonghao Zhang2020-08-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ovs_flow_tbl_destroy always is called from RCU callback or error path. It is no need to check if rcu_read_lock or lockdep_ovsl_is_held was held. ovs_dp_cmd_fill_info always is called with ovs_mutex, So use the rcu_dereference_ovsl instead of rcu_dereference in ovs_flow_tbl_masks_cache_size. Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable") Cc: Eelco Chaudron <echaudro@redhat.com> Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "vxlan: fix tos value before xmit"Hangbin Liu2020-08-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 71130f29979c7c7956b040673e6b9d5643003176. In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to make sure the tos value are filtered by RT_TOS() based on RFC1349. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | MBZ | +-----+-----+-----+-----+-----+-----+-----+-----+ But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELD, DSCP | ECN FIELD | +-----+-----+-----+-----+-----+-----+-----+-----+ So with IPTOS_TOS_MASK 0x1E RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) the first 3 bits DSCP info will get lost. To take all the DSCP info in xmit, we should revert the patch and just push all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later. Fixes: 71130f29979c ("vxlan: fix tos value before xmit") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ptp: only allow phase values lower than 1 periodVladimir Oltean2020-08-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The way we define the phase (the difference between the time of the signal's rising edge, and the closest integer multiple of the period), it doesn't make sense to have a phase value equal or larger than 1 period. So deny these settings coming from the user. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * farsync: switch from 'pci_' to 'dma_' APIChristophe JAILLET2020-08-051-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'fst_add_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * wan: wanxl: switch from 'pci_' to 'dma_' APIChristophe JAILLET2020-08-051-27/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'wanxl_pci_init_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. Moreover, just a few lines above, GFP_KERNEL is already used. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * hv_netvsc: do not use VF device if link is downStephen Hemminger2020-08-051-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the accelerated networking SRIOV VF device has lost carrier use the synthetic network device which is available as backup path. This is a rare case since if VF link goes down, normally the VMBus device will also loose external connectivity as well. But if the communication is between two VM's on the same host the VMBus device will still work. Reported-by: "Shah, Ashish N" <ashish.n.shah@intel.com> Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * dpaa2-eth: Fix passing zero to 'PTR_ERR' warningYueHaibing2020-08-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix smatch warning: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c:2419 alloc_channel() warn: passing zero to 'ERR_PTR' setup_dpcon() should return ERR_PTR(err) instead of zero in error handling case. Fixes: d7f5a9d89a55 ("dpaa2-eth: defer probe on object allocate") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: macb: Properly handle phylink on at91sam9xStefan Roese2020-08-051-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I just recently noticed that ethernet does not work anymore since v5.5 on the GARDENA smart Gateway, which is based on the AT91SAM9G25. Debugging showed that the "GEM bits" in the NCFGR register are now unconditionally accessed, which is incorrect for the !macb_is_gem() case. This patch adds the macb_is_gem() checks back to the code (in macb_mac_config() & macb_mac_link_up()), so that the GEM register bits are not accessed in this case any more. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Stefan Roese <sr@denx.de> Cc: Reto Schneider <reto.schneider@husqvarnagroup.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2020-08-049-20/+182
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Flush the cleanup xtables worker to make sure destructors have completed, from Florian Westphal. 2) iifgroup is matching erroneously, also from Florian. 3) Add selftest for meta interface matching, from Florian Westphal. 4) Move nf_ct_offload_timeout() to header, from Roi Dayan. 5) Call nf_ct_offload_timeout() from flow_offload_add() to make sure garbage collection does not evict offloaded flow, from Roi Dayan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * netfilter: flowtable: Set offload timeout when adding flowRoi Dayan2020-08-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On heavily loaded systems the GC can take time to go over all existing conns and reset their timeout. At that time other calls like from nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as expired. To fix this when we set the offload bit we should also reset the timeout instead of counting on GC to finish first iteration over all conns before the initial timeout. Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: conntrack: Move nf_ct_offload_timeout to header fileRoi Dayan2020-08-032-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To be used by callers from other modules. [ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution issue --Pablo ] Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * selftests: netfilter: add meta iif/oif match testFlorian Westphal2020-08-032-1/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple test case, but would have caught this: FAIL: iifgroupcount, want "packets 2", got table inet filter { counter iifgroupcount { packets 0 bytes 0 } } Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nft_meta: fix iifgroup matchingFlorian Westphal2020-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | iifgroup matching erroneously checks the output interface. Fixes: 8724e819cc9a ("netfilter: nft_meta: move all interface related keys to helper") Reported-by: Demi M. Obenour <demiobenour@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nft_compat: make sure xtables destructors have runFlorian Westphal2020-07-313-6/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira found that after recent update of xt_IDLETIMER the iptables-nft tests sometimes show an error. He tracked this down to the delayed cleanup used by nf_tables core: del rule (transaction A) add rule (transaction B) Its possible that by time transaction B (both in same netns) runs, the xt target destructor has not been invoked yet. For native nft expressions this is no problem because all expressions that have such side effects make sure these are handled from the commit phase, rather than async cleanup. For nft_compat however this isn't true. Instead of forcing synchronous behaviour for nft_compat, keep track of the number of outstanding destructor calls. When we attempt to create a new expression, flush the cleanup worker to make sure destructors have completed. With lots of help from Pablo Neira. Reported-by: Pablo Neira Ayso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | net: thunderx: use spin_lock_bh in nicvf_set_rx_mode_task()Xin Long2020-08-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A dead lock was triggered on thunderx driver: CPU0 CPU1 ---- ---- [01] lock(&(&nic->rx_mode_wq_lock)->rlock); [11] lock(&(&mc->mca_lock)->rlock); [12] lock(&(&nic->rx_mode_wq_lock)->rlock); [02] <Interrupt> lock(&(&mc->mca_lock)->rlock); The path for each is: [01] worker_thread() -> process_one_work() -> nicvf_set_rx_mode_task() [02] mld_ifc_timer_expire() [11] ipv6_add_dev() -> ipv6_dev_mc_inc() -> igmp6_group_added() -> [12] dev_mc_add() -> __dev_set_rx_mode() -> nicvf_set_rx_mode() To fix it, it needs to disable bh on [1], so that the timer on [2] wouldn't be triggered until rx_mode_wq_lock is released. So change to use spin_lock_bh() instead of spin_lock(). Thanks to Paolo for helping with this. v1->v2: - post to netdev. Reported-by: Rafael P. <rparrazo@redhat.com> Tested-by: Dean Nelson <dnelson@redhat.com> Fixes: 469998c861fa ("net: thunderx: prevent concurrent data re-writing by nicvf_set_rx_mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'Support-PMTU-discovery-with-bridged-UDP-tunnels'David S. Miller2020-08-049-29/+690
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stefano Brivio says: ==================== Support PMTU discovery with bridged UDP tunnels Currently, PMTU discovery for UDP tunnels only works if packets are routed to the encapsulating interfaces, not bridged. This results from the fact that we generally don't have valid routes to the senders we can use to relay ICMP and ICMPv6 errors, and makes PMTU discovery completely non-functional for VXLAN and GENEVE ports of both regular bridges and Open vSwitch instances. If the sender is local, and packets are forwarded to the port by a regular bridge, all it takes is to generate a corresponding route exception on the encapsulating device. The bridge then finds the route exception carrying the PMTU value estimate as it forwards frames, and relays ICMP messages back to the socket of the local sender. Patch 1/6 fixes this case. If the sender resides on another node, we actually need to reply to IP and IPv6 packets ourselves and send these ICMP or ICMPv6 errors back, using the same encapsulating device. Patch 2/6, based on an original idea by Florian Westphal, adds the needed functionality, while patches 3/6 and 4/6 add matching support for VXLAN and GENEVE. Finally, 5/6 and 6/6 introduce selftests for all combinations of inner and outer IP versions, covering both VXLAN and GENEVE, with both regular bridges and Open vSwitch instances. v2: Add helper to check for any bridge port, skip oif check for PMTU routes for bridge ports only, split IPv4 and IPv6 helpers and functions (all suggested by David Ahern) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | selftests: pmtu.sh: Add tests for UDP tunnels handled by Open vSwitchStefano Brivio2020-08-041-0/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new tests check that IP and IPv6 packets exceeding the local PMTU estimate, forwarded by an Open vSwitch instance from another node, result in the correct route exceptions being created, and that communication with end-to-end fragmentation, over GENEVE and VXLAN Open vSwitch ports, is now possible as a result of PMTU discovery. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | selftests: pmtu.sh: Add tests for bridged UDP tunnelsStefano Brivio2020-08-041-7/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new tests check that IP and IPv6 packets exceeding the local PMTU estimate, both locally generated and forwarded by a bridge from another node, result in the correct route exceptions being created, and that communication with end-to-end fragmentation over VXLAN and GENEVE tunnels is now possible as a result of PMTU discovery. Part of the existing setup functions aren't generic enough to simply add a namespace and a bridge to the existing routing setup. This rework is in progress and we can easily shrink this once more generic topology functions are available. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | geneve: Support for PMTU discovery on directly bridged linksStefano Brivio2020-08-041-5/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the interface is a bridge or Open vSwitch port, and we can't forward a packet because it exceeds the local PMTU estimate, trigger an ICMP or ICMPv6 reply to the sender, using the same interface to forward it back. If metadata collection is enabled, set destination and source addresses for the flow as if we were receiving the packet, so that Open vSwitch can match the ICMP error against the existing association. v2: Use netif_is_any_bridge_port() (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | vxlan: Support for PMTU discovery on directly bridged linksStefano Brivio2020-08-041-6/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the interface is a bridge or Open vSwitch port, and we can't forward a packet because it exceeds the local PMTU estimate, trigger an ICMP or ICMPv6 reply to the sender, using the same interface to forward it back. If metadata collection is enabled, reverse destination and source addresses, so that Open vSwitch is able to match this packet against the existing, reverse flow. v2: Use netif_is_any_bridge_port() (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tunnels: PMTU discovery support for directly bridged IP packetsStefano Brivio2020-08-046-16/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's currently possible to bridge Ethernet tunnels carrying IP packets directly to external interfaces without assigning them addresses and routes on the bridged network itself: this is the case for UDP tunnels bridged with a standard bridge or by Open vSwitch. PMTU discovery is currently broken with those configurations, because the encapsulation effectively decreases the MTU of the link, and while we are able to account for this using PMTU discovery on the lower layer, we don't have a way to relay ICMP or ICMPv6 messages needed by the sender, because we don't have valid routes to it. On the other hand, as a tunnel endpoint, we can't fragment packets as a general approach: this is for instance clearly forbidden for VXLAN by RFC 7348, section 4.3: VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may fragment encapsulated VXLAN packets due to the larger frame size. The destination VTEP MAY silently discard such VXLAN fragments. The same paragraph recommends that the MTU over the physical network accomodates for encapsulations, but this isn't a practical option for complex topologies, especially for typical Open vSwitch use cases. Further, it states that: Other techniques like Path MTU discovery (see [RFC1191] and [RFC1981]) MAY be used to address this requirement as well. Now, PMTU discovery already works for routed interfaces, we get route exceptions created by the encapsulation device as they receive ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and we already rebuild those messages with the appropriate MTU and route them back to the sender. Add the missing bits for bridged cases: - checks in skb_tunnel_check_pmtu() to understand if it's appropriate to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and RFC 4443 section 2.4 for ICMPv6. This function is already called by UDP tunnels - a new function generating those ICMP or ICMPv6 replies. We can't reuse icmp_send() and icmp6_send() as we don't see the sender as a valid destination. This doesn't need to be generic, as we don't cover any other type of ICMP errors given that we only provide an encapsulation function to the sender While at it, make the MTU check in skb_tunnel_check_pmtu() accurate: we might receive GSO buffers here, and the passed headroom already includes the inner MAC length, so we don't have to account for it a second time (that would imply three MAC headers on the wire, but there are just two). This issue became visible while bridging IPv6 packets with 4500 bytes of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50 bytes of encapsulation headroom, we would advertise MTU as 3950, and we would reject fragmented IPv6 datagrams of 3958 bytes size on the wire. We're exclusively dealing with network MTU here, though, so we could get Ethernet frames up to 3964 octets in that case. v2: - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern) - split IPv4/IPv6 functions (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | ipv4: route: Ignore output interface in FIB lookup for PMTU routeStefano Brivio2020-08-042-0/+10
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, processes sending traffic to a local bridge with an encapsulation device as a port don't get ICMP errors if they exceed the PMTU of the encapsulated link. David Ahern suggested this as a hack, but it actually looks like the correct solution: when we update the PMTU for a given destination by means of updating or creating a route exception, the encapsulation might trigger this because of PMTU discovery happening either on the encapsulation device itself, or its lower layer. This happens on bridged encapsulations only. The output interface shouldn't matter, because we already have a valid destination. Drop the output interface restriction from the associated route lookup. For UDP tunnels, we will now have a route exception created for the encapsulation itself, with a MTU value reflecting its headroom, which allows a bridge forwarding IP packets originated locally to deliver errors back to the sending socket. The behaviour is now consistent with IPv6 and verified with selftests pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in this series. v2: - reset output interface only for bridge ports (David Ahern) - add and use netif_is_any_bridge_port() helper (David Ahern) Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'wireless-drivers-next-2020-08-04' of ↵David S. Miller2020-08-04124-1277/+5595
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.9 Second set of patches for v5.9. mt76 has most of patches this time. Otherwise it's just smaller fixes and cleanups to other drivers. There was a major conflict in mt76 driver between wireless-drivers and wireless-drivers-next. I solved that by merging the former to the latter. Major changes: rtw88 * add support for ieee80211_ops::change_interface * add support for enabling and disabling beacon * add debugfs file for testing h2c mt76 * ARP filter offload for 7663 * runtime power management for 7663 * testmode support for mfg calibration * support for more channels ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.gitKalle Valo2020-08-0411-65/+88
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | mt76 driver had major conflicts within mt7615 directory. To make it easier for every merge wireless-drivers to wireless-drivers-next and solve those conflicts.
| | * | | brcmfmac: Set timeout value when configuring power saveNicolas Saenz Julienne2020-08-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the timeout value as per cfg80211's set_power_mgmt() request. If the requested value value is left undefined we set it to 2 seconds, the maximum supported value. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200721112302.22718-1-nsaenzjulienne@suse.de
| | * | | bcma: gpio: Use irqchip templateLinus Walleij2020-08-021-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the driver use the irqchip template to assign properties to the gpio_irq_chip instead of using the explicit call to gpiochip_irqchip_add(). The irqchip is instead added while adding the gpiochip. Cc: Rafał Miłecki <rafal@milecki.pl> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200722111725.210923-1-linus.walleij@linaro.org
| | * | | drivers: bcma: remove set but not used variable `addrh` and `sizeh`Zheng Yongjun2020-08-021-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/bcma/scan.c: In function 'bcma_erom_get_addr_desc': drivers/bcma/scan.c:219 warning: variable `addrh` and `sizeh` set but not used [-Wunused-but-set-variable] Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200721083935.13306-1-zhengyongjun3@huawei.com
| | * | | wl1251: fix always return 0 errorWang Hai2020-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wl1251_event_ps_report() should not always return 0 because wl1251_ps_set_mode() may fail. Change it to return 'ret'. Fixes: f7ad1eed4d4b ("wl1251: retry power save entry") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200730073939.33704-1-wanghai38@huawei.com
| | * | | airo: use generic power managementVaibhav Gupta2020-08-021-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers using legacy power management .suspen()/.resume() callbacks have to manage PCI states and device's PM states themselves. They also need to take care of standard configuration registers. Switch to generic power management framework using a single "struct dev_pm_ops" variable to take the unnecessary load from the driver. This also avoids the need for the driver to directly call most of the PCI helper functions and device power state control functions, as through the generic framework PCI Core takes care of the necessary operations, and drivers are required to do only device-specific jobs. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200728114128.1218310-1-vaibhavgupta40@gmail.com
| | * | | intersil: fix wiki website urlFlavio Suligoi2020-08-025-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some Intersil files, the wiki url is still the old "wireless.kernel.org" instead of the new "wireless.wiki.kernel.org" Signed-off-by: Flavio Suligoi <f.suligoi@asem.it> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200723135254.594984-1-f.suligoi@asem.it
| | * | | qtnfmac: Missing platform_device_unregister() on error in qtnf_core_mac_alloc()Wang Hai2020-08-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing platform_device_unregister() before return from qtnf_core_mac_alloc() in the error handling case. Fixes: 616f5701f4ab ("qtnfmac: assign each wiphy to its own virtual platform device") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200730064910.37589-1-wanghai38@huawei.com
| | * | | ipw2x00: switch from 'pci_' to 'dma_' APIChristophe JAILLET2020-08-022-89/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'ipw2100_msg_allocate()' (ipw2100.c), GFP_KERNEL can be used because it is called from the probe function. The call chain is: ipw2100_pci_init_one (the probe function) --> ipw2100_queues_allocate --> ipw2100_msg_allocate Moreover, 'ipw2100_msg_allocate()' already uses GFP_KERNEL for some other memory allocations. When memory is allocated in 'status_queue_allocate()' (ipw2100.c), GFP_KERNEL can be used because it is called from the probe function. The call chain is: ipw2100_pci_init_one (the probe function) --> ipw2100_queues_allocate --> ipw2100_rx_allocate --> status_queue_allocate Moreover, 'ipw2100_rx_allocate()' already uses GFP_KERNEL for some other memory allocations. When memory is allocated in 'bd_queue_allocate()' (ipw2100.c), GFP_KERNEL can be used because it is called from the probe function. The call chain is: ipw2100_pci_init_one (the probe function) --> ipw2100_queues_allocate --> ipw2100_rx_allocate --> bd_queue_allocate Moreover, 'ipw2100_rx_allocate()' already uses GFP_KERNEL for some other memory allocations. When memory is allocated in 'ipw2100_tx_allocate()' (ipw2100.c), GFP_KERNEL can be used because it is called from the probe function. The call chain is: ipw2100_pci_init_one (the probe function) --> ipw2100_queues_allocate --> ipw2100_tx_allocate Moreover, 'ipw2100_tx_allocate()' already uses GFP_KERNEL for some other memory allocations. When memory is allocated in 'ipw_queue_tx_init()' (ipw2200.c), GFP_KERNEL can be used because it is called from a call chain that already uses GFP_KERNEL and no spin_lock is taken in the between. The call chain is: ipw_up --> ipw_load --> ipw_queue_reset --> ipw_queue_tx_init 'ipw_up()' already uses GFP_KERNEL for some other memory allocations. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200722101716.26185-1-christophe.jaillet@wanadoo.fr
| | * | | ipw2100: Use GFP_KERNEL instead of GFP_ATOMIC in some memory allocationChristophe JAILLET2020-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call chain is: ipw2100_pci_init_one (the probe function) --> ipw2100_queues_allocate --> ipw2100_tx_allocate No lock is taken in the between. So it is safe to use GFP_KERNEL in 'ipw2100_tx_allocate()'. BTW, 'ipw2100_queues_allocate()' also calls 'ipw2100_msg_allocate()' which already allocates some memory using GFP_KERNEL. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200722101701.26126-1-christophe.jaillet@wanadoo.fr
| | * | | hostap: use generic power managementVaibhav Gupta2020-08-022-27/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers using legacy power management .suspen()/.resume() callbacks have to manage PCI states and device's PM states themselves. They also need to take care of standard configuration registers. Switch to generic power management framework using a single "struct dev_pm_ops" variable to take the unnecessary load from the driver. This also avoids the need for the driver to directly call most of the PCI helper functions and device power state control functions as through the generic framework, PCI Core takes care of the necessary operations, and drivers are required to do only device-specific jobs. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200721150547.371763-1-vaibhavgupta40@gmail.com