summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mptcp: backup flag from incoming MPJ ack optionPaolo Abeni2021-08-141-2/+4
| | | | | | | | | | | the parsed incoming backup flag is not propagated to the subflow itself, the client may end-up using it to send data. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/191 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: add mibs for stale subflows processingPaolo Abeni2021-08-145-0/+8
| | | | | | | | | This allows monitoring exceptional events like active backup scenarios. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: faster active backup recoveryPaolo Abeni2021-08-146-5/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | The msk can use backup subflows to transmit in-sequence data only if there are no other active subflow. On active backup scenario, the MPTCP connection can do forward progress only due to MPTCP retransmissions - rtx can pick backup subflows. This patch introduces a new flag flow MPTCP subflows: if the underlying TCP connection made no progresses for long time, and there are other less problematic subflows available, the given subflow become stale. Stale subflows are not considered active: if all non backup subflows become stale, the MPTCP scheduler can pick backup subflows for plain transmissions. Stale subflows can return in active state, as soon as any reply from the peer is observed. Active backup scenarios can now leverage the available b/w with no restrinction. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: cleanup sysctl data and helpersPaolo Abeni2021-08-142-10/+10
| | | | | | | | | | | Reorder the data in mptcp_pernet to avoid wasting space with no reasons and constify the access helpers. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: handle pending data on closed subflowPaolo Abeni2021-08-143-8/+82
| | | | | | | | | | | | | | | | | | | | | The PM can close active subflow, e.g. due to ingress RM_ADDR option. Such subflow could carry data still unacked at the MPTCP-level, both in the write and the rtx_queue, which has never reached the other peer. Currently the mptcp-level retransmission will deliver such data, but at a very low rate (at most 1 DSM for each MPTCP rtx interval). We can speed-up the recovery a lot, moving all the unacked in the tcp write_queue, so that it will be pushed again via other subflows, at the speed allowed by them. Also make available the new helper for later patches. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: less aggressive retransmission strategyPaolo Abeni2021-08-143-10/+37
| | | | | | | | | | | | | | | | | | The current mptcp re-inject strategy is very aggressive, we have mptcp-level retransmissions even on single subflow connection, if the link in-use is lossy. Let's be a little more conservative: we do retransmit only if at least a subflow has write and rtx queue empty. Additionally use the backup subflows only if the active subflows are stale - no progresses in at least an rtx period and ignore stale subflows for rtx timeout update Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mptcp: more accurate timeoutPaolo Abeni2021-08-141-23/+37
| | | | | | | | | | | | | | | | | As reported by Maxim, we have a lot of MPTCP-level retransmissions when multilple links with different latencies are in use. This patch refactor the mptcp-level timeout accounting so that the maximum of all the active subflow timeout is used. To avoid traversing the subflow list multiple times, the update is performed inside the packet scheduler. Additionally clean-up a bit timeout handling. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethernet: fix PTP_1588_CLOCK dependenciesArnd Bergmann2021-08-1429-44/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'imply' keyword does not do what most people think it does, it only politely asks Kconfig to turn on another symbol, but does not prevent it from being disabled manually or built as a loadable module when the user is built-in. In the ICE driver, the latter now causes a link failure: aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl': ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config' ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config' aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config' ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset': ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release' ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild': This is a recurring problem in many drivers, and we have discussed it several times befores, without reaching a consensus. I'm providing a link to the previous email thread for reference, which discusses some related problems. To solve the dependency issue better than the 'imply' keyword, introduce a separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver can depend on if it is able to use PTP support when available, but works fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are then prevented from being built-in, the same way as with a 'depends on PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick, but that can be rather confusing when you first see it. Since this should cover the dependencies correctly, the IS_REACHABLE() hack in the header is no longer needed now, and can be turned back into a normal IS_ENABLED() check. Any driver that gets the dependency wrong will now cause a link time failure rather than being unable to use PTP support when that is in a loadable module. However, the two recently added ptp_get_vclocks_index() and ptp_convert_timestamp() interfaces are only called from builtin code with ethtool and socket timestamps, so keep the current behavior by stubbing those out completely when PTP is in a loadable module. This should be addressed properly in a follow-up. As Richard suggested, we may want to actually turn PTP support into a 'bool' option later on, preventing it from being a loadable module altogether, which would be one way to solve the problem with the ethtool interface. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/ Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/ Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/ Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: phy: marvell: add SFP support for 88E1510Ivan Bornyakov2021-08-141-1/+104
| | | | | | | | | | | | Add support for SFP cages connected to the Marvell 88E1512 transceiver. 88E1512 supports for SGMII/1000Base-X/100Base-FX media type with RGMII on system interface. Configure PHY to appropriate mode depending on the type of SFP inserted. On SFP removal configure PHY to the RGMII-copper mode so RJ-45 port can still work. Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru> Link: https://lore.kernel.org/r/20210812134256.2436-1-i.bornyakov@metrotek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge branch 'kconfig-symbol-clean-up-on-net'Jakub Kicinski2021-08-144-64/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lukas Bulwahn says: ==================== Kconfig symbol clean-up on net The script ./scripts/checkkconfigsymbols.py warns on invalid references to Kconfig symbols (often, minor typos, name confusions or outdated references). This patch series addresses all issues reported by ./scripts/checkkconfigsymbols.py in ./net/ and ./drivers/net/ for Kconfig and Makefile files. Issues in the Kconfig and Makefile files indicate some shortcomings in the overall build definitions, and often are true actionable issues to address. These issues can be identified and filtered by: ./scripts/checkkconfigsymbols.py \ | grep -E "(drivers/)?net/.*(Kconfig|Makefile)" -B 1 -A 1 After applying this patch series on linux-next (next-20210811), the command above yields no further issues to address. ==================== Link: https://lore.kernel.org/r/20210812083806.28434-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: dpaa_eth: remove dead select in menuconfig FSL_DPAA_ETHLukas Bulwahn2021-08-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The menuconfig FSL_DPAA_ETH selects config FSL_FMAN_MAC, but the config FSL_FMAN_MAC never existed in the kernel tree. Hence, ./scripts/checkkconfigsymbols.py warns: FSL_FMAN_MAC Referencing files: drivers/net/ethernet/freescale/dpaa/Kconfig Remove this dead select in menuconfig FSL_DPAA_ETH. Fixes: 9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: 802: remove dead leftover after ipx driver removalLukas Bulwahn2021-08-142-61/+0
| | | | | | | | | | | | | | | | | | | | | | Commit 7a2e838d28cf ("staging: ipx: delete it from the tree") removes the ipx driver and the config IPX. Since then, there is some dead leftover in ./net/802/, that was once used by the IPX driver, but has no other user. Remove this dead leftover. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: Kconfig: remove obsolete reference to config MICROBLAZE_64K_PAGESLukas Bulwahn2021-08-141-2/+2
|/ | | | | | | | | | | | Commit 05cdf457477d ("microblaze: Remove noMMU code") removes config MICROBLAZE_64K_PAGES in arch/microblaze/Kconfig. However, there is still a reference to MICROBLAZE_64K_PAGES in the config VMXNET3 in ./drivers/net/Kconfig. Remove this obsolete reference to config MICROBLAZE_64K_PAGES. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* dt-bindings: net: macb: add documentation for sama5d29 ethernet interfaceHari Prasath2021-08-141-0/+1
| | | | | | | | Add documentation for SAMA5D29 ethernet interface. Signed-off-by: Hari Prasath <Hari.PrasathGE@microchip.com> Link: https://lore.kernel.org/r/20210812074422.13487-2-Hari.PrasathGE@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: macb: Add PTP support for SAMA5D29Hari Prasath2021-08-141-0/+9
| | | | | | | | Add PTP capability to the macb config object for sama5d29. Signed-off-by: Hari Prasath <Hari.PrasathGE@microchip.com> Link: https://lore.kernel.org/r/20210812074422.13487-1-Hari.PrasathGE@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: fec: add WoL support for i.MX8MQJoakim Zhang2021-08-142-5/+23
| | | | | | | | | | | | By default FEC driver treat irq[0] (i.e. int0 described in dt-binding) as wakeup interrupt, but this situation changed on i.MX8M serials, SoC integration guys mix wakeup interrupt signal into int2 interrupt line. This patch introduces FEC_QUIRK_WAKEUP_FROM_INT2 to indicate int2 as wakeup interrupt for i.MX8MQ. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20210812070948.25797-1-qiangqing.zhang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ravb: Remove checks for unsupported internal delay modesGeert Uytterhoeven2021-08-141-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The EtherAVB instances on the R-Car E3/D3 and RZ/G2E SoCs do not support TX clock internal delay modes, and the EtherAVB driver prints a warning if an unsupported "rgmii-*id" PHY mode is specified, to catch buggy DTBs. Commit a6f51f2efa742df0 ("ravb: Add support for explicit internal clock delay configuration") deprecated deriving the internal delay mode from the PHY mode, in favor of explicit configuration using the now mandatory "rx-internal-delay-ps" and "tx-internal-delay-ps" properties, thus delegating the warning to the legacy fallback code. Since explicit configuration of a (valid) internal clock delay configuration is enforced by validating device tree source files against DT binding files, and all upstream DTS files have been converted as of commit a5200e63af57d05e ("arm64: dts: renesas: rzg2: Convert EtherAVB to explicit delay handling"), the checks in the legacy fallback code can be removed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/2037542ac56e99413b9807e24049711553cc88a9.1628696778.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: hso: drop unused function argumentPavel Skripkin2021-08-141-4/+3
| | | | | | | | | _hso_serial_set_termios() doesn't use it's second argument, so it can be dropped. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20210811171321.18317-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: in_irq() cleanupChangbin Du2021-08-135-8/+8
| | | | | | | | | Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210813145749.86512-1-changbin.du@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net, bonding: Disallow vlan+srcmac with XDPJussi Maki2021-08-131-7/+11
| | | | | | | | | | | | | | | | | | The new vlan+srcmac xmit policy is not implementable with XDP since in many cases the 802.1Q payload is not present in the packet. This can be for example due to hardware offload or in the case of veth due to use of skbuffs internally. This also fixes the NULL deref with the vlan+srcmac xmit policy reported by Jonathan Toppins by additionally checking the skb pointer. Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* af_unix: fix holding spinlock in oob handlingRao Shoaib2021-08-131-12/+24
| | | | | | | | | | | syzkaller found that OOB code was holding spinlock while calling a function in which it could sleep. Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller.appspotmail.com Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Link: https://lore.kernel.org/r/20210811220652.567434-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-08-13362-1478/+2984
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge tag 'net-5.14-rc6' of ↵Linus Torvalds2021-08-13118-372/+763
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
| | * Merge tag 'ieee802154-for-davem-2021-08-12' of ↵Jakub Kicinski2021-08-122-4/+9
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== ieee802154 for net 2021-08-12 Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw. * tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: net: Fix memory leak in ieee802154_raw_deliver ieee802154: hwsim: fix GPF in hwsim_new_edge_nl ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi ==================== Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | | * net: Fix memory leak in ieee802154_raw_deliverTakeshi Misawa2021-08-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If IEEE-802.15.4-RAW is closed before receive skb, skb is leaked. Fix this, by freeing sk_receive_queue in sk->sk_destruct(). syzbot report: BUG: memory leak unreferenced object 0xffff88810f644600 (size 232): comm "softirq", pid 0, jiffies 4294967032 (age 81.270s) hex dump (first 32 bytes): 10 7d 4b 12 81 88 ff ff 10 7d 4b 12 81 88 ff ff .}K......}K..... 00 00 00 00 00 00 00 00 40 7c 4b 12 81 88 ff ff ........@|K..... backtrace: [<ffffffff83651d4a>] skb_clone+0xaa/0x2b0 net/core/skbuff.c:1496 [<ffffffff83fe1b80>] ieee802154_raw_deliver net/ieee802154/socket.c:369 [inline] [<ffffffff83fe1b80>] ieee802154_rcv+0x100/0x340 net/ieee802154/socket.c:1070 [<ffffffff8367cc7a>] __netif_receive_skb_one_core+0x6a/0xa0 net/core/dev.c:5384 [<ffffffff8367cd07>] __netif_receive_skb+0x27/0xa0 net/core/dev.c:5498 [<ffffffff8367cdd9>] netif_receive_skb_internal net/core/dev.c:5603 [inline] [<ffffffff8367cdd9>] netif_receive_skb+0x59/0x260 net/core/dev.c:5662 [<ffffffff83fe6302>] ieee802154_deliver_skb net/mac802154/rx.c:29 [inline] [<ffffffff83fe6302>] ieee802154_subif_frame net/mac802154/rx.c:102 [inline] [<ffffffff83fe6302>] __ieee802154_rx_handle_packet net/mac802154/rx.c:212 [inline] [<ffffffff83fe6302>] ieee802154_rx+0x612/0x620 net/mac802154/rx.c:284 [<ffffffff83fe59a6>] ieee802154_tasklet_handler+0x86/0xa0 net/mac802154/main.c:35 [<ffffffff81232aab>] tasklet_action_common.constprop.0+0x5b/0x100 kernel/softirq.c:557 [<ffffffff846000bf>] __do_softirq+0xbf/0x2ab kernel/softirq.c:345 [<ffffffff81232f4c>] do_softirq kernel/softirq.c:248 [inline] [<ffffffff81232f4c>] do_softirq+0x5c/0x80 kernel/softirq.c:235 [<ffffffff81232fc1>] __local_bh_enable_ip+0x51/0x60 kernel/softirq.c:198 [<ffffffff8367a9a4>] local_bh_enable include/linux/bottom_half.h:32 [inline] [<ffffffff8367a9a4>] rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline] [<ffffffff8367a9a4>] __dev_queue_xmit+0x7f4/0xf60 net/core/dev.c:4221 [<ffffffff83fe2db4>] raw_sendmsg+0x1f4/0x2b0 net/ieee802154/socket.c:295 [<ffffffff8363af16>] sock_sendmsg_nosec net/socket.c:654 [inline] [<ffffffff8363af16>] sock_sendmsg+0x56/0x80 net/socket.c:674 [<ffffffff8363deec>] __sys_sendto+0x15c/0x200 net/socket.c:1977 [<ffffffff8363dfb6>] __do_sys_sendto net/socket.c:1989 [inline] [<ffffffff8363dfb6>] __se_sys_sendto net/socket.c:1985 [inline] [<ffffffff8363dfb6>] __x64_sys_sendto+0x26/0x30 net/socket.c:1985 Fixes: 9ec767160357 ("net: add IEEE 802.15.4 socket family implementation") Reported-and-tested-by: syzbot+1f68113fa907bf0695a8@syzkaller.appspotmail.com Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20210805075414.GA15796@DESKTOP Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| | | * ieee802154: hwsim: fix GPF in hwsim_new_edge_nlDongliang Mu2021-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE must be present to fix GPF. Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20210707155633.1486603-1-mudongliangabcd@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| | | * ieee802154: hwsim: fix GPF in hwsim_set_edge_lqiDongliang Mu2021-07-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE, MAC802154_HWSIM_EDGE_ATTR_ENDPOINT_ID and MAC802154_HWSIM_EDGE_ATTR_LQI must be present to fix GPF. Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20210705131321.217111-1-mudongliangabcd@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| | * | vsock/virtio: avoid potential deadlock when vsock device removeLongpeng(Mike)2021-08-121-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | wwan: core: Avoid returning NULL from wwan_create_dev()Andy Shevchenko2021-08-121-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make wwan_create_dev() to return either valid or error pointer, In some cases it may return NULL. Prevent this by converting it to the respective error pointer. Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20210811124845.10955-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: dsa: sja1105: unregister the MDIO buses during teardownVladimir Oltean2021-08-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to sja1105_mdiobus_unregister is present in the error path but absent from the main driver unbind path. Fixes: 5a8f09748ee7 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Revert "tipc: Return the correct errno code"Hoang Le2021-08-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0efea3c649f0 because of: - The returning -ENOBUF error is fine on socket buffer allocation. - There is side effect in the calling path tipc_node_xmit()->tipc_link_xmit() when checking error code returning. Fixes: 0efea3c649f0 ("tipc: Return the correct errno code") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: mscc: Fix non-GPL export of regmap APIsMark Brown2021-08-121-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ocelot driver makes use of regmap, wrapping it with driver specific operations that are thin wrappers around the core regmap APIs. These are exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap exports which is frowned upon. Add _GPL suffixes to at least the APIs that are doing register I/O. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: igmp: increase size of mr_ifc_countEric Dumazet2021-08-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem. Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | MAINTAINERS: switch to my OMP email for Renesas Ethernet driversSergey Shtylyov2021-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm still going to continue looking after the Renesas Ethernet drivers and device tree bindings. Now my new employer, Open Mobile Platform (OMP), will pay for all my upstream work. Let's switch to my OMP email for the reviews. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/9c212711-a0d7-39cd-7840-ff7abf938da1@omp.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packetsNeal Cardwell2021-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if BBR congestion control is initialized after more than 2B packets have been delivered, depending on the phase of the tp->delivered counter the tracking of BBR round trips can get stuck. The bug arises because if tp->delivered is between 2^31 and 2^32 at the time the BBR congestion control module is initialized, then the initialization of bbr->next_rtt_delivered to 0 will cause the logic to believe that the end of the round trip is still billions of packets in the future. More specifically, the following check will fail repeatedly: !before(rs->prior_delivered, bbr->next_rtt_delivered) and thus the connection will take up to 2B packets delivered before that check will pass and the connection will set: bbr->round_start = 1; This could cause many mechanisms in BBR to fail to trigger, for example bbr_check_full_bw_reached() would likely never exit STARTUP. This bug is 5 years old and has not been observed, and as a practical matter this would likely rarely trigger, since it would require transferring at least 2B packets, or likely more than 3 terabytes of data, before switching congestion control algorithms to BBR. This patch is a stable candidate for kernels as far back as v4.9, when tcp_bbr.c was added. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Kevin Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: pcs: xpcs: fix error handling on failed to allocate memoryWong Vee Khee2021-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers such as sja1105 and stmmac that call xpcs_create() expects an error returned by the pcs-xpcs module, but this was not the case on failed to allocate memory. Fixed this by returning an -ENOMEM instead of a NULL pointer. Fixes: 3ad1d171548e ("net: dsa: sja1105: migrate to xpcs for SGMII") Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210810085812.1808466-1-vee.khee.wong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: linkwatch: fix failure to restore device state across suspend/resumeWilly Tarreau2021-08-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed that my Ethernet port to which a bond and a VLAN interface are attached appeared to remain up after resuming from suspend with the cable unplugged (and that problem still persists with 5.10-LTS). It happens that the following happens: - the network driver (e1000e here) prepares to suspend, calls e1000e_down() which calls netif_carrier_off() to signal that the link is going down. - netif_carrier_off() adds a link_watch event to the list of events for this device - the device is completely stopped. - the machine suspends - the cable is unplugged and the machine brought to another location - the machine is resumed - the queued linkwatch events are processed for the device - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events are silently dropped - the device is resumed with its link down - the upper VLAN and bond interfaces are never notified that the link had been turned down and remain up - the only way to provoke a change is to physically connect the machine to a port and possibly unplug it. The state after resume looks like this: $ ip -br li | egrep 'bond|eth' bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> Placing an explicit call to netdev_state_change() either in the suspend or the resume code in the NIC driver worked around this but the solution is not satisfying. The issue in fact really is in link_watch that loses events while it ought not to. It happens that the test for the device being present was added by commit 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") in 4.20 to avoid an access to devices that are not present. Instead of dropping events, this patch proceeds slightly differently by postponing their handling so that they happen after the device is fully resumed. Fixes: 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: bridge: fix memleak in br_add_if()Yang Yingliang2021-08-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got a memleak report: BUG: memory leak unreferenced object 0x607ee521a658 (size 240): comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s) hex dump (first 32 bytes, cpu 1): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693 [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline] [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611 [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline] [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487 [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457 [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488 [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550 [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae On error path of br_add_if(), p->mcast_stats allocated in new_nbp() need be freed, or it will be leaked. Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted ↵Vladimir Oltean2021-08-1011-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | by drivers towards the bridge The blamed commit added a new field to struct switchdev_notifier_fdb_info, but did not make sure that all call paths set it to something valid. For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE notifier, and since the 'is_local' flag is not set, it contains junk from the stack, so the bridge might interpret those notifications as being for local FDB entries when that was not intended. To avoid that now and in the future, zero-initialize all switchdev_notifier_fdb_info structures created by drivers such that all newly added fields to not need to touch drivers again. Fixes: 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: bridge: fix flags interpretation for extern learn fdb entriesNikolay Aleksandrov2021-08-104-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ignore fdb flags when adding port extern learn entries and always set BR_FDB_LOCAL flag when adding bridge extern learn entries. This is closest to the behaviour we had before and avoids breaking any use cases which were allowed. This patch fixes iproute2 calls which assume NUD_PERMANENT and were allowed before, example: $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn Extern learn entries are allowed to roam, but do not expire, so static or dynamic flags make no sense for them. Also add a comment for future reference. Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Fixes: 0541a6293298 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2021-08-107-15/+27
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== bpf 2021-08-10 We've added 5 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 27 insertions(+), 15 deletions(-). 1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song. 2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song. 3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann. 4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, core: Fix kernel-doc notation bpf: Fix potentially incorrect results with bpf_get_local_storage() bpf: Add missing bpf_read_[un]lock_trace() for syscall program bpf: Add lockdown check for probe_write_user helper bpf: Add _kernel suffix to internal lockdown_bpf_read ==================== Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | | * | bpf, core: Fix kernel-doc notationRandy Dunlap2021-08-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc and W=1 builds). That is, correct a function name in a comment and add return descriptions for 2 functions. Fixes these kernel-doc warnings: kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run' kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org
| | | * | bpf: Fix potentially incorrect results with bpf_get_local_storage()Yonghong Song2021-08-102-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage() helper so different tasks won't mess up with each other's percpu local storage. The percpu data contains 8 slots so it can hold up to 8 contexts (same or different tasks), for 8 different program runs, at the same time. This in general is sufficient. But our internal testing showed the following warning multiple times: [...] warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193 __cgroup_bpf_run_filter_sock_ops+0x13e/0x180 RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180 <IRQ> tcp_call_bpf.constprop.99+0x93/0xc0 tcp_conn_request+0x41e/0xa50 ? tcp_rcv_state_process+0x203/0xe00 tcp_rcv_state_process+0x203/0xe00 ? sk_filter_trim_cap+0xbc/0x210 ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160 tcp_v6_do_rcv+0x181/0x3e0 tcp_v6_rcv+0xc65/0xcb0 ip6_protocol_deliver_rcu+0xbd/0x450 ip6_input_finish+0x11/0x20 ip6_input+0xb5/0xc0 ip6_sublist_rcv_finish+0x37/0x50 ip6_sublist_rcv+0x1dc/0x270 ipv6_list_rcv+0x113/0x140 __netif_receive_skb_list_core+0x1a0/0x210 netif_receive_skb_list_internal+0x186/0x2a0 gro_normal_list.part.170+0x19/0x40 napi_complete_done+0x65/0x150 mlx5e_napi_poll+0x1ae/0x680 __napi_poll+0x25/0x120 net_rx_action+0x11e/0x280 __do_softirq+0xbb/0x271 irq_exit_rcu+0x97/0xa0 common_interrupt+0x7f/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac ? __cgroup_bpf_run_filter_skb+0x378/0x4e0 ? do_softirq+0x34/0x70 ? ip6_finish_output2+0x266/0x590 ? ip6_finish_output+0x66/0xa0 ? ip6_output+0x6c/0x130 ? ip6_xmit+0x279/0x550 ? ip6_dst_check+0x61/0xd0 [...] Using drgn [0] to dump the percpu buffer contents showed that on this CPU slot 0 is still available, but slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist any more. So we might have issues in bpf_cgroup_storage_unset(). Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset(). Currently, it tries to unset "current" slot with searching from the start. So the following sequence is possible: 1. A task is running and claims slot 0 2. Running BPF program is done, and it checked slot 0 has the "task" and ready to reset it to NULL (not yet). 3. An interrupt happens, another BPF program runs and it claims slot 1 with the *same* task. 4. The unset() in interrupt context releases slot 0 since it matches "task". 5. Interrupt is done, the task in process context reset slot 0. At the end, slot 1 is not reset and the same process can continue to occupy slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF program won't be able to claim an empty slot and a warning will be issued. To fix the issue, for unset() function, we should traverse from the last slot to the first. This way, the above issue can be avoided. The same reverse traversal should also be done in bpf_get_local_storage() helper itself. Otherwise, incorrect local storage may be returned to BPF program. [0] https://github.com/osandov/drgn Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
| | | * | bpf: Add missing bpf_read_[un]lock_trace() for syscall programYonghong Song2021-08-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") added support for syscall program, which is a sleepable program. But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(), which is needed to ensure proper rcu callback invocations. This patch adds bpf_read_[un]lock_trace() properly. Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com
| | | * | bpf: Add lockdown check for probe_write_user helperDaniel Borkmann2021-08-103-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back then, commit 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") added the bpf_probe_write_user() helper in order to allow to override user space memory. Its original goal was to have a facility to "debug, divert, and manipulate execution of semi-cooperative processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed since it would otherwise tamper with its integrity. One use case was shown in cf9b1199de27 ("samples/bpf: Add test/example of using bpf_probe_write_user bpf helper") where the program DNATs traffic at the time of connect(2) syscall, meaning, it rewrites the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. These days we have better mechanisms in BPF for achieving the same (e.g. for load-balancers), but without having to write to userspace memory. Of course the bpf_probe_write_user() helper can also be used to abuse many other things for both good or bad purpose. Outside of BPF, there is a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and PTRACE_POKE{TEXT,DATA}, but would likely require some more effort. Commit 96ae52279594 explicitly dedicated the helper for experimentation purpose only. Thus, move the helper's availability behind a newly added LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under the "integrity" mode. More fine-grained control can be implemented also from LSM side with this change. Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
| | | * | bpf: Add _kernel suffix to internal lockdown_bpf_readDaniel Borkmann2021-08-094-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
| | * | | Merge branch 'fdb-backpressure-fixes'David S. Miller2021-08-104-22/+37
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vladimir Oltean says: ==================== Fix broken backpressure during FDB dump in DSA drivers rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. DSA is one of the few switchdev drivers that have an .ndo_fdb_dump implementation, because of the assumption that the hardware and software FDBs cannot be efficiently kept in sync via SWITCHDEV_FDB_ADD_TO_BRIDGE. Other drivers with a home-cooked .ndo_fdb_dump implementation are ocelot and dpaa2-switch. These appear to do the correct thing, as do the other DSA drivers, so nothing else appears to need fixing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | net: dsa: sja1105: fix broken backpressure in .port_fdb_dumpVladimir Oltean2021-08-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | net: dsa: lantiq: fix broken backpressure in .port_fdb_dumpVladimir Oltean2021-08-101-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: 58c59ef9e930 ("net: dsa: lantiq: Add Forwarding Database access") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | net: dsa: lan9303: fix broken backpressure in .port_fdb_dumpVladimir Oltean2021-08-101-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries). When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped. Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump. Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb. Fixes: ab335349b852 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>