summaryrefslogtreecommitdiffstats
path: root/Documentation/networking (follow)
Commit message (Collapse)AuthorAgeFilesLines
* docs: tproxy: ignore non-transparent sockets in iptables谢致邦 (XIE Zhibang)2024-09-261-1/+1
| | | | | | | | | | | | | The iptables example was added in commit d2f26037a38a (netfilter: Add documentation for tproxy, 2008-10-08), but xt_socket 'transparent' option was added in commit a31e1ffd2231 (netfilter: xt_socket: added new revision of the 'socket' match supporting flags, 2009-06-09). Now add the 'transparent' option to the iptables example to ignore non-transparent sockets, which is also consistent with the nft example. Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Documentation: networking: Fix missing PSE documentation and grammar issuesKory Maincent2024-09-141-7/+8
| | | | | | | | | | | | | | Fix a missing end of phrase in the documentation. It describes the ETHTOOL_A_C33_PSE_ACTUAL_PW attribute, which was not fully explained. Also, fix grammar issues by using simple present tense instead of present continuous. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240912090550.743174-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/mlx5e: SHAMPO, Add no-split ethtool counters for header/data splitDragos Tatulea2024-09-131-0/+16
| | | | | | | | | | | | | When SHAMPO can't identify the protocol/header of a packet, it will yield a packet that is not split - all the packet is in the data part. Count this value in packets and bytes. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240911201757.1505453-15-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: ena: Add ENA Express metrics supportDavid Arinzon2024-09-131-0/+5
| | | | | | | | | | | | | | | | ENA Express metrics, called `ena_srd` are exposed to customers via `ethtool`. The metrics allow customers to check the configuration (mode), tx/rx counters as well as resource utilization. The documentation is also updated to provide a general explanation about ENA Express as well as links for further information about metrics and configurations. Signed-off-by: Igor Chauskin <igorch@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20240909084704.13856-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Documentation: networking: add OPEN Alliance 10BASE-T1x MAC-PHY serial interfaceParthiban Veerasooran2024-09-122-0/+498
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach PHY supporting full duplex point-to-point operation over 1 km of single balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach PHY supporting full / half duplex point-to-point operation over 15 m of single balanced pair of conductors, or half duplex multidrop bus operation over 25 m of single balanced pair of conductors. Furthermore, the IEEE 802.3cg project defines the new Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to provide improved determinism to the CSMA/CD media access method. PLCA works in conjunction with the 10BASE-T1S PHY operating in multidrop mode. The aforementioned PHYs are intended to cover the low-speed / low-cost applications in industrial and automotive environment. The large number of pins (16) required by the MII interface, which is specified by the IEEE 802.3 in Clause 22, is one of the major cost factors that need to be addressed to fulfil this objective. The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY exposing a low pin count Serial Peripheral Interface (SPI) to the host microcontroller. This also enables the addition of Ethernet functionality to existing low-end microcontrollers which do not integrate a MAC controller. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-2-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: add devmem TCP documentationMina Almasry2024-09-122-0/+270
| | | | | | | | | | Add documentation outlining the usage and details of devmem TCP. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20240910171458.219195-12-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* mptcp: disable active MPTCP in case of blackholeMatthieu Baerts (NGI0)2024-09-121-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | An MPTCP firewall blackhole can be detected if the following SYN retransmission after a fallback to "plain" TCP is accepted. In case of blackhole, a similar technique to the one in place with TFO is now used: MPTCP can be disabled for a certain period of time, 1h by default. This time period will grow exponentially when more blackhole issues get detected right after MPTCP is re-enabled and will reset to the initial value when the blackhole issue goes away. The blackhole period can be modified thanks to a new sysctl knob: blackhole_timeout. Two new MIB counters help understanding what's happening: - 'Blackhole', incremented when a blackhole is detected. - 'MPCapableSYNTXDisabled', incremented when an MPTCP connection directly falls back to TCP during the blackhole period. Because the technique is inspired by the one used by TFO, an important part of the new code is similar to what can find in tcp_fastopen.c, with some adaptations to the MPTCP case. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge tag 'mlx5-updates-2024-09-02' of ↵Jakub Kicinski2024-09-111-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2024-08-29 HW-Managed Flow Steering in mlx5 driver Yevgeny Kliteynik says: ======================= 1. Overview ----------- ConnectX devices support packet matching, modification, and redirection. This functionality is referred as Flow Steering. To configure a steering rule, the rule is written to the device-owned memory. This memory is accessed and cached by the device when processing a packet. The first implementation of Flow Steering was done in FW, and it is referred in the mlx5 driver as Device-Managed Flow Steering (DMFS). Later we introduced SW-managed Flow Steering (SWS or SMFS), where the driver is writing directly to the device's configuration memory (ICM) through RC QP using RDMA operations (RDMA-read and RDAM-write), thus achieving higher rates of rule insertion/deletion. Now we introduce a new flow steering implementation: HW-Managed Flow Steering (HWS or HMFS). In this new approach, the driver is configuring steering rules directly to the HW using the WQs with a special new type of WQE. This way we can reach higher rule insertion/deletion rate with much lower CPU utilization compared to SWS. The key benefits of HWS as opposed to SWS: + HW manages the steering decision tree - HW calculates CRC for each entry - HW handles tree hash collisions - HW & FW manage objects refcount + HW keeps cache coherency: - HW provides tree access locking and synchronization - HW provides notification on completion + Insertion rate isn’t affected by background traffic - Dedicated HW components that handle insertion 2. Performance -------------- Measuring Connection Tracking with simple IPv4 flows w/o NAT, we are able to get ~5 times more flows offloaded per second using HWS. 3. Configuration ---------------- The enablement of HWS mode in eswitch manager is done using the same devlink param that is already used for switching between FW-managed steering and SW-managed steering modes: # devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs 4. Upstream Submission ---------------------- HWS support consists of 3 main components: + Steering: - The lower layer that exposes HWS API to upper layers and implements all the management of flow steering building blocks + FS-Core - Implementation of fs_hws layer to enable fs_core to use HWS instead of FW or SW steering - Create HW steering action pools to utilize the ability of HWS to share steering actions among different rules - Add support for configuring HWS mode through devlink command, similar to configuring SWS mode + Connection Tracking - Implementation of CT support for HW steering - Hooks up the CT ops for the new steering mode and uses the HWS API to implement connection tracking. Because of the large number of patches, we need to perform the submission in several separate patch series. This series is the first submission that lays the ground work for the next submissions, where an actual user of HWS will be added. 5. Patches in this series ------------------------- This patch series contains implementation of the first bullet from above. ======================= * tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: HWS, added API and enabled HWS support net/mlx5: HWS, added send engine and context handling net/mlx5: HWS, added debug dump and internal headers net/mlx5: HWS, added backward-compatible API handling net/mlx5: HWS, added memory management handling net/mlx5: HWS, added vport handling net/mlx5: HWS, added modify header pattern and args handling net/mlx5: HWS, added FW commands handling net/mlx5: HWS, added matchers functionality net/mlx5: HWS, added definers handling net/mlx5: HWS, added rules handling net/mlx5: HWS, added tables handling net/mlx5: HWS, added actions handling net/mlx5: Added missing definitions in preparation for HW Steering net/mlx5: Added missing mlx5_ifc definition for HW Steering ==================== Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/mlx5: HWS, added API and enabled HWS supportYevgeny Kliteynik2024-09-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | Enabling HWS support in the mlx5 driver: - added HWS API header - added HWS files in the mlx5 driver makefile - added kconfig flag that enables HWS compilation Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
* | net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flagJason Xing2024-09-111-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter out rx software timestamp report, especially after a process turns on netstamp_needed_key which can time stamp every incoming skb. Previously, we found out if an application starts first which turns on netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE could also get rx timestamp. Now we handle this case by introducing this new flag without breaking users. Quoting Willem to explain why we need the flag: "why a process would want to request software timestamp reporting, but not receive software timestamp generation. The only use I see is when the application does request SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE." Similarly, this new flag could also be used for hardware case where we can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive hardware receive timestamp. Another thing about errqueue in this patch I have a few words to say: In this case, we need to handle the egress path carefully, or else reporting the tx timestamp will fail. Egress path and ingress path will finally call sock_recv_timestamp(). We have to distinguish them. Errqueue is a good indicator to reflect the flow direction. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net-timestamp: correct the use of SOF_TIMESTAMPING_RAW_HARDWAREJason Xing2024-09-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SOF_TIMESTAMPING_RAW_HARDWARE is a report flag which passes the timestamps generated by either SOF_TIMESTAMPING_TX_HARDWARE or SOF_TIMESTAMPING_RX_HARDWARE to the userspace all the time. So let us revise the doc here. Link: https://lore.kernel.org/all/66d8c21d3042a_163d93294cb@willemb.c.googlers.com.notmuch/ Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://patch.msgid.link/20240908124141.39628-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | eth: fbnic: Add devlink firmware version infoLee Trager2024-09-102-0/+30
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support to show firmware version information for both stored and running firmware versions. The version and commit is displayed separately to aid monitoring tools which only care about the version. Example output: # devlink dev info pci/0000:01:00.0: driver fbnic serial_number 88-25-08-ff-ff-01-50-92 versions: running: fw 24.07.15-017 fw.commit h999784ae9df0 fw.bootloader 24.07.10-000 fw.bootloader.commit hfef3ac835ce7 stored: fw 24.07.24-002 fw.commit hc9d14a68b3f2 fw.bootloader 24.07.22-000 fw.bootloader.commit h922f8493eb96 fw.undi 01.00.03-000 Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: napi: Prevent overflow of napi_defer_hard_irqsJoe Damato2024-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature") napi_defer_irqs was added to net_device and napi_defer_irqs_count was added to napi_struct, both as type int. This value never goes below zero, so there is not reason for it to be a signed int. Change the type for both from int to u32, and add an overflow check to sysfs to limit the value to S32_MAX. The limit of S32_MAX was chosen because the practical limit before this patch was S32_MAX (anything larger was an overflow) and thus there are no behavioral changes introduced. If the extra bit is needed in the future, the limit can be raised. Before this patch: $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs' $ cat /sys/class/net/eth4/napi_defer_hard_irqs -2147483647 After this patch: $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs' bash: line 0: echo: write error: Numerical result out of range Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned: include/linux/netdevice.h: unsigned int tx_queue_len; And has an overflow check: dev_change_tx_queue_len(..., unsigned long new_len): if (new_len != (unsigned int)new_len) return -ERANGE; Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* netdev_features: convert NETIF_F_FCOE_MTU to dev->fcoe_mtuAlexander Lobakin2024-09-031-0/+1
| | | | | | | | | | | Ability to handle maximum FCoE frames of 2158 bytes can never be changed and thus more of an attribute, not a toggleable feature. Move it from netdev_features_t to "cold" priv flags (bitfield bool) and free yet another feature bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_localAlexander Lobakin2024-09-033-9/+3
| | | | | | | | | | | "Interface can't change network namespaces" is rather an attribute, not a feature, and it can't be changed via Ethtool. Make it a "cold" private flag instead of a netdev_feature and free one more bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin2024-09-033-10/+3
| | | | | | | | | | NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* netdevice: convert private flags > BIT(31) to bitfieldsAlexander Lobakin2024-09-031-1/+3
| | | | | | | | | | | | | | | | | Make dev->priv_flags `u32` back and define bits higher than 31 as bitfield booleans as per Jakub's suggestion. This simplifies code which accesses these bits with no optimization loss (testb both before/after), allows to not extend &netdev_priv_flags each time, but also scales better as bits > 63 in the future would only add a new u64 to the structure with no complications, comparing to that extending ::priv_flags would require converting it to a bitmap. Note that I picked `unsigned long :1` to not lose any potential optimizations comparing to `bool :1` etc. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* Documentation: Add missing fields to net_cachelinesJoe Damato2024-08-301-0/+2
| | | | | | | | | | | | | Two fields, page_pools and *irq_moder, were added to struct net_device in commit 083772c9f972 ("net: page_pool: record pools per netdev") and commit f750dfe825b9 ("ethtool: provide customized dim profile management"), respectively. Add both to the net_cachelines documentation, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240829155742.366584-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: Extend cable testing interface with result source informationOleksij Rempel2024-08-261-0/+5
| | | | | | | | | | | | | | | | | | | Extend the ethtool netlink cable testing interface by adding support for specifying the source of cable testing results. This allows users to differentiate between results obtained through different diagnostic methods. For example, some TI 10BaseT1L PHYs provide two variants of cable diagnostics: Time Domain Reflectometry (TDR) and Active Link Cable Diagnostic (ALCD). By introducing `ETHTOOL_A_CABLE_RESULT_SRC` and `ETHTOOL_A_CABLE_FAULT_LENGTH_SRC` attributes, this update enables drivers to indicate whether the result was derived from TDR or ALCD, improving the clarity and utility of diagnostic information. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240822120703.1393130-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Documentation: networking: document phy_link_topologyMaxime Chevallier2024-08-233-0/+125
| | | | | | | | | | | | | | The newly introduced phy_link_topology tracks all ethernet PHYs that are attached to a netdevice. Document the base principle, internal and external APIs. As the phy_link_topology is expected to be extended, this documentation will hold any further improvements and additions made relative to topology handling. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethtool: Introduce a command to list PHYs on an interfaceMaxime Chevallier2024-08-231-0/+41
| | | | | | | | | | | | | | | As we have the ability to track the PHYs connected to a net_device through the link_topology, we can expose this list to userspace. This allows userspace to use these identifiers for phy-specific commands and take the decision of which PHY to target by knowing the link topology. Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list devices on only one interface. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethtool: Allow passing a phy index for some commandsMaxime Chevallier2024-08-231-0/+7
| | | | | | | | | | | | | | | | Some netlink commands are target towards ethernet PHYs, to control some of their features. As there's several such commands, add the ability to pass a PHY index in the ethnl request, which will populate the generic ethnl_req_info with the passed phy_index. Add a helper that netlink command handlers need to use to grab the targeted PHY from the req_info. This helper needs to hold rtnl_lock() while interacting with the PHY, as it may be removed at any point. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* docs: networking: Align documentation with behavior changeTariq Toukan2024-08-161-5/+5
| | | | | | | | | | | Following commit 9f7e8fbb91f8 ("net/mlx5: offset comp irq index in name by one"), which fixed the index in IRQ name to start once again from 0, we change the documentation accordingly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20240815142343.2254247-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Documentation: networking: correct spellingJing-Ping Jan2024-08-141-10/+10
| | | | | | | | | | Correct spelling problems for Documentation/networking/ as reported by ispell. Signed-off-by: Jing-Ping Jan <zoo868e@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240812170910.5760-1-zoo868e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: rss: support skipping contexts during dumpJakub Kicinski2024-08-121-2/+10
| | | | | | | | | | | | | Applications may want to deal with dynamic RSS contexts only. So dumping context 0 will be counter-productive for them. Support starting the dump from a given context ID. Alternative would be to implement a dump flag to skip just context 0, not sure which is better... Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* documentation/networking: update l2tp docsJames Chapman2024-08-111-34/+20
| | | | | | | | | | | | l2tp no longer uses sk_user_data in tunnel sockets and now manages tunnel/session lifetimes slightly differently. Update docs to cover this. CC: linux-doc@vger.kernel.org CC: corbet@lwn.net Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-08-011-0/+1
|\ | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240801131917.34494-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ethtool: rss: echo the context number backJakub Kicinski2024-07-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The response to a GET request in Netlink should fully identify the queried object. RSS_GET accepts context id as an input, so it must echo that attribute back to the response. After (assuming context 1 has been created): $ ./cli.py --spec netlink/specs/ethtool.yaml \ --do rss-get \ --json '{"header": {"dev-index": 2}, "context": 1}' {'context': 1, 'header': {'dev-index': 2, 'dev-name': 'eth0'}, [...] Fixes: 7112a04664bf ("ethtool: add netlink based get rss support") Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240724234249.2621109-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Add support for PIO p flagPatrick Rohr2024-07-311-0/+14
|/ | | | | | | | | | | | | | | | | | | | draft-ietf-6man-pio-pflag is adding a new flag to the Prefix Information Option to signal that the network can allocate a unique IPv6 prefix per client via DHCPv6-PD (see draft-ietf-v6ops-dhcp-pd-per-device). When ra_honor_pio_pflag is enabled, the presence of a P-flag causes SLAAC autoconfiguration to be disabled for that particular PIO. An automated test has been added in Android (r.android.com/3195335) to go along with this change. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: David Lamparter <equinox@opensourcerouting.org> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'for-netdev' of ↵Jakub Kicinski2024-07-251-6/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_lenStanislav Fomichev2024-07-251-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len") can break existing use cases which don't zero-initialize xdp_umem_reg padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we interpret the padding as tx_metadata_len only when being explicitly asked. Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
* | Merge tag 'devicetree-for-6.11' of ↵Linus Torvalds2024-07-181-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DT Bindings: - Convert and add a bunch of IBM FSI related bindings - Add a new schema listing legacy compatibles which will (probably) never be documented. This will silence various checks warning about them. - Add bindings for Sierra Wireless mangOH Green SPI IoT interface, new Arm 2024 Cortex and Neoverse CPUs, QCom sc8180x PDC, QCom SDX75 GPI DMA, imx8mp/imx8qxp fsl,irqsteer, and Renesas RZ/G2UL CRU and CSI-2 blocks - Convert Spreadtrum sprd-timer, FSL cpm_qe, FSL fsl,ls-scfg-msi, FSL q(b)man-*, FSL qoriq-mc, and img,pdc-wdt bindings to DT schema - Drop obsolete stericsson,abx500.txt DT core: - Update dtc to upstream version v1.7.0-93-g1df7b047fe43 - Add support to run DT validation on DTs with applied overlays - Add helper for creating boolean properties in dynamic nodes and use that for dynamic PCI nodes - Clean-up early parsing of '#{address,size}-cells'" * tag 'devicetree-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (39 commits) dt-bindings: timer: sprd-timer: convert to YAML dt-bindings: incomplete-devices: document devices without bindings dt-bindings: trivial-devices: document the Sierra Wireless mangOH Green SPI IoT interface scripts/dtc: Update to upstream version v1.7.0-93-g1df7b047fe43 dt-bindings: soc: fsl: Add fsl,ls1028a-reset for reset syscon node dt-bindings: soc: fsl: cpm_qe: convert to yaml format dt-bindings: i2c: i2c-fsi: Convert to json-schema dt-bindings: fsi: Document the FSI Hub Controller dt-bindings: fsi: Document the AST2700 FSI controller dt-bindings: fsi: ast2600-fsi-master: Convert to json-schema dt-bindings: fsi: ibm,i2cr-fsi-master: Reference common FSI controller dt-bindings: fsi: Document the FSI controller common properties dt-bindings: fsi: Document the IBM SBEFIFO engine dt-bindings: fsi: p9-occ: Convert to json-schema dt-bindings: fsi: Document the IBM SCOM engine dt-bindings: fsi: fsi2spi: Document SPI controller child nodes dt-bindings: interrupt-controller: convert fsl,ls-scfg-msi to yaml dt-bindings: soc: fsl: Convert q(b)man-* to yaml format dt-bindings: misc: fsl,qoriq-mc: convert to yaml format dt-bindings: drop stale Anson Huang from maintainers ...
| * dt-bindings: misc: fsl,qoriq-mc: convert to yaml formatFrank Li2024-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert fsl,qoriq-mc from txt to yaml format. Addition changes: - Child node name allow 'ethernet'. - Use 32bit address in example. - Fixed missed ';' in example. - Allow dma-coherent. - Remove smmu, its part in example. - Change child node name as 'ethernet' Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240617170934.813321-1-Frank.Li@nxp.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
* | Merge branch '40GbE' of ↵Jakub Kicinski2024-07-141-0/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-07-11 (net/intel) This series contains updates to most Intel network drivers. Tony removes MODULE_AUTHOR from drivers containing the entry. Simon Horman corrects a kdoc entry for i40e. Pawel adds implementation for devlink param "local_forwarding" on ice. Michal removes unneeded call, and code, for eswitch rebuild for ice. Sasha removed a no longer used field from igc. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove the internal 'eee_advert' field ice: remove eswitch rebuild ice: Add support for devlink local_forwarding param i40e: correct i40e_addr_to_hkey() name in kdoc net: intel: Remove MODULE_AUTHORs ==================== Link: https://patch.msgid.link/20240711201932.2019925-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | ice: Add support for devlink local_forwarding paramPawel Kaminski2024-07-111-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for driver-specific devlink local_forwarding param. Supported values are "enabled", "disabled" and "prioritized". Default configuration is set to "enabled". Add documentation in networking/devlink/ice.rst. In previous generations of Intel NICs the transmit scheduler was only limited by PCIe bandwidth when scheduling/assigning hairpin-bandwidth between VFs. Changes to E810 HW design introduced scheduler limitation, so that available hairpin-bandwidth is bound to external port speed. In order to address this limitation and enable NFV services such as "service chaining" a knob to adjust the scheduler config was created. Driver can send a configuration message to the FW over admin queue and internal FW logic will reconfigure HW to prioritize and add more BW to VF to VF traffic. An end result, for example, 10G port will no longer limit hairpin-bandwidth to 10G and much higher speeds can be achieved. Devlink local_forwarding param set to "prioritized" enables higher hairpin-bandwitdh on related PFs. Configuration is applicable only to 8x10G and 4x25G cards. Changing local_forwarding configuration will trigger CORER reset in order to take effect. Example command to change current value: devlink dev param set pci/0000:b2:00.3 name local_forwarding \ value prioritized \ cmode runtime Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-07-111-1/+1
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | docs: networking: devlink: capitalise length valueChris Packham2024-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Correct the example to match the help text from the devlink utility. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: ethtool: Add new power limit get and set featuresKory Maincent (Dent Project)2024-07-061-18/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch expands the status information provided by ethtool for PSE c33 with available power limit and available power limit ranges. It also adds a call to pse_ethtool_set_pw_limit() to configure the PSE control power limit. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-5-320003204264@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: ethtool: pse-pd: Expand C33 PSE status with class, power and extended stateKory Maincent (Dent Project)2024-07-061-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This update expands the status information provided by ethtool for PSE c33. It includes details such as the detected class, current power delivered, and extended state information. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-1-320003204264@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | ethtool: Add an interface for flashing transceiver modules' firmwareDanielle Ratson2024-06-281-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CMIS compliant modules such as QSFP-DD might be running a firmware that can be updated in a vendor-neutral way by exchanging messages between the host and the module as described in section 7.3.1 of revision 5.2 of the CMIS standard. Add a pair of new ethtool messages that allow: * User space to trigger firmware update of transceiver modules * The kernel to notify user space about the progress of the process The user interface is designed to be asynchronous in order to avoid RTNL being held for too long and to allow several modules to be updated simultaneously. The interface is designed with CMIS compliant modules in mind, but kept generic enough to accommodate future use cases, if these arise. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ethtool: provide customized dim profile managementHeng Qi2024-06-262-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NetDIM library, currently leveraged by an array of NICs, delivers excellent acceleration benefits. Nevertheless, NICs vary significantly in their dim profile list prerequisites. Specifically, virtio-net backends may present diverse sw or hw device implementation, making a one-size-fits-all parameter list impractical. On Alibaba Cloud, the virtio DPU's performance under the default DIM profile falls short of expectations, partly due to a mismatch in parameter configuration. I also noticed that ice/idpf/ena and other NICs have customized profilelist or placed some restrictions on dim capabilities. Motivated by this, I tried adding new params for "ethtool -C" that provides a per-device control to modify and access a device's interrupt parameters. Usage ======== The target NIC is named ethx. Assume that ethx only declares support for rx profile setting (with DIM_PROFILE_RX flag set in profile_flags) and supports modification of usec and pkt fields. 1. Query the currently customized list of the device $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 256, .comps = n/a,}, {.usec = 8, .pkts = 256, .comps = n/a,}, {.usec = 64, .pkts = 256, .comps = n/a,}, {.usec = 128, .pkts = 256, .comps = n/a,}, {.usec = 256, .pkts = 256, .comps = n/a,} tx-profile: n/a 2. Tune $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n "n" means do not modify this field. $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 1, .comps = n/a,}, {.usec = 2, .pkts = 256, .comps = n/a,}, {.usec = 3, .pkts = 3, .comps = n/a,}, {.usec = 4, .pkts = 4, .comps = n/a,}, {.usec = 256, .pkts = 5, .comps = n/a,} tx-profile: n/a 3. Hint If the device does not support some type of customized dim profiles, the corresponding "n/a" will display. If the "n/a" field is being modified, -EOPNOTSUPP will be reported. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Merge tag 'linux-can-next-for-6.11-20240621' of ↵Jakub Kicinski2024-06-222-0/+387
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-06-21 The first 2 patches are by Andy Shevchenko, one cleans up the includes in the mcp251x driver, the other one updates the sja100 plx_pci driver to make use of predefines PCI subvendor ID. Mans Rullgard's patch cleans up the Kconfig help text of for the slcan driver. Oliver Hartkopp provides a patch to update the documentation, which removes the ISO 15675-2 specification version where possible. The next 2 patches are by Harini T and update the documentation of the xilinx_can driver. Francesco Valla provides documentation for the ISO 15765-2 protocol. A patch by Dr. David Alan Gilbert removes an unused struct from the mscan driver. 12 patches are by Martin Jocic. The first three add support for 3 new devices to the kvaser_usb driver. The remaining 9 first clean up the kvaser_pciefd driver, and then add support for MSI. Krzysztof Kozlowski contributes 3 patches simplifies the CAN SPI drivers by making use of spi_get_device_match_data(). The last patch is by Martin Hundebøll, which reworks the m_can driver to not enable the CAN transceiver during probe. * tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (24 commits) can: m_can: don't enable transceiver when probing can: mcp251xfd: simplify with spi_get_device_match_data() can: mcp251x: simplify with spi_get_device_match_data() can: hi311x: simplify with spi_get_device_match_data() can: kvaser_pciefd: Add MSI interrupts can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR can: kvaser_pciefd: Change name of return code variable can: kvaser_pciefd: Rename board_irq to pci_irq can: kvaser_pciefd: Add unlikely can: kvaser_pciefd: Add inline can: kvaser_pciefd: Remove unnecessary comment can: kvaser_pciefd: Skip redundant NULL pointer check in ISR can: kvaser_pciefd: Group #defines together can: kvaser_usb: Add support for Kvaser Mini PCIe 1xCAN can: kvaser_usb: Add support for Kvaser USBcan Pro 5xCAN can: kvaser_usb: Add support for Vining 800 can: mscan: remove unused struct 'mscan_state' Documentation: networking: document ISO 15765-2 can: xilinx_can: Document driver description to list all supported IPs can: isotp: remove ISO 15675-2 specification version where possible ... ==================== Link: https://patch.msgid.link/20240621080201.305471-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | Documentation: networking: document ISO 15765-2Francesco Valla2024-06-202-0/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document basic concepts, APIs and behaviour of the ISO 15675-2 (ISO-TP) CAN stack. Signed-off-by: Francesco Valla <valla.francesco@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20240501092413.414700-2-valla.francesco@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
* | | | octeontx2-pf: Add ucast filter count configurability via devlink.Sai Krishna2024-06-211-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing method of reserving unicast filter count leads to wasted MCAM entries if the functionality is not used or fewer entries are used. Furthermore, the amount of MCAM entries differs amongst Octeon SoCs. We implemented a means to adjust the UC filter count via devlink, allowing for better use of MCAM entries across Netdev apps. commands: To get the current unicast filter count # devlink dev param show pci/0002:02:00.0 name unicast_filter_count To change/set the unicast filter count # devlink dev param set pci/0002:02:00.0 name unicast_filter_count value 5 cmode runtime Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | docs: net: document guidance of implementing the SR-IOV NDOsJakub Kicinski2024-06-212-0/+26
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New drivers were prevented from adding ndo_set_vf_* callbacks over the last few years. This was expected to result in broader switchdev adoption, but seems to have had little effect. Based on recent netdev meeting there is broad support for allowing adding those ops. There is a problem with the current API supporting a limited number of VFs (100+, which is less than some modern HW supports). We can try to solve it by adding similar functionality on devlink ports, but that'd be another API variation to maintain. So a netlink attribute reshuffling is a more likely outcome. Document the guidance, make it clear that the API is frozen. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: phy: introduce core support for phy-mode = "10g-qxgmii"Vladimir Oltean2024-06-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport specification. It uses the same signaling as USXGMII, but it multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G per port. Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean either the single-port USXGMII or the quad-port 10G-QXGMII variant, and they could get away just fine with that thus far. But there is a need to distinguish between the 2 as far as SerDes drivers are concerned. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | | net: ipv4: Add a sysctl to set multipath hash seedPetr Machata2024-06-131-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Documentation/tcp-ao: Add a few lines on tracepointsDmitry Safonov2024-06-121-0/+9
| | | | | | | | | | | | | | | Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-06-061-19/+14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/pensando/ionic/ionic_txrx.c d9c04209990b ("ionic: Mark error paths in the data path as unlikely") 491aee894a08 ("ionic: fix kernel panic in XDP_TX action") net/ipv6/ip6_fib.c b4cb4a1391dc ("net: use unrcu_pointer() helper") b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Revert "xsk: Document ability to redirect to any socket bound to the same umem"Magnus Karlsson2024-06-051-19/+14
| |/ | | | | | | | | | | | | | | | | | | This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f. Reported-by: Yuval El-Hanany <YuvalE@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com