linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	bnxt_en: Add VF PCI ID for 5760X (P7) chips	Ajit Khaparde	2024-05-02	2	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	bnxt_en: Optimize recovery path ULP locking in the driver	Kalesh AP	2024-05-02	3	-26/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	bnxt_en: Add a mutex to synchronize ULP operations	Kalesh AP	2024-05-02	2	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current scheme relies heavily on the RTNL lock for all ULP operations between the L2 and the RoCE driver. Add a new en_dev_lock mutex so that the asynchronous ULP_STOP and ULP_START operations can be serialized with bnxt_register_dev() and bnxt_unregister_dev() calls without relying on the RTNL lock. The next patch will remove the RTNL lock from the ULP_STOP and ULP_START calls. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	bnxt_en: Don't call ULP_STOP/ULP_START during L2 reset	Michael Chan	2024-05-02	1	-11/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	bnxt_en: Don't support offline self test when RoCE driver is loaded	Kalesh AP	2024-05-02	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Offline self test is a very disruptive operation for RoCE and requires all active QPs to be destroyed. With a large number of QPs, it can take a long time to destroy all the QPs and can timeout. Do not allow ethtool offline self test if the RoCE driver is registered on the device. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	bnxt_en: share NQ ring sw_stats memory with subrings	Edwin Peer	2024-05-02	3	-19/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	Merge branch '40GbE' of ↵	Jakub Kicinski	2024-05-02	10	-152/+211
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== i40e: cleanups & refactors Ivan Vecera says: This series do following: Patch 1 - Removes write-only flags field from i40e_veb structure and from i40e_veb_setup() parameters Patch 2 - Refactors parameter of i40e_notify_client_of_l2_param_changes() and i40e_notify_client_of_netdev_close() Patch 3 - Refactors parameter of i40e_detect_recover_hung() Patch 4 - Adds helper i40e_pf_get_main_vsi() to get main VSI and uses it in existing code Patch 5 - Consolidates checks whether given VSI is the main one Patch 6 - Adds helper i40e_pf_get_main_veb() to get main VEB and uses it in existing code Patch 7 - Adds helper i40e_vsi_reconfig_tc() to reconfigure TC for particular and uses it to replace existing open-coded pieces * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: Add and use helper to reconfigure TC for given VSI i40e: Add helper to access main VEB i40e: Consolidate checks whether given VSI is main i40e: Add helper to access main VSI i40e: Refactor argument of i40e_detect_recover_hung() i40e: Refactor argument of several client notification functions i40e: Remove flags field from i40e_veb ==================== Link: https://lore.kernel.org/r/20240430180639.1938515-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
\| *	i40e: Add and use helper to reconfigure TC for given VSI	Ivan Vecera	2024-04-30	1	-8/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add helper i40e_vsi_reconfig_tc(vsi) that configures TC for given VSI using previously stored TC bitmap. Effectively replaces open-coded patterns: enabled_tc = vsi->tc_config.enabled_tc; vsi->tc_config.enabled_tc = 0; i40e_vsi_config_tc(vsi, enabled_tc); Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Add helper to access main VEB	Ivan Vecera	2024-04-30	3	-16/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a helper to access main VEB: i40e_pf_get_main_veb(pf) replaces 'pf->veb[pf->lan_veb]' Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Consolidate checks whether given VSI is main	Ivan Vecera	2024-04-30	3	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the driver code there are 3 types of checks whether given VSI is main or not: 1. vsi->type ==/!= I40E_VSI_MAIN 2. vsi ==/!= pf->vsi[pf->lan_vsi] 3. vsi->seid ==/!= pf->vsi[pf->lan_vsi]->seid All of them are equivalent and can be consolidated. Convert cases 2 and 3 to case 1. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Add helper to access main VSI	Ivan Vecera	2024-04-30	9	-83/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add simple helper i40e_pf_get_main_vsi(pf) to access main VSI that replaces pattern 'pf->vsi[pf->lan_vsi]' Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Refactor argument of i40e_detect_recover_hung()	Ivan Vecera	2024-04-30	3	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 07d44190a389 ("i40e/i40evf: Detect and recover hung queue scenario") changes i40e_detect_recover_hung() argument type from i40e_pf* to i40e_vsi* to be shareable by both i40e and i40evf. Because the i40evf does not exist anymore and the function is exclusively used by i40e we can revert this change. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Refactor argument of several client notification functions	Ivan Vecera	2024-04-30	3	-19/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0ef2d5afb12d ("i40e: KISS the client interface") simplified the client interface so in practice it supports only one client per i40e netdev. But we have still 2 notification functions that uses as parameter a pointer to VSI of netdevice associated with the client. After the mentioned commit only possible and used VSI is the main (LAN) VSI. So refactor these functions so they are called with PF pointer argument and the associated VSI (LAN) is taken inside them. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
\| *	i40e: Remove flags field from i40e_veb	Ivan Vecera	2024-04-30	3	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The field is initialized always to zero and it is never read. Remove it. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* \|	net: ti: icssg_prueth: Add SW TX / RX Coalescing based on hrtimers	MD Danish Anwar	2024-05-02	4	-8/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SW IRQ coalescing based on hrtimers for RX and TX data path for ICSSG driver, which can be enabled by ethtool commands: - RX coalescing ethtool -C eth1 rx-usecs 50 - TX coalescing can be enabled per TX queue - by default enables coalescing for TX0 ethtool -C eth1 tx-usecs 50 - configure TX0 ethtool -Q eth0 queue_mask 1 --coalesce tx-usecs 100 - configure TX1 ethtool -Q eth0 queue_mask 2 --coalesce tx-usecs 100 - configure TX0 and TX1 ethtool -Q eth0 queue_mask 3 --coalesce tx-usecs 100 --coalesce tx-usecs 100 Minimum value for both rx-usecs and tx-usecs is 20us. Compared to gro_flush_timeout and napi_defer_hard_irqs this patch allows to enable IRQ coalescing for RX path separately. Benchmarking numbers: =============================================================== \| Method \| Tput_TX \| CPU_TX \| Tput_RX \| CPU_RX \| \| ============================================================== \| Default Driver 943 Mbps 31% 517 Mbps 38% \| \| IRQ Coalescing (Patch) 943 Mbps 28% 518 Mbps 25% \| =============================================================== Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240430120634.1558998-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* \|	net: loopback: Do not allocate lstats explicitly	Breno Leitao	2024-05-01	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the loopback driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429085559.2841918-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* \|	inet: introduce dst_rtable() helper	Eric Dumazet	2024-05-01	3	-13/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: sfp-bus: constify link_modes to sfp_select_interface()	Russell King (Oracle)	2024-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	sfp_select_interface() does not modify its link_modes argument, so make this a const pointer. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/E1s15s0-00AHyq-8E@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net: sfp: allow use 2500base-X for 2500base-T modules	Russell King (Oracle)	2024-04-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Allow use of 2500base-X interface mode for PHY modules that support 2500base-T. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/E1s15rv-00AHyk-5S@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net: phylink: add debug print for empty posssible_interfaces	Russell King (Oracle)	2024-04-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Add a debugging print in phylink_validate_phy() when we detect that the PHY has not supplied a possible_interfaces bitmap. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/E1s15rq-00AHye-22@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net: dsa: realtek: provide own phylink MAC operations	Russell King (Oracle)	2024-04-30	4	-18/+46
\| \| \| \| \| \| \| \| \| \| \|	Convert realtek to provide its own phylink MAC operations, thus avoiding the shim layer in DSA's port.c. We need to provide a stub for the mandatory mac_config() method for rtl8366rb. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/E1s11qJ-00AHi0-Kk@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net: dsa: mt7530: do not set MT7530_P5_DIS when PHY muxing is being used	Arınç ÜNAL	2024-04-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DSA initalises the ds->num_ports amount of ports in dsa_switch_touch_ports(). When the PHY muxing feature is in use, port 5 won't be defined in the device tree. Because of this, the type member of the dsa_port structure for this port will be assigned DSA_PORT_TYPE_UNUSED. The dsa_port_setup() function calls ds->ops->port_disable() when the port type is DSA_PORT_TYPE_UNUSED. The MT7530_P5_DIS bit is unset in mt7530_setup() when PHY muxing is being used. mt7530_port_disable() which is assigned to ds->ops->port_disable() is called afterwards. Currently, mt7530_port_disable() sets MT7530_P5_DIS which breaks network connectivity when PHY muxing is being used. Therefore, do not set MT7530_P5_DIS when PHY muxing is being used. Fixes: 377174c5760c ("net: dsa: mt7530: move MT753X_MTRAP operations for MT7530") Reported-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240428-for-netnext-mt7530-do-not-disable-port5-when-phy-muxing-v2-1-bb7c37d293f8@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net/smc: decouple ism_client from SMC-D DMB registration	Wen Gu	2024-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The struct 'ism_client' is specialized for s390 platform firmware ISM. So replace it with 'void' to make SMCD DMB registration helper generic for both Emulated-ISM and existing ISM. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio-net: support queue stat	Xuan Zhuo	2024-04-30	1	-21/+371
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To enhance functionality, we now support reporting statistics through the netdev-generic netlink (netdev-genl) queue stats interface. However, this does not extend to all statistics, so a new field, qstat_offset, has been introduced. This field determines which statistics should be reported via netdev-genl queue stats. Given that queue stats are retrieved individually per queue, it's necessary for the virtnet_get_hw_stats() function to be capable of fetching statistics for a specific queue. As the document https://docs.kernel.org/next/networking/statistics.html#notes-for-driver-authors We should not duplicate the stats which get reported via the netlink API in ethtool. If the stats are for queue stat, that will not be reported by ethtool -S. python3 ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump qstats-get --json '{"scope": "queue"}' [{'ifindex': 2, 'queue-id': 0, 'queue-type': 'rx', 'rx-bytes': 157844011, 'rx-csum-bad': 0, 'rx-csum-none': 0, 'rx-csum-unnecessary': 2195386, 'rx-hw-drop-overruns': 0, 'rx-hw-drop-ratelimits': 0, 'rx-hw-drops': 12964, 'rx-packets': 598929}, {'ifindex': 2, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 1938511, 'tx-csum-none': 0, 'tx-hw-drop-errors': 0, 'tx-hw-drop-ratelimits': 0, 'tx-hw-drops': 0, 'tx-needs-csum': 61263, 'tx-packets': 15515}] Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio_net: add the total stats field	Xuan Zhuo	2024-04-30	1	-12/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, we just show the stats of every queue. But for the user, the total values of every stat may are valuable. NIC statistics: rx_packets: 373522 rx_bytes: 85919736 rx_drops: 0 rx_xdp_packets: 0 rx_xdp_tx: 0 rx_xdp_redirects: 0 rx_xdp_drops: 0 rx_kicks: 11125 rx_hw_notifications: 0 rx_hw_packets: 1325870 rx_hw_bytes: 263348963 rx_hw_interrupts: 0 rx_hw_drops: 1451 rx_hw_drop_overruns: 0 rx_hw_csum_valid: 1325870 rx_hw_needs_csum: 1325870 rx_hw_csum_none: 0 rx_hw_csum_bad: 0 rx_hw_ratelimit_packets: 0 rx_hw_ratelimit_bytes: 0 tx_packets: 10050 tx_bytes: 1230176 tx_xdp_tx: 0 tx_xdp_tx_drops: 0 tx_kicks: 10050 tx_timeouts: 0 tx_hw_notifications: 0 tx_hw_packets: 32281 tx_hw_bytes: 4315590 tx_hw_interrupts: 0 tx_hw_drops: 0 tx_hw_drop_malformed: 0 tx_hw_csum_none: 0 tx_hw_needs_csum: 32281 tx_hw_ratelimit_packets: 0 tx_hw_ratelimit_bytes: 0 rx0_packets: 373522 rx0_bytes: 85919736 rx0_drops: 0 rx0_xdp_packets: 0 rx0_xdp_tx: 0 rx0_xdp_redirects: 0 rx0_xdp_drops: 0 rx0_kicks: 11125 rx0_hw_notifications: 0 rx0_hw_packets: 1325870 rx0_hw_bytes: 263348963 rx0_hw_interrupts: 0 rx0_hw_drops: 1451 rx0_hw_drop_overruns: 0 rx0_hw_csum_valid: 1325870 rx0_hw_needs_csum: 1325870 rx0_hw_csum_none: 0 rx0_hw_csum_bad: 0 rx0_hw_ratelimit_packets: 0 rx0_hw_ratelimit_bytes: 0 tx0_packets: 10050 tx0_bytes: 1230176 tx0_xdp_tx: 0 tx0_xdp_tx_drops: 0 tx0_kicks: 10050 tx0_timeouts: 0 tx0_hw_notifications: 0 tx0_hw_packets: 32281 tx0_hw_bytes: 4315590 tx0_hw_interrupts: 0 tx0_hw_drops: 0 tx0_hw_drop_malformed: 0 tx0_hw_csum_none: 0 tx0_hw_needs_csum: 32281 tx0_hw_ratelimit_packets: 0 tx0_hw_ratelimit_bytes: 0 Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio_net: device stats helpers support driver stats	Xuan Zhuo	2024-04-30	1	-75/+82
\| \| \| \| \| \| \| \| \| \| \|	In the last commit, we introduced some helpers for device stats. And the drivers stats are realized by the open code. This commit make the helpers to support driver stats. Then we can have the unify helper for device and driver stats. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio_net: support device stats	Xuan Zhuo	2024-04-30	1	-4/+472
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the spec https://github.com/oasis-tcs/virtio-spec/commit/42f389989823039724f95bbbd243291ab0064f82 make virtio-net support getting the stats from the device by ethtool -S <eth0>. NIC statistics: rx0_packets: 582951 rx0_bytes: 155307077 rx0_drops: 0 rx0_xdp_packets: 0 rx0_xdp_tx: 0 rx0_xdp_redirects: 0 rx0_xdp_drops: 0 rx0_kicks: 17007 rx0_hw_packets: 2179409 rx0_hw_bytes: 510015040 rx0_hw_notifications: 0 rx0_hw_interrupts: 0 rx0_hw_needs_csum: 2179409 rx0_hw_ratelimit_bytes: 0 tx0_packets: 15361 tx0_bytes: 1918970 tx0_xdp_tx: 0 tx0_xdp_tx_drops: 0 tx0_kicks: 15361 tx0_timeouts: 0 tx0_hw_packets: 32272 tx0_hw_bytes: 4311698 tx0_hw_notifications: 0 tx0_hw_interrupts: 0 tx0_hw_ratelimit_bytes: 0 The follow stats are hidden, there are exported by the queue stat API in the subsequent comment. VIRTNET_STATS_DESC_RX(basic, drops) VIRTNET_STATS_DESC_RX(basic, drop_overruns), VIRTNET_STATS_DESC_TX(basic, drops), VIRTNET_STATS_DESC_TX(basic, drop_malformed), VIRTNET_STATS_DESC_RX(csum, csum_valid), VIRTNET_STATS_DESC_RX(csum, csum_none), VIRTNET_STATS_DESC_RX(csum, csum_bad), VIRTNET_STATS_DESC_TX(csum, needs_csum), VIRTNET_STATS_DESC_TX(csum, csum_none), VIRTNET_STATS_DESC_RX(gso, gso_packets), VIRTNET_STATS_DESC_RX(gso, gso_bytes), VIRTNET_STATS_DESC_RX(gso, gso_packets_coalesced), VIRTNET_STATS_DESC_RX(gso, gso_bytes_coalesced), VIRTNET_STATS_DESC_TX(gso, gso_packets), VIRTNET_STATS_DESC_TX(gso, gso_bytes), VIRTNET_STATS_DESC_TX(gso, gso_segments), VIRTNET_STATS_DESC_TX(gso, gso_segments_bytes), VIRTNET_STATS_DESC_RX(speed, ratelimit_packets), VIRTNET_STATS_DESC_TX(speed, ratelimit_packets), Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio_net: remove "_queue" from ethtool -S	Xuan Zhuo	2024-04-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The key size of ethtool -S is controlled by this macro. ETH_GSTRING_LEN 32 That includes the \0 at the end. So the max length of the key name must is 31. But the length of the prefix "rx_queue_0_" is 11. If the queue num is larger than 10, the length of the prefix is 12. So the key name max is 19. That is too short. We will introduce some keys such as "gso_packets_coalesced". So we should change the prefix to "rx0_". Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	virtio_net: introduce ability to get reply info from device	Xuan Zhuo	2024-04-30	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \|	As the spec https://github.com/oasis-tcs/virtio-spec/commit/42f389989823039724f95bbbd243291ab0064f82 Based on the description provided in the above specification, we have enabled the virtio-net driver to support acquiring some response information from the device via the CVQ (Control Virtqueue). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
*	net: txgbe: use phylink_pcs_change() to report PCS link change events	Russell King (Oracle)	2024-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/E1s0OH2-009hgx-Qw@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: prestera: use phylink_pcs_change() to report PCS link change events	Russell King (Oracle)	2024-04-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/E1s0OGx-009hgr-NP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: mvneta: use phylink_pcs_change() to report PCS link change events	Russell King (Oracle)	2024-04-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/E1s0OGs-009hgl-Jg@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: mvpp2: use phylink_pcs_change() to report PCS link change events	Russell King (Oracle)	2024-04-30	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \|	Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/E1s0OGn-009hgf-G6@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: dsa: ksz_common: use separate phylink_mac_ops for ksz8830	Russell King (Oracle)	2024-04-30	1	-6/+20
\| \| \| \| \| \| \| \|	Use a separate phylink_mac_ops for the KSZ8830 chip-id. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1s0O7R-009gq2-Qm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: dsa: ksz_common: sub-driver phylink ops	Russell King (Oracle)	2024-04-30	4	-37/+65
\| \| \| \| \| \|	Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1s0O7M-009gpw-Lj@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: dsa: ksz_common: provide own phylink MAC operations	Russell King (Oracle)	2024-04-30	1	-16/+26
\| \| \| \| \| \| \| \| \| \|	Convert ksz_common to provide its own phylink MAC operations, thus avoiding the shim layer in DSA's port.c Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1s0O7H-009gpq-IF@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: dsa: ksz_common: remove phylink_mac_config from ksz_dev_ops	Russell King (Oracle)	2024-04-30	2	-6/+0
\| \| \| \| \| \| \| \| \| \|	The phylink_mac_config function pointer member of struct ksz_dev_ops is never initialised, so let's remove it to simplify the code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1s0O7C-009gpk-Dh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	net: phy: micrel: Add support for PTP_PF_EXTTS for lan8814	Horatiu Vultur	2024-04-29	1	-1/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the PTP programmable gpios to implement also PTP_PF_EXTTS function. The pins can be configured to capture both of rising and falling edge. Once the event is seen, then an interrupt is generated and the LTC is saved in the registers. On lan8814 only GPIO 3 can be configured for this. This was tested using: ts2phc -m -l 7 -s generic -f ts2phc.cfg Where the configuration was the following: --- [global] ts2phc.pin_index 3 [eth0] --- Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: realtek: add LED drivers for rtl8366rb	Luiz Angelo Daros de Luca	2024-04-29	1	-39/+265
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit introduces LED drivers for rtl8366rb, enabling LEDs to be described in the device tree using the same format as qca8k. Each port can configure up to 4 LEDs. If all LEDs in a group use the default state "keep", they will use the default behavior after a reset. Changing the brightness of one LED, either manually or by a trigger, will disable the default hardware trigger and switch the entire LED group to manually controlled LEDs. Once in this mode, there is no way to revert to hardware-controlled LEDs (except by resetting the switch). Software triggers function as expected with manually controlled LEDs. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: realtek: do not assert reset on remove	Luiz Angelo Daros de Luca	2024-04-29	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The necessity of asserting the reset on removal was previously questioned, as DSA's own cleanup methods should suffice to prevent traffic leakage[1]. When a driver has subdrivers controlled by devres, they will be unregistered after the main driver's .remove is executed. If it asserts a reset, the subdrivers will be unable to communicate with the hardware during their cleanup. For LEDs, this means that they will fail to turn off, resulting in a timeout error. [1] https://lore.kernel.org/r/20240123215606.26716-9-luizluca@gmail.com/ Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: realtek: keep default LED state in rtl8366rb	Luiz Angelo Daros de Luca	2024-04-29	1	-67/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This switch family supports four LEDs for each of its six ports. Each LED group is composed of one of these four LEDs from all six ports. LED groups can be configured to display hardware information, such as link activity, or manually controlled through a bitmap in registers RTL8366RB_LED_0_1_CTRL_REG and RTL8366RB_LED_2_3_CTRL_REG. After a reset, the default LED group configuration for groups 0 to 3 indicates, respectively, link activity, link at 1000M, 100M, and 10M, or RTL8366RB_LED_CTRL_REG as 0x5432. These configurations are commonly used for LED indications. However, the driver was replacing that configuration to use manually controlled LEDs (RTL8366RB_LED_FORCE) without providing a way for the OS to control them. The default configuration is deemed more useful than fixed, uncontrollable turned-on LEDs. The driver was enabling/disabling LEDs during port_enable/disable. However, these events occur when the port is administratively controlled (up or down) and are not related to link presence. Additionally, when a port N was disabled, the driver was turning off all LEDs for group N, not only the corresponding LED for port N in any of those 4 groups. In such cases, if port 0 was brought down, the LEDs for all ports in LED group 0 would be turned off. As another side effect, the driver was wrongly warning that port 5 didn't have an LED ("no LED for port 5"). Since showing the administrative state of ports is not an orthodox way to use LEDs, it was not worth it to fix it and all this code was dropped. The code to disable LEDs was simplified only changing each LED group to the RTL8366RB_LED_OFF state. Registers RTL8366RB_LED_0_1_CTRL_REG and RTL8366RB_LED_2_3_CTRL_REG are only used when the corresponding LED group is configured with RTL8366RB_LED_FORCE and they don't need to be cleaned. The code still references an LED controlled by RTL8366RB_INTERRUPT_CONTROL_REG, but as of now, no test device has actually used it. Also, some magic numbers were replaced by macros. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: introduce dst_rt6_info() helper	Eric Dumazet	2024-04-29	5	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlxsw: pci: Use NAPI for event processing	Amit Cohen	2024-04-29	1	-19/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spectrum ASICs only support a single interrupt, that means that all the events are handled by one IRQ (interrupt request) handler. Once an interrupt is received, we schedule tasklet to handle events from EQ and then schedule tasklets to handle completions from CQs. Tasklet runs in softIRQ (software IRQ) context, and will be run on the same CPU which scheduled it. That means that today we use only one CPU to handle all the packets (both network packets and EMADs) from hardware. This can be improved using NAPI. The idea is to use NAPI instance per CQ, which is mapped 1:1 to DQ (RDQ or SDQ). NAPI poll method can be run in kernel thread, so then the driver will be able to handle WQEs in several CPUs. Convert the existing code to use NAPI APIs. Add NAPI instance as part of 'struct mlxsw_pci_queue' and initialize it as part of CQs initialization. Set the appropriate poll method and dummy net device, according to queue number, similar to tasklet setup. For CQs which are used for completions of RDQ, use Rx poll method and 'napi_dev_rx', which is set as 'threaded'. It means that Rx poll method will run in kernel context, so several RDQs will be handled in parallel. For CQs which are used for completions of SDQ, use Tx poll method and 'napi_dev_tx', this method will run in softIRQ context, as it is recommended in NAPI documentation, as Tx packets' processing is short task. Convert mlxsw_pci_cq_{rx,tx}_tasklet() to poll methods. Handle 'budget' argument - ignore it in Tx poll method, as it is recommended to not limit Tx processing. For Rx processing, handle up to 'budget' completions. Return 'work_done' which is the amount of completions that were handled. Handle the following cases: 1. After processing 'budget' completions, the driver still has work to do: Return work-done = budget. In that case, the NAPI instance will be polled again (without the need to be rescheduled). Do not re-arm the queue, as NAPI will handle the reschedule, so we do not have to involve hardware to send an additional interrupt for the completions that should be processed. 2. Event processing has been completed: Call napi_complete_done() to mark NAPI processing as completed, which means that the poll method will not be rescheduled. Re-arm the queue, as all completions were handled. In case that poll method handled exactly 'budget' completions, return work-done = budget -1, to distinguish from the case that driver still has completions to handle. Otherwise, return the amount of completions that were handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure	Amit Cohen	2024-04-29	1	-38/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	The next patch will set the driver to use NAPI for event processing. Then tasklet mechanism will be used only for EQ. Reorganize 'mlxsw_pci_queue' to hold EQ and CQ attributes in a union. For now, add tasklet for both EQ and CQ. This will be changed in the next patch, as 'tasklet_struct' will be replaced with NAPI instance. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlxsw: pci: Initialize dummy net devices for NAPI	Amit Cohen	2024-04-29	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mlxsw will use NAPI for event processing in a next patch. As preparation, add two dummy net devices and initialize them. NAPI instance should be attached to net device. Usually each queue is used by a single net device in network drivers, so the mapping between net device to NAPI instance is intuitive. In our case, Rx queues are not per port, they are per trap-group. Tx queues are mapped to net devices, but we do not have a separate queue for each local port, several ports share the same queue. Use init_dummy_netdev() to initialize dummy net devices for NAPI. To run NAPI poll method in a kernel thread, the net device which NAPI instance is attached to should be marked as 'threaded'. It is recommended to handle Tx packets in softIRQ context, as usually this is a short task - just free the Tx packet which has been transmitted. Rx packets handling is more complicated task, so drivers can use a dedicated kernel thread to process them. It allows processing packets from different Rx queues in parallel. We would like to handle only Rx packets in kernel threads, which means that we will use two dummy net devices (one for Rx and one for Tx). Set only one of them with 'threaded' as it will be used for Rx processing. Do not fail in case that setting 'threaded' fails, as it is better to use regular softIRQ NAPI rather than preventing the driver from loading. Note that the net devices are initialized with init_dummy_netdev(), so they are not registered, which means that they will not be visible to user. It will not be possible to change 'threaded' configuration from user space, but it is reasonable in our case, as there is no another configuration which makes sense, considering that user has no influence on the usage of each queue. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlxsw: pci: Ring RDQ and CQ doorbells once per several completions	Amit Cohen	2024-04-29	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, for each CQE in CQ, we ring CQ doorbell, then handle RDQ and ring RDQ doorbell. Finally we ring CQ arm doorbell - once per CQ tasklet. The idea of ringing CQ doorbell before RDQ doorbell, is to be sure that when we post new WQE (after RDQ is handled), there is an available CQE. This was done because of a hardware bug as part of commit c9ebea04cb1b ("mlxsw: pci: Ring CQ's doorbell before RDQ's"). There is no real reason to ring RDQ and CQ doorbells for each completion, it is better to handle several completions and reduce number of ringings, as access to hardware is expensive (time wise) and might take time because of memory barriers. A previous patch changed CQ tasklet to handle up to 64 Rx packets. With this limitation, we can ring CQ and RDQ doorbells once per CQ tasklet. The counters of the doorbells are increased by the amount of packets that we handled, then the device will know for which completion to send an additional event. To avoid reordering CQ and RDQ doorbells' ring, let the tasklet to ring also RDQ doorbell, mlxsw_pci_cqe_rdq_handle() handles the counter but does not ring the doorbell. Note that with this change there is no need to copy the CQE, as we ring CQ doorbell only after Rx packet processing (which uses the CQE) is done. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlxsw: pci: Handle up to 64 Rx completions in tasklet	Amit Cohen	2024-04-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can get many completions in one interrupt. Currently, the CQ tasklet handles up to half queue size completions, and then arms the hardware to generate additional events, which means that in case that there were additional completions that we did not handle, we will get immediately an additional interrupt to handle the rest. The decision to handle up to half of the queue size is arbitrary and was determined in 2015, when mlxsw driver was added to the kernel. One additional fact that should be taken into account is that while WQEs from RDQ are handled, the CPU that handles the tasklet is dedicated for this task, which means that we might hold the CPU for a long time. Handle WQEs in smaller chucks, then arm CQ doorbell to notify the hardware to send additional notifications. Set the chunk size to 64 as this number is recommended using NAPI and the driver will use NAPI in a next patch. Note that for now we use ARM doorbell to retrigger CQ tasklet, but with NAPI it will be more efficient as software will reschedule the poll method and we will not involve hardware for that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ethernet: ti: am65-cpsw-qos: Add support to taprio for past base_time	Tanmay Patil	2024-04-29	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \|	If the base-time for taprio is in the past, start the schedule at the time of the form "base_time + N*cycle_time" where N is the smallest possible integer such that the above time is in the future. Signed-off-by: Tanmay Patil <t-patil@ti.com> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: lan9303: use ethtool_puts() for lan9303_get_strings()	Justin Stitt	2024-04-26	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pattern of strncpy with some pointer arithmetic setting fixed-sized intervals with string literal data is a bit weird so let's use ethtool_puts() as this has more obvious behavior and is less-error prone. Nicely, we also get to drop a usage of the now deprecated strncpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20240425-strncpy-drivers-net-dsa-lan9303-core-c-v4-1-9fafd419d7bb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet	Song Yoong Siang	2024-04-26	3	-40/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support to per-packet Tx hardware timestamp request to AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that user needs to enable Tx HW timestamp capability via igc_ioctl() with SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request. Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0 (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have four sets of timestamping registers. skb and xsk_tx_buffer pointers are used to indicate whether the timestamping register is already occupied. Furthermore, a boolean variable named xsk_pending_ts is used to hold the transmit completion until the tx hardware timestamp is ready. This is because, for i225/i226, the timestamp notification event comes some time after the transmit completion event. The driver will retrigger hardware irq to clean the packet after retrieve the tx hardware timestamp. Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook to the metadata location of the transmit packet. When the Tx timestamp interrupt is fired, the interrupt handler will copy the value of Tx hwts into metadata location via xsk_tx_metadata_complete(). This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel ADL-S platform. Below are the test steps and results. Test Step 1: Run xdp_hw_metadata app ./xdp_hw_metadata <iface> > /dev/shm/result.log Test Step 2: Enable Tx hardware timestamp hwstamp_ctl -i <iface> -t 1 -r 1 Test Step 3: Run ptp4l and phc2sys for time synchronization Test Step 4: Generate UDP packets with 1ms interval for 10s trafgen --dev <iface> '{eth(da=<addr>), udp(dp=9091)}' -t 1ms -n 10000 Test Step 5: Rerun Step 1-3 with 10s iperf3 as background traffic Test Step 6: Rerun Step 1-4 with 10s iperf3 as background traffic Based on iperf3 results below, the impact of holding tx completion to throughput is not observable. Result of last UDP packet (no. 10000) in Step 4: poll: 1 (0) skip=99 fail=0 redir=10000 xsk_ring_cons__peek: 1 0x5640a37972d0: rx_desc[9999]->addr=f2110 addr=f2110 comp_addr=f2110 EoP rx_hash: 0x2049BE1D with RSS type:0x1 HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (14.990 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (4.271 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x5640a37972d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6 0x5640a37972d0: complete tx idx=9999 addr=f010 HW TX-complete-time: 1679819246793036971 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (77.656 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (132.640 usec) HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to HW TX-complete-time sec:0.0001 (65.703 usec) 0x5640a37972d0: complete rx idx=10127 addr=f2110 Result of iperf3 without tx hwts request in step 5: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.05 sec 2.74 GBytes 2.34 Gbits/sec receiver Result of iperf3 running parallel with trafgen command in step 6: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 2.74 GBytes 2.34 Gbits/sec receiver Co-developed-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240424210256.3440903-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>