summaryrefslogtreecommitdiffstats
path: root/drivers/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outboundYue Haibing2024-05-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path. WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70 Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:sk_mc_loop+0x2d/0x70 Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212 RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001 RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000 RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00 R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000 R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000 FS: 0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? __warn (kernel/panic.c:693) ? sk_mc_loop (net/core/sock.c:760) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:239) ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? sk_mc_loop (net/core/sock.c:760) ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1)) ? nf_hook_slow (net/netfilter/core.c:626) ip6_finish_output (net/ipv6/ip6_output.c:222) ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215) ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan dev_hard_start_xmit (net/core/dev.c:3594) sch_direct_xmit (net/sched/sch_generic.c:343) __qdisc_run (net/sched/sch_generic.c:416) net_tx_action (net/core/dev.c:5286) handle_softirqs (kernel/softirq.c:555) __irq_exit_rcu (kernel/softirq.c:589) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043) The warning triggers as this: packet_sendmsg packet_snd //skb->sk is packet sk __dev_queue_xmit __dev_xmit_skb //q->enqueue is not NULL __qdisc_run sch_direct_xmit dev_hard_start_xmit ipvlan_start_xmit ipvlan_xmit_mode_l3 //l3 mode ipvlan_process_outbound //vepa flag ipvlan_process_v6_outbound ip6_local_out __ip6_finish_output ip6_finish_output2 //multicast packet sk_mc_loop //sk->sk_family is AF_PACKET Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: ena: Fix redundant device NUMA node overrideShay Agroskin2024-05-301-11/+0
| | | | | | | | | | | | | | | The driver overrides the NUMA node id of the device regardless of whether it knows its correct value (often setting it to -1 even though the node id is advertised in 'struct device'). This can lead to suboptimal configurations. This patch fixes this behavior and makes the shared memory allocation functions use the NUMA node id advertised by the underlying device. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/r/20240528170912.1204417-1-shayagr@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ice: check for unregistering correct number of devlink paramsDave Ertman2024-05-301-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On module load, the ice driver checks for the lack of a specific PF capability to determine if it should reduce the number of devlink params to register. One situation when this test returns true is when the driver loads in safe mode. The same check is not present on the unload path when devlink params are unregistered. This results in the driver triggering a WARN_ON in the kernel devlink code. The current check and code path uses a reduction in the number of elements reported in the list of params. This is fragile and not good for future maintaining. Change the parameters to be held in two lists, one always registered and one dependent on the check. Add a symmetrical check in the unload path so that the correct parameters are unregistered as well. Fixes: 109eb2917284 ("ice: Add tx_scheduling_layers devlink param") CC: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-8-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ice: fix 200G PHY types to link speed mappingPaul Greenwalt2024-05-301-0/+10
| | | | | | | | | | | | | | | | | | Commit 24407a01e57c ("ice: Add 200G speed/phy type use") added support for 200G PHY speeds, but did not include the mapping of 200G PHY types to link speed. As a result the driver is returning UNKNOWN link speed when setting 200G ethtool advertised link modes. To fix this add 200G PHY types to link speed mapping to ice_get_link_speed_based_on_phy_type(). Fixes: 24407a01e57c ("ice: Add 200G speed/phy type use") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-5-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* i40e: Fully suspend and resume IO operations in EEH caseThinh Tran2024-05-301-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | When EEH events occurs, the callback functions in the i40e, which are managed by the EEH driver, will completely suspend and resume all IO operations. - In the PCI error detected callback, replaced i40e_prep_for_reset() with i40e_io_suspend(). The change is to fully suspend all I/O operations - In the PCI error slot reset callback, replaced pci_enable_device_mem() with pci_enable_device(). This change enables both I/O and memory of the device. - In the PCI error resume callback, replaced i40e_handle_reset_warning() with i40e_io_resume(). This change allows the system to resume I/O operations Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* i40e: factoring out i40e_suspend/i40e_resumeThinh Tran2024-05-301-114/+135
| | | | | | | | | | | | | | | | | | | | | Two new functions, i40e_io_suspend() and i40e_io_resume(), have been introduced. These functions were factored out from the existing i40e_suspend() and i40e_resume() respectively. This factoring was done due to concerns about the logic of the I40E_SUSPENSED state, which caused the device to be unable to recover. The functions are now used in the EEH handling for device suspend/resume callbacks. The function i40e_enable_mc_magic_wake() has been moved ahead of i40e_io_suspend() to ensure it is declared before being used. Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* e1000e: move force SMBUS near the end of enable_ulp functionHui Wang2024-05-302-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") introduces a regression on PCH_MTP_I219_LM18 (PCIID: 0x8086550A). Without the referred commit, the ethernet works well after suspend and resume, but after applying the commit, the ethernet couldn't work anymore after the resume and the dmesg shows that the NIC link changes to 10Mbps (1000Mbps originally): [ 43.305084] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx Without the commit, the force SMBUS code will not be executed if "return 0" or "goto out" is executed in the enable_ulp(), and in my case, the "goto out" is executed since FWSM_FW_VALID is set. But after applying the commit, the force SMBUS code will be ran unconditionally. Here move the force SMBUS code back to enable_ulp() and put it immediately ahead of hw->phy.ops.release(hw), this could allow the longest settling time as possible for interface in this function and doesn't change the original code logic. The issue was found on a Lenovo laptop with the ethernet hw as below: 00:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:550a] (rev 20). And this patch is verified (cable plug and unplug, system suspend and resume) on Lenovo laptops with ethernet hw: [8086:550a], [8086:550b], [8086:15bb], [8086:15be], [8086:1a1f], [8086:1a1c] and [8086:0dc7]. Fixes: 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-1-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: microchip: fix RGMII error in KSZ DSA driverTristram Ha2024-05-301-1/+1
| | | | | | | | | | | | The driver should return RMII interface when XMII is running in RMII mode. Fixes: 0ab7f6bf1675 ("net: dsa: microchip: ksz9477: use common xmii function") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Jerry Ray <jerry.ray@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/1716932066-3342-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: ti: icssg-prueth: Fix start counter for ft1 filterMD Danish Anwar2024-05-281-1/+1
| | | | | | | | | | | | The start counter for FT1 filter is wrongly set to 0 in the driver. FT1 is used for source address violation (SAV) check and source address starts at Byte 6 not Byte 0. Fix this by changing start counter to ETH_ALEN in icssg_ft1_set_mac_addr(). Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/20240527063015.263748-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* ice: fix accounting if a VLAN already existsJacob Keller2024-05-281-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice_vsi_add_vlan() function is used to add a VLAN filter for the target VSI. This function prepares a filter in the switch table for the given VSI. If it succeeds, the vsi->num_vlan counter is incremented. It is not considered an error to add a VLAN which already exists in the switch table, so the function explicitly checks and ignores -EEXIST. The vsi->num_vlan counter is still incremented. This seems incorrect, as it means we can double-count in the case where the same VLAN is added twice by the caller. The actual table will have one less filter than the count. The ice_vsi_del_vlan() function similarly checks and handles the -ENOENT condition for when deleting a filter that doesn't exist. This flow only decrements the vsi->num_vlan if it actually deleted a filter. The vsi->num_vlan counter is used only in a few places, primarily related to tracking the number of non-zero VLANs. If the vsi->num_vlans gets out of sync, then ice_vsi_num_non_zero_vlans() will incorrectly report more VLANs than are present, and ice_vsi_has_non_zero_vlans() could return true potentially in cases where there are only VLAN 0 filters left. Fix this by only incrementing the vsi->num_vlan in the case where we actually added an entry, and not in the case where the entry already existed. Fixes: a1ffafb0b4a4 ("ice: Support configuring the device to Double VLAN Mode") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-2-17a923e0bb5f@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* idpf: don't enable NAPI and interrupts prior to allocating Rx buffersAlexander Lobakin2024-05-283-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, idpf enables NAPI and interrupts prior to allocating Rx buffers. This may lead to frame loss (there are no buffers to place incoming frames) and even crashes on quick ifup-ifdown. Interrupts must be enabled only after all the resources are here and available. Split interrupt init into two phases: initialization and enabling, and perform the second only after the queues are fully initialized. Note that we can't just move interrupt initialization down the init process, as the queues must have correct a ::q_vector pointer set and NAPI already added in order to allocate buffers correctly. Also, during the deinit process, disable HW interrupts first and only then disable NAPI. Otherwise, there can be a HW event leading to napi_schedule(), but the NAPI will already be unavailable. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Reported-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-1-17a923e0bb5f@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: micrel: Fix lan8841_config_intr after getting out of sleep modeHoratiu Vultur2024-05-281-1/+9
| | | | | | | | | | | | | | | | | | | | When the interrupt is enabled, the function lan8841_config_intr tries to clear any pending interrupts by reading the interrupt status, then checks the return value for errors and then continue to enable the interrupt. It has been seen that once the system gets out of sleep mode, the interrupt status has the value 0x400 meaning that the PHY detected that the link was in low power. That is correct value but the problem is that the check is wrong. We try to check for errors but we return an error also in this case which is not an error. Therefore fix this by returning only when there is an error. Fixes: a8f1a19d27ef ("net: micrel: Add support for lan8841 PHY") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240524085350.359812-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net:fec: Add fec_enet_deinit()Xiaolei Wang2024-05-281-0/+10
| | | | | | | | | | | | | | When fec_probe() fails or fec_drv_remove() needs to release the fec queue and remove a NAPI context, therefore add a function corresponding to fec_enet_init() and call fec_enet_deinit() which does the opposite to release memory and remove a NAPI context. Fixes: 59d0f7465644 ("net: fec: init multi queue date structure") Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240524050528.4115581-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge tag 'for-netdev' of ↵Jakub Kicinski2024-05-281-6/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-05-27 We've added 15 non-merge commits during the last 7 day(s) which contain a total of 18 files changed, 583 insertions(+), 55 deletions(-). The main changes are: 1) Fix broken BPF multi-uprobe PID filtering logic which filtered by thread while the promise was to filter by process, from Andrii Nakryiko. 2) Fix the recent influx of syzkaller reports to sockmap which triggered a locking rule violation by performing a map_delete, from Jakub Sitnicki. 3) Fixes to netkit driver in particular on skb->pkt_type override upon pass verdict, from Daniel Borkmann. 4) Fix an integer overflow in resolve_btfids which can wrongly trigger build failures, from Friedrich Vock. 5) Follow-up fixes for ARC JIT reported by static analyzers, from Shahab Vahedi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Cover verifier checks for mutating sockmap/sockhash Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem" bpf: Allow delete from sockmap/sockhash only if update is allowed selftests/bpf: Add netkit test for pkt_type selftests/bpf: Add netkit tests for mac address netkit: Fix pkt_type override upon netkit pass verdict netkit: Fix setting mac address in l2 mode ARC, bpf: Fix issues reported by the static analyzers selftests/bpf: extend multi-uprobe tests with USDTs selftests/bpf: extend multi-uprobe tests with child thread case libbpf: detect broken PID filtering logic for multi-uprobe bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic bpf: fix multi-uprobe PID filtering logic bpf: Fix potential integer overflow in resolve_btfids MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT ==================== Link: https://lore.kernel.org/r/20240527203551.29712-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * netkit: Fix pkt_type override upon netkit pass verdictDaniel Borkmann2024-05-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running Cilium connectivity test suite with netkit in L2 mode, we found that compared to tcx a few tests were failing which pushed traffic into an L7 proxy sitting in host namespace. The problem in particular is around the invocation of eth_type_trans() in netkit. In case of tcx, this is run before the tcx ingress is triggered inside host namespace and thus if the BPF program uses the bpf_skb_change_type() helper the newly set type is retained. However, in case of netkit, the late eth_type_trans() invocation overrides the earlier decision from the BPF program which eventually leads to the test failure. Instead of eth_type_trans(), split out the relevant parts, meaning, reset of mac header and call to eth_skb_pkt_type() before the BPF program is run in order to have the same behavior as with tcx, and refactor a small helper called eth_skb_pull_mac() which is run in case it's passed up the stack where the mac header must be pulled. With this all connectivity tests pass. Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240524163619.26001-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * netkit: Fix setting mac address in l2 modeDaniel Borkmann2024-05-251-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running Cilium connectivity test suite with netkit in L2 mode, we found that it is expected to be able to specify a custom MAC address for the devices, in particular, cilium-cni obtains the specified MAC address by querying the endpoint and sets the MAC address of the interface inside the Pod. Thus, fix the missing support in netkit for L2 mode. Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20240524163619.26001-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | net: usb: smsc95xx: fix changing LED_SEL bit value updated from EEPROMParthiban Veerasooran2024-05-271-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LED Select (LED_SEL) bit in the LED General Purpose IO Configuration register is used to determine the functionality of external LED pins (Speed Indicator, Link and Activity Indicator, Full Duplex Link Indicator). The default value for this bit is 0 when no EEPROM is present. If a EEPROM is present, the default value is the value of the LED Select bit in the Configuration Flags of the EEPROM. A USB Reset or Lite Reset (LRST) will cause this bit to be restored to the image value last loaded from EEPROM, or to be set to 0 if no EEPROM is present. While configuring the dual purpose GPIO/LED pins to LED outputs in the LED General Purpose IO Configuration register, the LED_SEL bit is changed as 0 and resulting the configured value from the EEPROM is cleared. The issue is fixed by using read-modify-write approach. Fixes: f293501c61c5 ("smsc95xx: configure LED outputs") Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Link: https://lore.kernel.org/r/20240523085314.167650-1-Parthiban.Veerasooran@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | Octeontx2-pf: Free send queue buffers incase of leaf to innerHariprasad Kelam2024-05-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two type of classes. "Leaf classes" that are the bottom of the class hierarchy. "Inner classes" that are neither the root class nor leaf classes. QoS rules can only specify leaf classes as targets for traffic. Root / \ / \ 1 2 /\ / \ 4 5 classes 1,4 and 5 are leaf classes. class 2 is a inner class. When a leaf class made as inner, or vice versa, resources associated with send queue (send queue buffers and transmit schedulers) are not getting freed. Fixes: 5e6808b4c68d ("octeontx2-pf: Add support for HTB offload") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240523073626.4114-1-hkelam@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | enic: Validate length of nl attributes in enic_set_vf_portRoded Zats2024-05-271-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enic_set_vf_port assumes that the nl attribute IFLA_PORT_PROFILE is of length PORT_PROFILE_MAX and that the nl attributes IFLA_PORT_INSTANCE_UUID, IFLA_PORT_HOST_UUID are of length PORT_UUID_MAX. These attributes are validated (in the function do_setlink in rtnetlink.c) using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and IFLA_PORT_HOST_UUID as NLA_STRING. That means that the length validation using the policy is for the max size of the attributes and not on exact size so the length of these attributes might be less than the sizes that enic_set_vf_port expects. This might cause an out of bands read access in the memcpys of the data of these attributes in enic_set_vf_port. Fixes: f8bd909183ac ("net: Add ndo_{set|get}_vf_port support for enic dynamic vnics") Signed-off-by: Roded Zats <rzats@paloaltonetworks.com> Link: https://lore.kernel.org/r/20240522073044.33519-1-rzats@paloaltonetworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | net/mlx5e: Fix UDP GSO for encapsulated packetsGal Pressman2024-05-242-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the skb is encapsulated, adjust the inner UDP header instead of the outer one, and account for UDP header (instead of TCP) in the inline header size calculation. Fixes: 689adf0d4892 ("net/mlx5e: Add UDP GSO support") Reported-by: Jason Baron <jbaron@akamai.com> Closes: https://lore.kernel.org/netdev/c42961cb-50b9-4a9a-bd43-87fe48d88d29@akamai.com/ Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer ↵Carolina Jubran2024-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exhaustion Previously, the driver incorrectly used rx_dropped to report device buffer exhaustion. According to the documentation, rx_dropped should not be used to count packets dropped due to buffer exhaustion, which is the purpose of rx_missed_errors. Use rx_missed_errors as intended for counting packets dropped due to buffer exhaustion. Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Do not use ptp structure for tx ts stats when not initializedRahul Rameshbabu2024-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ptp channel instance is only initialized when ptp traffic is first processed by the driver. This means that there is a window in between when port timestamping is enabled and ptp traffic is sent where the ptp channel instance is not initialized. Accessing statistics during this window will lead to an access violation (NULL + member offset). Check the validity of the instance before attempting to query statistics. BUG: unable to handle page fault for address: 0000000000003524 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 109dfc067 P4D 109dfc067 PUD 1064ef067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 420 Comm: ethtool Not tainted 6.9.0-rc2-rrameshbabu+ #245 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/204 RIP: 0010:mlx5e_stats_ts_get+0x4c/0x130 <snip> Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x24/0x70 ? page_fault_oops+0x15f/0x430 ? do_user_addr_fault+0x2c9/0x5c0 ? exc_page_fault+0x63/0x110 ? asm_exc_page_fault+0x27/0x30 ? mlx5e_stats_ts_get+0x4c/0x130 ? mlx5e_stats_ts_get+0x20/0x130 mlx5e_get_ts_stats+0x15/0x20 <snip> Fixes: 3579032c08c1 ("net/mlx5e: Implement ethtool hardware timestamping statistics") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Fix IPsec tunnel mode offload feature checkRahul Rameshbabu2024-05-241-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove faulty check disabling checksum offload and GSO for offload of simple IPsec tunnel L4 traffic. Comment previously describing the deleted code incorrectly claimed the check prevented double tunnel (or three layers of ip headers). Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5: Use mlx5_ipsec_rx_status_destroy to correctly delete status rulesRahul Rameshbabu2024-05-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rx_create no longer allocates a modify_hdr instance that needs to be cleaned up. The mlx5_modify_header_dealloc call will lead to a NULL pointer dereference. A leak in the rules also previously occurred since there are now two rules populated related to status. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 109907067 P4D 109907067 PUD 116890067 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 484 Comm: ip Not tainted 6.9.0-rc2-rrameshbabu+ #254 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:mlx5_modify_header_dealloc+0xd/0x70 <snip> Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x24/0x70 ? page_fault_oops+0x15f/0x430 ? free_to_partial_list.constprop.0+0x79/0x150 ? do_user_addr_fault+0x2c9/0x5c0 ? exc_page_fault+0x63/0x110 ? asm_exc_page_fault+0x27/0x30 ? mlx5_modify_header_dealloc+0xd/0x70 rx_create+0x374/0x590 rx_add_rule+0x3ad/0x500 ? rx_add_rule+0x3ad/0x500 ? mlx5_cmd_exec+0x2c/0x40 ? mlx5_create_ipsec_obj+0xd6/0x200 mlx5e_accel_ipsec_fs_add_rule+0x31/0xf0 mlx5e_xfrm_add_state+0x426/0xc00 <snip> Fixes: 94af50c0a9bb ("net/mlx5e: Unify esw and normal IPsec status table creation/destruction") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5: Do not query MPIR on embedded CPU functionTariq Toukan2024-05-241-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A proper query to MPIR needs to set the correct value in the depth field. On embedded CPU this value is not necessarily zero. As there is no real use case for multi-PF netdev on the embedded CPU of the smart NIC, block this option. This fixes the following failure: ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5) Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5: Lag, do bond only if slaves agree on roce stateMaher Sanalla2024-05-241-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the driver does not enforce that lag bond slaves must have matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts to enable roce on all vports of the bond slaves, causing the following syndrome when one slave has no roce fw support: mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0) failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22) Thus, create HW lag only if bond's slaves agree on roce state, either all slaves have roce support resulting in a roce lag bond, or none do, resulting in a raw eth bond. Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8061Mathieu Othacehe2024-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following a similar reinstate for the KSZ8081 and KSZ9031. Older kernels would use the genphy_soft_reset if the PHY did not implement a .soft_reset. The KSZ8061 errata described here: https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8061-Errata-DS80000688B.pdf and worked around with 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect") is back again without this soft reset. Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Tested-by: Karim Ben Houcine <karim.benhoucine@landisgyr.com> Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'net-6.10-rc1' of ↵Linus Torvalds2024-05-2310-106/+42
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Quite smaller than usual. Notably it includes the fix for the unix regression from the past weeks. The TCP window fix will require some follow-up, already queued. Current release - regressions: - af_unix: fix garbage collection of embryos Previous releases - regressions: - af_unix: fix race between GC and receive path - ipv6: sr: fix missing sk_buff release in seg6_input_core - tcp: remove 64 KByte limit for initial tp->rcv_wnd value - eth: r8169: fix rx hangup - eth: lan966x: remove ptp traps in case the ptp is not enabled - eth: ixgbe: fix link breakage vs cisco switches - eth: ice: prevent ethtool from corrupting the channels Previous releases - always broken: - openvswitch: set the skbuff pkt_type for proper pmtud support - tcp: Fix shift-out-of-bounds in dctcp_update_alpha() Misc: - a bunch of selftests stabilization patches" * tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits) r8169: Fix possible ring buffer corruption on fragmented Tx packets. idpf: Interpret .set_channels() input differently ice: Interpret .set_channels() input differently nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() net: relax socket state check at accept time. tcp: remove 64 KByte limit for initial tp->rcv_wnd value net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe() tls: fix missing memory barrier in tls_init net: fec: avoid lock evasion when reading pps_enable Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" testing: net-drv: use stats64 for testing net: mana: Fix the extra HZ in mana_hwc_send_request net: lan966x: Remove ptp traps in case the ptp is not enabled. openvswitch: Set the skbuff pkt_type for proper pmtud support. selftest: af_unix: Make SCM_RIGHTS into OOB data. af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). selftests/net: use tc rule to filter the na packet ipv6: sr: fix memleak in seg6_hmac_init_algo af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock. ...
| * | r8169: Fix possible ring buffer corruption on fragmented Tx packets.Ken Milmore2024-05-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An issue was found on the RTL8125b when transmitting small fragmented packets, whereby invalid entries were inserted into the transmit ring buffer, subsequently leading to calls to dma_unmap_single() with a null address. This was caused by rtl8169_start_xmit() not noticing changes to nr_frags which may occur when small packets are padded (to work around hardware quirks) in rtl8169_tso_csum_v2(). To fix this, postpone inspecting nr_frags until after any padding has been applied. Fixes: 9020845fb5d6 ("r8169: improve rtl8169_start_xmit") Cc: stable@vger.kernel.org Signed-off-by: Ken Milmore <ken.milmore@gmail.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | idpf: Interpret .set_channels() input differentlyLarysa Zaremba2024-05-231-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike ice, idpf does not check, if user has requested at least 1 combined channel. Instead, it relies on a check in the core code. Unfortunately, the check does not trigger for us because of the hacky .set_channels() interpretation logic that is not consistent with the core code. This naturally leads to user being able to trigger a crash with an invalid input. This is how: 1. ethtool -l <IFNAME> -> combined: 40 2. ethtool -L <IFNAME> rx 0 tx 0 combined number is not specified, so command becomes {rx_count = 0, tx_count = 0, combined_count = 40}. 3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel, comparing (combined_count + rx_count) and (combined_count + tx_count) to zero. Obviously, (40 + 0) is greater than zero, so the core code deems the input OK. 4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such configuration. The issue has to be solved fundamentally, as current logic is also known to cause AF_XDP problems in ice [0]. Interpret the command in a way that is more consistent with ethtool manual [1] (--show-channels and --set-channels) and new ice logic. Considering that in the idpf driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/ [1] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | ice: Interpret .set_channels() input differentlyLarysa Zaremba2024-05-231-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bug occurs because a safety check guarding AF_XDP-related queues in ethnl_set_channels(), does not trigger. This happens, because kernel and ice driver interpret the ethtool command differently. How the bug occurs: 1. ethtool -l <IFNAME> -> combined: 40 2. Attach AF_XDP to queue 30 3. ethtool -L <IFNAME> rx 15 tx 15 combined number is not specified, so command becomes {rx_count = 15, tx_count = 15, combined_count = 40}. 4. ethnl_set_channels checks, if there are any AF_XDP of queues from the new (combined_count + rx_count) to the old one, so from 55 to 40, check does not trigger. 5. ice interprets `rx 15 tx 15` as 15 combined channels and deletes the queue that AF_XDP is attached to. Interpret the command in a way that is more consistent with ethtool manual [0] (--show-channels and --set-channels). Considering that in the ice driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()Romain Gantois2024-05-231-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the prueth_probe() function, if one of the calls to emac_phy_connect() fails due to of_phy_connect() returning NULL, then the subsequent call to phy_attached_info() will dereference a NULL pointer. Check the return code of emac_phy_connect and fail cleanly if there is an error. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Cc: stable@vger.kernel.org Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/20240521-icssg-prueth-fix-v1-1-b4b17b1433e9@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: fec: avoid lock evasion when reading pps_enableWei Fang2024-05-231-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignment of pps_enable is protected by tmreg_lock, but the read operation of pps_enable is not. So the Coverity tool reports a lock evasion warning which may cause data race to occur when running in a multithread environment. Although this issue is almost impossible to occur, we'd better fix it, at least it seems more logically reasonable, and it also prevents Coverity from continuing to issue warnings. Fixes: 278d24047891 ("net: fec: ptp: Enable PPS output based on ptp clock") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"Jacob Keller2024-05-232-56/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. According to the commit, it implements a manual AN-37 for some "troublesome" Juniper MX5 switches. This appears to be a workaround for a particular switch. It has been reported that this causes a severe breakage for other switches, including a Cisco 3560CX-12PD-S. The code appears to be a workaround for a specific switch which fails to link in SFI mode. It expects to see AN-37 auto negotiation in order to link. The Cisco switch is not expecting AN-37 auto negotiation. When the device starts the manual AN-37, the Cisco switch decides that the port is confused and stops attempting to link with it. This persists until a power cycle. A simple driver unload and reload does not resolve the issue, even if loading with a version of the driver which lacks this workaround. The authors of the workaround commit have not responded with clarifications, and the result of the workaround is complete failure to connect with other switches. This appears to be a case where the driver can either "correctly" link with the Juniper MX5 switch, at the cost of bricking the link with the Cisco switch, or it can behave properly for the Cisco switch, but fail to link with the Junipir MX5 switch. I do not know enough about the standards involved to clearly determine whether either switch is at fault or behaving incorrectly. Nor do I know whether there exists some alternative fix which corrects behavior with both switches. Revert the workaround for the Juniper switch. Fixes: 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI") Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/ Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Daly <jeffd@silicom-usa.com> Cc: kernel.org-fo5k2w@ycharbi.fr Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: mana: Fix the extra HZ in mana_hwc_send_requestSouradeep Chakrabarti2024-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 62c1bff593b7 added an extra HZ along with msecs_to_jiffies. This patch fixes that. Cc: stable@vger.kernel.org Fixes: 62c1bff593b7 ("net: mana: Configure hwc timeout from hardware") Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1716185104-31658-1-git-send-email-schakrabarti@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: lan966x: Remove ptp traps in case the ptp is not enabled.Horatiu Vultur2024-05-221-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lan966x is adding ptp traps to redirect the ptp frames to the CPU such that the HW will not forward these frames anywhere. The issue is that in case ptp is not enabled and the timestamping source is et to HWTSTAMP_SOURCE_NETDEV then these traps would not be removed on the error path. Fix this by removing the traps in this case as they are not needed. Fixes: 54e1ed69c40a ("net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240517135808.3025435-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already"Heiner Kallweit2024-05-211-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. Ken reported that RTL8125b can lock up if gro_flush_timeout has the default value of 20000 and napi_defer_hard_irqs is set to 0. In this scenario device interrupts aren't disabled, what seems to trigger some silicon bug under heavy load. I was able to reproduce this behavior on RTL8168h. Fix this by reverting 7274c4147afb. Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") Cc: stable@vger.kernel.org Reported-by: Ken Milmore <ken.milmore@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * net: Always descend into dsa/ folder with CONFIG_NET_DSA enabledFlorian Fainelli2024-05-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen reported that he was unable to get the dsa_loop driver to get probed, and the reason ended up being because he had CONFIG_FIXED_PHY=y in his kernel configuration. As Masahiro explained it: "obj-m += dsa/" means everything under dsa/ must be modular. If there is a built-in object under dsa/ with CONFIG_NET_DSA=m, you cannot do "obj-$(CONFIG_NET_DSA) += dsa/". You need to change it back to "obj-y += dsa/". This was the case here whereby CONFIG_NET_DSA=m, and so the obj-$(CONFIG_FIXED_PHY) += dsa_loop_bdinfo.o rule is not executed and the DSA loop mdio_board info structure is not registered with the kernel, and eventually the device is simply not found. To preserve the intention of the original commit of limiting the amount of folder descending, conditionally descend into drivers/net/dsa when CONFIG_NET_DSA is enabled. Fixes: 227d72063fcc ("dsa: simplify Kconfig symbols and dependencies") Reported-by: Stephen Langstaff <stephenlangstaff1@gmail.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'trace-assign-str-v6.10' of ↵Linus Torvalds2024-05-2332-145/+144
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing cleanup from Steven Rostedt: "Remove second argument of __assign_str() The __assign_str() macro logic of the TRACE_EVENT() macro was optimized so that it no longer needs the second argument. The __assign_str() is always matched with __string() field that takes a field name and the source for that field: __string(field, source) The TRACE_EVENT() macro logic will save off the source value and then use that value to copy into the ring buffer via the __assign_str(). Before commit c1fa617caeb0 ("tracing: Rework __assign_str() and __string() to not duplicate getting the string"), the __assign_str() needed the second argument which would perform the same logic as the __string() source parameter did. Not only would this add overhead, but it was error prone as if the __assign_str() source produced something different, it may not have allocated enough for the string in the ring buffer (as the __string() source was used to determine how much to allocate) Now that the __assign_str() just uses the same string that was used in __string() it no longer needs the source parameter. It can now be removed" * tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/treewide: Remove second parameter of __assign_str()
| * | tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)2024-05-2332-145/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
* | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2024-05-233-3/+0
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
| * | wifi: mac80211_hwsim: drop owner assignmentKrzysztof Kozlowski2024-05-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-20-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | net: virtio: drop owner assignmentKrzysztof Kozlowski2024-05-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-17-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | net: caif: virtio: drop owner assignmentKrzysztof Kozlowski2024-05-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-16-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | Merge tag 'pci-v6.10-changes' of ↵Linus Torvalds2024-05-2116-51/+48
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...
| * | | wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACYDamien Le Moal2024-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-21-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> [bhelgaas: split to separate patch] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | wifi: rtw88: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACYDamien Le Moal2024-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-21-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> [bhelgaas: split to separate patch] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | wifi: ath10k: Refer to INTX instead of LEGACYDamien Le Moal2024-04-253-30/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To be consistent with the deprecation of PCI_IRQ_LEGACY and its replacement with PCI_IRQ_INTX, rename macros and functions referencing "legacy irq" to instead use the term "intx irq". Link: https://lore.kernel.org/r/20240325070944.3600338-20-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | net: wangxun: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACYDamien Le Moal2024-04-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-19-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | r8169: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACYDamien Le Moal2024-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-18-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Heiner Kallweit <hkallweit1@gmail.com>