diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-06-27 19:05:35 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-06-27 19:05:35 +0200 |
commit | fd19d4a492af77b1e8fb0439781a3048d1d1f554 (patch) | |
tree | c293c1a1218fe87c4b6712938352dbc349d0b68e | |
parent | Merge tag 'sound-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/t... (diff) | |
parent | Merge tag 'nf-24-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/netf... (diff) | |
download | linux-fd19d4a492af77b1e8fb0439781a3048d1d1f554.tar.xz linux-fd19d4a492af77b1e8fb0439781a3048d1d1f554.zip |
Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
63 files changed, 1663 insertions, 594 deletions
diff --git a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml index c80c880a9dab..60aaf30d68ed 100644 --- a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml +++ b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml @@ -128,7 +128,6 @@ required: - cell-index - reg - fsl,fman-ports - - ptp-timer dependencies: pcs-handle-names: diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index 00dc61358be8..4510e8d1adcb 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -1603,7 +1603,7 @@ operations: attributes: - header reply: - attributes: &pse + attributes: - header - podl-pse-admin-state - podl-pse-admin-control @@ -1620,7 +1620,10 @@ operations: do: request: - attributes: *pse + attributes: + - header + - podl-pse-admin-control + - c33-pse-admin-control - name: rss-get doc: Get RSS params. diff --git a/MAINTAINERS b/MAINTAINERS index db9bb0ce3043..0748d6bd0c4f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4083,12 +4083,13 @@ F: kernel/bpf/ringbuf.c BPF [SECURITY & LSM] (Security Audit and Enforcement using BPF) M: KP Singh <kpsingh@kernel.org> -R: Matt Bobrowski <mattbobrowski@google.com> +M: Matt Bobrowski <mattbobrowski@google.com> L: bpf@vger.kernel.org S: Maintained F: Documentation/bpf/prog_lsm.rst F: include/linux/bpf_lsm.h F: kernel/bpf/bpf_lsm.c +F: kernel/trace/bpf_trace.c F: security/bpf/ BPF [SELFTESTS] (Test Runners & Infrastructure) @@ -17531,7 +17532,6 @@ F: include/linux/peci.h PENSANDO ETHERNET DRIVERS M: Shannon Nelson <shannon.nelson@amd.com> M: Brett Creeley <brett.creeley@amd.com> -M: drivers@pensando.io L: netdev@vger.kernel.org S: Supported F: Documentation/networking/device_drivers/ethernet/pensando/ionic.rst diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3c3fcce4acd4..d19aabf5d4fb 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5773,6 +5773,9 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev, if (real_dev) { ret = ethtool_get_ts_info_by_layer(real_dev, info); } else { + info->phc_index = -1; + info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE | + SOF_TIMESTAMPING_SOFTWARE; /* Check if all slaves support software tx timestamping */ rcu_read_lock(); bond_for_each_slave_rcu(bond, slave, iter) { diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c index 1d9057dc44f2..bf1589aef1fc 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c @@ -1618,11 +1618,20 @@ static int mcp251xfd_open(struct net_device *ndev) clear_bit(MCP251XFD_FLAGS_DOWN, priv->flags); can_rx_offload_enable(&priv->offload); + priv->wq = alloc_ordered_workqueue("%s-mcp251xfd_wq", + WQ_FREEZABLE | WQ_MEM_RECLAIM, + dev_name(&spi->dev)); + if (!priv->wq) { + err = -ENOMEM; + goto out_can_rx_offload_disable; + } + INIT_WORK(&priv->tx_work, mcp251xfd_tx_obj_write_sync); + err = request_threaded_irq(spi->irq, NULL, mcp251xfd_irq, IRQF_SHARED | IRQF_ONESHOT, dev_name(&spi->dev), priv); if (err) - goto out_can_rx_offload_disable; + goto out_destroy_workqueue; err = mcp251xfd_chip_interrupts_enable(priv); if (err) @@ -1634,6 +1643,8 @@ static int mcp251xfd_open(struct net_device *ndev) out_free_irq: free_irq(spi->irq, priv); + out_destroy_workqueue: + destroy_workqueue(priv->wq); out_can_rx_offload_disable: can_rx_offload_disable(&priv->offload); set_bit(MCP251XFD_FLAGS_DOWN, priv->flags); @@ -1661,6 +1672,7 @@ static int mcp251xfd_stop(struct net_device *ndev) hrtimer_cancel(&priv->tx_irq_timer); mcp251xfd_chip_interrupts_disable(priv); free_irq(ndev->irq, priv); + destroy_workqueue(priv->wq); can_rx_offload_disable(&priv->offload); mcp251xfd_timestamp_stop(priv); mcp251xfd_chip_stop(priv, CAN_STATE_STOPPED); diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c index 160528d3cc26..b1de8052a45c 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c @@ -131,6 +131,39 @@ mcp251xfd_tx_obj_from_skb(const struct mcp251xfd_priv *priv, tx_obj->xfer[0].len = len; } +static void mcp251xfd_tx_failure_drop(const struct mcp251xfd_priv *priv, + struct mcp251xfd_tx_ring *tx_ring, + int err) +{ + struct net_device *ndev = priv->ndev; + struct net_device_stats *stats = &ndev->stats; + unsigned int frame_len = 0; + u8 tx_head; + + tx_ring->head--; + stats->tx_dropped++; + tx_head = mcp251xfd_get_tx_head(tx_ring); + can_free_echo_skb(ndev, tx_head, &frame_len); + netdev_completed_queue(ndev, 1, frame_len); + netif_wake_queue(ndev); + + if (net_ratelimit()) + netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err); +} + +void mcp251xfd_tx_obj_write_sync(struct work_struct *work) +{ + struct mcp251xfd_priv *priv = container_of(work, struct mcp251xfd_priv, + tx_work); + struct mcp251xfd_tx_obj *tx_obj = priv->tx_work_obj; + struct mcp251xfd_tx_ring *tx_ring = priv->tx; + int err; + + err = spi_sync(priv->spi, &tx_obj->msg); + if (err) + mcp251xfd_tx_failure_drop(priv, tx_ring, err); +} + static int mcp251xfd_tx_obj_write(const struct mcp251xfd_priv *priv, struct mcp251xfd_tx_obj *tx_obj) { @@ -162,6 +195,11 @@ static bool mcp251xfd_tx_busy(const struct mcp251xfd_priv *priv, return false; } +static bool mcp251xfd_work_busy(struct work_struct *work) +{ + return work_busy(work); +} + netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, struct net_device *ndev) { @@ -175,7 +213,8 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, if (can_dev_dropped_skb(ndev, skb)) return NETDEV_TX_OK; - if (mcp251xfd_tx_busy(priv, tx_ring)) + if (mcp251xfd_tx_busy(priv, tx_ring) || + mcp251xfd_work_busy(&priv->tx_work)) return NETDEV_TX_BUSY; tx_obj = mcp251xfd_get_tx_obj_next(tx_ring); @@ -193,13 +232,13 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, netdev_sent_queue(priv->ndev, frame_len); err = mcp251xfd_tx_obj_write(priv, tx_obj); - if (err) - goto out_err; - - return NETDEV_TX_OK; - - out_err: - netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err); + if (err == -EBUSY) { + netif_stop_queue(ndev); + priv->tx_work_obj = tx_obj; + queue_work(priv->wq, &priv->tx_work); + } else if (err) { + mcp251xfd_tx_failure_drop(priv, tx_ring, err); + } return NETDEV_TX_OK; } diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h index 24510b3b8020..b35bfebd23f2 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h @@ -633,6 +633,10 @@ struct mcp251xfd_priv { struct mcp251xfd_rx_ring *rx[MCP251XFD_FIFO_RX_NUM]; struct mcp251xfd_tx_ring tx[MCP251XFD_FIFO_TX_NUM]; + struct workqueue_struct *wq; + struct work_struct tx_work; + struct mcp251xfd_tx_obj *tx_work_obj; + DECLARE_BITMAP(flags, __MCP251XFD_FLAGS_SIZE__); u8 rx_ring_num; @@ -952,6 +956,7 @@ void mcp251xfd_skb_set_timestamp(const struct mcp251xfd_priv *priv, void mcp251xfd_timestamp_init(struct mcp251xfd_priv *priv); void mcp251xfd_timestamp_stop(struct mcp251xfd_priv *priv); +void mcp251xfd_tx_obj_write_sync(struct work_struct *work); netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, struct net_device *ndev); diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c index 8faf8a462c05..7292c81fc0cd 100644 --- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c +++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c @@ -294,7 +294,7 @@ int kvaser_usb_send_cmd_async(struct kvaser_usb_net_priv *priv, void *cmd, } usb_free_urb(urb); - return 0; + return err; } int kvaser_usb_can_rx_over_error(struct net_device *netdev) diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c index f8ad7833f5d9..425e20daf1e9 100644 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@ -355,10 +355,8 @@ int ksz9477_reset_switch(struct ksz_device *dev) SPI_AUTO_EDGE_DETECTION, 0); /* default configuration */ - ksz_read8(dev, REG_SW_LUE_CTRL_1, &data8); - data8 = SW_AGING_ENABLE | SW_LINK_AUTO_AGING | - SW_SRC_ADDR_FILTER | SW_FLUSH_STP_TABLE | SW_FLUSH_MSTP_TABLE; - ksz_write8(dev, REG_SW_LUE_CTRL_1, data8); + ksz_write8(dev, REG_SW_LUE_CTRL_1, + SW_AGING_ENABLE | SW_LINK_AUTO_AGING | SW_SRC_ADDR_FILTER); /* disable interrupts */ ksz_write32(dev, REG_SW_INT_MASK__4, SWITCH_INT_MASK); @@ -429,6 +427,57 @@ void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze) mutex_unlock(&p->mib.cnt_mutex); } +int ksz9477_errata_monitor(struct ksz_device *dev, int port, + u64 tx_late_col) +{ + u32 pmavbc; + u8 status; + u16 pqm; + int ret; + + ret = ksz_pread8(dev, port, REG_PORT_STATUS_0, &status); + if (ret) + return ret; + if (!(FIELD_GET(PORT_INTF_SPEED_MASK, status) == PORT_INTF_SPEED_NONE) && + !(status & PORT_INTF_FULL_DUPLEX)) { + /* Errata DS80000754 recommends monitoring potential faults in + * half-duplex mode. The switch might not be able to communicate anymore + * in these states. + * If you see this message, please read the errata-sheet for more information: + * https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9477S-Errata-DS80000754.pdf + * To workaround this issue, half-duplex mode should be avoided. + * A software reset could be implemented to recover from this state. + */ + dev_warn_once(dev->dev, + "Half-duplex detected on port %d, transmission halt may occur\n", + port); + if (tx_late_col != 0) { + /* Transmission halt with late collisions */ + dev_crit_once(dev->dev, + "TX late collisions detected, transmission may be halted on port %d\n", + port); + } + ret = ksz_read8(dev, REG_SW_LUE_CTRL_0, &status); + if (ret) + return ret; + if (status & SW_VLAN_ENABLE) { + ret = ksz_pread16(dev, port, REG_PORT_QM_TX_CNT_0__4, &pqm); + if (ret) + return ret; + ret = ksz_read32(dev, REG_PMAVBC, &pmavbc); + if (ret) + return ret; + if ((FIELD_GET(PMAVBC_MASK, pmavbc) <= PMAVBC_MIN) || + (FIELD_GET(PORT_QM_TX_CNT_M, pqm) >= PORT_QM_TX_CNT_MAX)) { + /* Transmission halt with Half-Duplex and VLAN */ + dev_crit_once(dev->dev, + "resources out of limits, transmission may be halted\n"); + } + } + } + return ret; +} + void ksz9477_port_init_cnt(struct ksz_device *dev, int port) { struct ksz_port_mib *mib = &dev->ports[port].mib; @@ -1299,6 +1348,10 @@ int ksz9477_setup(struct dsa_switch *ds) /* Enable REG_SW_MTU__2 reg by setting SW_JUMBO_PACKET */ ksz_cfg(dev, REG_SW_MAC_CTRL_1, SW_JUMBO_PACKET, true); + /* Use collision based back pressure mode. */ + ksz_cfg(dev, REG_SW_MAC_CTRL_1, SW_BACK_PRESSURE, + SW_BACK_PRESSURE_COLLISION); + /* Now we can configure default MTU value */ ret = regmap_update_bits(ksz_regmap_16(dev), REG_SW_MTU__2, REG_SW_MTU_MASK, VLAN_ETH_FRAME_LEN + ETH_FCS_LEN); diff --git a/drivers/net/dsa/microchip/ksz9477.h b/drivers/net/dsa/microchip/ksz9477.h index ce1e656b800b..239a281da10b 100644 --- a/drivers/net/dsa/microchip/ksz9477.h +++ b/drivers/net/dsa/microchip/ksz9477.h @@ -36,6 +36,8 @@ int ksz9477_port_mirror_add(struct ksz_device *dev, int port, bool ingress, struct netlink_ext_ack *extack); void ksz9477_port_mirror_del(struct ksz_device *dev, int port, struct dsa_mall_mirror_tc_entry *mirror); +int ksz9477_errata_monitor(struct ksz_device *dev, int port, + u64 tx_late_col); void ksz9477_get_caps(struct ksz_device *dev, int port, struct phylink_config *config); int ksz9477_fdb_dump(struct ksz_device *dev, int port, diff --git a/drivers/net/dsa/microchip/ksz9477_reg.h b/drivers/net/dsa/microchip/ksz9477_reg.h index f3a205ee483f..d5354c600ea1 100644 --- a/drivers/net/dsa/microchip/ksz9477_reg.h +++ b/drivers/net/dsa/microchip/ksz9477_reg.h @@ -247,6 +247,7 @@ #define REG_SW_MAC_CTRL_1 0x0331 #define SW_BACK_PRESSURE BIT(5) +#define SW_BACK_PRESSURE_COLLISION 0 #define FAIR_FLOW_CTRL BIT(4) #define NO_EXC_COLLISION_DROP BIT(3) #define SW_JUMBO_PACKET BIT(2) @@ -842,8 +843,8 @@ #define REG_PORT_STATUS_0 0x0030 -#define PORT_INTF_SPEED_M 0x3 -#define PORT_INTF_SPEED_S 3 +#define PORT_INTF_SPEED_MASK GENMASK(4, 3) +#define PORT_INTF_SPEED_NONE GENMASK(1, 0) #define PORT_INTF_FULL_DUPLEX BIT(2) #define PORT_TX_FLOW_CTRL BIT(1) #define PORT_RX_FLOW_CTRL BIT(0) @@ -1167,6 +1168,11 @@ #define PORT_RMII_CLK_SEL BIT(7) #define PORT_MII_SEL_EDGE BIT(5) +#define REG_PMAVBC 0x03AC + +#define PMAVBC_MASK GENMASK(26, 16) +#define PMAVBC_MIN 0x580 + /* 4 - MAC */ #define REG_PORT_MAC_CTRL_0 0x0400 @@ -1494,6 +1500,7 @@ #define PORT_QM_TX_CNT_USED_S 0 #define PORT_QM_TX_CNT_M (BIT(11) - 1) +#define PORT_QM_TX_CNT_MAX 0x200 #define REG_PORT_QM_TX_CNT_1__4 0x0A14 diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 2818e24e2a51..0580b2fee21c 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -1382,6 +1382,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .tc_cbs_supported = true, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1416,6 +1417,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_ipms = 8, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1450,6 +1452,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_ipms = 8, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1540,6 +1543,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .tc_cbs_supported = true, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1820,6 +1824,7 @@ void ksz_r_mib_stats64(struct ksz_device *dev, int port) struct rtnl_link_stats64 *stats; struct ksz_stats_raw *raw; struct ksz_port_mib *mib; + int ret; mib = &dev->ports[port].mib; stats = &mib->stats64; @@ -1861,6 +1866,12 @@ void ksz_r_mib_stats64(struct ksz_device *dev, int port) pstats->rx_pause_frames = raw->rx_pause; spin_unlock(&mib->stats64_lock); + + if (dev->info->phy_errata_9477) { + ret = ksz9477_errata_monitor(dev, port, raw->tx_late_col); + if (ret) + dev_err(dev->dev, "Failed to monitor transmission halt\n"); + } } void ksz88xx_r_mib_stats64(struct ksz_device *dev, int port) @@ -2185,7 +2196,7 @@ static void ksz_irq_bus_sync_unlock(struct irq_data *d) struct ksz_device *dev = kirq->dev; int ret; - ret = ksz_write32(dev, kirq->reg_mask, kirq->masked); + ret = ksz_write8(dev, kirq->reg_mask, kirq->masked); if (ret) dev_err(dev->dev, "failed to change IRQ mask\n"); diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index c784fd23a993..ee7db46e469d 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -66,6 +66,7 @@ struct ksz_chip_data { bool tc_cbs_supported; const struct ksz_dev_ops *ops; const struct phylink_mac_ops *phylink_mac_ops; + bool phy_errata_9477; bool ksz87xx_eee_link_erratum; const struct ksz_mib_names *mib_names; int mib_cnt; diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5e9a93bdb518..23ebeb143987 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2482,6 +2482,18 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) (tx_pool->consumer_index + 1) % tx_pool->num_buffers; tx_buff = &tx_pool->tx_buff[bufidx]; + + /* Sanity checks on our free map to make sure it points to an index + * that is not being occupied by another skb. If skb memory is + * not freed then we see congestion control kick in and halt tx. + */ + if (unlikely(tx_buff->skb)) { + dev_warn_ratelimited(dev, "TX free map points to untracked skb (%s %d idx=%d)\n", + skb_is_gso(skb) ? "tso_pool" : "tx_pool", + queue_num, bufidx); + dev_kfree_skb_any(tx_buff->skb); + } + tx_buff->skb = skb; tx_buff->index = bufidx; tx_buff->pool_index = queue_num; @@ -4061,6 +4073,12 @@ static void release_sub_crqs(struct ibmvnic_adapter *adapter, bool do_h_free) adapter->num_active_tx_scrqs = 0; } + /* Clean any remaining outstanding SKBs + * we freed the irq so we won't be hearing + * from them + */ + clean_tx_pools(adapter); + if (adapter->rx_scrq) { for (i = 0; i < adapter->num_active_rx_scrqs; i++) { if (!adapter->rx_scrq[i]) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 1766230abfff..55a42aad92a5 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4139,7 +4139,7 @@ bool ice_is_wol_supported(struct ice_hw *hw) int ice_vsi_recfg_qs(struct ice_vsi *vsi, int new_rx, int new_tx, bool locked) { struct ice_pf *pf = vsi->back; - int err = 0, timeout = 50; + int i, err = 0, timeout = 50; if (!new_rx && !new_tx) return -EINVAL; @@ -4165,6 +4165,14 @@ int ice_vsi_recfg_qs(struct ice_vsi *vsi, int new_rx, int new_tx, bool locked) ice_vsi_close(vsi); ice_vsi_rebuild(vsi, ICE_VSI_FLAG_NO_INIT); + + ice_for_each_traffic_class(i) { + if (vsi->tc_cfg.ena_tc & BIT(i)) + netdev_set_tc_queue(vsi->netdev, + vsi->tc_cfg.tc_info[i].netdev_tc, + vsi->tc_cfg.tc_info[i].qcount_tx, + vsi->tc_cfg.tc_info[i].qoffset); + } ice_pf_dcb_recfg(pf, locked); ice_vsi_open(vsi); done: diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 671368d2c77e..9adf4301c9b1 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -6907,6 +6907,7 @@ static int mvpp2_port_probe(struct platform_device *pdev, /* 9704 == 9728 - 20 and rounding to 8 */ dev->max_mtu = MVPP2_BM_JUMBO_PKT_SIZE; device_set_node(&dev->dev, port_fwnode); + dev->dev_port = port->id; port->pcs_gmac.ops = &mvpp2_phylink_gmac_pcs_ops; port->pcs_gmac.neg_mode = true; diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index a85ac039d779..87d5776e3b88 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -648,14 +648,14 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for } else if (lvl == NIX_TXSCH_LVL_TL4) { parent = schq_list[NIX_TXSCH_LVL_TL3][prio]; req->reg[0] = NIX_AF_TL4X_PARENT(schq); - req->regval[0] = parent << 16; + req->regval[0] = (u64)parent << 16; req->num_regs++; req->reg[1] = NIX_AF_TL4X_SCHEDULE(schq); req->regval[1] = dwrr_val; } else if (lvl == NIX_TXSCH_LVL_TL3) { parent = schq_list[NIX_TXSCH_LVL_TL2][prio]; req->reg[0] = NIX_AF_TL3X_PARENT(schq); - req->regval[0] = parent << 16; + req->regval[0] = (u64)parent << 16; req->num_regs++; req->reg[1] = NIX_AF_TL3X_SCHEDULE(schq); req->regval[1] = dwrr_val; @@ -670,11 +670,11 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for } else if (lvl == NIX_TXSCH_LVL_TL2) { parent = schq_list[NIX_TXSCH_LVL_TL1][prio]; req->reg[0] = NIX_AF_TL2X_PARENT(schq); - req->regval[0] = parent << 16; + req->regval[0] = (u64)parent << 16; req->num_regs++; req->reg[1] = NIX_AF_TL2X_SCHEDULE(schq); - req->regval[1] = TXSCH_TL1_DFLT_RR_PRIO << 24 | dwrr_val; + req->regval[1] = (u64)hw->txschq_aggr_lvl_rr_prio << 24 | dwrr_val; if (lvl == hw->txschq_link_cfg_lvl) { req->num_regs++; @@ -698,7 +698,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for req->num_regs++; req->reg[1] = NIX_AF_TL1X_TOPOLOGY(schq); - req->regval[1] = (TXSCH_TL1_DFLT_RR_PRIO << 1); + req->regval[1] = hw->txschq_aggr_lvl_rr_prio << 1; req->num_regs++; req->reg[2] = NIX_AF_TL1X_CIR(schq); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h index 45a32e4b49d1..e3aee6e36215 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h @@ -139,33 +139,34 @@ #define NIX_LF_CINTX_ENA_W1C(a) (NIX_LFBASE | 0xD50 | (a) << 12) /* NIX AF transmit scheduler registers */ -#define NIX_AF_SMQX_CFG(a) (0x700 | (a) << 16) -#define NIX_AF_TL1X_SCHEDULE(a) (0xC00 | (a) << 16) -#define NIX_AF_TL1X_CIR(a) (0xC20 | (a) << 16) -#define NIX_AF_TL1X_TOPOLOGY(a) (0xC80 | (a) << 16) -#define NIX_AF_TL2X_PARENT(a) (0xE88 | (a) << 16) -#define NIX_AF_TL2X_SCHEDULE(a) (0xE00 | (a) << 16) -#define NIX_AF_TL2X_TOPOLOGY(a) (0xE80 | (a) << 16) -#define NIX_AF_TL2X_CIR(a) (0xE20 | (a) << 16) -#define NIX_AF_TL2X_PIR(a) (0xE30 | (a) << 16) -#define NIX_AF_TL3X_PARENT(a) (0x1088 | (a) << 16) -#define NIX_AF_TL3X_SCHEDULE(a) (0x1000 | (a) << 16) -#define NIX_AF_TL3X_SHAPE(a) (0x1010 | (a) << 16) -#define NIX_AF_TL3X_CIR(a) (0x1020 | (a) << 16) -#define NIX_AF_TL3X_PIR(a) (0x1030 | (a) << 16) -#define NIX_AF_TL3X_TOPOLOGY(a) (0x1080 | (a) << 16) -#define NIX_AF_TL4X_PARENT(a) (0x1288 | (a) << 16) -#define NIX_AF_TL4X_SCHEDULE(a) (0x1200 | (a) << 16) -#define NIX_AF_TL4X_SHAPE(a) (0x1210 | (a) << 16) -#define NIX_AF_TL4X_CIR(a) (0x1220 | (a) << 16) -#define NIX_AF_TL4X_PIR(a) (0x1230 | (a) << 16) -#define NIX_AF_TL4X_TOPOLOGY(a) (0x1280 | (a) << 16) -#define NIX_AF_MDQX_SCHEDULE(a) (0x1400 | (a) << 16) -#define NIX_AF_MDQX_SHAPE(a) (0x1410 | (a) << 16) -#define NIX_AF_MDQX_CIR(a) (0x1420 | (a) << 16) -#define NIX_AF_MDQX_PIR(a) (0x1430 | (a) << 16) -#define NIX_AF_MDQX_PARENT(a) (0x1480 | (a) << 16) -#define NIX_AF_TL3_TL2X_LINKX_CFG(a, b) (0x1700 | (a) << 16 | (b) << 3) +#define NIX_AF_SMQX_CFG(a) (0x700 | (u64)(a) << 16) +#define NIX_AF_TL4X_SDP_LINK_CFG(a) (0xB10 | (u64)(a) << 16) +#define NIX_AF_TL1X_SCHEDULE(a) (0xC00 | (u64)(a) << 16) +#define NIX_AF_TL1X_CIR(a) (0xC20 | (u64)(a) << 16) +#define NIX_AF_TL1X_TOPOLOGY(a) (0xC80 | (u64)(a) << 16) +#define NIX_AF_TL2X_PARENT(a) (0xE88 | (u64)(a) << 16) +#define NIX_AF_TL2X_SCHEDULE(a) (0xE00 | (u64)(a) << 16) +#define NIX_AF_TL2X_TOPOLOGY(a) (0xE80 | (u64)(a) << 16) +#define NIX_AF_TL2X_CIR(a) (0xE20 | (u64)(a) << 16) +#define NIX_AF_TL2X_PIR(a) (0xE30 | (u64)(a) << 16) +#define NIX_AF_TL3X_PARENT(a) (0x1088 | (u64)(a) << 16) +#define NIX_AF_TL3X_SCHEDULE(a) (0x1000 | (u64)(a) << 16) +#define NIX_AF_TL3X_SHAPE(a) (0x1010 | (u64)(a) << 16) +#define NIX_AF_TL3X_CIR(a) (0x1020 | (u64)(a) << 16) +#define NIX_AF_TL3X_PIR(a) (0x1030 | (u64)(a) << 16) +#define NIX_AF_TL3X_TOPOLOGY(a) (0x1080 | (u64)(a) << 16) +#define NIX_AF_TL4X_PARENT(a) (0x1288 | (u64)(a) << 16) +#define NIX_AF_TL4X_SCHEDULE(a) (0x1200 | (u64)(a) << 16) +#define NIX_AF_TL4X_SHAPE(a) (0x1210 | (u64)(a) << 16) +#define NIX_AF_TL4X_CIR(a) (0x1220 | (u64)(a) << 16) +#define NIX_AF_TL4X_PIR(a) (0x1230 | (u64)(a) << 16) +#define NIX_AF_TL4X_TOPOLOGY(a) (0x1280 | (u64)(a) << 16) +#define NIX_AF_MDQX_SCHEDULE(a) (0x1400 | (u64)(a) << 16) +#define NIX_AF_MDQX_SHAPE(a) (0x1410 | (u64)(a) << 16) +#define NIX_AF_MDQX_CIR(a) (0x1420 | (u64)(a) << 16) +#define NIX_AF_MDQX_PIR(a) (0x1430 | (u64)(a) << 16) +#define NIX_AF_MDQX_PARENT(a) (0x1480 | (u64)(a) << 16) +#define NIX_AF_TL3_TL2X_LINKX_CFG(a, b) (0x1700 | (u64)(a) << 16 | (b) << 3) /* LMT LF registers */ #define LMT_LFBASE BIT_ULL(RVU_FUNC_BLKADDR_SHIFT) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c index 929b4eac25d9..3eb85949677a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c @@ -513,7 +513,7 @@ process_cqe: static void otx2_adjust_adaptive_coalese(struct otx2_nic *pfvf, struct otx2_cq_poll *cq_poll) { - struct dim_sample dim_sample; + struct dim_sample dim_sample = { 0 }; u64 rx_frames, rx_bytes; u64 tx_frames, tx_bytes; diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c index edac008099c0..0f844c14485a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c @@ -153,7 +153,6 @@ static void __otx2_qos_txschq_cfg(struct otx2_nic *pfvf, num_regs++; otx2_config_sched_shaping(pfvf, node, cfg, &num_regs); - } else if (level == NIX_TXSCH_LVL_TL4) { otx2_config_sched_shaping(pfvf, node, cfg, &num_regs); } else if (level == NIX_TXSCH_LVL_TL3) { @@ -176,7 +175,7 @@ static void __otx2_qos_txschq_cfg(struct otx2_nic *pfvf, /* check if node is root */ if (node->qid == OTX2_QOS_QID_INNER && !node->parent) { cfg->reg[num_regs] = NIX_AF_TL2X_SCHEDULE(node->schq); - cfg->regval[num_regs] = TXSCH_TL1_DFLT_RR_PRIO << 24 | + cfg->regval[num_regs] = (u64)hw->txschq_aggr_lvl_rr_prio << 24 | mtu_to_dwrr_weight(pfvf, pfvf->tx_max_pktlen); num_regs++; diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index bf66d996e32e..c0ced4d315f3 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1594,18 +1594,25 @@ static int mlxsw_pci_sys_ready_wait(struct mlxsw_pci *mlxsw_pci, return -EBUSY; } -static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci) +static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci, + bool pci_reset_sbr_supported) { struct pci_dev *pdev = mlxsw_pci->pdev; char mrsr_pl[MLXSW_REG_MRSR_LEN]; int err; + if (!pci_reset_sbr_supported) { + pci_dbg(pdev, "Performing PCI hot reset instead of \"all reset\"\n"); + goto sbr; + } + mlxsw_reg_mrsr_pack(mrsr_pl, MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE); err = mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl); if (err) return err; +sbr: device_lock_assert(&pdev->dev); pci_cfg_access_lock(pdev); @@ -1633,6 +1640,7 @@ static int mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) { struct pci_dev *pdev = mlxsw_pci->pdev; + bool pci_reset_sbr_supported = false; char mcam_pl[MLXSW_REG_MCAM_LEN]; bool pci_reset_supported = false; u32 sys_status; @@ -1652,13 +1660,17 @@ mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) mlxsw_reg_mcam_pack(mcam_pl, MLXSW_REG_MCAM_FEATURE_GROUP_ENHANCED_FEATURES); err = mlxsw_reg_query(mlxsw_pci->core, MLXSW_REG(mcam), mcam_pl); - if (!err) + if (!err) { mlxsw_reg_mcam_unpack(mcam_pl, MLXSW_REG_MCAM_PCI_RESET, &pci_reset_supported); + mlxsw_reg_mcam_unpack(mcam_pl, MLXSW_REG_MCAM_PCI_RESET_SBR, + &pci_reset_sbr_supported); + } if (pci_reset_supported) { pci_dbg(pdev, "Starting PCI reset flow\n"); - err = mlxsw_pci_reset_at_pci_disable(mlxsw_pci); + err = mlxsw_pci_reset_at_pci_disable(mlxsw_pci, + pci_reset_sbr_supported); } else { pci_dbg(pdev, "Starting software reset flow\n"); err = mlxsw_pci_reset_sw(mlxsw_pci); diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 8adf86a6f5cc..3bb89045eaf5 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -10671,6 +10671,8 @@ enum mlxsw_reg_mcam_mng_feature_cap_mask_bits { MLXSW_REG_MCAM_MCIA_128B = 34, /* If set, MRSR.command=6 is supported. */ MLXSW_REG_MCAM_PCI_RESET = 48, + /* If set, MRSR.command=6 is supported with Secondary Bus Reset. */ + MLXSW_REG_MCAM_PCI_RESET_SBR = 67, }; #define MLXSW_REG_BYTES_PER_DWORD 0x4 diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c index c9f1c79f3f9d..ba090262e27e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c @@ -1607,8 +1607,8 @@ static void mlxsw_sp_sb_sr_occ_query_cb(struct mlxsw_core *mlxsw_core, int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core, unsigned int sb_index) { + u16 local_port, local_port_1, first_local_port, last_local_port; struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core); - u16 local_port, local_port_1, last_local_port; struct mlxsw_sp_sb_sr_occ_query_cb_ctx cb_ctx; u8 masked_count, current_page = 0; unsigned long cb_priv = 0; @@ -1628,6 +1628,7 @@ next_batch: masked_count = 0; mlxsw_reg_sbsr_pack(sbsr_pl, false); mlxsw_reg_sbsr_port_page_set(sbsr_pl, current_page); + first_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE; last_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE + MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE - 1; @@ -1645,9 +1646,12 @@ next_batch: if (local_port != MLXSW_PORT_CPU_PORT) { /* Ingress quotas are not supported for the CPU port */ mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, - local_port, 1); + local_port - first_local_port, + 1); } - mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1); + mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, + local_port - first_local_port, + 1); for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) { err = mlxsw_sp_sb_pm_occ_query(mlxsw_sp, local_port, i, &bulk_list); @@ -1684,7 +1688,7 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core, unsigned int sb_index) { struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core); - u16 local_port, last_local_port; + u16 local_port, first_local_port, last_local_port; LIST_HEAD(bulk_list); unsigned int masked_count; u8 current_page = 0; @@ -1702,6 +1706,7 @@ next_batch: masked_count = 0; mlxsw_reg_sbsr_pack(sbsr_pl, true); mlxsw_reg_sbsr_port_page_set(sbsr_pl, current_page); + first_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE; last_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE + MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE - 1; @@ -1719,9 +1724,12 @@ next_batch: if (local_port != MLXSW_PORT_CPU_PORT) { /* Ingress quotas are not supported for the CPU port */ mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, - local_port, 1); + local_port - first_local_port, + 1); } - mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1); + mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, + local_port - first_local_port, + 1); for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) { err = mlxsw_sp_sb_pm_occ_clear(mlxsw_sp, local_port, i, &bulk_list); diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index d087cf954f75..608ad31a9702 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -2798,6 +2798,8 @@ static int add_adev(struct gdma_dev *gd) if (ret) goto init_fail; + /* madev is owned by the auxiliary device */ + madev = NULL; ret = auxiliary_device_add(adev); if (ret) goto add_fail; diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.h b/drivers/net/ethernet/pensando/ionic/ionic_dev.h index f30eee4a5a80..b6c01a88098d 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.h +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.h @@ -375,7 +375,9 @@ typedef void (*ionic_cq_done_cb)(void *done_arg); unsigned int ionic_cq_service(struct ionic_cq *cq, unsigned int work_to_do, ionic_cq_cb cb, ionic_cq_done_cb done_cb, void *done_arg); -unsigned int ionic_tx_cq_service(struct ionic_cq *cq, unsigned int work_to_do); +unsigned int ionic_tx_cq_service(struct ionic_cq *cq, + unsigned int work_to_do, + bool in_napi); int ionic_q_init(struct ionic_lif *lif, struct ionic_dev *idev, struct ionic_queue *q, unsigned int index, const char *name, diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index 1934e9d6d9e4..1837a30ba08a 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -1189,7 +1189,7 @@ static int ionic_adminq_napi(struct napi_struct *napi, int budget) ionic_rx_service, NULL, NULL); if (lif->hwstamp_txq) - tx_work = ionic_tx_cq_service(&lif->hwstamp_txq->cq, budget); + tx_work = ionic_tx_cq_service(&lif->hwstamp_txq->cq, budget, !!budget); work_done = max(max(n_work, a_work), max(rx_work, tx_work)); if (work_done < budget && napi_complete_done(napi, work_done)) { diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c index 2427610f4306..9fdd7cd3ef19 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c @@ -23,7 +23,8 @@ static void ionic_tx_desc_unmap_bufs(struct ionic_queue *q, static void ionic_tx_clean(struct ionic_queue *q, struct ionic_tx_desc_info *desc_info, - struct ionic_txq_comp *comp); + struct ionic_txq_comp *comp, + bool in_napi); static inline void ionic_txq_post(struct ionic_queue *q, bool ring_dbell) { @@ -480,6 +481,20 @@ int ionic_xdp_xmit(struct net_device *netdev, int n, return nxmit; } +static void ionic_xdp_rx_put_bufs(struct ionic_queue *q, + struct ionic_buf_info *buf_info, + int nbufs) +{ + int i; + + for (i = 0; i < nbufs; i++) { + dma_unmap_page(q->dev, buf_info->dma_addr, + IONIC_PAGE_SIZE, DMA_FROM_DEVICE); + buf_info->page = NULL; + buf_info++; + } +} + static bool ionic_run_xdp(struct ionic_rx_stats *stats, struct net_device *netdev, struct bpf_prog *xdp_prog, @@ -493,6 +508,7 @@ static bool ionic_run_xdp(struct ionic_rx_stats *stats, struct netdev_queue *nq; struct xdp_frame *xdpf; int remain_len; + int nbufs = 1; int frag_len; int err = 0; @@ -542,6 +558,7 @@ static bool ionic_run_xdp(struct ionic_rx_stats *stats, if (page_is_pfmemalloc(bi->page)) xdp_buff_set_frag_pfmemalloc(&xdp_buf); } while (remain_len > 0); + nbufs += sinfo->nr_frags; } xdp_action = bpf_prog_run_xdp(xdp_prog, &xdp_buf); @@ -574,9 +591,6 @@ static bool ionic_run_xdp(struct ionic_rx_stats *stats, goto out_xdp_abort; } - dma_unmap_page(rxq->dev, buf_info->dma_addr, - IONIC_PAGE_SIZE, DMA_FROM_DEVICE); - err = ionic_xdp_post_frame(txq, xdpf, XDP_TX, buf_info->page, buf_info->page_offset, @@ -586,23 +600,19 @@ static bool ionic_run_xdp(struct ionic_rx_stats *stats, netdev_dbg(netdev, "tx ionic_xdp_post_frame err %d\n", err); goto out_xdp_abort; } - buf_info->page = NULL; + ionic_xdp_rx_put_bufs(rxq, buf_info, nbufs); stats->xdp_tx++; /* the Tx completion will free the buffers */ break; case XDP_REDIRECT: - /* unmap the pages before handing them to a different device */ - dma_unmap_page(rxq->dev, buf_info->dma_addr, - IONIC_PAGE_SIZE, DMA_FROM_DEVICE); - err = xdp_do_redirect(netdev, &xdp_buf, xdp_prog); if (err) { netdev_dbg(netdev, "xdp_do_redirect err %d\n", err); goto out_xdp_abort; } - buf_info->page = NULL; + ionic_xdp_rx_put_bufs(rxq, buf_info, nbufs); rxq->xdp_flush = true; stats->xdp_redirect++; break; @@ -935,7 +945,7 @@ int ionic_tx_napi(struct napi_struct *napi, int budget) u32 work_done = 0; u32 flags = 0; - work_done = ionic_tx_cq_service(cq, budget); + work_done = ionic_tx_cq_service(cq, budget, !!budget); if (unlikely(!budget)) return budget; @@ -1019,7 +1029,7 @@ int ionic_txrx_napi(struct napi_struct *napi, int budget) txqcq = lif->txqcqs[qi]; txcq = &lif->txqcqs[qi]->cq; - tx_work_done = ionic_tx_cq_service(txcq, IONIC_TX_BUDGET_DEFAULT); + tx_work_done = ionic_tx_cq_service(txcq, IONIC_TX_BUDGET_DEFAULT, !!budget); if (unlikely(!budget)) return budget; @@ -1152,7 +1162,8 @@ static void ionic_tx_desc_unmap_bufs(struct ionic_queue *q, static void ionic_tx_clean(struct ionic_queue *q, struct ionic_tx_desc_info *desc_info, - struct ionic_txq_comp *comp) + struct ionic_txq_comp *comp, + bool in_napi) { struct ionic_tx_stats *stats = q_to_tx_stats(q); struct ionic_qcq *qcq = q_to_qcq(q); @@ -1204,11 +1215,13 @@ static void ionic_tx_clean(struct ionic_queue *q, desc_info->bytes = skb->len; stats->clean++; - napi_consume_skb(skb, 1); + napi_consume_skb(skb, likely(in_napi) ? 1 : 0); } static bool ionic_tx_service(struct ionic_cq *cq, - unsigned int *total_pkts, unsigned int *total_bytes) + unsigned int *total_pkts, + unsigned int *total_bytes, + bool in_napi) { struct ionic_tx_desc_info *desc_info; struct ionic_queue *q = cq->bound_q; @@ -1230,7 +1243,7 @@ static bool ionic_tx_service(struct ionic_cq *cq, desc_info->bytes = 0; index = q->tail_idx; q->tail_idx = (q->tail_idx + 1) & (q->num_descs - 1); - ionic_tx_clean(q, desc_info, comp); + ionic_tx_clean(q, desc_info, comp, in_napi); if (desc_info->skb) { pkts++; bytes += desc_info->bytes; @@ -1244,7 +1257,9 @@ static bool ionic_tx_service(struct ionic_cq *cq, return true; } -unsigned int ionic_tx_cq_service(struct ionic_cq *cq, unsigned int work_to_do) +unsigned int ionic_tx_cq_service(struct ionic_cq *cq, + unsigned int work_to_do, + bool in_napi) { unsigned int work_done = 0; unsigned int bytes = 0; @@ -1253,7 +1268,7 @@ unsigned int ionic_tx_cq_service(struct ionic_cq *cq, unsigned int work_to_do) if (work_to_do == 0) return 0; - while (ionic_tx_service(cq, &pkts, &bytes)) { + while (ionic_tx_service(cq, &pkts, &bytes, in_napi)) { if (cq->tail_idx == cq->num_descs - 1) cq->done_color = !cq->done_color; cq->tail_idx = (cq->tail_idx + 1) & (cq->num_descs - 1); @@ -1279,7 +1294,7 @@ void ionic_tx_flush(struct ionic_cq *cq) { u32 work_done; - work_done = ionic_tx_cq_service(cq, cq->num_descs); + work_done = ionic_tx_cq_service(cq, cq->num_descs, false); if (work_done) ionic_intr_credits(cq->idev->intr_ctrl, cq->bound_intr->index, work_done, IONIC_INTR_CRED_RESET_COALESCE); @@ -1296,7 +1311,7 @@ void ionic_tx_empty(struct ionic_queue *q) desc_info = &q->tx_info[q->tail_idx]; desc_info->bytes = 0; q->tail_idx = (q->tail_idx + 1) & (q->num_descs - 1); - ionic_tx_clean(q, desc_info, NULL); + ionic_tx_clean(q, desc_info, NULL, false); if (desc_info->skb) { pkts++; bytes += desc_info->bytes; diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 5aada7cf3da7..ebafedde0ab7 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -5607,6 +5607,7 @@ static struct mdio_device_id __maybe_unused micrel_tbl[] = { { PHY_ID_KSZ8081, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ8873MLL, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ886X, MICREL_PHY_ID_MASK }, + { PHY_ID_KSZ9477, MICREL_PHY_ID_MASK }, { PHY_ID_LAN8814, MICREL_PHY_ID_MASK }, { PHY_ID_LAN8804, MICREL_PHY_ID_MASK }, { PHY_ID_LAN8841, MICREL_PHY_ID_MASK }, diff --git a/drivers/net/pse-pd/Kconfig b/drivers/net/pse-pd/Kconfig index 577ea904b3d9..7fab916a7f46 100644 --- a/drivers/net/pse-pd/Kconfig +++ b/drivers/net/pse-pd/Kconfig @@ -23,6 +23,7 @@ config PSE_REGULATOR config PSE_PD692X0 tristate "PD692X0 PSE controller" depends on I2C + select FW_LOADER select FW_UPLOAD help This module provides support for PD692x0 regulator based Ethernet diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index c2fb736f78b2..b034ef8a73ea 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -326,7 +326,8 @@ static void ax88179_status(struct usbnet *dev, struct urb *urb) if (netif_carrier_ok(dev->net) != link) { usbnet_link_change(dev, link, 1); - netdev_info(dev->net, "ax88179 - Link status is: %d\n", link); + if (!link) + netdev_info(dev->net, "ax88179 - Link status is: 0\n"); } } @@ -1542,6 +1543,7 @@ static int ax88179_link_reset(struct usbnet *dev) GMII_PHY_PHYSR, 2, &tmp16); if (!(tmp16 & GMII_PHY_PHYSR_LINK)) { + netdev_info(dev->net, "ax88179 - Link status is: 0\n"); return 0; } else if (GMII_PHY_PHYSR_GIGA == (tmp16 & GMII_PHY_PHYSR_SMASK)) { mode |= AX_MEDIUM_GIGAMODE | AX_MEDIUM_EN_125MHZ; @@ -1579,6 +1581,8 @@ static int ax88179_link_reset(struct usbnet *dev) netif_carrier_on(dev->net); + netdev_info(dev->net, "ax88179 - Link status is: 1\n"); + return 0; } diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 663e46348ce3..386d62769ded 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1372,6 +1372,8 @@ static const struct usb_device_id products[] = { {QMI_QUIRK_SET_DTR(0x1bc7, 0x1260, 2)}, /* Telit LE910Cx */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1261, 2)}, /* Telit LE910Cx */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1900, 1)}, /* Telit LN940 series */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x3000, 0)}, /* Telit FN912 series */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x3001, 0)}, /* Telit FN912 series */ {QMI_FIXED_INTF(0x1c9e, 0x9801, 3)}, /* Telewell TW-3G HSPA+ */ {QMI_FIXED_INTF(0x1c9e, 0x9803, 4)}, /* Telewell TW-3G HSPA+ */ {QMI_FIXED_INTF(0x1c9e, 0x9b01, 3)}, /* XS Stick W100-2 from 4G Systems */ diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 567cb3faab70..ba59e92ab941 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -2339,7 +2339,7 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct ip_tunnel_key *pkey; struct ip_tunnel_key key; struct vxlan_dev *vxlan = netdev_priv(dev); - const struct iphdr *old_iph = ip_hdr(skb); + const struct iphdr *old_iph; struct vxlan_metadata _md; struct vxlan_metadata *md = &_md; unsigned int pkt_len = skb->len; @@ -2353,8 +2353,15 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, bool use_cache; bool udp_sum = false; bool xnet = !net_eq(vxlan->net, dev_net(vxlan->dev)); + bool no_eth_encap; __be32 vni = 0; + no_eth_encap = flags & VXLAN_F_GPE && skb->protocol != htons(ETH_P_TEB); + if (!skb_vlan_inet_prepare(skb, no_eth_encap)) + goto drop; + + old_iph = ip_hdr(skb); + info = skb_tunnel_info(skb); use_cache = ip_tunnel_dst_cache_usable(skb, info); diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 7d6b1254c92d..c0deaafebfdc 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -263,7 +263,7 @@ struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, struct sock *inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req, struct sock *child); -void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, +bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, unsigned long timeout); struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child, struct request_sock *req, diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 2796153b03da..188d41da1a40 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -619,6 +619,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; } +static inline enum nft_data_types nft_set_datatype(const struct nft_set *set) +{ + return set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE; +} + static inline bool nft_set_gc_is_pending(const struct nft_set *s) { return refcount_read(&s->refs) != 1; diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h index f1b5e816e7e5..ff33f41a9db7 100644 --- a/include/trace/events/qdisc.h +++ b/include/trace/events/qdisc.h @@ -81,7 +81,7 @@ TRACE_EVENT(qdisc_reset, TP_ARGS(q), TP_STRUCT__entry( - __string( dev, qdisc_dev(q)->name ) + __string( dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)" ) __string( kind, q->ops->id ) __field( u32, parent ) __field( u32, handle ) diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 583ee4fe48ef..e52b3ad231b9 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -212,6 +212,7 @@ static u64 arena_map_mem_usage(const struct bpf_map *map) struct vma_list { struct vm_area_struct *vma; struct list_head head; + atomic_t mmap_count; }; static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma) @@ -221,20 +222,30 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma) vml = kmalloc(sizeof(*vml), GFP_KERNEL); if (!vml) return -ENOMEM; + atomic_set(&vml->mmap_count, 1); vma->vm_private_data = vml; vml->vma = vma; list_add(&vml->head, &arena->vma_list); return 0; } +static void arena_vm_open(struct vm_area_struct *vma) +{ + struct vma_list *vml = vma->vm_private_data; + + atomic_inc(&vml->mmap_count); +} + static void arena_vm_close(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; struct bpf_arena *arena = container_of(map, struct bpf_arena, map); - struct vma_list *vml; + struct vma_list *vml = vma->vm_private_data; + if (!atomic_dec_and_test(&vml->mmap_count)) + return; guard(mutex)(&arena->lock); - vml = vma->vm_private_data; + /* update link list under lock */ list_del(&vml->head); vma->vm_private_data = NULL; kfree(vml); @@ -287,6 +298,7 @@ out: } static const struct vm_operations_struct arena_vm_ops = { + .open = arena_vm_open, .close = arena_vm_close, .fault = arena_vm_fault, }; diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 0ee653a936ea..e20b90c36131 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -51,7 +51,8 @@ struct bpf_ringbuf { * This prevents a user-space application from modifying the * position and ruining in-kernel tracking. The permissions of the * pages depend on who is producing samples: user-space or the - * kernel. + * kernel. Note that the pending counter is placed in the same + * page as the producer, so that it shares the same cache line. * * Kernel-producer * --------------- @@ -70,6 +71,7 @@ struct bpf_ringbuf { */ unsigned long consumer_pos __aligned(PAGE_SIZE); unsigned long producer_pos __aligned(PAGE_SIZE); + unsigned long pending_pos; char data[] __aligned(PAGE_SIZE); }; @@ -179,6 +181,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) rb->mask = data_sz - 1; rb->consumer_pos = 0; rb->producer_pos = 0; + rb->pending_pos = 0; return rb; } @@ -404,9 +407,9 @@ bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr) static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) { - unsigned long cons_pos, prod_pos, new_prod_pos, flags; - u32 len, pg_off; + unsigned long cons_pos, prod_pos, new_prod_pos, pend_pos, flags; struct bpf_ringbuf_hdr *hdr; + u32 len, pg_off, tmp_size, hdr_len; if (unlikely(size > RINGBUF_MAX_RECORD_SZ)) return NULL; @@ -424,13 +427,29 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) spin_lock_irqsave(&rb->spinlock, flags); } + pend_pos = rb->pending_pos; prod_pos = rb->producer_pos; new_prod_pos = prod_pos + len; - /* check for out of ringbuf space by ensuring producer position - * doesn't advance more than (ringbuf_size - 1) ahead + while (pend_pos < prod_pos) { + hdr = (void *)rb->data + (pend_pos & rb->mask); + hdr_len = READ_ONCE(hdr->len); + if (hdr_len & BPF_RINGBUF_BUSY_BIT) + break; + tmp_size = hdr_len & ~BPF_RINGBUF_DISCARD_BIT; + tmp_size = round_up(tmp_size + BPF_RINGBUF_HDR_SZ, 8); + pend_pos += tmp_size; + } + rb->pending_pos = pend_pos; + + /* check for out of ringbuf space: + * - by ensuring producer position doesn't advance more than + * (ringbuf_size - 1) ahead + * - by ensuring oldest not yet committed record until newest + * record does not span more than (ringbuf_size - 1) */ - if (new_prod_pos - cons_pos > rb->mask) { + if (new_prod_pos - cons_pos > rb->mask || + new_prod_pos - pend_pos > rb->mask) { spin_unlock_irqrestore(&rb->spinlock, flags); return NULL; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 010cfee7ffe9..214a9fa8c6fb 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6236,6 +6236,7 @@ static void set_sext32_default_val(struct bpf_reg_state *reg, int size) } reg->u32_min_value = 0; reg->u32_max_value = U32_MAX; + reg->var_off = tnum_subreg(tnum_unknown); } static void coerce_subreg_to_size_sx(struct bpf_reg_state *reg, int size) @@ -6280,6 +6281,7 @@ static void coerce_subreg_to_size_sx(struct bpf_reg_state *reg, int size) reg->s32_max_value = s32_max; reg->u32_min_value = (u32)s32_min; reg->u32_max_value = (u32)s32_max; + reg->var_off = tnum_subreg(tnum_range(s32_min, s32_max)); return; } @@ -12719,6 +12721,16 @@ static bool signed_add32_overflows(s32 a, s32 b) return res < a; } +static bool signed_add16_overflows(s16 a, s16 b) +{ + /* Do the add in u16, where overflow is well-defined */ + s16 res = (s16)((u16)a + (u16)b); + + if (b < 0) + return res > a; + return res < a; +} + static bool signed_sub_overflows(s64 a, s64 b) { /* Do the sub in u64, where overflow is well-defined */ @@ -17448,11 +17460,11 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) goto skip_inf_loop_check; } if (is_may_goto_insn_at(env, insn_idx)) { - if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) { + if (sl->state.may_goto_depth != cur->may_goto_depth && + states_equal(env, &sl->state, cur, RANGE_WITHIN)) { update_loop_entry(cur, &sl->state); goto hit; } - goto skip_inf_loop_check; } if (calls_callback(env, insn_idx)) { if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) @@ -18730,6 +18742,39 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of return new_prog; } +/* + * For all jmp insns in a given 'prog' that point to 'tgt_idx' insn adjust the + * jump offset by 'delta'. + */ +static int adjust_jmp_off(struct bpf_prog *prog, u32 tgt_idx, u32 delta) +{ + struct bpf_insn *insn = prog->insnsi; + u32 insn_cnt = prog->len, i; + + for (i = 0; i < insn_cnt; i++, insn++) { + u8 code = insn->code; + + if ((BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32) || + BPF_OP(code) == BPF_CALL || BPF_OP(code) == BPF_EXIT) + continue; + + if (insn->code == (BPF_JMP32 | BPF_JA)) { + if (i + 1 + insn->imm != tgt_idx) + continue; + if (signed_add32_overflows(insn->imm, delta)) + return -ERANGE; + insn->imm += delta; + } else { + if (i + 1 + insn->off != tgt_idx) + continue; + if (signed_add16_overflows(insn->imm, delta)) + return -ERANGE; + insn->off += delta; + } + } + return 0; +} + static int adjust_subprog_starts_after_remove(struct bpf_verifier_env *env, u32 off, u32 cnt) { @@ -20004,7 +20049,10 @@ static int do_misc_fixups(struct bpf_verifier_env *env) stack_depth_extra = 8; insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_AX, BPF_REG_10, stack_off); - insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off + 2); + if (insn->off >= 0) + insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off + 2); + else + insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off - 1); insn_buf[2] = BPF_ALU64_IMM(BPF_SUB, BPF_REG_AX, 1); insn_buf[3] = BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_AX, stack_off); cnt = 4; @@ -20546,6 +20594,13 @@ next_insn: if (!new_prog) return -ENOMEM; env->prog = prog = new_prog; + /* + * If may_goto is a first insn of a prog there could be a jmp + * insn that points to it, hence adjust all such jmps to point + * to insn after BPF_ST that inits may_goto count. + * Adjustment will succeed because bpf_patch_insn_data() didn't fail. + */ + WARN_ON(adjust_jmp_off(env->prog, subprog_start, 1)); } /* Since poke tab is now finalized, publish aux to tracker. */ diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index ac74f6ead62d..8f6dd2c6ee41 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -12,6 +12,7 @@ #include <linux/errno.h> #include <linux/etherdevice.h> #include <linux/gfp.h> +#include <linux/if_vlan.h> #include <linux/jiffies.h> #include <linux/kref.h> #include <linux/list.h> @@ -132,6 +133,29 @@ batadv_orig_node_vlan_get(struct batadv_orig_node *orig_node, } /** + * batadv_vlan_id_valid() - check if vlan id is in valid batman-adv encoding + * @vid: the VLAN identifier + * + * Return: true when either no vlan is set or if VLAN is in correct range, + * false otherwise + */ +static bool batadv_vlan_id_valid(unsigned short vid) +{ + unsigned short non_vlan = vid & ~(BATADV_VLAN_HAS_TAG | VLAN_VID_MASK); + + if (vid == 0) + return true; + + if (!(vid & BATADV_VLAN_HAS_TAG)) + return false; + + if (non_vlan) + return false; + + return true; +} + +/** * batadv_orig_node_vlan_new() - search and possibly create an orig_node_vlan * object * @orig_node: the originator serving the VLAN @@ -149,6 +173,9 @@ batadv_orig_node_vlan_new(struct batadv_orig_node *orig_node, { struct batadv_orig_node_vlan *vlan; + if (!batadv_vlan_id_valid(vid)) + return NULL; + spin_lock_bh(&orig_node->vlan_list_lock); /* first look if an object for this vid already exists */ diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b21ff3c36b07..2243cec18ecc 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -209,6 +209,20 @@ batadv_tt_global_hash_find(struct batadv_priv *bat_priv, const u8 *addr, } /** + * batadv_tt_local_entry_free_rcu() - free the tt_local_entry + * @rcu: rcu pointer of the tt_local_entry + */ +static void batadv_tt_local_entry_free_rcu(struct rcu_head *rcu) +{ + struct batadv_tt_local_entry *tt_local_entry; + + tt_local_entry = container_of(rcu, struct batadv_tt_local_entry, + common.rcu); + + kmem_cache_free(batadv_tl_cache, tt_local_entry); +} + +/** * batadv_tt_local_entry_release() - release tt_local_entry from lists and queue * for free after rcu grace period * @ref: kref pointer of the nc_node @@ -222,7 +236,7 @@ static void batadv_tt_local_entry_release(struct kref *ref) batadv_softif_vlan_put(tt_local_entry->vlan); - kfree_rcu(tt_local_entry, common.rcu); + call_rcu(&tt_local_entry->common.rcu, batadv_tt_local_entry_free_rcu); } /** @@ -241,6 +255,20 @@ batadv_tt_local_entry_put(struct batadv_tt_local_entry *tt_local_entry) } /** + * batadv_tt_global_entry_free_rcu() - free the tt_global_entry + * @rcu: rcu pointer of the tt_global_entry + */ +static void batadv_tt_global_entry_free_rcu(struct rcu_head *rcu) +{ + struct batadv_tt_global_entry *tt_global_entry; + + tt_global_entry = container_of(rcu, struct batadv_tt_global_entry, + common.rcu); + + kmem_cache_free(batadv_tg_cache, tt_global_entry); +} + +/** * batadv_tt_global_entry_release() - release tt_global_entry from lists and * queue for free after rcu grace period * @ref: kref pointer of the nc_node @@ -254,7 +282,7 @@ void batadv_tt_global_entry_release(struct kref *ref) batadv_tt_global_del_orig_list(tt_global_entry); - kfree_rcu(tt_global_entry, common.rcu); + call_rcu(&tt_global_entry->common.rcu, batadv_tt_global_entry_free_rcu); } /** @@ -380,6 +408,19 @@ static void batadv_tt_global_size_dec(struct batadv_orig_node *orig_node, } /** + * batadv_tt_orig_list_entry_free_rcu() - free the orig_entry + * @rcu: rcu pointer of the orig_entry + */ +static void batadv_tt_orig_list_entry_free_rcu(struct rcu_head *rcu) +{ + struct batadv_tt_orig_list_entry *orig_entry; + + orig_entry = container_of(rcu, struct batadv_tt_orig_list_entry, rcu); + + kmem_cache_free(batadv_tt_orig_cache, orig_entry); +} + +/** * batadv_tt_orig_list_entry_release() - release tt orig entry from lists and * queue for free after rcu grace period * @ref: kref pointer of the tt orig entry @@ -392,7 +433,7 @@ static void batadv_tt_orig_list_entry_release(struct kref *ref) refcount); batadv_orig_node_put(orig_entry->orig_node); - kfree_rcu(orig_entry, rcu); + call_rcu(&orig_entry->rcu, batadv_tt_orig_list_entry_free_rcu); } /** diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index a6fb89fa6278..7e8a20f2fc42 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -30,10 +30,6 @@ MODULE_ALIAS("can-proto-" __stringify(CAN_J1939)); /* CAN_HDR: #bytes before can_frame data part */ #define J1939_CAN_HDR (offsetof(struct can_frame, data)) -/* CAN_FTR: #bytes beyond data part */ -#define J1939_CAN_FTR (sizeof(struct can_frame) - J1939_CAN_HDR - \ - sizeof(((struct can_frame *)0)->data)) - /* lowest layer */ static void j1939_can_recv(struct sk_buff *iskb, void *data) { @@ -342,7 +338,7 @@ int j1939_send_one(struct j1939_priv *priv, struct sk_buff *skb) memset(cf, 0, J1939_CAN_HDR); /* make it a full can frame again */ - skb_put(skb, J1939_CAN_FTR + (8 - dlc)); + skb_put_zero(skb, 8 - dlc); canid = CAN_EFF_FLAG | (skcb->priority << 26) | diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index fe3df23a2595..4be73de5033c 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1593,8 +1593,8 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv, struct j1939_sk_buff_cb skcb = *j1939_skb_to_cb(skb); struct j1939_session *session; const u8 *dat; + int len, ret; pgn_t pgn; - int len; netdev_dbg(priv->ndev, "%s\n", __func__); @@ -1653,7 +1653,22 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv, session->tskey = priv->rx_tskey++; j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_RTS); - WARN_ON_ONCE(j1939_session_activate(session)); + ret = j1939_session_activate(session); + if (ret) { + /* Entering this scope indicates an issue with the J1939 bus. + * Possible scenarios include: + * - A time lapse occurred, and a new session was initiated + * due to another packet being sent correctly. This could + * have been caused by too long interrupt, debugger, or being + * out-scheduled by another task. + * - The bus is receiving numerous erroneous packets, either + * from a malfunctioning device or during a test scenario. + */ + netdev_alert(priv->ndev, "%s: 0x%p: concurrent session with same addr (%02x %02x) is already active.\n", + __func__, session, skcb.addr.sa, skcb.addr.da); + j1939_session_put(session); + return NULL; + } return session; } @@ -1681,6 +1696,8 @@ static int j1939_xtp_rx_rts_session_active(struct j1939_session *session, j1939_session_timers_cancel(session); j1939_session_cancel(session, J1939_XTP_ABORT_BUSY); + if (session->transmission) + j1939_session_deactivate_activate_next(session); return -EBUSY; } diff --git a/net/core/dev.c b/net/core/dev.c index 4d4de9008f6f..2b4819b610b8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1226,9 +1226,9 @@ int dev_change_name(struct net_device *dev, const char *newname) memcpy(oldname, dev->name, IFNAMSIZ); - write_seqlock(&netdev_rename_lock); + write_seqlock_bh(&netdev_rename_lock); err = dev_get_valid_name(net, dev, newname); - write_sequnlock(&netdev_rename_lock); + write_sequnlock_bh(&netdev_rename_lock); if (err < 0) { up_write(&devnet_rename_sem); @@ -1269,9 +1269,9 @@ rollback: if (err >= 0) { err = ret; down_write(&devnet_rename_sem); - write_seqlock(&netdev_rename_lock); + write_seqlock_bh(&netdev_rename_lock); memcpy(dev->name, oldname, IFNAMSIZ); - write_sequnlock(&netdev_rename_lock); + write_sequnlock_bh(&netdev_rename_lock); memcpy(oldname, newname, IFNAMSIZ); WRITE_ONCE(dev->name_assign_type, old_assign_type); old_assign_type = NET_NAME_RENAMED; @@ -11419,9 +11419,9 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net, if (new_name[0]) { /* Rename the netdev to prepared name */ - write_seqlock(&netdev_rename_lock); + write_seqlock_bh(&netdev_rename_lock); strscpy(dev->name, new_name, IFNAMSIZ); - write_sequnlock(&netdev_rename_lock); + write_sequnlock_bh(&netdev_rename_lock); } /* Fixup kobjects */ diff --git a/net/core/xdp.c b/net/core/xdp.c index 41693154e426..022c12059cf2 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -295,10 +295,8 @@ static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem, mutex_lock(&mem_id_lock); ret = __mem_id_init_hash_table(); mutex_unlock(&mem_id_lock); - if (ret < 0) { - WARN_ON(1); + if (ret < 0) return ERR_PTR(ret); - } } xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index ff41bd6f99c3..5926159a6f20 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -657,8 +657,11 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb) if (dccp_v4_send_response(sk, req)) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT); - reqsk_put(req); + if (unlikely(!inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT))) + reqsk_free(req); + else + reqsk_put(req); + return 0; drop_and_free: diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 85f4b8fdbe5e..da5dba120bc9 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -400,8 +400,11 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb) if (dccp_v6_send_response(sk, req)) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT); - reqsk_put(req); + if (unlikely(!inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT))) + reqsk_free(req); + else + reqsk_put(req); + return 0; drop_and_free: diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d81f74ce0f02..d4f0eff8b20f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1122,25 +1122,34 @@ drop: inet_csk_reqsk_queue_drop_and_put(oreq->rsk_listener, oreq); } -static void reqsk_queue_hash_req(struct request_sock *req, +static bool reqsk_queue_hash_req(struct request_sock *req, unsigned long timeout) { + bool found_dup_sk = false; + + if (!inet_ehash_insert(req_to_sk(req), NULL, &found_dup_sk)) + return false; + + /* The timer needs to be setup after a successful insertion. */ timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED); mod_timer(&req->rsk_timer, jiffies + timeout); - inet_ehash_insert(req_to_sk(req), NULL, NULL); /* before letting lookups find us, make sure all req fields * are committed to memory and refcnt initialized. */ smp_wmb(); refcount_set(&req->rsk_refcnt, 2 + 1); + return true; } -void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, +bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, unsigned long timeout) { - reqsk_queue_hash_req(req, timeout); + if (!reqsk_queue_hash_req(req, timeout)) + return false; + inet_csk_reqsk_queue_added(sk); + return true; } EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 01d208e0eef3..2e39cb881e20 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2782,13 +2782,37 @@ static void tcp_mtup_probe_success(struct sock *sk) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS); } +/* Sometimes we deduce that packets have been dropped due to reasons other than + * congestion, like path MTU reductions or failed client TFO attempts. In these + * cases we call this function to retransmit as many packets as cwnd allows, + * without reducing cwnd. Given that retransmits will set retrans_stamp to a + * non-zero value (and may do so in a later calling context due to TSQ), we + * also enter CA_Loss so that we track when all retransmitted packets are ACKed + * and clear retrans_stamp when that happens (to ensure later recurring RTOs + * are using the correct retrans_stamp and don't declare ETIMEDOUT + * prematurely). + */ +static void tcp_non_congestion_loss_retransmit(struct sock *sk) +{ + const struct inet_connection_sock *icsk = inet_csk(sk); + struct tcp_sock *tp = tcp_sk(sk); + + if (icsk->icsk_ca_state != TCP_CA_Loss) { + tp->high_seq = tp->snd_nxt; + tp->snd_ssthresh = tcp_current_ssthresh(sk); + tp->prior_ssthresh = 0; + tp->undo_marker = 0; + tcp_set_ca_state(sk, TCP_CA_Loss); + } + tcp_xmit_retransmit_queue(sk); +} + /* Do a simple retransmit without using the backoff mechanisms in * tcp_timer. This is used for path mtu discovery. * The socket is already locked here. */ void tcp_simple_retransmit(struct sock *sk) { - const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; int mss; @@ -2828,14 +2852,7 @@ void tcp_simple_retransmit(struct sock *sk) * in network, but units changed and effective * cwnd/ssthresh really reduced now. */ - if (icsk->icsk_ca_state != TCP_CA_Loss) { - tp->high_seq = tp->snd_nxt; - tp->snd_ssthresh = tcp_current_ssthresh(sk); - tp->prior_ssthresh = 0; - tp->undo_marker = 0; - tcp_set_ca_state(sk, TCP_CA_Loss); - } - tcp_xmit_retransmit_queue(sk); + tcp_non_congestion_loss_retransmit(sk); } EXPORT_SYMBOL(tcp_simple_retransmit); @@ -6295,8 +6312,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, tp->fastopen_client_fail = TFO_DATA_NOT_ACKED; skb_rbtree_walk_from(data) tcp_mark_skb_lost(sk, data); - tcp_xmit_retransmit_queue(sk); - tp->retrans_stamp = 0; + tcp_non_congestion_loss_retransmit(sk); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENACTIVEFAIL); return true; @@ -7257,7 +7273,12 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, tcp_rsk(req)->tfo_listener = false; if (!want_cookie) { req->timeout = tcp_timeout_init((struct sock *)req); - inet_csk_reqsk_queue_hash_add(sk, req, req->timeout); + if (unlikely(!inet_csk_reqsk_queue_hash_add(sk, req, + req->timeout))) { + reqsk_free(req); + return 0; + } + } af_ops->send_synack(sk, dst, &fl, req, &foc, !want_cookie ? TCP_SYNACK_NORMAL : diff --git a/net/netfilter/nf_hooks_lwtunnel.c b/net/netfilter/nf_hooks_lwtunnel.c index 7cdb59bb4459..d8ebebc9775d 100644 --- a/net/netfilter/nf_hooks_lwtunnel.c +++ b/net/netfilter/nf_hooks_lwtunnel.c @@ -117,4 +117,7 @@ void netfilter_lwtunnel_fini(void) { unregister_pernet_subsys(&nf_lwtunnel_net_ops); } +#else +int __init netfilter_lwtunnel_init(void) { return 0; } +void netfilter_lwtunnel_fini(void) {} #endif /* CONFIG_SYSCTL */ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index be3b4c90d2ed..e8dcf41d360d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5740,8 +5740,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb, if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext), - set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, - set->dlen) < 0) + nft_set_datatype(set), set->dlen) < 0) goto nla_put_failure; if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPRESSIONS) && @@ -11073,6 +11072,9 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, return 0; default: + if (type != NFT_DATA_VALUE) + return -EINVAL; + if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE) return -EINVAL; if (len == 0) @@ -11081,8 +11083,6 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, sizeof_field(struct nft_regs, data)) return -ERANGE; - if (data != NULL && type != NFT_DATA_VALUE) - return -EINVAL; return 0; } } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index b314ca728a29..f3080fa1b226 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -132,7 +132,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx, return -EINVAL; err = nft_parse_register_store(ctx, tb[NFTA_LOOKUP_DREG], - &priv->dreg, NULL, set->dtype, + &priv->dreg, NULL, + nft_set_datatype(set), set->dlen); if (err < 0) return err; diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 2928c142a2dd..3b980bf2770b 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -168,8 +168,13 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct) static void ovs_ct_get_labels(const struct nf_conn *ct, struct ovs_key_ct_labels *labels) { - struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL; + struct nf_conn_labels *cl = NULL; + if (ct) { + if (ct->master && !nf_ct_is_confirmed(ct)) + ct = ct->master; + cl = nf_ct_labels_find(ct); + } if (cl) memcpy(labels, cl->bits, OVS_CT_LABELS_LEN); else diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5e695a9a609c..142f56770b77 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2613,10 +2613,24 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk, { struct unix_sock *u = unix_sk(sk); - if (!unix_skb_len(skb) && !(flags & MSG_PEEK)) { - skb_unlink(skb, &sk->sk_receive_queue); - consume_skb(skb); - skb = NULL; + if (!unix_skb_len(skb)) { + struct sk_buff *unlinked_skb = NULL; + + spin_lock(&sk->sk_receive_queue.lock); + + if (copied && (!u->oob_skb || skb == u->oob_skb)) { + skb = NULL; + } else if (flags & MSG_PEEK) { + skb = skb_peek_next(skb, &sk->sk_receive_queue); + } else { + unlinked_skb = skb; + skb = skb_peek_next(skb, &sk->sk_receive_queue); + __skb_unlink(unlinked_skb, &sk->sk_receive_queue); + } + + spin_unlock(&sk->sk_receive_queue.lock); + + consume_skb(unlinked_skb); } else { struct sk_buff *unlinked_skb = NULL; @@ -3093,12 +3107,23 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) #if IS_ENABLED(CONFIG_AF_UNIX_OOB) case SIOCATMARK: { + struct unix_sock *u = unix_sk(sk); struct sk_buff *skb; int answ = 0; + mutex_lock(&u->iolock); + skb = skb_peek(&sk->sk_receive_queue); - if (skb && skb == READ_ONCE(unix_sk(sk)->oob_skb)) - answ = 1; + if (skb) { + struct sk_buff *oob_skb = READ_ONCE(u->oob_skb); + + if (skb == oob_skb || + (!oob_skb && !unix_skb_len(skb))) + answ = 1; + } + + mutex_unlock(&u->iolock); + err = put_user(answ, (int __user *)arg); } break; diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index e0b3887b3d2d..dd49c1d23a60 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -457,7 +457,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \ LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \ trace_printk.c trace_vprintk.c map_ptr_kern.c \ core_kern.c core_kern_overflow.c test_ringbuf.c \ - test_ringbuf_n.c test_ringbuf_map_key.c + test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c # Generate both light skeleton and libbpf skeleton for these LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \ diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c index 4c6f42dae409..da430df45aa4 100644 --- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c +++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c @@ -12,9 +12,11 @@ #include <sys/sysinfo.h> #include <linux/perf_event.h> #include <linux/ring_buffer.h> + #include "test_ringbuf.lskel.h" #include "test_ringbuf_n.lskel.h" #include "test_ringbuf_map_key.lskel.h" +#include "test_ringbuf_write.lskel.h" #define EDONE 7777 @@ -84,6 +86,58 @@ static void *poll_thread(void *input) return (void *)(long)ring_buffer__poll(ringbuf, timeout); } +static void ringbuf_write_subtest(void) +{ + struct test_ringbuf_write_lskel *skel; + int page_size = getpagesize(); + size_t *mmap_ptr; + int err, rb_fd; + + skel = test_ringbuf_write_lskel__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + skel->maps.ringbuf.max_entries = 0x4000; + + err = test_ringbuf_write_lskel__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + rb_fd = skel->maps.ringbuf.map_fd; + + mmap_ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, rb_fd, 0); + if (!ASSERT_OK_PTR(mmap_ptr, "rw_cons_pos")) + goto cleanup; + *mmap_ptr = 0x3000; + ASSERT_OK(munmap(mmap_ptr, page_size), "unmap_rw"); + + skel->bss->pid = getpid(); + + ringbuf = ring_buffer__new(rb_fd, process_sample, NULL, NULL); + if (!ASSERT_OK_PTR(ringbuf, "ringbuf_new")) + goto cleanup; + + err = test_ringbuf_write_lskel__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto cleanup_ringbuf; + + skel->bss->discarded = 0; + skel->bss->passed = 0; + + /* trigger exactly two samples */ + syscall(__NR_getpgid); + syscall(__NR_getpgid); + + ASSERT_EQ(skel->bss->discarded, 2, "discarded"); + ASSERT_EQ(skel->bss->passed, 0, "passed"); + + test_ringbuf_write_lskel__detach(skel); +cleanup_ringbuf: + ring_buffer__free(ringbuf); +cleanup: + test_ringbuf_write_lskel__destroy(skel); +} + static void ringbuf_subtest(void) { const size_t rec_sz = BPF_RINGBUF_HDR_SZ + sizeof(struct sample); @@ -451,4 +505,6 @@ void test_ringbuf(void) ringbuf_n_subtest(); if (test__start_subtest("ringbuf_map_key")) ringbuf_map_key_subtest(); + if (test__start_subtest("ringbuf_write")) + ringbuf_write_subtest(); } diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf_write.c b/tools/testing/selftests/bpf/progs/test_ringbuf_write.c new file mode 100644 index 000000000000..350513c0e4c9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_ringbuf_write.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); +} ringbuf SEC(".maps"); + +/* inputs */ +int pid = 0; + +/* outputs */ +long passed = 0; +long discarded = 0; + +SEC("fentry/" SYS_PREFIX "sys_getpgid") +int test_ringbuf_write(void *ctx) +{ + int *foo, cur_pid = bpf_get_current_pid_tgid() >> 32; + void *sample1, *sample2; + + if (cur_pid != pid) + return 0; + + sample1 = bpf_ringbuf_reserve(&ringbuf, 0x3000, 0); + if (!sample1) + return 0; + /* first one can pass */ + sample2 = bpf_ringbuf_reserve(&ringbuf, 0x3000, 0); + if (!sample2) { + bpf_ringbuf_discard(sample1, 0); + __sync_fetch_and_add(&discarded, 1); + return 0; + } + /* second one must not */ + __sync_fetch_and_add(&passed, 1); + foo = sample2 + 4084; + *foo = 256; + bpf_ringbuf_discard(sample1, 0); + bpf_ringbuf_discard(sample2, 0); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c index bd676d7e615f..80c737b6d340 100644 --- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c +++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c @@ -274,6 +274,58 @@ static __naked void iter_limit_bug_cb(void) ); } +int tmp_var; +SEC("socket") +__failure __msg("infinite loop detected at insn 2") +__naked void jgt_imm64_and_may_goto(void) +{ + asm volatile (" \ + r0 = %[tmp_var] ll; \ +l0_%=: .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short -3; /* off -3 */ \ + .long 0; /* imm */ \ + if r0 > 10 goto l0_%=; \ + r0 = 0; \ + exit; \ +" :: __imm_addr(tmp_var) + : __clobber_all); +} + +SEC("socket") +__failure __msg("infinite loop detected at insn 1") +__naked void may_goto_self(void) +{ + asm volatile (" \ + r0 = *(u32 *)(r10 - 4); \ +l0_%=: .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short -1; /* off -1 */ \ + .long 0; /* imm */ \ + if r0 > 10 goto l0_%=; \ + r0 = 0; \ + exit; \ +" ::: __clobber_all); +} + +SEC("socket") +__success __retval(0) +__naked void may_goto_neg_off(void) +{ + asm volatile (" \ + r0 = *(u32 *)(r10 - 4); \ + goto l0_%=; \ + goto l1_%=; \ +l0_%=: .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short -2; /* off -2 */ \ + .long 0; /* imm */ \ + if r0 > 10 goto l0_%=; \ +l1_%=: r0 = 0; \ + exit; \ +" ::: __clobber_all); +} + SEC("tc") __failure __flag(BPF_F_TEST_STATE_FREQ) @@ -307,6 +359,100 @@ int iter_limit_bug(struct __sk_buff *skb) return 0; } +SEC("socket") +__success __retval(0) +__naked void ja_and_may_goto(void) +{ + asm volatile (" \ +l0_%=: .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short 1; /* off 1 */ \ + .long 0; /* imm */ \ + goto l0_%=; \ + r0 = 0; \ + exit; \ +" ::: __clobber_common); +} + +SEC("socket") +__success __retval(0) +__naked void ja_and_may_goto2(void) +{ + asm volatile (" \ +l0_%=: r0 = 0; \ + .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short 1; /* off 1 */ \ + .long 0; /* imm */ \ + goto l0_%=; \ + r0 = 0; \ + exit; \ +" ::: __clobber_common); +} + +SEC("socket") +__success __retval(0) +__naked void jlt_and_may_goto(void) +{ + asm volatile (" \ +l0_%=: call %[bpf_jiffies64]; \ + .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short 1; /* off 1 */ \ + .long 0; /* imm */ \ + if r0 < 10 goto l0_%=; \ + r0 = 0; \ + exit; \ +" :: __imm(bpf_jiffies64) + : __clobber_all); +} + +#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \ + (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \ + defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \ + defined(__TARGET_ARCH_loongarch)) && \ + __clang_major__ >= 18 +SEC("socket") +__success __retval(0) +__naked void gotol_and_may_goto(void) +{ + asm volatile (" \ +l0_%=: r0 = 0; \ + .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short 1; /* off 1 */ \ + .long 0; /* imm */ \ + gotol l0_%=; \ + r0 = 0; \ + exit; \ +" ::: __clobber_common); +} +#endif + +SEC("socket") +__success __retval(0) +__naked void ja_and_may_goto_subprog(void) +{ + asm volatile (" \ + call subprog_with_may_goto; \ + exit; \ +" ::: __clobber_all); +} + +static __naked __noinline __used +void subprog_with_may_goto(void) +{ + asm volatile (" \ +l0_%=: .byte 0xe5; /* may_goto */ \ + .byte 0; /* regs */ \ + .short 1; /* off 1 */ \ + .long 0; /* imm */ \ + goto l0_%=; \ + r0 = 0; \ + exit; \ +" ::: __clobber_all); +} + #define ARR_SZ 1000000 int zero; char arr[ARR_SZ]; diff --git a/tools/testing/selftests/bpf/progs/verifier_movsx.c b/tools/testing/selftests/bpf/progs/verifier_movsx.c index cbb9d6714f53..028ec855587b 100644 --- a/tools/testing/selftests/bpf/progs/verifier_movsx.c +++ b/tools/testing/selftests/bpf/progs/verifier_movsx.c @@ -224,6 +224,69 @@ l0_%=: \ : __clobber_all); } +SEC("socket") +__description("MOV32SX, S8, var_off u32_max") +__failure __msg("infinite loop detected") +__failure_unpriv __msg_unpriv("back-edge from insn 2 to 0") +__naked void mov64sx_s32_varoff_1(void) +{ + asm volatile (" \ +l0_%=: \ + r3 = *(u8 *)(r10 -387); \ + w7 = (s8)w3; \ + if w7 >= 0x2533823b goto l0_%=; \ + w0 = 0; \ + exit; \ +" : + : + : __clobber_all); +} + +SEC("socket") +__description("MOV32SX, S8, var_off not u32_max, positive after s8 extension") +__success __retval(0) +__failure_unpriv __msg_unpriv("frame pointer is read only") +__naked void mov64sx_s32_varoff_2(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r3 = r0; \ + r3 &= 0xf; \ + w7 = (s8)w3; \ + if w7 s>= 16 goto l0_%=; \ + w0 = 0; \ + exit; \ +l0_%=: \ + r10 = 1; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +SEC("socket") +__description("MOV32SX, S8, var_off not u32_max, negative after s8 extension") +__success __retval(0) +__failure_unpriv __msg_unpriv("frame pointer is read only") +__naked void mov64sx_s32_varoff_3(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r3 = r0; \ + r3 &= 0xf; \ + r3 |= 0x80; \ + w7 = (s8)w3; \ + if w7 s>= -5 goto l0_%=; \ + w0 = 0; \ + exit; \ +l0_%=: \ + r10 = 1; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + #else SEC("socket") diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 49a56eb5d036..666ab7d9390b 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -43,7 +43,6 @@ tap tcp_fastopen_backup_key tcp_inq tcp_mmap -test_unix_oob timestamping tls toeplitz diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index 3b83c797650d..50584479540b 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -1,4 +1,4 @@ CFLAGS += $(KHDR_INCLUDES) -TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd scm_rights +TEST_GEN_PROGS := diag_uid msg_oob scm_pidfd scm_rights unix_connect include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/config b/tools/testing/selftests/net/af_unix/config new file mode 100644 index 000000000000..37368567768c --- /dev/null +++ b/tools/testing/selftests/net/af_unix/config @@ -0,0 +1,3 @@ +CONFIG_UNIX=y +CONFIG_AF_UNIX_OOB=y +CONFIG_UNIX_DIAG=m diff --git a/tools/testing/selftests/net/af_unix/msg_oob.c b/tools/testing/selftests/net/af_unix/msg_oob.c new file mode 100644 index 000000000000..16d0c172eaeb --- /dev/null +++ b/tools/testing/selftests/net/af_unix/msg_oob.c @@ -0,0 +1,734 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright Amazon.com Inc. or its affiliates. */ + +#include <fcntl.h> +#include <string.h> +#include <unistd.h> + +#include <netinet/in.h> +#include <sys/epoll.h> +#include <sys/ioctl.h> +#include <sys/signalfd.h> +#include <sys/socket.h> + +#include "../../kselftest_harness.h" + +#define BUF_SZ 32 + +FIXTURE(msg_oob) +{ + int fd[4]; /* 0: AF_UNIX sender + * 1: AF_UNIX receiver + * 2: TCP sender + * 3: TCP receiver + */ + int signal_fd; + int epoll_fd[2]; /* 0: AF_UNIX receiver + * 1: TCP receiver + */ + bool tcp_compliant; +}; + +FIXTURE_VARIANT(msg_oob) +{ + bool peek; +}; + +FIXTURE_VARIANT_ADD(msg_oob, no_peek) +{ + .peek = false, +}; + +FIXTURE_VARIANT_ADD(msg_oob, peek) +{ + .peek = true +}; + +static void create_unix_socketpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self) +{ + int ret; + + ret = socketpair(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0, self->fd); + ASSERT_EQ(ret, 0); +} + +static void create_tcp_socketpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self) +{ + struct sockaddr_in addr; + socklen_t addrlen; + int listen_fd; + int ret; + + listen_fd = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_GE(listen_fd, 0); + + ret = listen(listen_fd, -1); + ASSERT_EQ(ret, 0); + + addrlen = sizeof(addr); + ret = getsockname(listen_fd, (struct sockaddr *)&addr, &addrlen); + ASSERT_EQ(ret, 0); + + self->fd[2] = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_GE(self->fd[2], 0); + + ret = connect(self->fd[2], (struct sockaddr *)&addr, addrlen); + ASSERT_EQ(ret, 0); + + self->fd[3] = accept(listen_fd, (struct sockaddr *)&addr, &addrlen); + ASSERT_GE(self->fd[3], 0); + + ret = fcntl(self->fd[3], F_SETFL, O_NONBLOCK); + ASSERT_EQ(ret, 0); +} + +static void setup_sigurg(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self) +{ + struct signalfd_siginfo siginfo; + int pid = getpid(); + sigset_t mask; + int i, ret; + + for (i = 0; i < 2; i++) { + ret = ioctl(self->fd[i * 2 + 1], FIOSETOWN, &pid); + ASSERT_EQ(ret, 0); + } + + ret = sigemptyset(&mask); + ASSERT_EQ(ret, 0); + + ret = sigaddset(&mask, SIGURG); + ASSERT_EQ(ret, 0); + + ret = sigprocmask(SIG_BLOCK, &mask, NULL); + ASSERT_EQ(ret, 0); + + self->signal_fd = signalfd(-1, &mask, SFD_NONBLOCK); + ASSERT_GE(self->signal_fd, 0); + + ret = read(self->signal_fd, &siginfo, sizeof(siginfo)); + ASSERT_EQ(ret, -1); +} + +static void setup_epollpri(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self) +{ + struct epoll_event event = { + .events = EPOLLPRI, + }; + int i; + + for (i = 0; i < 2; i++) { + int ret; + + self->epoll_fd[i] = epoll_create1(0); + ASSERT_GE(self->epoll_fd[i], 0); + + ret = epoll_ctl(self->epoll_fd[i], EPOLL_CTL_ADD, self->fd[i * 2 + 1], &event); + ASSERT_EQ(ret, 0); + } +} + +static void close_sockets(FIXTURE_DATA(msg_oob) *self) +{ + int i; + + for (i = 0; i < 4; i++) + close(self->fd[i]); +} + +FIXTURE_SETUP(msg_oob) +{ + create_unix_socketpair(_metadata, self); + create_tcp_socketpair(_metadata, self); + + setup_sigurg(_metadata, self); + setup_epollpri(_metadata, self); + + self->tcp_compliant = true; +} + +FIXTURE_TEARDOWN(msg_oob) +{ + close_sockets(self); +} + +static void __epollpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self, + bool oob_remaining) +{ + struct epoll_event event[2] = {}; + int i, ret[2]; + + for (i = 0; i < 2; i++) + ret[i] = epoll_wait(self->epoll_fd[i], &event[i], 1, 0); + + ASSERT_EQ(ret[0], oob_remaining); + + if (self->tcp_compliant) + ASSERT_EQ(ret[0], ret[1]); + + if (oob_remaining) { + ASSERT_EQ(event[0].events, EPOLLPRI); + + if (self->tcp_compliant) + ASSERT_EQ(event[0].events, event[1].events); + } +} + +static void __sendpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self, + const void *buf, size_t len, int flags) +{ + int i, ret[2]; + + for (i = 0; i < 2; i++) { + struct signalfd_siginfo siginfo = {}; + int bytes; + + ret[i] = send(self->fd[i * 2], buf, len, flags); + + bytes = read(self->signal_fd, &siginfo, sizeof(siginfo)); + + if (flags & MSG_OOB) { + ASSERT_EQ(bytes, sizeof(siginfo)); + ASSERT_EQ(siginfo.ssi_signo, SIGURG); + + bytes = read(self->signal_fd, &siginfo, sizeof(siginfo)); + } + + ASSERT_EQ(bytes, -1); + } + + ASSERT_EQ(ret[0], len); + ASSERT_EQ(ret[0], ret[1]); +} + +static void __recvpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self, + const void *expected_buf, int expected_len, + int buf_len, int flags) +{ + int i, ret[2], recv_errno[2], expected_errno = 0; + char recv_buf[2][BUF_SZ] = {}; + bool printed = false; + + ASSERT_GE(BUF_SZ, buf_len); + + errno = 0; + + for (i = 0; i < 2; i++) { + ret[i] = recv(self->fd[i * 2 + 1], recv_buf[i], buf_len, flags); + recv_errno[i] = errno; + } + + if (expected_len < 0) { + expected_errno = -expected_len; + expected_len = -1; + } + + if (ret[0] != expected_len || recv_errno[0] != expected_errno) { + TH_LOG("AF_UNIX :%s", ret[0] < 0 ? strerror(recv_errno[0]) : recv_buf[0]); + TH_LOG("Expected:%s", expected_errno ? strerror(expected_errno) : expected_buf); + + ASSERT_EQ(ret[0], expected_len); + ASSERT_EQ(recv_errno[0], expected_errno); + } + + if (ret[0] != ret[1] || recv_errno[0] != recv_errno[1]) { + TH_LOG("AF_UNIX :%s", ret[0] < 0 ? strerror(recv_errno[0]) : recv_buf[0]); + TH_LOG("TCP :%s", ret[1] < 0 ? strerror(recv_errno[1]) : recv_buf[1]); + + printed = true; + + if (self->tcp_compliant) { + ASSERT_EQ(ret[0], ret[1]); + ASSERT_EQ(recv_errno[0], recv_errno[1]); + } + } + + if (expected_len >= 0) { + int cmp; + + cmp = strncmp(expected_buf, recv_buf[0], expected_len); + if (cmp) { + TH_LOG("AF_UNIX :%s", ret[0] < 0 ? strerror(recv_errno[0]) : recv_buf[0]); + TH_LOG("Expected:%s", expected_errno ? strerror(expected_errno) : expected_buf); + + ASSERT_EQ(cmp, 0); + } + + cmp = strncmp(recv_buf[0], recv_buf[1], expected_len); + if (cmp) { + if (!printed) { + TH_LOG("AF_UNIX :%s", ret[0] < 0 ? strerror(recv_errno[0]) : recv_buf[0]); + TH_LOG("TCP :%s", ret[1] < 0 ? strerror(recv_errno[1]) : recv_buf[1]); + } + + if (self->tcp_compliant) + ASSERT_EQ(cmp, 0); + } + } +} + +static void __setinlinepair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self) +{ + int i, oob_inline = 1; + + for (i = 0; i < 2; i++) { + int ret; + + ret = setsockopt(self->fd[i * 2 + 1], SOL_SOCKET, SO_OOBINLINE, + &oob_inline, sizeof(oob_inline)); + ASSERT_EQ(ret, 0); + } +} + +static void __siocatmarkpair(struct __test_metadata *_metadata, + FIXTURE_DATA(msg_oob) *self, + bool oob_head) +{ + int answ[2] = {}; + int i; + + for (i = 0; i < 2; i++) { + int ret; + + ret = ioctl(self->fd[i * 2 + 1], SIOCATMARK, &answ[i]); + ASSERT_EQ(ret, 0); + } + + ASSERT_EQ(answ[0], oob_head); + + if (self->tcp_compliant) + ASSERT_EQ(answ[0], answ[1]); +} + +#define sendpair(buf, len, flags) \ + __sendpair(_metadata, self, buf, len, flags) + +#define recvpair(expected_buf, expected_len, buf_len, flags) \ + do { \ + if (variant->peek) \ + __recvpair(_metadata, self, \ + expected_buf, expected_len, \ + buf_len, (flags) | MSG_PEEK); \ + __recvpair(_metadata, self, \ + expected_buf, expected_len, buf_len, flags); \ + } while (0) + +#define epollpair(oob_remaining) \ + __epollpair(_metadata, self, oob_remaining) + +#define siocatmarkpair(oob_head) \ + __siocatmarkpair(_metadata, self, oob_head) + +#define setinlinepair() \ + __setinlinepair(_metadata, self) + +#define tcp_incompliant \ + for (self->tcp_compliant = false; \ + self->tcp_compliant == false; \ + self->tcp_compliant = true) + +TEST_F(msg_oob, non_oob) +{ + sendpair("x", 1, 0); + epollpair(false); + siocatmarkpair(false); + + recvpair("", -EINVAL, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, oob) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + recvpair("x", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(true); +} + +TEST_F(msg_oob, oob_drop) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + recvpair("", -EAGAIN, 1, 0); /* Drop OOB. */ + epollpair(false); + siocatmarkpair(false); + + recvpair("", -EINVAL, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, oob_ahead) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + recvpair("hell", 4, 4, 0); + epollpair(false); + siocatmarkpair(true); +} + +TEST_F(msg_oob, oob_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("hell", 4, 5, 0); /* Break at OOB even with enough buffer. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(true); + + recvpair("", -EAGAIN, 1, 0); + siocatmarkpair(false); +} + +TEST_F(msg_oob, oob_ahead_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("world", 5, 0); + epollpair(true); + siocatmarkpair(false); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + recvpair("hell", 4, 9, 0); /* Break at OOB even after it's recv()ed. */ + epollpair(false); + siocatmarkpair(true); + + recvpair("world", 5, 5, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, oob_break_drop) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("world", 5, 0); + epollpair(true); + siocatmarkpair(false); + + recvpair("hell", 4, 10, 0); /* Break at OOB even with enough buffer. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("world", 5, 10, 0); /* Drop OOB and recv() the next skb. */ + epollpair(false); + siocatmarkpair(false); + + recvpair("", -EINVAL, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, ex_oob_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("wor", 3, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("ld", 2, 0); + epollpair(true); + siocatmarkpair(false); + + recvpair("hellowo", 7, 10, 0); /* Break at OOB but not at ex-OOB. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("r", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(true); + + recvpair("ld", 2, 2, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, ex_oob_drop) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + sendpair("y", 1, MSG_OOB); /* TCP drops "x" at this moment. */ + epollpair(true); + + tcp_incompliant { + siocatmarkpair(false); + + recvpair("x", 1, 1, 0); /* TCP drops "y" by passing through it. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("y", 1, 1, MSG_OOB); /* TCP returns -EINVAL. */ + epollpair(false); + siocatmarkpair(true); + } +} + +TEST_F(msg_oob, ex_oob_drop_2) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + sendpair("y", 1, MSG_OOB); /* TCP drops "x" at this moment. */ + epollpair(true); + + tcp_incompliant { + siocatmarkpair(false); + } + + recvpair("y", 1, 1, MSG_OOB); + epollpair(false); + + tcp_incompliant { + siocatmarkpair(false); + + recvpair("x", 1, 1, 0); /* TCP returns -EAGAIN. */ + epollpair(false); + siocatmarkpair(true); + } +} + +TEST_F(msg_oob, ex_oob_ahead_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("wor", 3, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("r", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + sendpair("ld", 2, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + tcp_incompliant { + recvpair("hellowol", 8, 10, 0); /* TCP recv()s "helloworl", why "r" ?? */ + } + + epollpair(true); + siocatmarkpair(true); + + recvpair("d", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(true); +} + +TEST_F(msg_oob, ex_oob_siocatmark) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + sendpair("world", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("hell", 4, 4, 0); /* Intentionally stop at ex-OOB. */ + epollpair(true); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_oob) +{ + setinlinepair(); + + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + recvpair("", -EINVAL, 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + recvpair("x", 1, 1, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_oob_break) +{ + setinlinepair(); + + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("", -EINVAL, 1, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("hell", 4, 5, 0); /* Break at OOB but not at ex-OOB. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("o", 1, 1, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_oob_ahead_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("world", 5, 0); + epollpair(true); + siocatmarkpair(false); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + setinlinepair(); + + recvpair("hell", 4, 9, 0); /* Break at OOB even with enough buffer. */ + epollpair(false); + siocatmarkpair(true); + + tcp_incompliant { + recvpair("world", 5, 6, 0); /* TCP recv()s "oworld", ... "o" ??? */ + } + + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_ex_oob_break) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("wor", 3, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + sendpair("ld", 2, 0); + epollpair(true); + siocatmarkpair(false); + + setinlinepair(); + + recvpair("hellowo", 7, 10, 0); /* Break at OOB but not at ex-OOB. */ + epollpair(true); + siocatmarkpair(true); + + recvpair("rld", 3, 3, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_ex_oob_no_drop) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + setinlinepair(); + + sendpair("y", 1, MSG_OOB); /* TCP does NOT drops "x" at this moment. */ + epollpair(true); + siocatmarkpair(false); + + recvpair("x", 1, 1, 0); + epollpair(true); + siocatmarkpair(true); + + recvpair("y", 1, 1, 0); + epollpair(false); + siocatmarkpair(false); +} + +TEST_F(msg_oob, inline_ex_oob_drop) +{ + sendpair("x", 1, MSG_OOB); + epollpair(true); + siocatmarkpair(true); + + sendpair("y", 1, MSG_OOB); /* TCP drops "x" at this moment. */ + epollpair(true); + + setinlinepair(); + + tcp_incompliant { + siocatmarkpair(false); + + recvpair("x", 1, 1, 0); /* TCP recv()s "y". */ + epollpair(true); + siocatmarkpair(true); + + recvpair("y", 1, 1, 0); /* TCP returns -EAGAIN. */ + epollpair(false); + siocatmarkpair(false); + } +} + +TEST_F(msg_oob, inline_ex_oob_siocatmark) +{ + sendpair("hello", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("o", 1, 1, MSG_OOB); + epollpair(false); + siocatmarkpair(false); + + setinlinepair(); + + sendpair("world", 5, MSG_OOB); + epollpair(true); + siocatmarkpair(false); + + recvpair("hell", 4, 4, 0); /* Intentionally stop at ex-OOB. */ + epollpair(true); + siocatmarkpair(false); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/net/af_unix/test_unix_oob.c b/tools/testing/selftests/net/af_unix/test_unix_oob.c deleted file mode 100644 index a7c51889acd5..000000000000 --- a/tools/testing/selftests/net/af_unix/test_unix_oob.c +++ /dev/null @@ -1,436 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -#include <stdio.h> -#include <stdlib.h> -#include <sys/socket.h> -#include <arpa/inet.h> -#include <unistd.h> -#include <string.h> -#include <fcntl.h> -#include <sys/ioctl.h> -#include <errno.h> -#include <netinet/tcp.h> -#include <sys/un.h> -#include <sys/signal.h> -#include <sys/poll.h> - -static int pipefd[2]; -static int signal_recvd; -static pid_t producer_id; -static char sock_name[32]; - -static void sig_hand(int sn, siginfo_t *si, void *p) -{ - signal_recvd = sn; -} - -static int set_sig_handler(int signal) -{ - struct sigaction sa; - - sa.sa_sigaction = sig_hand; - sigemptyset(&sa.sa_mask); - sa.sa_flags = SA_SIGINFO | SA_RESTART; - - return sigaction(signal, &sa, NULL); -} - -static void set_filemode(int fd, int set) -{ - int flags = fcntl(fd, F_GETFL, 0); - - if (set) - flags &= ~O_NONBLOCK; - else - flags |= O_NONBLOCK; - fcntl(fd, F_SETFL, flags); -} - -static void signal_producer(int fd) -{ - char cmd; - - cmd = 'S'; - write(fd, &cmd, sizeof(cmd)); -} - -static void wait_for_signal(int fd) -{ - char buf[5]; - - read(fd, buf, 5); -} - -static void die(int status) -{ - fflush(NULL); - unlink(sock_name); - kill(producer_id, SIGTERM); - exit(status); -} - -int is_sioctatmark(int fd) -{ - int ans = -1; - - if (ioctl(fd, SIOCATMARK, &ans, sizeof(ans)) < 0) { -#ifdef DEBUG - perror("SIOCATMARK Failed"); -#endif - } - return ans; -} - -void read_oob(int fd, char *c) -{ - - *c = ' '; - if (recv(fd, c, sizeof(*c), MSG_OOB) < 0) { -#ifdef DEBUG - perror("Reading MSG_OOB Failed"); -#endif - } -} - -int read_data(int pfd, char *buf, int size) -{ - int len = 0; - - memset(buf, size, '0'); - len = read(pfd, buf, size); -#ifdef DEBUG - if (len < 0) - perror("read failed"); -#endif - return len; -} - -static void wait_for_data(int pfd, int event) -{ - struct pollfd pfds[1]; - - pfds[0].fd = pfd; - pfds[0].events = event; - poll(pfds, 1, -1); -} - -void producer(struct sockaddr_un *consumer_addr) -{ - int cfd; - char buf[64]; - int i; - - memset(buf, 'x', sizeof(buf)); - cfd = socket(AF_UNIX, SOCK_STREAM, 0); - - wait_for_signal(pipefd[0]); - if (connect(cfd, (struct sockaddr *)consumer_addr, - sizeof(*consumer_addr)) != 0) { - perror("Connect failed"); - kill(0, SIGTERM); - exit(1); - } - - for (i = 0; i < 2; i++) { - /* Test 1: Test for SIGURG and OOB */ - wait_for_signal(pipefd[0]); - memset(buf, 'x', sizeof(buf)); - buf[63] = '@'; - send(cfd, buf, sizeof(buf), MSG_OOB); - - wait_for_signal(pipefd[0]); - - /* Test 2: Test for OOB being overwitten */ - memset(buf, 'x', sizeof(buf)); - buf[63] = '%'; - send(cfd, buf, sizeof(buf), MSG_OOB); - - memset(buf, 'x', sizeof(buf)); - buf[63] = '#'; - send(cfd, buf, sizeof(buf), MSG_OOB); - - wait_for_signal(pipefd[0]); - - /* Test 3: Test for SIOCATMARK */ - memset(buf, 'x', sizeof(buf)); - buf[63] = '@'; - send(cfd, buf, sizeof(buf), MSG_OOB); - - memset(buf, 'x', sizeof(buf)); - buf[63] = '%'; - send(cfd, buf, sizeof(buf), MSG_OOB); - - memset(buf, 'x', sizeof(buf)); - send(cfd, buf, sizeof(buf), 0); - - wait_for_signal(pipefd[0]); - - /* Test 4: Test for 1byte OOB msg */ - memset(buf, 'x', sizeof(buf)); - buf[0] = '@'; - send(cfd, buf, 1, MSG_OOB); - } -} - -int -main(int argc, char **argv) -{ - int lfd, pfd; - struct sockaddr_un consumer_addr, paddr; - socklen_t len = sizeof(consumer_addr); - char buf[1024]; - int on = 0; - char oob; - int atmark; - - lfd = socket(AF_UNIX, SOCK_STREAM, 0); - memset(&consumer_addr, 0, sizeof(consumer_addr)); - consumer_addr.sun_family = AF_UNIX; - sprintf(sock_name, "unix_oob_%d", getpid()); - unlink(sock_name); - strcpy(consumer_addr.sun_path, sock_name); - - if ((bind(lfd, (struct sockaddr *)&consumer_addr, - sizeof(consumer_addr))) != 0) { - perror("socket bind failed"); - exit(1); - } - - pipe(pipefd); - - listen(lfd, 1); - - producer_id = fork(); - if (producer_id == 0) { - producer(&consumer_addr); - exit(0); - } - - set_sig_handler(SIGURG); - signal_producer(pipefd[1]); - - pfd = accept(lfd, (struct sockaddr *) &paddr, &len); - fcntl(pfd, F_SETOWN, getpid()); - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 1: - * veriyf that SIGURG is - * delivered, 63 bytes are - * read, oob is '@', and POLLPRI works. - */ - wait_for_data(pfd, POLLPRI); - read_oob(pfd, &oob); - len = read_data(pfd, buf, 1024); - if (!signal_recvd || len != 63 || oob != '@') { - fprintf(stderr, "Test 1 failed sigurg %d len %d %c\n", - signal_recvd, len, oob); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 2: - * Verify that the first OOB is over written by - * the 2nd one and the first OOB is returned as - * part of the read, and sigurg is received. - */ - wait_for_data(pfd, POLLIN | POLLPRI); - len = 0; - while (len < 70) - len = recv(pfd, buf, 1024, MSG_PEEK); - len = read_data(pfd, buf, 1024); - read_oob(pfd, &oob); - if (!signal_recvd || len != 127 || oob != '#') { - fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n", - signal_recvd, len, oob); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 3: - * verify that 2nd oob over writes - * the first one and read breaks at - * oob boundary returning 127 bytes - * and sigurg is received and atmark - * is set. - * oob is '%' and second read returns - * 64 bytes. - */ - len = 0; - wait_for_data(pfd, POLLIN | POLLPRI); - while (len < 150) - len = recv(pfd, buf, 1024, MSG_PEEK); - len = read_data(pfd, buf, 1024); - atmark = is_sioctatmark(pfd); - read_oob(pfd, &oob); - - if (!signal_recvd || len != 127 || oob != '%' || atmark != 1) { - fprintf(stderr, - "Test 3 failed, sigurg %d len %d OOB %c atmark %d\n", - signal_recvd, len, oob, atmark); - die(1); - } - - signal_recvd = 0; - - len = read_data(pfd, buf, 1024); - if (len != 64) { - fprintf(stderr, "Test 3.1 failed, sigurg %d len %d OOB %c\n", - signal_recvd, len, oob); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 4: - * verify that a single byte - * oob message is delivered. - * set non blocking mode and - * check proper error is - * returned and sigurg is - * received and correct - * oob is read. - */ - - set_filemode(pfd, 0); - - wait_for_data(pfd, POLLIN | POLLPRI); - len = read_data(pfd, buf, 1024); - if ((len == -1) && (errno == 11)) - len = 0; - - read_oob(pfd, &oob); - - if (!signal_recvd || len != 0 || oob != '@') { - fprintf(stderr, "Test 4 failed, sigurg %d len %d OOB %c\n", - signal_recvd, len, oob); - die(1); - } - - set_filemode(pfd, 1); - - /* Inline Testing */ - - on = 1; - if (setsockopt(pfd, SOL_SOCKET, SO_OOBINLINE, &on, sizeof(on))) { - perror("SO_OOBINLINE"); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 1 -- Inline: - * Check that SIGURG is - * delivered and 63 bytes are - * read and oob is '@' - */ - - wait_for_data(pfd, POLLIN | POLLPRI); - len = read_data(pfd, buf, 1024); - - if (!signal_recvd || len != 63) { - fprintf(stderr, "Test 1 Inline failed, sigurg %d len %d\n", - signal_recvd, len); - die(1); - } - - len = read_data(pfd, buf, 1024); - - if (len != 1) { - fprintf(stderr, - "Test 1.1 Inline failed, sigurg %d len %d oob %c\n", - signal_recvd, len, oob); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 2 -- Inline: - * Verify that the first OOB is over written by - * the 2nd one and read breaks correctly on - * 2nd OOB boundary with the first OOB returned as - * part of the read, and sigurg is delivered and - * siocatmark returns true. - * next read returns one byte, the oob byte - * and siocatmark returns false. - */ - len = 0; - wait_for_data(pfd, POLLIN | POLLPRI); - while (len < 70) - len = recv(pfd, buf, 1024, MSG_PEEK); - len = read_data(pfd, buf, 1024); - atmark = is_sioctatmark(pfd); - if (len != 127 || atmark != 1 || !signal_recvd) { - fprintf(stderr, "Test 2 Inline failed, len %d atmark %d\n", - len, atmark); - die(1); - } - - len = read_data(pfd, buf, 1024); - atmark = is_sioctatmark(pfd); - if (len != 1 || buf[0] != '#' || atmark == 1) { - fprintf(stderr, "Test 2.1 Inline failed, len %d data %c atmark %d\n", - len, buf[0], atmark); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 3 -- Inline: - * verify that 2nd oob over writes - * the first one and read breaks at - * oob boundary returning 127 bytes - * and sigurg is received and siocatmark - * is true after the read. - * subsequent read returns 65 bytes - * because of oob which should be '%'. - */ - len = 0; - wait_for_data(pfd, POLLIN | POLLPRI); - while (len < 126) - len = recv(pfd, buf, 1024, MSG_PEEK); - len = read_data(pfd, buf, 1024); - atmark = is_sioctatmark(pfd); - if (!signal_recvd || len != 127 || !atmark) { - fprintf(stderr, - "Test 3 Inline failed, sigurg %d len %d data %c\n", - signal_recvd, len, buf[0]); - die(1); - } - - len = read_data(pfd, buf, 1024); - atmark = is_sioctatmark(pfd); - if (len != 65 || buf[0] != '%' || atmark != 0) { - fprintf(stderr, - "Test 3.1 Inline failed, len %d oob %c atmark %d\n", - len, buf[0], atmark); - die(1); - } - - signal_recvd = 0; - signal_producer(pipefd[1]); - - /* Test 4 -- Inline: - * verify that a single - * byte oob message is delivered - * and read returns one byte, the oob - * byte and sigurg is received - */ - wait_for_data(pfd, POLLIN | POLLPRI); - len = read_data(pfd, buf, 1024); - if (!signal_recvd || len != 1 || buf[0] != '@') { - fprintf(stderr, - "Test 4 Inline failed, signal %d len %d data %c\n", - signal_recvd, len, buf[0]); - die(1); - } - die(0); -} |