summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorNick Child <nnac123@linux.ibm.com>2023-06-28 20:22:44 +0200
committerJakub Kicinski <kuba@kernel.org>2023-06-29 20:12:19 +0200
commit48538ccb825b05544ec308a509e2cc9c013402db (patch)
treef6f2ea1b1e60bf6fb77316e3e6144ab313d6bf23
parentsfc: support for devlink port requires MAE access (diff)
downloadlinux-48538ccb825b05544ec308a509e2cc9c013402db.tar.xz
linux-48538ccb825b05544ec308a509e2cc9c013402db.zip
ibmvnic: Do not reset dql stats on NON_FATAL err
All ibmvnic resets, make a call to netdev_tx_reset_queue() when re-opening the device. netdev_tx_reset_queue() resets the num_queued and num_completed byte counters. These stats are used in Byte Queue Limit (BQL) algorithms. The difference between these two stats tracks the number of bytes currently sitting on the physical NIC. ibmvnic increases the number of queued bytes though calls to netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports that it is done transmitting bytes, the ibmvnic device increases the number of completed bytes through calls to netdev_tx_completed_queue(). It is important to note that the driver batches its transmit calls and num_queued is increased every time that an skb is added to the next batch, not necessarily when the batch is sent to VIOS for transmission. Unlike other reset types, a NON FATAL reset will not flush the sub crq tx buffers. Therefore, it is possible for the batched skb array to be partially full. So if there is call to netdev_tx_reset_queue() when re-opening the device, the value of num_queued (0) would not account for the skb's that are currently batched. Eventually, when the batch is sent to VIOS, the call to netdev_tx_completed_queue() would increase num_completed to a value greater than the num_queued. This causes a BUG_ON crash: ibmvnic 30000002: Firmware reports error, cause: adapter problem. Starting recovery... ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:27! Oops: Exception in kernel mode, sig: 5 [....] NIP dql_completed+0x28/0x1c0 LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] Call Trace: ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] __handle_irq_event_percpu+0x98/0x270 ---[ end trace ]--- Therefore, do not reset the dql stats when performing a NON_FATAL reset. Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-rw-r--r--drivers/net/ethernet/ibm/ibmvnic.c9
1 files changed, 8 insertions, 1 deletions
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index c63d3ec9d328..763d613adbcc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1816,7 +1816,14 @@ static int __ibmvnic_open(struct net_device *netdev)
if (prev_state == VNIC_CLOSED)
enable_irq(adapter->tx_scrq[i]->irq);
enable_scrq_irq(adapter, adapter->tx_scrq[i]);
- netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
+ /* netdev_tx_reset_queue will reset dql stats. During NON_FATAL
+ * resets, don't reset the stats because there could be batched
+ * skb's waiting to be sent. If we reset dql stats, we risk
+ * num_completed being greater than num_queued. This will cause
+ * a BUG_ON in dql_completed().
+ */
+ if (adapter->reset_reason != VNIC_RESET_NON_FATAL)
+ netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
}
rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);