diff options
author | David S. Miller <davem@davemloft.net> | 2018-07-04 15:30:28 +0200 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2018-07-04 15:30:28 +0200 |
commit | d11a899ff0de3efad3c27fb1241de51e9d1a04a8 (patch) | |
tree | 18de5104ec6df0136c947556e6c9bc0dbad8fd2e /drivers/net | |
parent | isdn: mark expected switch fall-throughs (diff) | |
parent | net/sched: Make etf report drops on error_queue (diff) | |
download | linux-d11a899ff0de3efad3c27fb1241de51e9d1a04a8.tar.xz linux-d11a899ff0de3efad3c27fb1241de51e9d1a04a8.zip |
Merge branch 'Scheduled-packet-Transmission-ETF'
Jesus Sanchez-Palencia says:
====================
Scheduled packet Transmission: ETF
Changes since v1:
- moved struct sock_txtime from socket.h to uapi net_tstamp.h;
- sk_clockid was changed from u16 to u8;
- sk_txtime_flags was changed from u16 to a u8 bit field in struct sock;
- the socket option flags are now validated in sock_setsockopt();
- added SO_EE_ORIGIN_TXTIME;
- sockc.transmit_time is now initialized from all IPv4 Tx paths;
- added support for the IPv6 Tx path;
Overview
========
This work consists of a set of kernel interfaces that can be used by
applications that require (time-based) Scheduled Tx of packets.
It is comprised by 3 new components to the kernel:
- SO_TXTIME: socket option + cmsg programming interfaces.
- etf: the "earliest txtime first" qdisc, that provides per-queue
TxTime-based scheduling. This has been renamed from 'tbs' to
'etf' to better describe its functionality.
- taprio: the "time-aware priority scheduler" qdisc, that provides
per-port Time-Aware scheduling;
This patchset is providing the first 2 components, which have been
developed for longer. The taprio qdisc will be shared as an RFC separately
(shortly).
Note that this series is a follow up of the "Time based packet
transmission" RFCv3 [1].
etf (formerly known as 'tbs')
=============================
For applications/systems that the concept of time slices isn't precise
enough, the etf qdisc allows applications to control the instant when
a packet should leave the network controller. When used in conjunction
with taprio, it can also be used in case the application needs to
control with greater guarantee the offset into each time slice a packet
will be sent. Another use case of etf, is when only a small number of
applications on a system are time sensitive, so it can then be used
with a more traditional root qdisc (like mqprio).
The etf qdisc is designed so it buffers packets until a configurable
time before their deadline (Tx time). The qdisc uses a rbtree internally
so the buffered packets are always 'ordered' by their txtime (deadline)
and will be dequeued following the earliest txtime first.
It relies on the SO_TXTIME API set for receiving the per-packet timestamp
(txtime) as well as the config flags for each socket: the clockid to be
used as a reference, if the expected mode of txtime for that socket is
deadline or strict mode, and if packet drops should be reported on the
socket's error queue or not.
The qdisc will drop any packets with a Tx time in the past, or if a
packet expires while waiting for being dequeued. Drops can be reported
as errors back to userspace through the socket's error queue.
Example configuration:
$ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \
clockid CLOCK_TAI
Here, the Qdisc will use HW offload for the txtime control.
Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before
their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by hrtimers, the system clock
and the PHC clock must be synchronized for this mode to behave as expected.
A more complete example can be found here, with instructions of how to
test it:
https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2]
Note that we haven't modified the qdisc so it uses a timerqueue because
the modification needed was increasing the number of cachelines of a sk_buff.
This series is also hosted on github and can be found at [3].
The companion iproute2 patches can be found at [4].
[1] https://patchwork.ozlabs.org/cover/882342/
[2] github doesn't make it clear, but the gist can be cloned like this:
$ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f scheduled-tx-tests
[3] https://github.com/jeez/linux/tree/etf-v2
[4] https://github.com/jeez/iproute2/tree/etf-v2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net')
-rw-r--r-- | drivers/net/ethernet/intel/igb/e1000_defines.h | 16 | ||||
-rw-r--r-- | drivers/net/ethernet/intel/igb/igb.h | 1 | ||||
-rw-r--r-- | drivers/net/ethernet/intel/igb/igb_main.c | 256 |
3 files changed, 206 insertions, 67 deletions
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 252440a418dc..8a28f3388f69 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -1048,6 +1048,22 @@ #define E1000_TQAVCTRL_XMIT_MODE BIT(0) #define E1000_TQAVCTRL_DATAFETCHARB BIT(4) #define E1000_TQAVCTRL_DATATRANARB BIT(8) +#define E1000_TQAVCTRL_DATATRANTIM BIT(9) +#define E1000_TQAVCTRL_SP_WAIT_SR BIT(10) +/* Fetch Time Delta - bits 31:16 + * + * This field holds the value to be reduced from the launch time for + * fetch time decision. The FetchTimeDelta value is defined in 32 ns + * granularity. + * + * This field is 16 bits wide, and so the maximum value is: + * + * 65535 * 32 = 2097120 ~= 2.1 msec + * + * XXX: We are configuring the max value here since we couldn't come up + * with a reason for not doing so. + */ +#define E1000_TQAVCTRL_FETCHTIME_DELTA (0xFFFF << 16) /* TX Qav Credit Control fields */ #define E1000_TQAVCC_IDLESLOPE_MASK 0xFFFF diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 9643b5b3d444..ca54e268d157 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -262,6 +262,7 @@ struct igb_ring { u16 count; /* number of desc. in the ring */ u8 queue_index; /* logical index of the ring*/ u8 reg_idx; /* physical index of the ring */ + bool launchtime_enable; /* true if LaunchTime is enabled */ bool cbs_enable; /* indicates if CBS is enabled */ s32 idleslope; /* idleSlope in kbps */ s32 sendslope; /* sendSlope in kbps */ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index f1e3397bd405..e3a0c02721c9 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1654,33 +1654,65 @@ static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode) wr32(E1000_I210_TQAVCC(queue), val); } +static bool is_any_cbs_enabled(struct igb_adapter *adapter) +{ + int i; + + for (i = 0; i < adapter->num_tx_queues; i++) { + if (adapter->tx_ring[i]->cbs_enable) + return true; + } + + return false; +} + +static bool is_any_txtime_enabled(struct igb_adapter *adapter) +{ + int i; + + for (i = 0; i < adapter->num_tx_queues; i++) { + if (adapter->tx_ring[i]->launchtime_enable) + return true; + } + + return false; +} + /** - * igb_configure_cbs - Configure Credit-Based Shaper (CBS) + * igb_config_tx_modes - Configure "Qav Tx mode" features on igb * @adapter: pointer to adapter struct * @queue: queue number - * @enable: true = enable CBS, false = disable CBS - * @idleslope: idleSlope in kbps - * @sendslope: sendSlope in kbps - * @hicredit: hiCredit in bytes - * @locredit: loCredit in bytes * - * Configure CBS for a given hardware queue. When disabling, idleslope, - * sendslope, hicredit, locredit arguments are ignored. Returns 0 if - * success. Negative otherwise. + * Configure CBS and Launchtime for a given hardware queue. + * Parameters are retrieved from the correct Tx ring, so + * igb_save_cbs_params() and igb_save_txtime_params() should be used + * for setting those correctly prior to this function being called. **/ -static void igb_configure_cbs(struct igb_adapter *adapter, int queue, - bool enable, int idleslope, int sendslope, - int hicredit, int locredit) +static void igb_config_tx_modes(struct igb_adapter *adapter, int queue) { + struct igb_ring *ring = adapter->tx_ring[queue]; struct net_device *netdev = adapter->netdev; struct e1000_hw *hw = &adapter->hw; - u32 tqavcc; + u32 tqavcc, tqavctrl; u16 value; WARN_ON(hw->mac.type != e1000_i210); WARN_ON(queue < 0 || queue > 1); - if (enable || queue == 0) { + /* If any of the Qav features is enabled, configure queues as SR and + * with HIGH PRIO. If none is, then configure them with LOW PRIO and + * as SP. + */ + if (ring->cbs_enable || ring->launchtime_enable) { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH); + set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION); + } else { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW); + set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY); + } + + /* If CBS is enabled, set DataTranARB and config its parameters. */ + if (ring->cbs_enable || queue == 0) { /* i210 does not allow the queue 0 to be in the Strict * Priority mode while the Qav mode is enabled, so, * instead of disabling strict priority mode, we give @@ -1690,14 +1722,19 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue, * Queue0 QueueMode must be set to 1b when * TransmitMode is set to Qav." */ - if (queue == 0 && !enable) { + if (queue == 0 && !ring->cbs_enable) { /* max "linkspeed" idleslope in kbps */ - idleslope = 1000000; - hicredit = ETH_FRAME_LEN; + ring->idleslope = 1000000; + ring->hicredit = ETH_FRAME_LEN; } - set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH); - set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION); + /* Always set data transfer arbitration to credit-based + * shaper algorithm on TQAVCTRL if CBS is enabled for any of + * the queues. + */ + tqavctrl = rd32(E1000_I210_TQAVCTRL); + tqavctrl |= E1000_TQAVCTRL_DATATRANARB; + wr32(E1000_I210_TQAVCTRL, tqavctrl); /* According to i210 datasheet section 7.2.7.7, we should set * the 'idleSlope' field from TQAVCC register following the @@ -1756,17 +1793,16 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue, * calculated value, so the resulting bandwidth might * be slightly higher for some configurations. */ - value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000); + value = DIV_ROUND_UP_ULL(ring->idleslope * 61034ULL, 1000000); tqavcc = rd32(E1000_I210_TQAVCC(queue)); tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK; tqavcc |= value; wr32(E1000_I210_TQAVCC(queue), tqavcc); - wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735); + wr32(E1000_I210_TQAVHC(queue), + 0x80000000 + ring->hicredit * 0x7735); } else { - set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW); - set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY); /* Set idleSlope to zero. */ tqavcc = rd32(E1000_I210_TQAVCC(queue)); @@ -1775,6 +1811,43 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue, /* Set hiCredit to zero. */ wr32(E1000_I210_TQAVHC(queue), 0); + + /* If CBS is not enabled for any queues anymore, then return to + * the default state of Data Transmission Arbitration on + * TQAVCTRL. + */ + if (!is_any_cbs_enabled(adapter)) { + tqavctrl = rd32(E1000_I210_TQAVCTRL); + tqavctrl &= ~E1000_TQAVCTRL_DATATRANARB; + wr32(E1000_I210_TQAVCTRL, tqavctrl); + } + } + + /* If LaunchTime is enabled, set DataTranTIM. */ + if (ring->launchtime_enable) { + /* Always set DataTranTIM on TQAVCTRL if LaunchTime is enabled + * for any of the SR queues, and configure fetchtime delta. + * XXX NOTE: + * - LaunchTime will be enabled for all SR queues. + * - A fixed offset can be added relative to the launch + * time of all packets if configured at reg LAUNCH_OS0. + * We are keeping it as 0 for now (default value). + */ + tqavctrl = rd32(E1000_I210_TQAVCTRL); + tqavctrl |= E1000_TQAVCTRL_DATATRANTIM | + E1000_TQAVCTRL_FETCHTIME_DELTA; + wr32(E1000_I210_TQAVCTRL, tqavctrl); + } else { + /* If Launchtime is not enabled for any SR queues anymore, + * then clear DataTranTIM on TQAVCTRL and clear fetchtime delta, + * effectively disabling Launchtime. + */ + if (!is_any_txtime_enabled(adapter)) { + tqavctrl = rd32(E1000_I210_TQAVCTRL); + tqavctrl &= ~E1000_TQAVCTRL_DATATRANTIM; + tqavctrl &= ~E1000_TQAVCTRL_FETCHTIME_DELTA; + wr32(E1000_I210_TQAVCTRL, tqavctrl); + } } /* XXX: In i210 controller the sendSlope and loCredit parameters from @@ -1782,9 +1855,27 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue, * configuration' in respect to these parameters. */ - netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n", - (enable) ? "enabled" : "disabled", queue, - idleslope, sendslope, hicredit, locredit); + netdev_dbg(netdev, "Qav Tx mode: cbs %s, launchtime %s, queue %d \ + idleslope %d sendslope %d hiCredit %d \ + locredit %d\n", + (ring->cbs_enable) ? "enabled" : "disabled", + (ring->launchtime_enable) ? "enabled" : "disabled", queue, + ring->idleslope, ring->sendslope, ring->hicredit, + ring->locredit); +} + +static int igb_save_txtime_params(struct igb_adapter *adapter, int queue, + bool enable) +{ + struct igb_ring *ring; + + if (queue < 0 || queue > adapter->num_tx_queues) + return -EINVAL; + + ring = adapter->tx_ring[queue]; + ring->launchtime_enable = enable; + + return 0; } static int igb_save_cbs_params(struct igb_adapter *adapter, int queue, @@ -1807,21 +1898,15 @@ static int igb_save_cbs_params(struct igb_adapter *adapter, int queue, return 0; } -static bool is_any_cbs_enabled(struct igb_adapter *adapter) -{ - struct igb_ring *ring; - int i; - - for (i = 0; i < adapter->num_tx_queues; i++) { - ring = adapter->tx_ring[i]; - - if (ring->cbs_enable) - return true; - } - - return false; -} - +/** + * igb_setup_tx_mode - Switch to/from Qav Tx mode when applicable + * @adapter: pointer to adapter struct + * + * Configure TQAVCTRL register switching the controller's Tx mode + * if FQTSS mode is enabled or disabled. Additionally, will issue + * a call to igb_config_tx_modes() per queue so any previously saved + * Tx parameters are applied. + **/ static void igb_setup_tx_mode(struct igb_adapter *adapter) { struct net_device *netdev = adapter->netdev; @@ -1836,11 +1921,11 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter) int i, max_queue; /* Configure TQAVCTRL register: set transmit mode to 'Qav', - * set data fetch arbitration to 'round robin' and set data - * transfer arbitration to 'credit shaper algorithm. + * set data fetch arbitration to 'round robin', set SP_WAIT_SR + * so SP queues wait for SR ones. */ val = rd32(E1000_I210_TQAVCTRL); - val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB; + val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_SP_WAIT_SR; val &= ~E1000_TQAVCTRL_DATAFETCHARB; wr32(E1000_I210_TQAVCTRL, val); @@ -1881,11 +1966,7 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter) adapter->num_tx_queues : I210_SR_QUEUES_NUM; for (i = 0; i < max_queue; i++) { - struct igb_ring *ring = adapter->tx_ring[i]; - - igb_configure_cbs(adapter, i, ring->cbs_enable, - ring->idleslope, ring->sendslope, - ring->hicredit, ring->locredit); + igb_config_tx_modes(adapter, i); } } else { wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT); @@ -2459,6 +2540,19 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev, return features; } +static void igb_offload_apply(struct igb_adapter *adapter, s32 queue) +{ + if (!is_fqtss_enabled(adapter)) { + enable_fqtss(adapter, true); + return; + } + + igb_config_tx_modes(adapter, queue); + + if (!is_any_cbs_enabled(adapter) && !is_any_txtime_enabled(adapter)) + enable_fqtss(adapter, false); +} + static int igb_offload_cbs(struct igb_adapter *adapter, struct tc_cbs_qopt_offload *qopt) { @@ -2479,17 +2573,7 @@ static int igb_offload_cbs(struct igb_adapter *adapter, if (err) return err; - if (is_fqtss_enabled(adapter)) { - igb_configure_cbs(adapter, qopt->queue, qopt->enable, - qopt->idleslope, qopt->sendslope, - qopt->hicredit, qopt->locredit); - - if (!is_any_cbs_enabled(adapter)) - enable_fqtss(adapter, false); - - } else { - enable_fqtss(adapter, true); - } + igb_offload_apply(adapter, qopt->queue); return 0; } @@ -2738,6 +2822,29 @@ static int igb_setup_tc_block(struct igb_adapter *adapter, } } +static int igb_offload_txtime(struct igb_adapter *adapter, + struct tc_etf_qopt_offload *qopt) +{ + struct e1000_hw *hw = &adapter->hw; + int err; + + /* Launchtime offloading is only supported by i210 controller. */ + if (hw->mac.type != e1000_i210) + return -EOPNOTSUPP; + + /* Launchtime offloading is only supported by queues 0 and 1. */ + if (qopt->queue < 0 || qopt->queue > 1) + return -EINVAL; + + err = igb_save_txtime_params(adapter, qopt->queue, qopt->enable); + if (err) + return err; + + igb_offload_apply(adapter, qopt->queue); + + return 0; +} + static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { @@ -2748,6 +2855,8 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, return igb_offload_cbs(adapter, type_data); case TC_SETUP_BLOCK: return igb_setup_tc_block(adapter, type_data); + case TC_SETUP_QDISC_ETF: + return igb_offload_txtime(adapter, type_data); default: return -EOPNOTSUPP; @@ -5568,11 +5677,14 @@ set_itr_now: } } -static void igb_tx_ctxtdesc(struct igb_ring *tx_ring, u32 vlan_macip_lens, - u32 type_tucmd, u32 mss_l4len_idx) +static void igb_tx_ctxtdesc(struct igb_ring *tx_ring, + struct igb_tx_buffer *first, + u32 vlan_macip_lens, u32 type_tucmd, + u32 mss_l4len_idx) { struct e1000_adv_tx_context_desc *context_desc; u16 i = tx_ring->next_to_use; + struct timespec64 ts; context_desc = IGB_TX_CTXTDESC(tx_ring, i); @@ -5587,9 +5699,18 @@ static void igb_tx_ctxtdesc(struct igb_ring *tx_ring, u32 vlan_macip_lens, mss_l4len_idx |= tx_ring->reg_idx << 4; context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens); - context_desc->seqnum_seed = 0; context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd); context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx); + + /* We assume there is always a valid tx time available. Invalid times + * should have been handled by the upper layers. + */ + if (tx_ring->launchtime_enable) { + ts = ns_to_timespec64(first->skb->tstamp); + context_desc->seqnum_seed = cpu_to_le32(ts.tv_nsec / 32); + } else { + context_desc->seqnum_seed = 0; + } } static int igb_tso(struct igb_ring *tx_ring, @@ -5672,7 +5793,8 @@ static int igb_tso(struct igb_ring *tx_ring, vlan_macip_lens |= (ip.hdr - skb->data) << E1000_ADVTXD_MACLEN_SHIFT; vlan_macip_lens |= first->tx_flags & IGB_TX_FLAGS_VLAN_MASK; - igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, mss_l4len_idx); + igb_tx_ctxtdesc(tx_ring, first, vlan_macip_lens, + type_tucmd, mss_l4len_idx); return 1; } @@ -5727,7 +5849,7 @@ no_csum: vlan_macip_lens |= skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT; vlan_macip_lens |= first->tx_flags & IGB_TX_FLAGS_VLAN_MASK; - igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, 0); + igb_tx_ctxtdesc(tx_ring, first, vlan_macip_lens, type_tucmd, 0); } #define IGB_SET_FLAG(_input, _flag, _result) \ @@ -6015,8 +6137,6 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb, } } - skb_tx_timestamp(skb); - if (skb_vlan_tag_present(skb)) { tx_flags |= IGB_TX_FLAGS_VLAN; tx_flags |= (skb_vlan_tag_get(skb) << IGB_TX_FLAGS_VLAN_SHIFT); @@ -6032,6 +6152,8 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb, else if (!tso) igb_tx_csum(tx_ring, first); + skb_tx_timestamp(skb); + if (igb_tx_map(tx_ring, first, hdr_len)) goto cleanup_tx_tstamp; |