diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-07-01 00:51:09 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-07-01 00:51:09 +0200 |
commit | dbe69e43372212527abf48609aba7fc39a6daa27 (patch) | |
tree | 96cfafdf70f5325ceeac1054daf7deca339c9730 /Documentation/networking/device_drivers | |
parent | Merge tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/ker... (diff) | |
parent | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (diff) | |
download | linux-dbe69e43372212527abf48609aba7fc39a6daa27.tar.xz linux-dbe69e43372212527abf48609aba7fc39a6daa27.zip |
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
Diffstat (limited to 'Documentation/networking/device_drivers')
7 files changed, 443 insertions, 103 deletions
diff --git a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst index 70643b58de05..4118384cf8eb 100644 --- a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst +++ b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst @@ -27,34 +27,136 @@ these MAP frames and send them to appropriate PDN's. 2. Packet format ================ -a. MAP packet (data / control) +a. MAP packet v1 (data / control) -MAP header has the same endianness of the IP packet. +MAP header fields are in big endian format. Packet format:: - Bit 0 1 2-7 8 - 15 16 - 31 + Bit 0 1 2-7 8-15 16-31 Function Command / Data Reserved Pad Multiplexer ID Payload length - Bit 32 - x - Function Raw Bytes + + Bit 32-x + Function Raw bytes Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command -or data packet. Control packet is used for transport level flow control. Data +or data packet. Command packet is used for transport level flow control. Data packets are standard IP packets. -Reserved bits are usually zeroed out and to be ignored by receiver. +Reserved bits must be zero when sent and ignored when received. -Padding is number of bytes to be added for 4 byte alignment if required by -hardware. +Padding is the number of bytes to be appended to the payload to +ensure 4 byte alignment. Multiplexer ID is to indicate the PDN on which data has to be sent. Payload length includes the padding length but does not include MAP header length. -b. MAP packet (command specific):: +b. Map packet v4 (data / control) + +MAP header fields are in big endian format. + +Packet format:: + + Bit 0 1 2-7 8-15 16-31 + Function Command / Data Reserved Pad Multiplexer ID Payload length + + Bit 32-(x-33) (x-32)-x + Function Raw bytes Checksum offload header + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Command packet is used for transport level flow control. Data +packets are standard IP packets. + +Reserved bits must be zero when sent and ignored when received. + +Padding is the number of bytes to be appended to the payload to +ensure 4 byte alignment. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +Checksum offload header, has the information about the checksum processing done +by the hardware.Checksum offload header fields are in big endian format. + +Packet format:: + + Bit 0-14 15 16-31 + Function Reserved Valid Checksum start offset + + Bit 31-47 48-64 + Function Checksum length Checksum value + +Reserved bits must be zero when sent and ignored when received. + +Valid bit indicates whether the partial checksum is calculated and is valid. +Set to 1, if its is valid. Set to 0 otherwise. + +Padding is the number of bytes to be appended to the payload to +ensure 4 byte alignment. + +Checksum start offset, Indicates the offset in bytes from the beginning of the +IP header, from which modem computed checksum. + +Checksum length is the Length in bytes starting from CKSUM_START_OFFSET, +over which checksum is computed. + +Checksum value, indicates the checksum computed. + +c. MAP packet v5 (data / control) + +MAP header fields are in big endian format. + +Packet format:: + + Bit 0 1 2-7 8-15 16-31 + Function Command / Data Next header Pad Multiplexer ID Payload length + + Bit 32-x + Function Raw bytes + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Command packet is used for transport level flow control. Data +packets are standard IP packets. + +Next header is used to indicate the presence of another header, currently is +limited to checksum header. + +Padding is the number of bytes to be appended to the payload to +ensure 4 byte alignment. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +d. Checksum offload header v5 + +Checksum offload header fields are in big endian format. + + Bit 0 - 6 7 8-15 16-31 + Function Header Type Next Header Checksum Valid Reserved + +Header Type is to indicate the type of header, this usually is set to CHECKSUM + +Header types += ========================================== +0 Reserved +1 Reserved +2 checksum header + +Checksum Valid is to indicate whether the header checksum is valid. Value of 1 +implies that checksum is calculated on this packet and is valid, value of 0 +indicates that the calculated packet checksum is invalid. + +Reserved bits must be zero when sent and ignored when received. + +e. MAP packet v1/v5 (command specific):: - Bit 0 1 2-7 8 - 15 16 - 31 + Bit 0 1 2-7 8 - 15 16 - 31 Function Command Reserved Pad Multiplexer ID Payload length Bit 32 - 39 40 - 45 46 - 47 48 - 63 Function Command name Reserved Command Type Reserved @@ -74,7 +176,7 @@ Command types 3 is for error during processing of commands = ========================================== -c. Aggregation +f. Aggregation Aggregation is multiple MAP packets (can be data or command) delivered to rmnet in a single linear skb. rmnet will process the individual diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index f8c6469f2bd2..01b2a69b0cb0 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -11,12 +11,12 @@ ENA is a networking interface designed to make good use of modern CPU features and system architectures. The ENA device exposes a lightweight management interface with a -minimal set of memory mapped registers and extendable command set +minimal set of memory mapped registers and extendible command set through an Admin Queue. The driver supports a range of ENA devices, is link-speed independent -(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has -a negotiated and extendable feature set. +(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc), and has +a negotiated and extendible feature set. Some ENA devices support SR-IOV. This driver is used for both the SR-IOV Physical Function (PF) and Virtual Function (VF) devices. @@ -27,9 +27,9 @@ is advertised by the device via the Admin Queue), a dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation, and CPU cacheline optimized data placement. -The ENA driver supports industry standard TCP/IP offload features such -as checksum offload and TCP transmit segmentation offload (TSO). -Receive-side scaling (RSS) is supported for multi-core scaling. +The ENA driver supports industry standard TCP/IP offload features such as +checksum offload. Receive-side scaling (RSS) is supported for multi-core +scaling. The ENA driver and its corresponding devices implement health monitoring mechanisms such as watchdog, enabling the device and driver @@ -38,22 +38,20 @@ debug logs. Some of the ENA devices support a working mode called Low-latency Queue (LLQ), which saves several more microseconds. - ENA Source Code Directory Structure =================================== ================= ====================================================== ena_com.[ch] Management communication layer. This layer is - responsible for the handling all the management - (admin) communication between the device and the - driver. + responsible for the handling all the management + (admin) communication between the device and the + driver. ena_eth_com.[ch] Tx/Rx data path. ena_admin_defs.h Definition of ENA management interface. ena_eth_io_defs.h Definition of ENA data path interface. ena_common_defs.h Common definitions for ena_com layer. ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers. ena_netdev.[ch] Main Linux kernel driver. -ena_syfsfs.[ch] Sysfs files. ena_ethtool.c ethtool callbacks. ena_pci_id_tbl.h Supported device IDs. ================= ====================================================== @@ -69,7 +67,7 @@ ENA management interface is exposed by means of: - Asynchronous Event Notification Queue (AENQ) ENA device MMIO Registers are accessed only during driver -initialization and are not involved in further normal device +initialization and are not used during further normal device operation. AQ is used for submitting management commands, and the @@ -100,28 +98,27 @@ group may have multiple syndromes, as shown below The events are: - ==================== =============== - Group Syndrome - ==================== =============== - Link state change **X** - Fatal error **X** - Notification Suspend traffic - Notification Resume traffic - Keep-Alive **X** - ==================== =============== +==================== =============== +Group Syndrome +==================== =============== +Link state change **X** +Fatal error **X** +Notification Suspend traffic +Notification Resume traffic +Keep-Alive **X** +==================== =============== ACQ and AENQ share the same MSI-X vector. -Keep-Alive is a special mechanism that allows monitoring of the -device's health. The driver maintains a watchdog (WD) handler which, -if fired, logs the current state and statistics then resets and -restarts the ENA device and driver. A Keep-Alive event is delivered by -the device every second. The driver re-arms the WD upon reception of a -Keep-Alive event. A missed Keep-Alive event causes the WD handler to -fire. +Keep-Alive is a special mechanism that allows monitoring the device's health. +A Keep-Alive event is delivered by the device every second. +The driver maintains a watchdog (WD) handler which logs the current state and +statistics. If the keep-alive events aren't delivered as expected the WD resets +the device and the driver. Data Path Interface =================== + I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx SQ correspondingly). Each SQ has a completion queue (CQ) associated with it. @@ -131,26 +128,24 @@ physical memory. The ENA driver supports two Queue Operation modes for Tx SQs: -- Regular mode +- **Regular mode:** + In this mode the Tx SQs reside in the host's memory. The ENA + device fetches the ENA Tx descriptors and packet data from host + memory. - * In this mode the Tx SQs reside in the host's memory. The ENA - device fetches the ENA Tx descriptors and packet data from host - memory. +- **Low Latency Queue (LLQ) mode or "push-mode":** + In this mode the driver pushes the transmit descriptors and the + first 128 bytes of the packet directly to the ENA device memory + space. The rest of the packet payload is fetched by the + device. For this operation mode, the driver uses a dedicated PCI + device memory BAR, which is mapped with write-combine capability. -- Low Latency Queue (LLQ) mode or "push-mode". - - * In this mode the driver pushes the transmit descriptors and the - first 128 bytes of the packet directly to the ENA device memory - space. The rest of the packet payload is fetched by the - device. For this operation mode, the driver uses a dedicated PCI - device memory BAR, which is mapped with write-combine capability. + **Note that** not all ENA devices support LLQ, and this feature is negotiated + with the device upon initialization. If the ENA device does not + support LLQ mode, the driver falls back to the regular mode. The Rx SQs support only the regular mode. -Note: Not all ENA devices support LLQ, and this feature is negotiated - with the device upon initialization. If the ENA device does not - support LLQ mode, the driver falls back to the regular mode. - The driver supports multi-queue for both Tx and Rx. This has various benefits: @@ -165,6 +160,7 @@ benefits: Interrupt Modes =============== + The driver assigns a single MSI-X vector per queue pair (for both Tx and Rx directions). The driver assigns an additional dedicated MSI-X vector for management (for ACQ and AENQ). @@ -190,20 +186,21 @@ unmasked by the driver after NAPI processing is complete. Interrupt Moderation ==================== + ENA driver and device can operate in conventional or adaptive interrupt moderation mode. -In conventional mode the driver instructs device to postpone interrupt +**In conventional mode** the driver instructs device to postpone interrupt posting according to static interrupt delay value. The interrupt delay -value can be configured through ethtool(8). The following ethtool -parameters are supported by the driver: tx-usecs, rx-usecs +value can be configured through `ethtool(8)`. The following `ethtool` +parameters are supported by the driver: ``tx-usecs``, ``rx-usecs`` -In adaptive interrupt moderation mode the interrupt delay value is +**In adaptive interrupt** moderation mode the interrupt delay value is updated by the driver dynamically and adjusted every NAPI cycle according to the traffic nature. -Adaptive coalescing can be switched on/off through ethtool(8) -adaptive_rx on|off parameter. +Adaptive coalescing can be switched on/off through `ethtool(8)`'s +:code:`adaptive_rx on|off` parameter. More information about Adaptive Interrupt Moderation (DIM) can be found in Documentation/networking/net_dim.rst @@ -214,17 +211,10 @@ The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK and can be configured by the ETHTOOL_STUNABLE command of the SIOCETHTOOL ioctl. -SKB -=== -The driver-allocated SKB for frames received from Rx handling using -NAPI context. The allocation method depends on the size of the packet. -If the frame length is larger than rx_copybreak, napi_get_frags() -is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer -content is copied (by CPU) to the SKB, and the buffer is recycled. - Statistics ========== -The user can obtain ENA device and driver statistics using ethtool. + +The user can obtain ENA device and driver statistics using `ethtool`. The driver can collect regular or extended statistics (including per-queue stats) from the device. @@ -232,22 +222,23 @@ In addition the driver logs the stats to syslog upon device reset. MTU === + The driver supports an arbitrarily large MTU with a maximum that is negotiated with the device. The driver configures MTU using the SetFeature command (ENA_ADMIN_MTU property). The user can change MTU -via ip(8) and similar legacy tools. +via `ip(8)` and similar legacy tools. Stateless Offloads ================== + The ENA driver supports: -- TSO over IPv4/IPv6 -- TSO with ECN - IPv4 header checksum offload - TCP/UDP over IPv4/IPv6 checksum offloads RSS === + - The ENA device supports RSS that allows flexible Rx traffic steering. - Toeplitz and CRC32 hash functions are supported. @@ -260,41 +251,42 @@ RSS function delivered in the Rx CQ descriptor is set in the received SKB. - The user can provide a hash key, hash function, and configure the - indirection table through ethtool(8). + indirection table through `ethtool(8)`. DATA PATH ========= + Tx -- -ena_start_xmit() is called by the stack. This function does the following: +:code:`ena_start_xmit()` is called by the stack. This function does the following: -- Maps data buffers (skb->data and frags). -- Populates ena_buf for the push buffer (if the driver and device are - in push mode.) +- Maps data buffers (``skb->data`` and frags). +- Populates ``ena_buf`` for the push buffer (if the driver and device are + in push mode). - Prepares ENA bufs for the remaining frags. -- Allocates a new request ID from the empty req_id ring. The request +- Allocates a new request ID from the empty ``req_id`` ring. The request ID is the index of the packet in the Tx info. This is used for - out-of-order TX completions. + out-of-order Tx completions. - Adds the packet to the proper place in the Tx ring. -- Calls ena_com_prepare_tx(), an ENA communication layer that converts - the ena_bufs to ENA descriptors (and adds meta ENA descriptors as - needed.) +- Calls :code:`ena_com_prepare_tx()`, an ENA communication layer that converts + the ``ena_bufs`` to ENA descriptors (and adds meta ENA descriptors as + needed). * This function also copies the ENA descriptors and the push buffer - to the Device memory space (if in push mode.) + to the Device memory space (if in push mode). -- Writes doorbell to the ENA device. +- Writes a doorbell to the ENA device. - When the ENA device finishes sending the packet, a completion interrupt is raised. - The interrupt handler schedules NAPI. -- The ena_clean_tx_irq() function is called. This function handles the +- The :code:`ena_clean_tx_irq()` function is called. This function handles the completion descriptors generated by the ENA, with a single completion descriptor per completed packet. - * req_id is retrieved from the completion descriptor. The tx_info of - the packet is retrieved via the req_id. The data buffers are - unmapped and req_id is returned to the empty req_id ring. + * ``req_id`` is retrieved from the completion descriptor. The ``tx_info`` of + the packet is retrieved via the ``req_id``. The data buffers are + unmapped and ``req_id`` is returned to the empty ``req_id`` ring. * The function stops when the completion descriptors are completed or the budget is reached. @@ -303,12 +295,11 @@ Rx - When a packet is received from the ENA device. - The interrupt handler schedules NAPI. -- The ena_clean_rx_irq() function is called. This function calls - ena_rx_pkt(), an ENA communication layer function, which returns the - number of descriptors used for a new unhandled packet, and zero if +- The :code:`ena_clean_rx_irq()` function is called. This function calls + :code:`ena_com_rx_pkt()`, an ENA communication layer function, which returns the + number of descriptors used for a new packet, and zero if no new packet is found. -- Then it calls the ena_clean_rx_irq() function. -- ena_eth_rx_skb() checks packet length: +- :code:`ena_rx_skb()` checks packet length: * If the packet is small (len < rx_copybreak), the driver allocates a SKB for the new packet, and copies the packet payload into the @@ -317,9 +308,10 @@ Rx - In this way the original data buffer is not passed to the stack and is reused for future Rx packets. - * Otherwise the function unmaps the Rx buffer, then allocates the - new SKB structure and hooks the Rx buffer to the SKB frags. + * Otherwise the function unmaps the Rx buffer, sets the first + descriptor as `skb`'s linear part and the other descriptors as the + `skb`'s frags. - The new SKB is updated with the necessary information (protocol, - checksum hw verify result, etc.), and then passed to the network - stack, using the NAPI interface function napi_gro_receive(). + checksum hw verify result, etc), and then passed to the network + stack, using the NAPI interface function :code:`napi_gro_receive()`. diff --git a/Documentation/networking/device_drivers/ethernet/google/gve.rst b/Documentation/networking/device_drivers/ethernet/google/gve.rst index 793693cef6e3..6d73ee78f3d7 100644 --- a/Documentation/networking/device_drivers/ethernet/google/gve.rst +++ b/Documentation/networking/device_drivers/ethernet/google/gve.rst @@ -47,13 +47,24 @@ The driver interacts with the device in the following ways: - Transmit and Receive Queues - See description below +Descriptor Formats +------------------ +GVE supports two descriptor formats: GQI and DQO. These two formats have +entirely different descriptors, which will be described below. + Registers --------- -All registers are MMIO and big endian. +All registers are MMIO. The registers are used for initializing and configuring the device as well as querying device status in response to management interrupts. +Endianness +---------- +- Admin Queue messages and registers are all Big Endian. +- GQI descriptors and datapath registers are Big Endian. +- DQO descriptors and datapath registers are Little Endian. + Admin Queue (AQ) ---------------- The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ @@ -97,10 +108,10 @@ the queues associated with that interrupt. The handler for these irqs schedule the napi for that block to run and poll the queues. -Traffic Queues --------------- -gVNIC's queues are composed of a descriptor ring and a buffer and are -assigned to a notification block. +GQI Traffic Queues +------------------ +GQI queues are composed of a descriptor ring and a buffer and are assigned to a +notification block. The descriptor rings are power-of-two-sized ring buffers consisting of fixed-size descriptors. They advance their head pointer using a __be32 @@ -121,3 +132,35 @@ Receive The buffers for receive rings are put into a data ring that is the same length as the descriptor ring and the head and tail pointers advance over the rings together. + +DQO Traffic Queues +------------------ +- Every TX and RX queue is assigned a notification block. + +- TX and RX buffers queues, which send descriptors to the device, use MMIO + doorbells to notify the device of new descriptors. + +- RX and TX completion queues, which receive descriptors from the device, use a + "generation bit" to know when a descriptor was populated by the device. The + driver initializes all bits with the "current generation". The device will + populate received descriptors with the "next generation" which is inverted + from the current generation. When the ring wraps, the current/next generation + are swapped. + +- It's the driver's responsibility to ensure that the RX and TX completion + queues are not overrun. This can be accomplished by limiting the number of + descriptors posted to HW. + +- TX packets have a 16 bit completion_tag and RX buffers have a 16 bit + buffer_id. These will be returned on the TX completion and RX queues + respectively to let the driver know which packet/buffer was completed. + +Transmit +~~~~~~~~ +A packet's buffers are DMA mapped for the device to access before transmission. +After the packet was successfully transmitted, the buffers are unmapped. + +Receive +~~~~~~~ +The driver posts fixed sized buffers to HW on the RX buffer queue. The packet +received on the associated RX queue may span multiple descriptors. diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst index 936a10f1942c..ef8cb62e82a1 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst @@ -12,6 +12,7 @@ Contents - `Enabling the driver and kconfig options`_ - `Devlink info`_ - `Devlink parameters`_ +- `Bridge offload`_ - `mlx5 subfunction`_ - `mlx5 function attributes`_ - `Devlink health reporters`_ @@ -217,6 +218,37 @@ users try to enable them. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev +Bridge offload +============== +The mlx5 driver implements support for offloading bridge rules when in switchdev +mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev +representor is attached to bridge. + +- Change device to switchdev mode:: + + $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev + +- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: + + $ ip link set enp8s0f0 master bridge1 + +VLANs +----- +Following bridge VLAN functions are supported by mlx5: + +- VLAN filtering (including multiple VLANs per port):: + + $ ip link set bridge1 type bridge vlan_filtering 1 + $ bridge vlan add dev enp8s0f0 vid 2-3 + +- VLAN push on bridge ingress:: + + $ bridge vlan add dev enp8s0f0 vid 3 pvid + +- VLAN pop on bridge egress:: + + $ bridge vlan add dev enp8s0f0 vid 3 untagged + mlx5 subfunction ================ mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. @@ -568,3 +600,59 @@ tc and eswitch offloads tracepoints: $ cat /sys/kernel/debug/tracing/trace ... kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1 + +Bridge offloads tracepoints: + +- mlx5_esw_bridge_fdb_entry_init: trace bridge FDB entry offloaded to mlx5:: + + $ echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + kworker/u20:9-2217 [003] ...1 318.582243: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=0 flags=0 used=0 + +- mlx5_esw_bridge_fdb_entry_cleanup: trace bridge FDB entry deleted from mlx5:: + + $ echo mlx5:mlx5_esw_bridge_fdb_entry_cleanup >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + ip-2581 [005] ...1 318.629871: mlx5_esw_bridge_fdb_entry_cleanup: net_device=enp8s0f0_1 addr=e4:fd:05:08:00:03 vid=0 flags=0 used=16 + +- mlx5_esw_bridge_fdb_entry_refresh: trace bridge FDB entry offload refreshed in + mlx5:: + + $ echo mlx5:mlx5_esw_bridge_fdb_entry_refresh >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + kworker/u20:8-3849 [003] ...1 466716: mlx5_esw_bridge_fdb_entry_refresh: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 used=0 + +- mlx5_esw_bridge_vlan_create: trace bridge VLAN object add on mlx5 + representor:: + + $ echo mlx5:mlx5_esw_bridge_vlan_create >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + ip-2560 [007] ...1 318.460258: mlx5_esw_bridge_vlan_create: vid=1 flags=6 + +- mlx5_esw_bridge_vlan_cleanup: trace bridge VLAN object delete from mlx5 + representor:: + + $ echo mlx5:mlx5_esw_bridge_vlan_cleanup >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + bridge-2582 [007] ...1 318.653496: mlx5_esw_bridge_vlan_cleanup: vid=2 flags=8 + +- mlx5_esw_bridge_vport_init: trace mlx5 vport assigned with bridge upper + device:: + + $ echo mlx5:mlx5_esw_bridge_vport_init >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + ip-2560 [007] ...1 318.458915: mlx5_esw_bridge_vport_init: vport_num=1 + +- mlx5_esw_bridge_vport_cleanup: trace mlx5 vport removed from bridge upper + device:: + + $ echo mlx5:mlx5_esw_bridge_vport_cleanup >> set_event + $ cat /sys/kernel/debug/tracing/trace + ... + ip-5387 [000] ...1 573713: mlx5_esw_bridge_vport_cleanup: vport_num=1 diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst index d8279de7bf25..3a5a1d46e77e 100644 --- a/Documentation/networking/device_drivers/index.rst +++ b/Documentation/networking/device_drivers/index.rst @@ -18,6 +18,7 @@ Contents: qlogic/index wan/index wifi/index + wwan/index .. only:: subproject and html diff --git a/Documentation/networking/device_drivers/wwan/index.rst b/Documentation/networking/device_drivers/wwan/index.rst new file mode 100644 index 000000000000..1cb8c7371401 --- /dev/null +++ b/Documentation/networking/device_drivers/wwan/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +WWAN Device Drivers +=================== + +Contents: + +.. toctree:: + :maxdepth: 2 + + iosm + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/networking/device_drivers/wwan/iosm.rst b/Documentation/networking/device_drivers/wwan/iosm.rst new file mode 100644 index 000000000000..aceb0223eb46 --- /dev/null +++ b/Documentation/networking/device_drivers/wwan/iosm.rst @@ -0,0 +1,96 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +.. Copyright (C) 2020-21 Intel Corporation + +.. _iosm_driver_doc: + +=========================================== +IOSM Driver for Intel M.2 PCIe based Modems +=========================================== +The IOSM (IPC over Shared Memory) driver is a WWAN PCIe host driver developed +for linux or chrome platform for data exchange over PCIe interface between +Host platform & Intel M.2 Modem. The driver exposes interface conforming to the +MBIM protocol [1]. Any front end application ( eg: Modem Manager) could easily +manage the MBIM interface to enable data communication towards WWAN. + +Basic usage +=========== +MBIM functions are inactive when unmanaged. The IOSM driver only provides a +userspace interface MBIM "WWAN PORT" representing MBIM control channel and does +not play any role in managing the functionality. It is the job of a userspace +application to detect port enumeration and enable MBIM functionality. + +Examples of few such userspace application are: +- mbimcli (included with the libmbim [2] library), and +- Modem Manager [3] + +Management Applications to carry out below required actions for establishing +MBIM IP session: +- open the MBIM control channel +- configure network connection settings +- connect to network +- configure IP network interface + +Management application development +================================== +The driver and userspace interfaces are described below. The MBIM protocol is +described in [1] Mobile Broadband Interface Model v1.0 Errata-1. + +MBIM control channel userspace ABI +---------------------------------- + +/dev/wwan0mbim0 character device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver exposes an MBIM interface to the MBIM function by implementing +MBIM WWAN Port. The userspace end of the control channel pipe is a +/dev/wwan0mbim0 character device. Application shall use this interface for +MBIM protocol communication. + +Fragmentation +~~~~~~~~~~~~~ +The userspace application is responsible for all control message fragmentation +and defragmentation as per MBIM specification. + +/dev/wwan0mbim0 write() +~~~~~~~~~~~~~~~~~~~~~~~ +The MBIM control messages from the management application must not exceed the +negotiated control message size. + +/dev/wwan0mbim0 read() +~~~~~~~~~~~~~~~~~~~~~~ +The management application must accept control messages of up the negotiated +control message size. + +MBIM data channel userspace ABI +------------------------------- + +wwan0-X network device +~~~~~~~~~~~~~~~~~~~~~~ +The IOSM driver exposes IP link interface "wwan0-X" of type "wwan" for IP +traffic. Iproute network utility is used for creating "wwan0-X" network +interface and for associating it with MBIM IP session. The Driver supports +upto 8 IP sessions for simultaneous IP communication. + +The userspace management application is responsible for creating new IP link +prior to establishing MBIM IP session where the SessionId is greater than 0. + +For example, creating new IP link for a MBIM IP session with SessionId 1: + + ip link add dev wwan0-1 parentdev-name wwan0 type wwan linkid 1 + +The driver will automatically map the "wwan0-1" network device to MBIM IP +session 1. + +References +========== +[1] "MBIM (Mobile Broadband Interface Model) Errata-1" + - https://www.usb.org/document-library/ + +[2] libmbim - "a glib-based library for talking to WWAN modems and + devices which speak the Mobile Interface Broadband Model (MBIM) + protocol" + - http://www.freedesktop.org/wiki/Software/libmbim/ + +[3] Modem Manager - "a DBus-activated daemon which controls mobile + broadband (2G/3G/4G) devices and connections" + - http://www.freedesktop.org/wiki/Software/ModemManager/ |