summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/bonding.rst11
-rw-r--r--Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst1
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst16
-rw-r--r--Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst60
-rw-r--r--Documentation/networking/devlink/bnxt.rst2
-rw-r--r--Documentation/networking/devlink/devlink-region.rst4
-rw-r--r--Documentation/networking/devlink/ice.rst4
-rw-r--r--Documentation/networking/devlink/index.rst2
-rw-r--r--Documentation/networking/devlink/iosm.rst162
-rw-r--r--Documentation/networking/devlink/octeontx2.rst42
-rw-r--r--Documentation/networking/ethtool-netlink.rst81
-rw-r--r--Documentation/networking/ip-sysctl.rst38
-rw-r--r--Documentation/networking/ipvs-sysctl.rst14
-rw-r--r--Documentation/networking/mctp.rst59
-rw-r--r--Documentation/networking/msg_zerocopy.rst2
-rw-r--r--Documentation/networking/timestamping.rst8
16 files changed, 476 insertions, 30 deletions
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 31cfd7d674a6..c0a789b00806 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -196,11 +196,12 @@ ad_actor_sys_prio
ad_actor_system
In an AD system, this specifies the mac-address for the actor in
- protocol packet exchanges (LACPDUs). The value cannot be NULL or
- multicast. It is preferred to have the local-admin bit set for this
- mac but driver does not enforce it. If the value is not given then
- system defaults to using the masters' mac address as actors' system
- address.
+ protocol packet exchanges (LACPDUs). The value cannot be a multicast
+ address. If the all-zeroes MAC is specified, bonding will internally
+ use the MAC of the bond itself. It is preferred to have the
+ local-admin bit set for this mac but driver does not enforce it. If
+ the value is not given then system defaults to using the masters'
+ mac address as actors' system address.
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
diff --git a/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst b/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
index d638b5a8aadd..199647729251 100644
--- a/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
+++ b/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
@@ -183,6 +183,7 @@ PHY and allows physical transmission and reception of Ethernet frames.
IRQ config, enable, reset
DPNI (Datapath Network Interface)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contains TX/RX queues, network interface configuration, and RX buffer pool
configuration mechanisms. The TX/RX queues are in memory and are identified
by queue number.
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst b/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
index f1d5233e5e51..0a233b17c664 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
@@ -440,6 +440,22 @@ NOTE: For 82599-based network connections, if you are enabling jumbo frames in
a virtual function (VF), jumbo frames must first be enabled in the physical
function (PF). The VF MTU setting cannot be larger than the PF MTU.
+NBASE-T Support
+---------------
+The ixgbe driver supports NBASE-T on some devices. However, the advertisement
+of NBASE-T speeds is suppressed by default, to accommodate broken network
+switches which cannot cope with advertised NBASE-T speeds. Use the ethtool
+command to enable advertising NBASE-T speeds on devices which support it::
+
+ ethtool -s eth? advertise 0x1800000001028
+
+On Linux systems with INTERFACES(5), this can be specified as a pre-up command
+in /etc/network/interfaces so that the interface is always brought up with
+NBASE-T support, e.g.::
+
+ iface eth? inet dhcp
+ pre-up ethtool -s eth? advertise 0x1800000001028 || true
+
Generic Receive Offload, aka GRO
--------------------------------
The driver supports the in-kernel software implementation of GRO. GRO has
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index 4b59cf2c599f..5edf50d7dbd5 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -543,6 +543,8 @@ The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
+On firmware error, the health buffer is dumped into the dmesg. The log
+level is derived from the error's severity (given in health buffer).
User commands examples:
@@ -700,3 +702,61 @@ Eswitch QoS tracepoints:
$ cat /sys/kernel/debug/tracing/trace
...
<...>-27418 [006] .... 76547.187258: mlx5_esw_group_qos_destroy: (0000:82:00.0) group=000000007b576bb3 tsar_ix=1
+
+SF tracepoints:
+
+- mlx5_sf_add: trace addition of the SF port::
+
+ $ echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ devlink-9363 [031] ..... 24610.188722: mlx5_sf_add: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 sfnum=88
+
+- mlx5_sf_free: trace freeing of the SF port::
+
+ $ echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ devlink-9830 [038] ..... 26300.404749: mlx5_sf_free: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000
+
+- mlx5_sf_hwc_alloc: trace allocating of the hardware SF context::
+
+ $ echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ devlink-9775 [031] ..... 26296.385259: mlx5_sf_hwc_alloc: (0000:06:00.0) controller=0 hw_id=0x8000 sfnum=88
+
+- mlx5_sf_hwc_free: trace freeing of the hardware SF context::
+
+ $ echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u128:3-9093 [046] ..... 24625.365771: mlx5_sf_hwc_free: (0000:06:00.0) hw_id=0x8000
+
+- mlx5_sf_hwc_deferred_free : trace deferred freeing of the hardware SF context::
+
+ $ echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ devlink-9519 [046] ..... 24624.400271: mlx5_sf_hwc_deferred_free: (0000:06:00.0) hw_id=0x8000
+
+- mlx5_sf_vhca_event: trace SF vhca event and state::
+
+ $ echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u128:3-9093 [046] ..... 24625.365525: mlx5_sf_vhca_event: (0000:06:00.0) hw_id=0x8000 sfnum=88 vhca_state=1
+
+- mlx5_sf_dev_add : trace SF device add event::
+
+ $ echo mlx5:mlx5_sf_dev_add>> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u128:3-9093 [000] ..... 24616.524495: mlx5_sf_dev_add: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
+
+- mlx5_sf_dev_del : trace SF device delete event::
+
+ $ echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u128:3-9093 [044] ..... 24624.400749: mlx5_sf_dev_del: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
diff --git a/Documentation/networking/devlink/bnxt.rst b/Documentation/networking/devlink/bnxt.rst
index 3dfd84ccb1c7..a4fb27663cd6 100644
--- a/Documentation/networking/devlink/bnxt.rst
+++ b/Documentation/networking/devlink/bnxt.rst
@@ -22,6 +22,8 @@ Parameters
- Permanent
* - ``msix_vec_per_pf_min``
- Permanent
+ * - ``enable_remote_dev_reset``
+ - Runtime
The ``bnxt`` driver also implements the following driver-specific
parameters.
diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 58fe95e9a49d..f06dca9a1eb6 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -44,8 +44,8 @@ example usage
# Show all of the exposed regions with region sizes:
$ devlink region show
- pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
- pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]
+ pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] max 8
+ pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] max 8
# Delete a snapshot using:
$ devlink region del pci/0000:00:05.0/cr-space snapshot 1
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 5d97cee9457b..59c78e9717d2 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -142,6 +142,10 @@ Users can request an immediate capture of a snapshot via the
.. code:: shell
+ $ devlink region show
+ pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1
+ pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10
+
$ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1
$ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 45b5f8b341df..443123772f44 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -47,3 +47,5 @@ parameters, info versions, and other features it supports.
ti-cpsw-switch
am65-nuss-cpsw-switch
prestera
+ iosm
+ octeontx2
diff --git a/Documentation/networking/devlink/iosm.rst b/Documentation/networking/devlink/iosm.rst
new file mode 100644
index 000000000000..6136181339aa
--- /dev/null
+++ b/Documentation/networking/devlink/iosm.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+iosm devlink support
+====================
+
+This document describes the devlink features implemented by the ``iosm``
+device driver.
+
+Parameters
+==========
+
+The ``iosm`` driver implements the following driver-specific parameters.
+
+.. list-table:: Driver-specific parameters implemented
+ :widths: 5 5 5 85
+
+ * - Name
+ - Type
+ - Mode
+ - Description
+ * - ``erase_full_flash``
+ - u8
+ - runtime
+ - erase_full_flash parameter is used to check if full erase is required for
+ the device during firmware flashing.
+ If set, Full nand erase command will be sent to the device. By default,
+ only conditional erase support is enabled.
+
+
+Flash Update
+============
+
+The ``iosm`` driver implements support for flash update using the
+``devlink-flash`` interface.
+
+It supports updating the device flash using a combined flash image which contains
+the Bootloader images and other modem software images.
+
+The driver uses DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT to identify type of
+firmware image that need to be flashed as requested by user space application.
+Supported firmware image types.
+
+.. list-table:: Firmware Image types
+ :widths: 15 85
+
+ * - Name
+ - Description
+ * - ``PSI RAM``
+ - Primary Signed Image
+ * - ``EBL``
+ - External Bootloader
+ * - ``FLS``
+ - Modem Software Image
+
+PSI RAM and EBL are the RAM images which are injected to the device when the
+device is in BOOT ROM stage. Once this is successful, the actual modem firmware
+image is flashed to the device. The modem software image contains multiple files
+each having one secure bin file and at least one Loadmap/Region file. For flashing
+these files, appropriate commands are sent to the modem device along with the
+data required for flashing. The data like region count and address of each region
+has to be passed to the driver using the devlink param command.
+
+If the device has to be fully erased before firmware flashing, user application
+need to set the erase_full_flash parameter using devlink param command.
+By default, conditional erase feature is supported.
+
+Flash Commands:
+===============
+1) When modem is in Boot ROM stage, user can use below command to inject PSI RAM
+image using devlink flash command.
+
+$ devlink dev flash pci/0000:02:00.0 file <PSI_RAM_File_name>
+
+2) If user want to do a full erase, below command need to be issued to set the
+erase full flash param (To be set only if full erase required).
+
+$ devlink dev param set pci/0000:02:00.0 name erase_full_flash value true cmode runtime
+
+3) Inject EBL after the modem is in PSI stage.
+
+$ devlink dev flash pci/0000:02:00.0 file <EBL_File_name>
+
+4) Once EBL is injected successfully, then the actual firmware flashing takes
+place. Below is the sequence of commands used for each of the firmware images.
+
+a) Flash secure bin file.
+
+$ devlink dev flash pci/0000:02:00.0 file <Secure_bin_file_name>
+
+b) Flashing the Loadmap/Region file
+
+$ devlink dev flash pci/0000:02:00.0 file <Load_map_file_name>
+
+Regions
+=======
+
+The ``iosm`` driver supports dumping the coredump logs.
+
+In case a firmware encounters an exception, a snapshot will be taken by the
+driver. Following regions are accessed for device internal data.
+
+.. list-table:: Regions implemented
+ :widths: 15 85
+
+ * - Name
+ - Description
+ * - ``report.json``
+ - The summary of exception details logged as part of this region.
+ * - ``coredump.fcd``
+ - This region contains the details related to the exception occurred in the
+ device (RAM dump).
+ * - ``cdd.log``
+ - This region contains the logs related to the modem CDD driver.
+ * - ``eeprom.bin``
+ - This region contains the eeprom logs.
+ * - ``bootcore_trace.bin``
+ - This region contains the current instance of bootloader logs.
+ * - ``bootcore_prev_trace.bin``
+ - This region contains the previous instance of bootloader logs.
+
+
+Region commands
+===============
+
+$ devlink region show
+
+$ devlink region new pci/0000:02:00.0/report.json
+
+$ devlink region dump pci/0000:02:00.0/report.json snapshot 0
+
+$ devlink region del pci/0000:02:00.0/report.json snapshot 0
+
+$ devlink region new pci/0000:02:00.0/coredump.fcd
+
+$ devlink region dump pci/0000:02:00.0/coredump.fcd snapshot 1
+
+$ devlink region del pci/0000:02:00.0/coredump.fcd snapshot 1
+
+$ devlink region new pci/0000:02:00.0/cdd.log
+
+$ devlink region dump pci/0000:02:00.0/cdd.log snapshot 2
+
+$ devlink region del pci/0000:02:00.0/cdd.log snapshot 2
+
+$ devlink region new pci/0000:02:00.0/eeprom.bin
+
+$ devlink region dump pci/0000:02:00.0/eeprom.bin snapshot 3
+
+$ devlink region del pci/0000:02:00.0/eeprom.bin snapshot 3
+
+$ devlink region new pci/0000:02:00.0/bootcore_trace.bin
+
+$ devlink region dump pci/0000:02:00.0/bootcore_trace.bin snapshot 4
+
+$ devlink region del pci/0000:02:00.0/bootcore_trace.bin snapshot 4
+
+$ devlink region new pci/0000:02:00.0/bootcore_prev_trace.bin
+
+$ devlink region dump pci/0000:02:00.0/bootcore_prev_trace.bin snapshot 5
+
+$ devlink region del pci/0000:02:00.0/bootcore_prev_trace.bin snapshot 5
diff --git a/Documentation/networking/devlink/octeontx2.rst b/Documentation/networking/devlink/octeontx2.rst
new file mode 100644
index 000000000000..610de99b728a
--- /dev/null
+++ b/Documentation/networking/devlink/octeontx2.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+octeontx2 devlink support
+=========================
+
+This document describes the devlink features implemented by the ``octeontx2 AF, PF and VF``
+device drivers.
+
+Parameters
+==========
+
+The ``octeontx2 PF and VF`` drivers implement the following driver-specific parameters.
+
+.. list-table:: Driver-specific parameters implemented
+ :widths: 5 5 5 85
+
+ * - Name
+ - Type
+ - Mode
+ - Description
+ * - ``mcam_count``
+ - u16
+ - runtime
+ - Select number of match CAM entries to be allocated for an interface.
+ The same is used for ntuple filters of the interface. Supported by
+ PF and VF drivers.
+
+The ``octeontx2 AF`` driver implements the following driver-specific parameters.
+
+.. list-table:: Driver-specific parameters implemented
+ :widths: 5 5 5 85
+
+ * - Name
+ - Type
+ - Mode
+ - Description
+ * - ``dwrr_mtu``
+ - u32
+ - runtime
+ - Use to set the quantum which hardware uses for scheduling among transmit queues.
+ Hardware uses weighted DWRR algorithm to schedule among all transmit queues.
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index d9b55b7a1a4d..7b598c7e3912 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -41,6 +41,11 @@ In the message structure descriptions below, if an attribute name is suffixed
with "+", parent nest can contain multiple attributes of the same type. This
implements an array of entries.
+Attributes that need to be filled-in by device drivers and that are dumped to
+user space based on whether they are valid or not should not use zero as a
+valid value. This avoids the need to explicitly signal the validity of the
+attribute in the device driver API.
+
Request header
==============
@@ -179,7 +184,7 @@ according to message purpose:
Userspace to kernel:
- ===================================== ================================
+ ===================================== =================================
``ETHTOOL_MSG_STRSET_GET`` get string set
``ETHTOOL_MSG_LINKINFO_GET`` get link settings
``ETHTOOL_MSG_LINKINFO_SET`` set link settings
@@ -213,7 +218,9 @@ Userspace to kernel:
``ETHTOOL_MSG_MODULE_EEPROM_GET`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET`` get standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
- ===================================== ================================
+ ``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
+ ``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
+ ===================================== =================================
Kernel to userspace:
@@ -252,6 +259,7 @@ Kernel to userspace:
``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
+ ``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
======================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -520,6 +528,8 @@ Link extended states:
power required from cable or module
``ETHTOOL_LINK_EXT_STATE_OVERHEAT`` The module is overheated
+
+ ``ETHTOOL_LINK_EXT_STATE_MODULE`` Transceiver module issue
================================================ ============================================
Link extended substates:
@@ -613,6 +623,14 @@ Link extended substates:
``ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE`` Cable test failure
=================================================== ============================================
+ Transceiver module issue substates:
+
+ =================================================== ============================================
+ ``ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY`` The CMIS Module State Machine did not reach
+ the ModuleReady state. For example, if the
+ module is stuck at ModuleFault state
+ =================================================== ============================================
+
DEBUG_GET
=========
@@ -1521,6 +1539,63 @@ Kernel response contents:
``ETHTOOL_A_PHC_VCLOCKS_INDEX`` s32 PHC index array
==================================== ====== ==========================
+MODULE_GET
+==========
+
+Gets transceiver module parameters.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_MODULE_HEADER`` nested request header
+ ===================================== ====== ==========================
+
+Kernel response contents:
+
+ ====================================== ====== ==========================
+ ``ETHTOOL_A_MODULE_HEADER`` nested reply header
+ ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
+ ``ETHTOOL_A_MODULE_POWER_MODE`` u8 operational power mode
+ ====================================== ====== ==========================
+
+The optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute encodes the
+transceiver module power mode policy enforced by the host. The default policy
+is driver-dependent, but "auto" is the recommended default and it should be
+implemented by new drivers and drivers where conformance to a legacy behavior
+is not critical.
+
+The optional ``ETHTHOOL_A_MODULE_POWER_MODE`` attribute encodes the operational
+power mode policy of the transceiver module. It is only reported when a module
+is plugged-in. Possible values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_module_power_mode
+
+MODULE_SET
+==========
+
+Sets transceiver module parameters.
+
+Request contents:
+
+ ====================================== ====== ==========================
+ ``ETHTOOL_A_MODULE_HEADER`` nested request header
+ ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
+ ====================================== ====== ==========================
+
+When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
+to set the transceiver module power policy enforced by the host. Possible
+values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_module_power_mode_policy
+
+For SFF-8636 modules, low power mode is forced by the host according to table
+6-10 in revision 2.10a of the specification.
+
+For CMIS modules, low power mode is forced by the host according to table 6-12
+in revision 5.0 of the specification.
+
Request translation
===================
@@ -1620,4 +1695,6 @@ are netlink only.
n/a ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
n/a ``ETHTOOL_MSG_TUNNEL_INFO_GET``
n/a ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
+ n/a ``ETHTOOL_MSG_MODULE_GET``
+ n/a ``ETHTOOL_MSG_MODULE_SET``
=================================== =====================================
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index d91ab28718d4..2572eecc3e86 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -25,7 +25,8 @@ ip_default_ttl - INTEGER
ip_no_pmtu_disc - INTEGER
Disable Path MTU Discovery. If enabled in mode 1 and a
fragmentation-required ICMP is received, the PMTU to this
- destination will be set to min_pmtu (see below). You will need
+ destination will be set to the smallest of the old MTU to
+ this destination and min_pmtu (see below). You will need
to raise min_pmtu to the smallest interface MTU on your system
manually if you want to avoid locally generated fragments.
@@ -49,7 +50,8 @@ ip_no_pmtu_disc - INTEGER
Default: FALSE
min_pmtu - INTEGER
- default 552 - minimum discovered Path MTU
+ default 552 - minimum Path MTU. Unless this is changed mannually,
+ each cached pmtu will never be lower than this setting.
ip_forward_use_pmtu - BOOLEAN
By default we don't trust protocol path MTUs while forwarding
@@ -989,14 +991,6 @@ tcp_challenge_ack_limit - INTEGER
in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
Default: 1000
-tcp_rx_skb_cache - BOOLEAN
- Controls a per TCP socket cache of one skb, that might help
- performance of some workloads. This might be dangerous
- on systems with a lot of TCP sockets, since it increases
- memory usage.
-
- Default: 0 (disabled)
-
UDP variables
=============
@@ -1012,13 +1006,11 @@ udp_l3mdev_accept - BOOLEAN
udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets.
- min: Below this number of pages UDP is not bothered about its
- memory appetite. When amount of memory allocated by UDP exceeds
- this number, UDP starts to moderate memory usage.
+ min: Number of pages allowed for queueing by all UDP sockets.
pressure: This value was introduced to follow format of tcp_mem.
- max: Number of pages allowed for queueing by all UDP sockets.
+ max: This value was introduced to follow format of tcp_mem.
Default is calculated at boot time from amount of available memory.
@@ -1619,6 +1611,15 @@ arp_accept - BOOLEAN
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.
+arp_evict_nocarrier - BOOLEAN
+ Clears the ARP cache on NOCARRIER events. This option is important for
+ wireless devices where the ARP cache should not be cleared when roaming
+ between access points on the same network. In most cases this should
+ remain as the default (1).
+
+ - 1 - (default): Clear the ARP cache on NOCARRIER events
+ - 0 - Do not clear ARP cache on NOCARRIER events
+
mcast_solicit - INTEGER
The maximum number of multicast probes in INCOMPLETE state,
when the associated hardware address is unknown. Defaults
@@ -2349,6 +2350,15 @@ ndisc_tclass - INTEGER
* 0 - (default)
+ndisc_evict_nocarrier - BOOLEAN
+ Clears the neighbor discovery table on NOCARRIER events. This option is
+ important for wireless devices where the neighbor discovery cache should
+ not be cleared when roaming between access points on the same network.
+ In most cases this should remain as the default (1).
+
+ - 1 - (default): Clear neighbor discover cache on NOCARRIER events.
+ - 0 - Do not clear neighbor discovery cache on NOCARRIER events.
+
mldv1_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv1 report retransmit will take place.
diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
index 2afccc63856e..387fda80f05f 100644
--- a/Documentation/networking/ipvs-sysctl.rst
+++ b/Documentation/networking/ipvs-sysctl.rst
@@ -37,8 +37,7 @@ conn_reuse_mode - INTEGER
0: disable any special handling on port reuse. The new
connection will be delivered to the same real server that was
- servicing the previous connection. This will effectively
- disable expire_nodest_conn.
+ servicing the previous connection.
bit 1: enable rescheduling of new connections when it is safe.
That is, whenever expire_nodest_conn and for TCP sockets, when
@@ -300,3 +299,14 @@ sync_version - INTEGER
Kernels with this sync_version entry are able to receive messages
of both version 1 and version 2 of the synchronisation protocol.
+
+run_estimation - BOOLEAN
+ 0 - disabled
+ not 0 - enabled (default)
+
+ If disabled, the estimation will be stop, and you can't see
+ any update on speed estimation data.
+
+ You can always re-enable estimation by setting this value to 1.
+ But be careful, the first estimation after re-enable is not
+ accurate.
diff --git a/Documentation/networking/mctp.rst b/Documentation/networking/mctp.rst
index fa7730dbf7b9..46f74bffce0f 100644
--- a/Documentation/networking/mctp.rst
+++ b/Documentation/networking/mctp.rst
@@ -211,3 +211,62 @@ remote address is already known, or the message does not require a reply.
Like the send calls, sockets will only receive responses to requests they have
sent (TO=1) and may only respond (TO=0) to requests they have received.
+
+Kernel internals
+================
+
+There are a few possible packet flows in the MCTP stack:
+
+1. local TX to remote endpoint, message <= MTU::
+
+ sendmsg()
+ -> mctp_local_output()
+ : route lookup
+ -> rt->output() (== mctp_route_output)
+ -> dev_queue_xmit()
+
+2. local TX to remote endpoint, message > MTU::
+
+ sendmsg()
+ -> mctp_local_output()
+ -> mctp_do_fragment_route()
+ : creates packet-sized skbs. For each new skb:
+ -> rt->output() (== mctp_route_output)
+ -> dev_queue_xmit()
+
+3. remote TX to local endpoint, single-packet message::
+
+ mctp_pkttype_receive()
+ : route lookup
+ -> rt->output() (== mctp_route_input)
+ : sk_key lookup
+ -> sock_queue_rcv_skb()
+
+4. remote TX to local endpoint, multiple-packet message::
+
+ mctp_pkttype_receive()
+ : route lookup
+ -> rt->output() (== mctp_route_input)
+ : sk_key lookup
+ : stores skb in struct sk_key->reasm_head
+
+ mctp_pkttype_receive()
+ : route lookup
+ -> rt->output() (== mctp_route_input)
+ : sk_key lookup
+ : finds existing reassembly in sk_key->reasm_head
+ : appends new fragment
+ -> sock_queue_rcv_skb()
+
+Key refcounts
+-------------
+
+ * keys are refed by:
+
+ - a skb: during route output, stored in ``skb->cb``.
+
+ - netns and sock lists.
+
+ * keys can be associated with a device, in which case they hold a
+ reference to the dev (set through ``key->dev``, counted through
+ ``dev->key_count``). Multiple keys can reference the device.
diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst
index ace56204dd03..15920db8d35d 100644
--- a/Documentation/networking/msg_zerocopy.rst
+++ b/Documentation/networking/msg_zerocopy.rst
@@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code.
patchset
[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY
- https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
+ https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
Interface
diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index a722eb30e014..f5809206eb93 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -486,8 +486,8 @@ of packets.
Drivers are free to use a more permissive configuration than the requested
configuration. It is expected that drivers should only implement directly the
most generic mode that can be supported. For example if the hardware can
-support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale
-HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT
+support HWTSTAMP_FILTER_PTP_V2_EVENT, then it should generally always upscale
+HWTSTAMP_FILTER_PTP_V2_L2_SYNC, and so forth, as HWTSTAMP_FILTER_PTP_V2_EVENT
is more generic (and more useful to applications).
A driver which supports hardware time stamping shall update the struct
@@ -582,8 +582,8 @@ Time stamps for outgoing packets are to be generated as follows:
and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set).
- As soon as the driver has sent the packet and/or obtained a
hardware time stamp for it, it passes the time stamp back by
- calling skb_hwtstamp_tx() with the original skb, the raw
- hardware time stamp. skb_hwtstamp_tx() clones the original skb and
+ calling skb_tstamp_tx() with the original skb, the raw
+ hardware time stamp. skb_tstamp_tx() clones the original skb and
adds the timestamps, therefore the original skb has to be freed now.
If obtaining the hardware time stamp somehow fails, then the driver
should not fall back to software time stamping. The rationale is that