summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2009-06-15 18:40:05 +0200
committerLinus Torvalds <torvalds@linux-foundation.org>2009-06-15 18:40:05 +0200
commit2ed0e21b30b53d3a94e204196e523e6c8f732b56 (patch)
treede2635426477d86338a9469ce09ba0626052288f /Documentation/networking
parentMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/po... (diff)
parentMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds... (diff)
downloadlinux-2ed0e21b30b53d3a94e204196e523e6c8f732b56.tar.xz
linux-2ed0e21b30b53d3a94e204196e523e6c8f732b56.zip
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits) pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US ipv4: Fix fib_trie rebalancing Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver Bluetooth: Fix Kconfig issue with RFKILL integration PIM-SM: namespace changes ipv4: update ARPD help text net: use a deferred timer in rt_check_expire ieee802154: fix kconfig bool/tristate muckup bonding: initialization rework bonding: use is_zero_ether_addr bonding: network device names are case sensative bonding: elminate bad refcount code bonding: fix style issues bonding: fix destructor bonding: remove bonding read/write semaphore bonding: initialize before registration bonding: bond_create always called with default parameters x_tables: Convert printk to pr_err netfilter: conntrack: optional reliable conntrack event delivery list_nulls: add hlist_nulls_add_head and hlist_nulls_del ...
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/can.txt235
-rw-r--r--Documentation/networking/ieee802154.txt76
-rw-r--r--Documentation/networking/ip-sysctl.txt18
-rw-r--r--Documentation/networking/ipv6.txt37
-rw-r--r--Documentation/networking/mac80211-injection.txt28
-rw-r--r--Documentation/networking/operstates.txt3
-rw-r--r--Documentation/networking/packet_mmap.txt140
7 files changed, 454 insertions, 83 deletions
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index 463d9e029ef3..cd79735013f9 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -36,10 +36,15 @@ This file contains
6.2 local loopback of sent frames
6.3 CAN controller hardware filters
6.4 The virtual CAN driver (vcan)
- 6.5 currently supported CAN hardware
- 6.6 todo
+ 6.5 The CAN network device driver interface
+ 6.5.1 Netlink interface to set/get devices properties
+ 6.5.2 Setting the CAN bit-timing
+ 6.5.3 Starting and stopping the CAN network device
+ 6.6 supported CAN hardware
- 7 Credits
+ 7 Socket CAN resources
+
+ 8 Credits
============================================================================
@@ -234,6 +239,8 @@ solution for a couple of reasons:
the user application using the common CAN filter mechanisms. Inside
this filter definition the (interested) type of errors may be
selected. The reception of error frames is disabled by default.
+ The format of the CAN error frame is briefly decribed in the Linux
+ header file "include/linux/can/error.h".
4. How to use Socket CAN
------------------------
@@ -605,61 +612,213 @@ solution for a couple of reasons:
removal of vcan network devices can be managed with the ip(8) tool:
- Create a virtual CAN network interface:
- ip link add type vcan
+ $ ip link add type vcan
- Create a virtual CAN network interface with a specific name 'vcan42':
- ip link add dev vcan42 type vcan
+ $ ip link add dev vcan42 type vcan
- Remove a (virtual CAN) network interface 'vcan42':
- ip link del vcan42
-
- The tool 'vcan' from the SocketCAN SVN repository on BerliOS is obsolete.
-
- Virtual CAN network device creation in older Kernels:
- In Linux Kernel versions < 2.6.24 the vcan driver creates 4 vcan
- netdevices at module load time by default. This value can be changed
- with the module parameter 'numdev'. E.g. 'modprobe vcan numdev=8'
-
- 6.5 currently supported CAN hardware
+ $ ip link del vcan42
+
+ 6.5 The CAN network device driver interface
+
+ The CAN network device driver interface provides a generic interface
+ to setup, configure and monitor CAN network devices. The user can then
+ configure the CAN device, like setting the bit-timing parameters, via
+ the netlink interface using the program "ip" from the "IPROUTE2"
+ utility suite. The following chapter describes briefly how to use it.
+ Furthermore, the interface uses a common data structure and exports a
+ set of common functions, which all real CAN network device drivers
+ should use. Please have a look to the SJA1000 or MSCAN driver to
+ understand how to use them. The name of the module is can-dev.ko.
+
+ 6.5.1 Netlink interface to set/get devices properties
+
+ The CAN device must be configured via netlink interface. The supported
+ netlink message types are defined and briefly described in
+ "include/linux/can/netlink.h". CAN link support for the program "ip"
+ of the IPROUTE2 utility suite is avaiable and it can be used as shown
+ below:
+
+ - Setting CAN device properties:
+
+ $ ip link set can0 type can help
+ Usage: ip link set DEVICE type can
+ [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
+ [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
+ phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
+
+ [ loopback { on | off } ]
+ [ listen-only { on | off } ]
+ [ triple-sampling { on | off } ]
+
+ [ restart-ms TIME-MS ]
+ [ restart ]
+
+ Where: BITRATE := { 1..1000000 }
+ SAMPLE-POINT := { 0.000..0.999 }
+ TQ := { NUMBER }
+ PROP-SEG := { 1..8 }
+ PHASE-SEG1 := { 1..8 }
+ PHASE-SEG2 := { 1..8 }
+ SJW := { 1..4 }
+ RESTART-MS := { 0 | NUMBER }
+
+ - Display CAN device details and statistics:
+
+ $ ip -details -statistics link show can0
+ 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP qlen 10
+ link/can
+ can <TRIPLE-SAMPLING> state ERROR-ACTIVE restart-ms 100
+ bitrate 125000 sample_point 0.875
+ tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
+ sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+ clock 8000000
+ re-started bus-errors arbit-lost error-warn error-pass bus-off
+ 41 17457 0 41 42 41
+ RX: bytes packets errors dropped overrun mcast
+ 140859 17608 17457 0 0 0
+ TX: bytes packets errors dropped carrier collsns
+ 861 112 0 41 0 0
+
+ More info to the above output:
+
+ "<TRIPLE-SAMPLING>"
+ Shows the list of selected CAN controller modes: LOOPBACK,
+ LISTEN-ONLY, or TRIPLE-SAMPLING.
+
+ "state ERROR-ACTIVE"
+ The current state of the CAN controller: "ERROR-ACTIVE",
+ "ERROR-WARNING", "ERROR-PASSIVE", "BUS-OFF" or "STOPPED"
+
+ "restart-ms 100"
+ Automatic restart delay time. If set to a non-zero value, a
+ restart of the CAN controller will be triggered automatically
+ in case of a bus-off condition after the specified delay time
+ in milliseconds. By default it's off.
+
+ "bitrate 125000 sample_point 0.875"
+ Shows the real bit-rate in bits/sec and the sample-point in the
+ range 0.000..0.999. If the calculation of bit-timing parameters
+ is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the
+ bit-timing can be defined by setting the "bitrate" argument.
+ Optionally the "sample-point" can be specified. By default it's
+ 0.000 assuming CIA-recommended sample-points.
+
+ "tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1"
+ Shows the time quanta in ns, propagation segment, phase buffer
+ segment 1 and 2 and the synchronisation jump width in units of
+ tq. They allow to define the CAN bit-timing in a hardware
+ independent format as proposed by the Bosch CAN 2.0 spec (see
+ chapter 8 of http://www.semiconductors.bosch.de/pdf/can2spec.pdf).
+
+ "sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+ clock 8000000"
+ Shows the bit-timing constants of the CAN controller, here the
+ "sja1000". The minimum and maximum values of the time segment 1
+ and 2, the synchronisation jump width in units of tq, the
+ bitrate pre-scaler and the CAN system clock frequency in Hz.
+ These constants could be used for user-defined (non-standard)
+ bit-timing calculation algorithms in user-space.
+
+ "re-started bus-errors arbit-lost error-warn error-pass bus-off"
+ Shows the number of restarts, bus and arbitration lost errors,
+ and the state changes to the error-warning, error-passive and
+ bus-off state. RX overrun errors are listed in the "overrun"
+ field of the standard network statistics.
+
+ 6.5.2 Setting the CAN bit-timing
+
+ The CAN bit-timing parameters can always be defined in a hardware
+ independent format as proposed in the Bosch CAN 2.0 specification
+ specifying the arguments "tq", "prop_seg", "phase_seg1", "phase_seg2"
+ and "sjw":
+
+ $ ip link set canX type can tq 125 prop-seg 6 \
+ phase-seg1 7 phase-seg2 2 sjw 1
+
+ If the kernel option CONFIG_CAN_CALC_BITTIMING is enabled, CIA
+ recommended CAN bit-timing parameters will be calculated if the bit-
+ rate is specified with the argument "bitrate":
+
+ $ ip link set canX type can bitrate 125000
+
+ Note that this works fine for the most common CAN controllers with
+ standard bit-rates but may *fail* for exotic bit-rates or CAN system
+ clock frequencies. Disabling CONFIG_CAN_CALC_BITTIMING saves some
+ space and allows user-space tools to solely determine and set the
+ bit-timing parameters. The CAN controller specific bit-timing
+ constants can be used for that purpose. They are listed by the
+ following command:
+
+ $ ip -details link show can0
+ ...
+ sja1000: clock 8000000 tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
+
+ 6.5.3 Starting and stopping the CAN network device
+
+ A CAN network device is started or stopped as usual with the command
+ "ifconfig canX up/down" or "ip link set canX up/down". Be aware that
+ you *must* define proper bit-timing parameters for real CAN devices
+ before you can start it to avoid error-prone default settings:
+
+ $ ip link set canX up type can bitrate 125000
+
+ A device may enter the "bus-off" state if too much errors occurred on
+ the CAN bus. Then no more messages are received or sent. An automatic
+ bus-off recovery can be enabled by setting the "restart-ms" to a
+ non-zero value, e.g.:
+
+ $ ip link set canX type can restart-ms 100
+
+ Alternatively, the application may realize the "bus-off" condition
+ by monitoring CAN error frames and do a restart when appropriate with
+ the command:
+
+ $ ip link set canX type can restart
+
+ Note that a restart will also create a CAN error frame (see also
+ chapter 3.4).
- On the project website http://developer.berlios.de/projects/socketcan
- there are different drivers available:
+ 6.6 Supported CAN hardware
- vcan: Virtual CAN interface driver (if no real hardware is available)
- sja1000: Philips SJA1000 CAN controller (recommended)
- i82527: Intel i82527 CAN controller
- mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200)
- ccan: CCAN controller core (e.g. inside SOC h7202)
- slcan: For a bunch of CAN adaptors that are attached via a
- serial line ASCII protocol (for serial / USB adaptors)
+ Please check the "Kconfig" file in "drivers/net/can" to get an actual
+ list of the support CAN hardware. On the Socket CAN project website
+ (see chapter 7) there might be further drivers available, also for
+ older kernel versions.
- Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport)
- from PEAK Systemtechnik support the CAN netdevice driver model
- since Linux driver v6.0: http://www.peak-system.com/linux/index.htm
+7. Socket CAN resources
+-----------------------
- Please check the Mailing Lists on the berlios OSS project website.
+ You can find further resources for Socket CAN like user space tools,
+ support for old kernel versions, more drivers, mailing lists, etc.
+ at the BerliOS OSS project website for Socket CAN:
- 6.6 todo
+ http://developer.berlios.de/projects/socketcan
- The configuration interface for CAN network drivers is still an open
- issue that has not been finalized in the socketcan project. Also the
- idea of having a library module (candev.ko) that holds functions
- that are needed by all CAN netdevices is not ready to ship.
- Your contribution is welcome.
+ If you have questions, bug fixes, etc., don't hesitate to post them to
+ the Socketcan-Users mailing list. But please search the archives first.
-7. Credits
+8. Credits
----------
- Oliver Hartkopp (PF_CAN core, filters, drivers, bcm)
+ Oliver Hartkopp (PF_CAN core, filters, drivers, bcm, SJA1000 driver)
Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan)
Jan Kizka (RT-SocketCAN core, Socket-API reconciliation)
- Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews)
+ Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews,
+ CAN device driver interface, MSCAN driver)
Robert Schwebel (design reviews, PTXdist integration)
Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers)
Benedikt Spranger (reviews)
Thomas Gleixner (LKML reviews, coding style, posting hints)
- Andrey Volkov (kernel subtree structure, ioctls, mscan driver)
+ Andrey Volkov (kernel subtree structure, ioctls, MSCAN driver)
Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003)
Klaus Hitschler (PEAK driver integration)
Uwe Koppe (CAN netdevices with PF_PACKET approach)
Michael Schulze (driver layer loopback requirement, RT CAN drivers review)
+ Pavel Pisa (Bit-timing calculation)
+ Sascha Hauer (SJA1000 platform driver)
+ Sebastian Haas (SJA1000 EMS PCI driver)
+ Markus Plessing (SJA1000 EMS PCI driver)
+ Per Dalen (SJA1000 Kvaser PCI driver)
+ Sam Ravnborg (reviews, coding style, kbuild help)
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
new file mode 100644
index 000000000000..a0280ad2edc9
--- /dev/null
+++ b/Documentation/networking/ieee802154.txt
@@ -0,0 +1,76 @@
+
+ Linux IEEE 802.15.4 implementation
+
+
+Introduction
+============
+
+The Linux-ZigBee project goal is to provide complete implementation
+of IEEE 802.15.4 / ZigBee / 6LoWPAN protocols. IEEE 802.15.4 is a stack
+of protocols for organizing Low-Rate Wireless Personal Area Networks.
+
+Currently only IEEE 802.15.4 layer is implemented. We have choosen
+to use plain Berkeley socket API, the generic Linux networking stack
+to transfer IEEE 802.15.4 messages and a special protocol over genetlink
+for configuration/management
+
+
+Socket API
+==========
+
+int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
+.....
+
+The address family, socket addresses etc. are defined in the
+include/net/ieee802154/af_ieee802154.h header or in the special header
+in our userspace package (see either linux-zigbee sourceforge download page
+or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee).
+
+One can use SOCK_RAW for passing raw data towards device xmit function. YMMV.
+
+
+MLME - MAC Level Management
+============================
+
+Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands.
+See the include/net/ieee802154/nl802154.h header. Our userspace tools package
+(see above) provides CLI configuration utility for radio interfaces and simple
+coordinator for IEEE 802.15.4 networks as an example users of MLME protocol.
+
+
+Kernel side
+=============
+
+Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
+1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
+ exports MLME and data API.
+2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
+ possibly with some kinds of acceleration like automatic CRC computation and
+ comparation, automagic ACK handling, address matching, etc.
+
+Those types of devices require different approach to be hooked into Linux kernel.
+
+
+HardMAC
+=======
+
+See the header include/net/ieee802154/netdevice.h. You have to implement Linux
+net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
+code via plain sk_buffs. The control block of sk_buffs will contain additional
+info as described in the struct ieee802154_mac_cb.
+
+To hook the MLME interface you have to populate the ml_priv field of your
+net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are
+required.
+
+We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c
+
+
+SoftMAC
+=======
+
+We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC
+in software. This is currently WIP.
+
+See header include/net/ieee802154/mac802154.h and several drivers in
+drivers/ieee802154/
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b121c5db707f..8be76235fe67 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -168,7 +168,16 @@ tcp_dsack - BOOLEAN
Allows TCP to send "duplicate" SACKs.
tcp_ecn - BOOLEAN
- Enable Explicit Congestion Notification in TCP.
+ Enable Explicit Congestion Notification (ECN) in TCP. ECN is only
+ used when both ends of the TCP flow support it. It is useful to
+ avoid losses due to congestion (when the bottleneck router supports
+ ECN).
+ Possible values are:
+ 0 disable ECN
+ 1 ECN enabled
+ 2 Only server-side ECN enabled. If the other end does
+ not support ECN, behavior is like with ECN disabled.
+ Default: 2
tcp_fack - BOOLEAN
Enable FACK congestion avoidance and fast retransmission.
@@ -1048,6 +1057,13 @@ disable_ipv6 - BOOLEAN
address.
Default: FALSE (enable IPv6 operation)
+ When this value is changed from 1 to 0 (IPv6 is being enabled),
+ it will dynamically create a link-local address on the given
+ interface and start Duplicate Address Detection, if necessary.
+
+ When this value is changed from 0 to 1 (IPv6 is being disabled),
+ it will dynamically delete all address on the given interface.
+
accept_dad - INTEGER
Whether to accept DAD (Duplicate Address Detection).
0: Disable DAD
diff --git a/Documentation/networking/ipv6.txt b/Documentation/networking/ipv6.txt
index 268e5c103dd8..9fd7e21296c8 100644
--- a/Documentation/networking/ipv6.txt
+++ b/Documentation/networking/ipv6.txt
@@ -33,3 +33,40 @@ disable
A reboot is required to enable IPv6.
+autoconf
+
+ Specifies whether to enable IPv6 address autoconfiguration
+ on all interfaces. This might be used when one does not wish
+ for addresses to be automatically generated from prefixes
+ received in Router Advertisements.
+
+ The possible values and their effects are:
+
+ 0
+ IPv6 address autoconfiguration is disabled on all interfaces.
+
+ Only the IPv6 loopback address (::1) and link-local addresses
+ will be added to interfaces.
+
+ 1
+ IPv6 address autoconfiguration is enabled on all interfaces.
+
+ This is the default value.
+
+disable_ipv6
+
+ Specifies whether to disable IPv6 on all interfaces.
+ This might be used when no IPv6 addresses are desired.
+
+ The possible values and their effects are:
+
+ 0
+ IPv6 is enabled on all interfaces.
+
+ This is the default value.
+
+ 1
+ IPv6 is disabled on all interfaces.
+
+ No IPv6 addresses will be added to interfaces.
+
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt
index 84906ef3ed6e..b30e81ad5307 100644
--- a/Documentation/networking/mac80211-injection.txt
+++ b/Documentation/networking/mac80211-injection.txt
@@ -12,38 +12,22 @@ following format:
The radiotap format is discussed in
./Documentation/networking/radiotap-headers.txt.
-Despite 13 radiotap argument types are currently defined, most only make sense
+Despite many radiotap parameters being currently defined, most only make sense
to appear on received packets. The following information is parsed from the
radiotap headers and used to control injection:
- * IEEE80211_RADIOTAP_RATE
-
- rate in 500kbps units, automatic if invalid or not present
-
-
- * IEEE80211_RADIOTAP_ANTENNA
-
- antenna to use, automatic if not present
-
-
- * IEEE80211_RADIOTAP_DBM_TX_POWER
-
- transmit power in dBm, automatic if not present
-
-
* IEEE80211_RADIOTAP_FLAGS
IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated
IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available
IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the
- current fragmentation threshold. Note that
- this flag is only reliable when software
- fragmentation is enabled)
+ current fragmentation threshold.
+
The injection code can also skip all other currently defined radiotap fields
facilitating replay of captured radiotap headers directly.
-Here is an example valid radiotap header defining these three parameters
+Here is an example valid radiotap header defining some parameters
0x00, 0x00, // <-- radiotap version
0x0b, 0x00, // <- radiotap header length
@@ -72,8 +56,8 @@ interface), along the following lines:
...
r = pcap_inject(ppcap, u8aSendBuffer, nLength);
-You can also find sources for a complete inject test applet here:
+You can also find a link to a complete inject application here:
-http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer
+http://wireless.kernel.org/en/users/Documentation/packetspammer
Andy Green <andy@warmcat.com>
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index c9074f9b78bb..1a77a3cfae54 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -38,9 +38,6 @@ ifinfomsg::if_flags & IFF_LOWER_UP:
ifinfomsg::if_flags & IFF_DORMANT:
Driver has signaled netif_dormant_on()
-These interface flags can also be queried without netlink using the
-SIOCGIFFLAGS ioctl.
-
TLV IFLA_OPERSTATE
contains RFC2863 state of the interface in numeric representation:
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 07c53d596035..a22fd85e3796 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -4,16 +4,18 @@
This file documents the CONFIG_PACKET_MMAP option available with the PACKET
socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
-capture network traffic with utilities like tcpdump or any other that uses
-the libpcap library.
-
-You can find the latest version of this document at
+capture network traffic with utilities like tcpdump or any other that needs
+raw access to network interface.
+You can find the latest version of this document at:
http://pusa.uv.es/~ulisses/packet_mmap/
-Please send me your comments to
+Howto can be found at:
+ http://wiki.gnu-log.net (packet_mmap)
+Please send your comments to
Ulisses Alonso CamarĂ³ <uaca@i.hate.spam.alumni.uv.es>
+ Johann Baudy <johann.baudy@gnu-log.net>
-------------------------------------------------------------------------------
+ Why use PACKET_MMAP
@@ -25,19 +27,24 @@ to capture each packet, it requires two if you want to get packet's
timestamp (like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
-configurable circular buffer mapped in user space. This way reading packets just
-needs to wait for them, most of the time there is no need to issue a single
-system call. By using a shared buffer between the kernel and the user
-also has the benefit of minimizing packet copies.
-
-It's fine to use PACKET_MMAP to improve the performance of the capture process,
-but it isn't everything. At least, if you are capturing at high speeds (this
-is relative to the cpu speed), you should check if the device driver of your
-network interface card supports some sort of interrupt load mitigation or
-(even better) if it supports NAPI, also make sure it is enabled.
+configurable circular buffer mapped in user space that can be used to either
+send or receive packets. This way reading packets just needs to wait for them,
+most of the time there is no need to issue a single system call. Concerning
+transmission, multiple packets can be sent through one system call to get the
+highest bandwidth.
+By using a shared buffer between the kernel and the user also has the benefit
+of minimizing packet copies.
+
+It's fine to use PACKET_MMAP to improve the performance of the capture and
+transmission process, but it isn't everything. At least, if you are capturing
+at high speeds (this is relative to the cpu speed), you should check if the
+device driver of your network interface card supports some sort of interrupt
+load mitigation or (even better) if it supports NAPI, also make sure it is
+enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
+supported by devices of your network.
--------------------------------------------------------------------------------
-+ How to use CONFIG_PACKET_MMAP
++ How to use CONFIG_PACKET_MMAP to improve capture process
--------------------------------------------------------------------------------
From the user standpoint, you should use the higher level libpcap library, which
@@ -57,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP
support.
--------------------------------------------------------------------------------
-+ How to use CONFIG_PACKET_MMAP directly
++ How to use CONFIG_PACKET_MMAP directly to improve capture process
--------------------------------------------------------------------------------
From the system calls stand point, the use of PACKET_MMAP involves
@@ -66,6 +73,7 @@ the following process:
[setup] socket() -------> creation of the capture socket
setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_RX_RING
mmap() ---------> mapping of the allocated buffer to the
user process
@@ -97,13 +105,75 @@ also the mapping of the circular buffer in the user process and
the use of this buffer.
--------------------------------------------------------------------------------
++ How to use CONFIG_PACKET_MMAP directly to improve transmission process
+--------------------------------------------------------------------------------
+Transmission process is similar to capture as shown below.
+
+[setup] socket() -------> creation of the transmission socket
+ setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_TX_RING
+ bind() ---------> bind transmission socket with a network interface
+ mmap() ---------> mapping of the allocated buffer to the
+ user process
+
+[transmission] poll() ---------> wait for free packets (optional)
+ send() ---------> send all packets that are set as ready in
+ the ring
+ The flag MSG_DONTWAIT can be used to return
+ before end of transfer.
+
+[shutdown] close() --------> destruction of the transmission socket and
+ deallocation of all associated resources.
+
+Binding the socket to your network interface is mandatory (with zero copy) to
+know the header size of frames used in the circular buffer.
+
+As capture, each frame contains two parts:
+
+ --------------------
+| struct tpacket_hdr | Header. It contains the status of
+| | of this frame
+|--------------------|
+| data buffer |
+. . Data that will be sent over the network interface.
+. .
+ --------------------
+
+ bind() associates the socket to your network interface thanks to
+ sll_ifindex parameter of struct sockaddr_ll.
+
+ Initialization example:
+
+ struct sockaddr_ll my_addr;
+ struct ifreq s_ifr;
+ ...
+
+ strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
+
+ /* get interface index of eth0 */
+ ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
+
+ /* fill sockaddr_ll struct to prepare binding */
+ my_addr.sll_family = AF_PACKET;
+ my_addr.sll_protocol = ETH_P_ALL;
+ my_addr.sll_ifindex = s_ifr.ifr_ifindex;
+
+ /* bind socket to eth0 */
+ bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
+
+ A complete tutorial is available at: http://wiki.gnu-log.net/
+
+--------------------------------------------------------------------------------
+ PACKET_MMAP settings
--------------------------------------------------------------------------------
To setup PACKET_MMAP from user level code is done with a call like
+ - Capture process
setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
+ - Transmission process
+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
The most significant argument in the previous call is the req parameter,
this parameter must to have the following structure:
@@ -117,11 +187,11 @@ this parameter must to have the following structure:
};
This structure is defined in /usr/include/linux/if_packet.h and establishes a
-circular buffer (ring) of unswappable memory mapped in the capture process.
+circular buffer (ring) of unswappable memory.
Being mapped in the capture process allows reading the captured frames and
related meta-information like timestamps without requiring a system call.
-Captured frames are grouped in blocks. Each block is a physically contiguous
+Frames are grouped in blocks. Each block is a physically contiguous
region of memory and holds tp_block_size/tp_frame_size frames. The total number
of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
@@ -336,6 +406,7 @@ struct tpacket_hdr). If this field is 0 means that the frame is ready
to be used for the kernel, If not, there is a frame the user can read
and the following flags apply:
++++ Capture process:
from include/linux/if_packet.h
#define TP_STATUS_COPY 2
@@ -391,6 +462,37 @@ packets are in the ring:
It doesn't incur in a race condition to first check the status value and
then poll for frames.
+
+++ Transmission process
+Those defines are also used for transmission:
+
+ #define TP_STATUS_AVAILABLE 0 // Frame is available
+ #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
+ #define TP_STATUS_SENDING 2 // Frame is currently in transmission
+ #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
+
+First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
+packet, the user fills a data buffer of an available frame, sets tp_len to
+current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
+This can be done on multiple frames. Once the user is ready to transmit, it
+calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
+forwarded to the network device. The kernel updates each status of sent
+frames with TP_STATUS_SENDING until the end of transfer.
+At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
+
+ header->tp_len = in_i_size;
+ header->tp_status = TP_STATUS_SEND_REQUEST;
+ retval = send(this->socket, NULL, 0, 0);
+
+The user can also use poll() to check if a buffer is available:
+(status == TP_STATUS_SENDING)
+
+ struct pollfd pfd;
+ pfd.fd = fd;
+ pfd.revents = 0;
+ pfd.events = POLLOUT;
+ retval = poll(&pfd, 1, timeout);
+
--------------------------------------------------------------------------------
+ THANKS
--------------------------------------------------------------------------------