diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-15 18:40:05 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-15 18:40:05 +0200 |
commit | 2ed0e21b30b53d3a94e204196e523e6c8f732b56 (patch) | |
tree | de2635426477d86338a9469ce09ba0626052288f /Documentation/networking | |
parent | Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/po... (diff) | |
parent | Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds... (diff) | |
download | linux-2ed0e21b30b53d3a94e204196e523e6c8f732b56.tar.xz linux-2ed0e21b30b53d3a94e204196e523e6c8f732b56.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits)
pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US
ipv4: Fix fib_trie rebalancing
Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver
Bluetooth: Fix Kconfig issue with RFKILL integration
PIM-SM: namespace changes
ipv4: update ARPD help text
net: use a deferred timer in rt_check_expire
ieee802154: fix kconfig bool/tristate muckup
bonding: initialization rework
bonding: use is_zero_ether_addr
bonding: network device names are case sensative
bonding: elminate bad refcount code
bonding: fix style issues
bonding: fix destructor
bonding: remove bonding read/write semaphore
bonding: initialize before registration
bonding: bond_create always called with default parameters
x_tables: Convert printk to pr_err
netfilter: conntrack: optional reliable conntrack event delivery
list_nulls: add hlist_nulls_add_head and hlist_nulls_del
...
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/can.txt | 235 | ||||
-rw-r--r-- | Documentation/networking/ieee802154.txt | 76 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 18 | ||||
-rw-r--r-- | Documentation/networking/ipv6.txt | 37 | ||||
-rw-r--r-- | Documentation/networking/mac80211-injection.txt | 28 | ||||
-rw-r--r-- | Documentation/networking/operstates.txt | 3 | ||||
-rw-r--r-- | Documentation/networking/packet_mmap.txt | 140 |
7 files changed, 454 insertions, 83 deletions
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 463d9e029ef3..cd79735013f9 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -36,10 +36,15 @@ This file contains 6.2 local loopback of sent frames 6.3 CAN controller hardware filters 6.4 The virtual CAN driver (vcan) - 6.5 currently supported CAN hardware - 6.6 todo + 6.5 The CAN network device driver interface + 6.5.1 Netlink interface to set/get devices properties + 6.5.2 Setting the CAN bit-timing + 6.5.3 Starting and stopping the CAN network device + 6.6 supported CAN hardware - 7 Credits + 7 Socket CAN resources + + 8 Credits ============================================================================ @@ -234,6 +239,8 @@ solution for a couple of reasons: the user application using the common CAN filter mechanisms. Inside this filter definition the (interested) type of errors may be selected. The reception of error frames is disabled by default. + The format of the CAN error frame is briefly decribed in the Linux + header file "include/linux/can/error.h". 4. How to use Socket CAN ------------------------ @@ -605,61 +612,213 @@ solution for a couple of reasons: removal of vcan network devices can be managed with the ip(8) tool: - Create a virtual CAN network interface: - ip link add type vcan + $ ip link add type vcan - Create a virtual CAN network interface with a specific name 'vcan42': - ip link add dev vcan42 type vcan + $ ip link add dev vcan42 type vcan - Remove a (virtual CAN) network interface 'vcan42': - ip link del vcan42 - - The tool 'vcan' from the SocketCAN SVN repository on BerliOS is obsolete. - - Virtual CAN network device creation in older Kernels: - In Linux Kernel versions < 2.6.24 the vcan driver creates 4 vcan - netdevices at module load time by default. This value can be changed - with the module parameter 'numdev'. E.g. 'modprobe vcan numdev=8' - - 6.5 currently supported CAN hardware + $ ip link del vcan42 + + 6.5 The CAN network device driver interface + + The CAN network device driver interface provides a generic interface + to setup, configure and monitor CAN network devices. The user can then + configure the CAN device, like setting the bit-timing parameters, via + the netlink interface using the program "ip" from the "IPROUTE2" + utility suite. The following chapter describes briefly how to use it. + Furthermore, the interface uses a common data structure and exports a + set of common functions, which all real CAN network device drivers + should use. Please have a look to the SJA1000 or MSCAN driver to + understand how to use them. The name of the module is can-dev.ko. + + 6.5.1 Netlink interface to set/get devices properties + + The CAN device must be configured via netlink interface. The supported + netlink message types are defined and briefly described in + "include/linux/can/netlink.h". CAN link support for the program "ip" + of the IPROUTE2 utility suite is avaiable and it can be used as shown + below: + + - Setting CAN device properties: + + $ ip link set can0 type can help + Usage: ip link set DEVICE type can + [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] | + [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1 + phase-seg2 PHASE-SEG2 [ sjw SJW ] ] + + [ loopback { on | off } ] + [ listen-only { on | off } ] + [ triple-sampling { on | off } ] + + [ restart-ms TIME-MS ] + [ restart ] + + Where: BITRATE := { 1..1000000 } + SAMPLE-POINT := { 0.000..0.999 } + TQ := { NUMBER } + PROP-SEG := { 1..8 } + PHASE-SEG1 := { 1..8 } + PHASE-SEG2 := { 1..8 } + SJW := { 1..4 } + RESTART-MS := { 0 | NUMBER } + + - Display CAN device details and statistics: + + $ ip -details -statistics link show can0 + 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP qlen 10 + link/can + can <TRIPLE-SAMPLING> state ERROR-ACTIVE restart-ms 100 + bitrate 125000 sample_point 0.875 + tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1 + sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 + clock 8000000 + re-started bus-errors arbit-lost error-warn error-pass bus-off + 41 17457 0 41 42 41 + RX: bytes packets errors dropped overrun mcast + 140859 17608 17457 0 0 0 + TX: bytes packets errors dropped carrier collsns + 861 112 0 41 0 0 + + More info to the above output: + + "<TRIPLE-SAMPLING>" + Shows the list of selected CAN controller modes: LOOPBACK, + LISTEN-ONLY, or TRIPLE-SAMPLING. + + "state ERROR-ACTIVE" + The current state of the CAN controller: "ERROR-ACTIVE", + "ERROR-WARNING", "ERROR-PASSIVE", "BUS-OFF" or "STOPPED" + + "restart-ms 100" + Automatic restart delay time. If set to a non-zero value, a + restart of the CAN controller will be triggered automatically + in case of a bus-off condition after the specified delay time + in milliseconds. By default it's off. + + "bitrate 125000 sample_point 0.875" + Shows the real bit-rate in bits/sec and the sample-point in the + range 0.000..0.999. If the calculation of bit-timing parameters + is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the + bit-timing can be defined by setting the "bitrate" argument. + Optionally the "sample-point" can be specified. By default it's + 0.000 assuming CIA-recommended sample-points. + + "tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1" + Shows the time quanta in ns, propagation segment, phase buffer + segment 1 and 2 and the synchronisation jump width in units of + tq. They allow to define the CAN bit-timing in a hardware + independent format as proposed by the Bosch CAN 2.0 spec (see + chapter 8 of http://www.semiconductors.bosch.de/pdf/can2spec.pdf). + + "sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 + clock 8000000" + Shows the bit-timing constants of the CAN controller, here the + "sja1000". The minimum and maximum values of the time segment 1 + and 2, the synchronisation jump width in units of tq, the + bitrate pre-scaler and the CAN system clock frequency in Hz. + These constants could be used for user-defined (non-standard) + bit-timing calculation algorithms in user-space. + + "re-started bus-errors arbit-lost error-warn error-pass bus-off" + Shows the number of restarts, bus and arbitration lost errors, + and the state changes to the error-warning, error-passive and + bus-off state. RX overrun errors are listed in the "overrun" + field of the standard network statistics. + + 6.5.2 Setting the CAN bit-timing + + The CAN bit-timing parameters can always be defined in a hardware + independent format as proposed in the Bosch CAN 2.0 specification + specifying the arguments "tq", "prop_seg", "phase_seg1", "phase_seg2" + and "sjw": + + $ ip link set canX type can tq 125 prop-seg 6 \ + phase-seg1 7 phase-seg2 2 sjw 1 + + If the kernel option CONFIG_CAN_CALC_BITTIMING is enabled, CIA + recommended CAN bit-timing parameters will be calculated if the bit- + rate is specified with the argument "bitrate": + + $ ip link set canX type can bitrate 125000 + + Note that this works fine for the most common CAN controllers with + standard bit-rates but may *fail* for exotic bit-rates or CAN system + clock frequencies. Disabling CONFIG_CAN_CALC_BITTIMING saves some + space and allows user-space tools to solely determine and set the + bit-timing parameters. The CAN controller specific bit-timing + constants can be used for that purpose. They are listed by the + following command: + + $ ip -details link show can0 + ... + sja1000: clock 8000000 tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 + + 6.5.3 Starting and stopping the CAN network device + + A CAN network device is started or stopped as usual with the command + "ifconfig canX up/down" or "ip link set canX up/down". Be aware that + you *must* define proper bit-timing parameters for real CAN devices + before you can start it to avoid error-prone default settings: + + $ ip link set canX up type can bitrate 125000 + + A device may enter the "bus-off" state if too much errors occurred on + the CAN bus. Then no more messages are received or sent. An automatic + bus-off recovery can be enabled by setting the "restart-ms" to a + non-zero value, e.g.: + + $ ip link set canX type can restart-ms 100 + + Alternatively, the application may realize the "bus-off" condition + by monitoring CAN error frames and do a restart when appropriate with + the command: + + $ ip link set canX type can restart + + Note that a restart will also create a CAN error frame (see also + chapter 3.4). - On the project website http://developer.berlios.de/projects/socketcan - there are different drivers available: + 6.6 Supported CAN hardware - vcan: Virtual CAN interface driver (if no real hardware is available) - sja1000: Philips SJA1000 CAN controller (recommended) - i82527: Intel i82527 CAN controller - mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200) - ccan: CCAN controller core (e.g. inside SOC h7202) - slcan: For a bunch of CAN adaptors that are attached via a - serial line ASCII protocol (for serial / USB adaptors) + Please check the "Kconfig" file in "drivers/net/can" to get an actual + list of the support CAN hardware. On the Socket CAN project website + (see chapter 7) there might be further drivers available, also for + older kernel versions. - Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport) - from PEAK Systemtechnik support the CAN netdevice driver model - since Linux driver v6.0: http://www.peak-system.com/linux/index.htm +7. Socket CAN resources +----------------------- - Please check the Mailing Lists on the berlios OSS project website. + You can find further resources for Socket CAN like user space tools, + support for old kernel versions, more drivers, mailing lists, etc. + at the BerliOS OSS project website for Socket CAN: - 6.6 todo + http://developer.berlios.de/projects/socketcan - The configuration interface for CAN network drivers is still an open - issue that has not been finalized in the socketcan project. Also the - idea of having a library module (candev.ko) that holds functions - that are needed by all CAN netdevices is not ready to ship. - Your contribution is welcome. + If you have questions, bug fixes, etc., don't hesitate to post them to + the Socketcan-Users mailing list. But please search the archives first. -7. Credits +8. Credits ---------- - Oliver Hartkopp (PF_CAN core, filters, drivers, bcm) + Oliver Hartkopp (PF_CAN core, filters, drivers, bcm, SJA1000 driver) Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan) Jan Kizka (RT-SocketCAN core, Socket-API reconciliation) - Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews) + Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews, + CAN device driver interface, MSCAN driver) Robert Schwebel (design reviews, PTXdist integration) Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers) Benedikt Spranger (reviews) Thomas Gleixner (LKML reviews, coding style, posting hints) - Andrey Volkov (kernel subtree structure, ioctls, mscan driver) + Andrey Volkov (kernel subtree structure, ioctls, MSCAN driver) Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003) Klaus Hitschler (PEAK driver integration) Uwe Koppe (CAN netdevices with PF_PACKET approach) Michael Schulze (driver layer loopback requirement, RT CAN drivers review) + Pavel Pisa (Bit-timing calculation) + Sascha Hauer (SJA1000 platform driver) + Sebastian Haas (SJA1000 EMS PCI driver) + Markus Plessing (SJA1000 EMS PCI driver) + Per Dalen (SJA1000 Kvaser PCI driver) + Sam Ravnborg (reviews, coding style, kbuild help) diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt new file mode 100644 index 000000000000..a0280ad2edc9 --- /dev/null +++ b/Documentation/networking/ieee802154.txt @@ -0,0 +1,76 @@ + + Linux IEEE 802.15.4 implementation + + +Introduction +============ + +The Linux-ZigBee project goal is to provide complete implementation +of IEEE 802.15.4 / ZigBee / 6LoWPAN protocols. IEEE 802.15.4 is a stack +of protocols for organizing Low-Rate Wireless Personal Area Networks. + +Currently only IEEE 802.15.4 layer is implemented. We have choosen +to use plain Berkeley socket API, the generic Linux networking stack +to transfer IEEE 802.15.4 messages and a special protocol over genetlink +for configuration/management + + +Socket API +========== + +int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0); +..... + +The address family, socket addresses etc. are defined in the +include/net/ieee802154/af_ieee802154.h header or in the special header +in our userspace package (see either linux-zigbee sourceforge download page +or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee). + +One can use SOCK_RAW for passing raw data towards device xmit function. YMMV. + + +MLME - MAC Level Management +============================ + +Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands. +See the include/net/ieee802154/nl802154.h header. Our userspace tools package +(see above) provides CLI configuration utility for radio interfaces and simple +coordinator for IEEE 802.15.4 networks as an example users of MLME protocol. + + +Kernel side +============= + +Like with WiFi, there are several types of devices implementing IEEE 802.15.4. +1) 'HardMAC'. The MAC layer is implemented in the device itself, the device + exports MLME and data API. +2) 'SoftMAC' or just radio. These types of devices are just radio transceivers + possibly with some kinds of acceleration like automatic CRC computation and + comparation, automagic ACK handling, address matching, etc. + +Those types of devices require different approach to be hooked into Linux kernel. + + +HardMAC +======= + +See the header include/net/ieee802154/netdevice.h. You have to implement Linux +net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family +code via plain sk_buffs. The control block of sk_buffs will contain additional +info as described in the struct ieee802154_mac_cb. + +To hook the MLME interface you have to populate the ml_priv field of your +net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are +required. + +We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c + + +SoftMAC +======= + +We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC +in software. This is currently WIP. + +See header include/net/ieee802154/mac802154.h and several drivers in +drivers/ieee802154/ diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index b121c5db707f..8be76235fe67 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -168,7 +168,16 @@ tcp_dsack - BOOLEAN Allows TCP to send "duplicate" SACKs. tcp_ecn - BOOLEAN - Enable Explicit Congestion Notification in TCP. + Enable Explicit Congestion Notification (ECN) in TCP. ECN is only + used when both ends of the TCP flow support it. It is useful to + avoid losses due to congestion (when the bottleneck router supports + ECN). + Possible values are: + 0 disable ECN + 1 ECN enabled + 2 Only server-side ECN enabled. If the other end does + not support ECN, behavior is like with ECN disabled. + Default: 2 tcp_fack - BOOLEAN Enable FACK congestion avoidance and fast retransmission. @@ -1048,6 +1057,13 @@ disable_ipv6 - BOOLEAN address. Default: FALSE (enable IPv6 operation) + When this value is changed from 1 to 0 (IPv6 is being enabled), + it will dynamically create a link-local address on the given + interface and start Duplicate Address Detection, if necessary. + + When this value is changed from 0 to 1 (IPv6 is being disabled), + it will dynamically delete all address on the given interface. + accept_dad - INTEGER Whether to accept DAD (Duplicate Address Detection). 0: Disable DAD diff --git a/Documentation/networking/ipv6.txt b/Documentation/networking/ipv6.txt index 268e5c103dd8..9fd7e21296c8 100644 --- a/Documentation/networking/ipv6.txt +++ b/Documentation/networking/ipv6.txt @@ -33,3 +33,40 @@ disable A reboot is required to enable IPv6. +autoconf + + Specifies whether to enable IPv6 address autoconfiguration + on all interfaces. This might be used when one does not wish + for addresses to be automatically generated from prefixes + received in Router Advertisements. + + The possible values and their effects are: + + 0 + IPv6 address autoconfiguration is disabled on all interfaces. + + Only the IPv6 loopback address (::1) and link-local addresses + will be added to interfaces. + + 1 + IPv6 address autoconfiguration is enabled on all interfaces. + + This is the default value. + +disable_ipv6 + + Specifies whether to disable IPv6 on all interfaces. + This might be used when no IPv6 addresses are desired. + + The possible values and their effects are: + + 0 + IPv6 is enabled on all interfaces. + + This is the default value. + + 1 + IPv6 is disabled on all interfaces. + + No IPv6 addresses will be added to interfaces. + diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index 84906ef3ed6e..b30e81ad5307 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -12,38 +12,22 @@ following format: The radiotap format is discussed in ./Documentation/networking/radiotap-headers.txt. -Despite 13 radiotap argument types are currently defined, most only make sense +Despite many radiotap parameters being currently defined, most only make sense to appear on received packets. The following information is parsed from the radiotap headers and used to control injection: - * IEEE80211_RADIOTAP_RATE - - rate in 500kbps units, automatic if invalid or not present - - - * IEEE80211_RADIOTAP_ANTENNA - - antenna to use, automatic if not present - - - * IEEE80211_RADIOTAP_DBM_TX_POWER - - transmit power in dBm, automatic if not present - - * IEEE80211_RADIOTAP_FLAGS IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the - current fragmentation threshold. Note that - this flag is only reliable when software - fragmentation is enabled) + current fragmentation threshold. + The injection code can also skip all other currently defined radiotap fields facilitating replay of captured radiotap headers directly. -Here is an example valid radiotap header defining these three parameters +Here is an example valid radiotap header defining some parameters 0x00, 0x00, // <-- radiotap version 0x0b, 0x00, // <- radiotap header length @@ -72,8 +56,8 @@ interface), along the following lines: ... r = pcap_inject(ppcap, u8aSendBuffer, nLength); -You can also find sources for a complete inject test applet here: +You can also find a link to a complete inject application here: -http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer +http://wireless.kernel.org/en/users/Documentation/packetspammer Andy Green <andy@warmcat.com> diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt index c9074f9b78bb..1a77a3cfae54 100644 --- a/Documentation/networking/operstates.txt +++ b/Documentation/networking/operstates.txt @@ -38,9 +38,6 @@ ifinfomsg::if_flags & IFF_LOWER_UP: ifinfomsg::if_flags & IFF_DORMANT: Driver has signaled netif_dormant_on() -These interface flags can also be queried without netlink using the -SIOCGIFFLAGS ioctl. - TLV IFLA_OPERSTATE contains RFC2863 state of the interface in numeric representation: diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 07c53d596035..a22fd85e3796 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -4,16 +4,18 @@ This file documents the CONFIG_PACKET_MMAP option available with the PACKET socket interface on 2.4 and 2.6 kernels. This type of sockets is used for -capture network traffic with utilities like tcpdump or any other that uses -the libpcap library. - -You can find the latest version of this document at +capture network traffic with utilities like tcpdump or any other that needs +raw access to network interface. +You can find the latest version of this document at: http://pusa.uv.es/~ulisses/packet_mmap/ -Please send me your comments to +Howto can be found at: + http://wiki.gnu-log.net (packet_mmap) +Please send your comments to Ulisses Alonso CamarĂ³ <uaca@i.hate.spam.alumni.uv.es> + Johann Baudy <johann.baudy@gnu-log.net> ------------------------------------------------------------------------------- + Why use PACKET_MMAP @@ -25,19 +27,24 @@ to capture each packet, it requires two if you want to get packet's timestamp (like libpcap always does). In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size -configurable circular buffer mapped in user space. This way reading packets just -needs to wait for them, most of the time there is no need to issue a single -system call. By using a shared buffer between the kernel and the user -also has the benefit of minimizing packet copies. - -It's fine to use PACKET_MMAP to improve the performance of the capture process, -but it isn't everything. At least, if you are capturing at high speeds (this -is relative to the cpu speed), you should check if the device driver of your -network interface card supports some sort of interrupt load mitigation or -(even better) if it supports NAPI, also make sure it is enabled. +configurable circular buffer mapped in user space that can be used to either +send or receive packets. This way reading packets just needs to wait for them, +most of the time there is no need to issue a single system call. Concerning +transmission, multiple packets can be sent through one system call to get the +highest bandwidth. +By using a shared buffer between the kernel and the user also has the benefit +of minimizing packet copies. + +It's fine to use PACKET_MMAP to improve the performance of the capture and +transmission process, but it isn't everything. At least, if you are capturing +at high speeds (this is relative to the cpu speed), you should check if the +device driver of your network interface card supports some sort of interrupt +load mitigation or (even better) if it supports NAPI, also make sure it is +enabled. For transmission, check the MTU (Maximum Transmission Unit) used and +supported by devices of your network. -------------------------------------------------------------------------------- -+ How to use CONFIG_PACKET_MMAP ++ How to use CONFIG_PACKET_MMAP to improve capture process -------------------------------------------------------------------------------- From the user standpoint, you should use the higher level libpcap library, which @@ -57,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP support. -------------------------------------------------------------------------------- -+ How to use CONFIG_PACKET_MMAP directly ++ How to use CONFIG_PACKET_MMAP directly to improve capture process -------------------------------------------------------------------------------- From the system calls stand point, the use of PACKET_MMAP involves @@ -66,6 +73,7 @@ the following process: [setup] socket() -------> creation of the capture socket setsockopt() ---> allocation of the circular buffer (ring) + option: PACKET_RX_RING mmap() ---------> mapping of the allocated buffer to the user process @@ -97,13 +105,75 @@ also the mapping of the circular buffer in the user process and the use of this buffer. -------------------------------------------------------------------------------- ++ How to use CONFIG_PACKET_MMAP directly to improve transmission process +-------------------------------------------------------------------------------- +Transmission process is similar to capture as shown below. + +[setup] socket() -------> creation of the transmission socket + setsockopt() ---> allocation of the circular buffer (ring) + option: PACKET_TX_RING + bind() ---------> bind transmission socket with a network interface + mmap() ---------> mapping of the allocated buffer to the + user process + +[transmission] poll() ---------> wait for free packets (optional) + send() ---------> send all packets that are set as ready in + the ring + The flag MSG_DONTWAIT can be used to return + before end of transfer. + +[shutdown] close() --------> destruction of the transmission socket and + deallocation of all associated resources. + +Binding the socket to your network interface is mandatory (with zero copy) to +know the header size of frames used in the circular buffer. + +As capture, each frame contains two parts: + + -------------------- +| struct tpacket_hdr | Header. It contains the status of +| | of this frame +|--------------------| +| data buffer | +. . Data that will be sent over the network interface. +. . + -------------------- + + bind() associates the socket to your network interface thanks to + sll_ifindex parameter of struct sockaddr_ll. + + Initialization example: + + struct sockaddr_ll my_addr; + struct ifreq s_ifr; + ... + + strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); + + /* get interface index of eth0 */ + ioctl(this->socket, SIOCGIFINDEX, &s_ifr); + + /* fill sockaddr_ll struct to prepare binding */ + my_addr.sll_family = AF_PACKET; + my_addr.sll_protocol = ETH_P_ALL; + my_addr.sll_ifindex = s_ifr.ifr_ifindex; + + /* bind socket to eth0 */ + bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll)); + + A complete tutorial is available at: http://wiki.gnu-log.net/ + +-------------------------------------------------------------------------------- + PACKET_MMAP settings -------------------------------------------------------------------------------- To setup PACKET_MMAP from user level code is done with a call like + - Capture process setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req)) + - Transmission process + setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req)) The most significant argument in the previous call is the req parameter, this parameter must to have the following structure: @@ -117,11 +187,11 @@ this parameter must to have the following structure: }; This structure is defined in /usr/include/linux/if_packet.h and establishes a -circular buffer (ring) of unswappable memory mapped in the capture process. +circular buffer (ring) of unswappable memory. Being mapped in the capture process allows reading the captured frames and related meta-information like timestamps without requiring a system call. -Captured frames are grouped in blocks. Each block is a physically contiguous +Frames are grouped in blocks. Each block is a physically contiguous region of memory and holds tp_block_size/tp_frame_size frames. The total number of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because @@ -336,6 +406,7 @@ struct tpacket_hdr). If this field is 0 means that the frame is ready to be used for the kernel, If not, there is a frame the user can read and the following flags apply: ++++ Capture process: from include/linux/if_packet.h #define TP_STATUS_COPY 2 @@ -391,6 +462,37 @@ packets are in the ring: It doesn't incur in a race condition to first check the status value and then poll for frames. + +++ Transmission process +Those defines are also used for transmission: + + #define TP_STATUS_AVAILABLE 0 // Frame is available + #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send() + #define TP_STATUS_SENDING 2 // Frame is currently in transmission + #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct + +First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a +packet, the user fills a data buffer of an available frame, sets tp_len to +current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST. +This can be done on multiple frames. Once the user is ready to transmit, it +calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are +forwarded to the network device. The kernel updates each status of sent +frames with TP_STATUS_SENDING until the end of transfer. +At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE. + + header->tp_len = in_i_size; + header->tp_status = TP_STATUS_SEND_REQUEST; + retval = send(this->socket, NULL, 0, 0); + +The user can also use poll() to check if a buffer is available: +(status == TP_STATUS_SENDING) + + struct pollfd pfd; + pfd.fd = fd; + pfd.revents = 0; + pfd.events = POLLOUT; + retval = poll(&pfd, 1, timeout); + -------------------------------------------------------------------------------- + THANKS -------------------------------------------------------------------------------- |