summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* batman-adv: Start new development cycleSimon Wunderlich2014-05-181-1/+1
| | | | | Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* batman-adv: remove semi-colon after macro definitionAntonio Quartulli2014-05-182-4/+4
| | | | | | | | Reported by checkpatch with the following warning: "WARNING: macros should not use a trailing semicolon" Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* batman-adv: add blank line between declarations and the rest of the codeAntonio Quartulli2014-05-184-0/+21
| | | | | | | | Reported by checkpatch with the following message: "WARNING: Missing a blank line after declarations" Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* Merge branch 'cdc_ncm-coalesce'David S. Miller2014-05-174-187/+455
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bjørn Mork says: ==================== cdc_ncm: add buffer tuning and stats using ethtool Quoting the previous description of this series (skip to the changelog below if you only want a summary of the changes): "I have got quite a few reports from frustrated users of OpenWRT hosts trying to use some powerful LTE modem, but not achieving full speed. This is typically caused by a combination of big buffers and little memory, giving in allocation errors and bad performance as a result. This series is an attempt to let users adjust the size of these buffers without having to rebuild the driver. Patches 1 - 4 are mostly rearranging existing code, in preparing for the dynamic buffer size changes. Patch 5 adds userspace control (ab)using the ethtool coalescing API. This isn't a perfect match, which is the main reason why I post this series as a RFC. Patch 6 is an unrelated framing optimization, reducing the overhead quite a bit and allowing for better use of smaller buffers. Patch 7 changes the way we calculate frame padding cutoff. The problem with big buffers is made much worse by the current padding strategy where zero padding often can account for more than 90% of the frames. Patch 8 add some counters giving some insight into how well the NCM/MBIM protocol works, supporting further tuning. Patch 9 reduce the initial maximum buffer size from 32kB to 16kB in an attempt to make the default better suit all. It is still possible to tune this up again to the old fixed max, using the new tuning knobs. I must admit that I had higher hopes for this series before I tested it on my own modems. One really unexpected result was that one of the MBIM modems accepted the new rx buffer size we set, but happily continued sending buffers of the same size as before. Needless to say: This did not work very well... So don't really expect to be able to use any values with any given device. Firmware implementations are still... I don't think I have words suitable for a public mailing list. But I am hoping this will help the many users who have had success rebuilding the driver with lower fixed limits. Please test and/or comment!" Changes: ** RFC -> v1 ** Patch 10 - a follow-up to a comment Joe Perches made in November 2013. I don't always forget :-) Patch 11 - removes the redundant "connected" driver state, and the associated .check_connect callbacks. ** v1 -> v2 ** Patch 1 - Better handling of minium rx buffer size, based on feedback from Oliver Neukum and Enrico Mioso Patch 5 - fixed locking around timer interval update Patch 9 - fixed whitespace error Patch 12 - new fix related to the tuneable tx timer ...and spelling fixes all over the commit messages. I have finally added a spelling hook, which I'm sure may of you will appreciate :-) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: do not start timer on an empty skbBjørn Mork2014-05-171-2/+2
| | | | | | | | | | | | | | | | | | | | We can end up with a freshly allocated tx_curr_skb with no frames in it. In this case it does not make any sense to start the timer. This avoids the timer periodically trying to start tx when there is nothing in the queue. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: remove redundant "disconnected" flagBjørn Mork2014-05-173-31/+2
| | | | | | | | | | | | | | | | | | Calling netif_carrier_{on,off} is sufficient. There is no need to duplicate the carrier state in a driver specific flag. Acked-by: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: fix argument alignmentBjørn Mork2014-05-171-6/+6
| | | | | | | | | | | | Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: use sane defaults for rx/tx buffersBjørn Mork2014-05-172-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | Lots of devices request much larger buffers than reasonable. This cause real problems for users of hosts with limited resources. Reducing the default buffer size to 16kB for such devices is a reasonable trade-off between allowing them to aggregate traffic and avoiding memory exhaustion on resource restrained hosts. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm/cdc_mbim: adding NCM protocol statisticsBjørn Mork2014-05-173-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To have an idea of the effects of the protocol coalescing it's useful to have some counters showing the different aspects. Due to the asymmetrical usbnet interface the netdev rx_bytes counter has been counting real received payload, while the tx_bytes counter has included the NCM/MBIM framing overhead. This overhead can be many times the payload because of the aggressive padding strategy of this driver, and will vary a lot depending on device and traffic. With very few exceptions, users are only interested in the payload size. Having an somewhat accurate payload byte counter is particularly important for mobile broadband devices, which many NCM devices and of course all MBIM devices are. Users and userspace applications will use this counter to monitor account quotas. Having protocol specific counters for the overhead, we are now able to correct the tx_bytes netdev counter so that it shows the real payload Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: set reasonable padding limitsBjørn Mork2014-05-172-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We pad frames larger than X to maximum size for devices which don't need a ZLP after maximum sized frames. This allows the device to optimize its transfers for one fixed buffer size. X was arbitrarily set at 512 bytes regardless of real buffer maximum, causing extreme overheads due to excessive padding of larger tx buffers. Limit the padding to at most 3 full USB packets, still allowing the overhead to payload ratio of 3/1. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: use true max dgram count for header estimatesBjørn Mork2014-05-172-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | Many newer NCM and MBIM devices will request a maximum tx datagram count which is much smaller than our hard-coded absolute max. We can reduce the overhead without sacrificing any of the simplicity for these devices, by simply using the true negotiated count in when calculated the maximum NTH and NDP header sizes. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: use ethtool to tune coalescing settingsBjørn Mork2014-05-172-3/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Datagram coalescing is an integral part of the NCM and MBIM protocols, intended to reduce the interrupt load primarily on the device end of the USB link. As with all coalescing solutions, there is a trade-off between buffering and interrupts. The current defaults are based on the assumption that device side buffers should be the limiting factor. However, many modern high speed LTE modems suffers from buffer-bloat, making this assumption fail. This results in sub-optimal performance due to excessive coalescing. And in cases where such modems are connected to cheap embedded hosts there is often severe buffer allocation issues, giving very noticeable performance degradation . A start on improving this is going from build time hard coded limits to per device user configurable limits. The ethtool coalescing API was selected as user interface because, although the tuned values are buffer sizes, these settings directly control datagram coalescing. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: support rx_max/tx_max updates when runningBjørn Mork2014-05-171-6/+25
| | | | | | | | | | | | | | | | | | Finish the rx_max/tx_max setup by flushing buffers and informing usbnet about the changes. This way, the settings can be modified while the netdev is up and running. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: split .bind device initializationBjørn Mork2014-05-171-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have split out the part of the device setup which MUST be done with the data interface in altsetting 0, we can delay the rest of the initialization. This allows us to move some of post-init buffer size config from bind to the appropriate setup function. The purpose of this refactoring is to collect all code adjusting the rx_max and tx_max buffers in one place, so that it is easier to call it from multiple call sites. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: factor out one-time device initializationBjørn Mork2014-05-171-96/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | Split the parts of setup dealing with device initialization from parts just setting defaults for attributes which might be changed after initialization. Some commands of the device initialization are only allowed when the data interface is in its disabled altsetting, so we must separate them out of we are to allow rerunning parts of setup. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: split out rx_max/tx_max update of setupBjørn Mork2014-05-171-31/+57
|/ | | | | | | | | Split out the part of setup dealing with updating the rx_max and tx_max buffer sizes so that this code can be reused for dynamically updating the limits. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* pktgen: Use seq_puts() where seq_printf() is not neededThomas Graf2014-05-161-25/+25
| | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'ieee802154-next'David S. Miller2014-05-1616-33/+2670
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Phoebe Buckheister says: ==================== 802154: implement link-layer security This patch series implements 802.15.4-2011 link layer security. Patches 1 and 2 prepare for llsec by adding data structures to represent the llsec PIB as specified in 802.15.4-2011. I've changed some structures from their specification to be more sensible, since 802.15.4 specifies some structures in not-exactly-useful ways. Nested lists are common, but not very accessible for netlink methods, and not very fast to traverse when searching for specific elements either. Patch 3 implements backends for these structures in mac802154. Patch 4 and 5 implement the encryption and decryption methods, split from patch 3 to ease review. The encryption and decryption methods are almost entirely compliant with the specified outgoing/incoming frame procedures. Decryption deviates from the specification slightly where the specification makes no sense, i.e. encrypted frames with security level 0 may be sent, but must be dropped an reception - but transforms for processing such frames are given a few lines in the standard. I've opted to not drop these frames instead of not implementing the transforms that wouldn't be used if they were dropped. Patch 6 links the mac802154 llsec with the SoftMAC devices. This is mainly init//fini code for llsec context, handling of security subheaders and calling the encryption/decryption methods. Patch 7 adds sockopts to 802.15.4 dgram sockets to modifiy outgoing security parameters on a per-socket basis. Ideally, this would also be available for sockets on 6lowpan devices, but I'm not sure how to do that nicely. Patch 8 adds forwarders to the llsec configuration methods for netlink, patch 10 implements these netlink accessors. This is mainly mechanical. Patch 11, implements a key tracking option for devices that previous patches haven't, because I'm not entirely sure whether this is the best approach to the problem. It performs reasonably well though, so I decided to include it as a separate patch in this series instead of sending an RFC just for this one option. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ieee802154, mac802154: implement devkey record optionPhoebe Buckheister2014-05-162-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 802.15.4-2011 standard states that for each key, a list of devices that use this key shall be kept. Previous patches have only considered two options: * a device "uses" (or may use) all keys, rendering the list useless * a device is restricted to a certain set of keys Another option would be that a device *may* use all keys, but need not do so, and we are interested in the actual set of keys the device uses. Recording keys used by any given device may have a noticable performance impact and might not be needed as often. The common case, in which a device will not switch keys too often, should still perform well. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ieee802154: add netlink interfaces for llsecPhoebe Buckheister2014-05-165-0/+893
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds user-visible interfaces for the llsec infrastructure. For the added methods, the only major difference between all add/remove implementation lies in how the specific object is parsed, and for dump requests, how objects are written into netlink messages. To save on boilerplate code, table dumps are routed through a helper function that handles netlink dump state, leaving the actual dumping code to care only about iterating over the table to be dumped and filling netlink messages. For add/remove methods, the boilerplate required to work is not quite as large, but still enough to also move into a local helper. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: propagate device address changes to llsecPhoebe Buckheister2014-05-162-3/+47
| | | | | | | | | | Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: add llsec configuration functionsPhoebe Buckheister2014-05-164-0/+274
| | | | | | | | | | Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ieee802154: add dgram sockopts for security controlPhoebe Buckheister2014-05-162-1/+75
| | | | | | | | | | | | | | | | | | Allow datagram sockets to override the security settings of the device they send from on a per-socket basis. Requires CAP_NET_ADMIN or CAP_NET_RAW, since raw sockets can send arbitrary packets anyway. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: integrate llsec with wpan devicesPhoebe Buckheister2014-05-163-28/+103
| | | | | | | | | | Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: add llsec decryption methodPhoebe Buckheister2014-05-162-0/+248
| | | | | | | | | | Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: add llsec encryption methodPhoebe Buckheister2014-05-162-0/+255
| | | | | | | | | | Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: add llsec structures and mutatorsPhoebe Buckheister2014-05-164-1/+637
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds containers and mutators for the major ieee802154_llsec structures to mac802154. Most of the (rather simple) ieee802154_llsec structs are wrapped only to provide an rcu_head for orderly disposal, but some structs - llsec keys notably - require more complex bookkeeping. Since each llsec key may be referenced by a number of llsec key table entries (with differing key ids, but the same actual key), we want to save memory and not allocate crypto transforms for each entry in the table. Thus, the mac802154 llsec key is reference-counted instead. Further, each key will have four associated crypto transforms - three CCM transforms for the authsizes 4/8/16 and one CTR transform for unauthenticated encryption. If we had a CCM* transform that allowed authsize 0, and authsize as part of requests instead of transforms, this would not be necessary. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac802154: update KconfigPhoebe Buckheister2014-05-161-0/+4
| | | | | | | | | | | | | | | | Link-layer security requires AES CCM for authenticated modes and AES CTR for the unauthenticated encryption mode. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ieee802154: add types for link-layer securityPhoebe Buckheister2014-05-161-0/+95
|/ | | | | | | | | | | | | | The added structures match 802.15.4-2011 link-layer security PIBs as closely as is reasonable. Some lists required by the standard were modeled as bitmaps (frame_types and command_frame_ids in *llsec_key, 802.15.4-2011 7.5/Table 61), since using lists for those seems a bit excessive and not particularly useful. The DeviceDescriptorHandleList was inverted and is here a per-device list, since operations on this list are likely to have both a key and a device at hand, and per-device lists of keys are shorter than per-key lists of devices. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2014-05-1611-184/+176
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A set of OVS changes for net-next/3.16. The major change here is a switch from per-CPU to per-NUMA flow statistics. This improves scalability by reducing kernel overhead in flow setup and maintenance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.cMonam Agarwal2014-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Use TCP flags in the flow key for stats.Jarno Rajahalme2014-05-161-7/+5
| | | | | | | | | | | | | | | | | | We already extract the TCP flags for the key, might as well use that for stats. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Fix output of SCTP mask.Jarno Rajahalme2014-05-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | The 'output' argument of the ovs_nla_put_flow() is the one from which the bits are written to the netlink attributes. For SCTP we accidentally used the bits from the 'swkey' instead. This caused the mask attributes to include the bits from the actual flow key instead of the mask. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Per NUMA node flow stats.Jarno Rajahalme2014-05-164-55/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS locking overhead otherwise on the top of perf reports and allows OVS to scale better with higher number of threads. With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup rate doubles on a server with two hyper-threaded physical CPUs (16 logical cores each) compared to the current OVS master. Tested with non-trivial flow table with a TCP port match rule forcing all new connections with unique port numbers to OVS userspace. The IP addresses are still wildcarded, so the kernel flows are not considered as exact match 5-tuple flows. This type of flows can be expected to appear in large numbers as the result of more effective wildcarding made possible by improvements in OVS userspace flow classifier. Perf results for this test (master): Events: 305K cycles + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.03% swapper [kernel.kallsyms] [k] intel_idle + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock ... And after this patch: Events: 356K cycles + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 1.47% swapper [kernel.kallsyms] [k] intel_idle + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock ... There is a small increase in kernel spinlock overhead due to the same spinlock being shared between multiple cores of the same physical CPU, but that is barely visible in the netperf TCP_CRR test performance (maybe ~1% performance drop, hard to tell exactly due to variance in the test results), when testing for kernel module throughput (with no userspace activity, handful of kernel flows). On flow setup, a single stats instance is allocated (for the NUMA node 0). As CPUs from multiple NUMA nodes start updating stats, new NUMA-node specific stats instances are allocated. This allocation on the packet processing code path is made to never block or look for emergency memory pools, minimizing the allocation latency. If the allocation fails, the existing preallocated stats instance is used. Also, if only CPUs from one NUMA-node are updating the preallocated stats instance, no additional stats instances are allocated. This eliminates the need to pre-allocate stats instances that will not be used, also relieving the stats reader from the burden of reading stats that are never used. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Remove 5-tuple optimization.Jarno Rajahalme2014-05-167-113/+32
| | | | | | | | | | | | | | | | | | The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Use ether_addr_copyJoe Perches2014-05-163-16/+16
| | | | | | | | | | | | | | It's slightly smaller/faster for some architectures. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: flow_netlink: Use pr_fmt to OVS_NLERR outputJoe Perches2014-05-161-0/+2
| | | | | | | | | | | | | | | | Add "openvswitch: " prefix to OVS_NLERR output to match the other OVS_NLERR output of datapath.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Use net_ratelimit in OVS_NLERRJoe Perches2014-05-161-3/+5
| | | | | | | | | | | | | | | | | | | | | | Each use of pr_<level>_once has a per-site flag. Some of the OVS_NLERR messages look as if seeing them multiple times could be useful, so use net_ratelimit() instead of pr_info_once. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Added (unsigned long long) cast in printfDaniele Di Proietto2014-05-161-2/+2
| | | | | | | | | | | | | | | | This is necessary, since u64 is not unsigned long long in all architectures: u64 could be also uint64_t. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: avoid cast-qual warning in vport_privDaniele Di Proietto2014-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | This function must cast a const value to a non const value. By adding an uintptr_t cast the warning is suppressed. To avoid the cast (proper solution) several function signatures must be changed. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: avoid warnings in vport_from_privDaniele Di Proietto2014-05-161-2/+2
| | | | | | | | | | | | | | | | | | | | This change, firstly, avoids declaring the formal parameter const, since it is treated as non const. (to avoid -Wcast-qual) Secondly, it cast the pointer from void* to u8*, since it is used in arithmetic (to avoid -Wpointer-arith) Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: use const in some local vars and castsDaniele Di Proietto2014-05-162-9/+13
| | | | | | | | | | | | | | | | | | In few functions, const formal parameters are assigned or cast to non-const. These changes suppress warnings if compiled with -Wcast-qual. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
* | Merge branch 'dt_fixed_phy'David S. Miller2014-05-167-13/+214
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thomas Petazzoni says: ==================== Add DT support for fixed PHYs Here is a fourth version of the patch set that adds a Device Tree binding and the related code to support fixed PHYs. I'm hoping to get this merged in 3.16. Changes since v3: * Rebased on top of v3.15-rc5 * In patch "net: phy: decouple PHY id and PHY address in fixed PHY driver", changed the PHY ID of fixed PHYs from 0xdeadbeef to 0x0, as suggested by Grant Likely. * Fixed the !CONFIG_PHY_FIXED case in patch "net: phy: extend fixed driver with fixed_phy_register()". Noticed by Florian Fainelli. * Added Acked-by from Grant Likely and Florian Fainelli on patch "net: phy: extend fixed driver with fixed_phy_register()". * Reworked the new fixed-link DT binding to be just a sub-node of the Ethernet MAC node, and not a node referenced by the 'phy' property. This was requested by Grant Likely. * Reworked the code implementing the new DT binding to also make it accept the old, single property based, DT binding. * Added a patch that actually uses the new fixed link DT binding for the Armada XP Matrix board. Changes since v2: * Rebased on top of v3.14-rc1, and re-tested on hardware. * Removed the RFC tag, since there seems to be some real interest in this feature, and the code has gone through several iterations already. * The error handling in fixed_phy_register() has been fixed. Changes since v1: * Instead of using a 'fixed-link' property inside the Ethernet device DT node, with a fairly cryptic succession of integer values, we now use a PHY subnode under the Ethernet device DT node, with explicit properties to configure the duplex, speed, pause and other PHY properties. * The PHY address is automatically allocated by the kernel and no longer visible in the Device Tree binding. * The PHY device is created directly when the network driver calls of_phy_connect_fixed_link(), and associated to the PHY DT node, which allows the existing of_phy_connect() function to work, without the need to use the deprecated of_phy_connect_fixed_link(). Posts of previous versions: RFCv1: http://www.spinics.net/lists/netdev/msg243253.html RFCv2: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-September/196919.html PATCHv3: http://www.spinics.net/lists/netdev/msg273117.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ARM: mvebu: use the fixed-link PHY DT binding for the Armada XP Matrix boardThomas Petazzoni2014-05-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The Armada XP Matrix board has an Ethernet PHY that isn't configurable through the MDIO bus, so we use the newly introduced fixed-link PHY DT binding to represent the PHY of this platform and get network working. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: mvneta: add support for fixed linksThomas Petazzoni2014-05-161-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Following the introduction of of_phy_register_fixed_link(), this patch introduces fixed link support in the mvneta driver, for Marvell Armada 370/XP SOCs. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | of: provide a binding for fixed link PHYsThomas Petazzoni2014-05-163-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some Ethernet MACs have a "fixed link", and are not connected to a normal MDIO-managed PHY device. For those situations, a Device Tree binding allows to describe a "fixed link" using a special PHY node. This patch adds: * A documentation for the fixed PHY Device Tree binding. * An of_phy_is_fixed_link() function that an Ethernet driver can call on its PHY phandle to find out whether it's a fixed link PHY or not. It should typically be used to know if of_phy_register_fixed_link() should be called. * An of_phy_register_fixed_link() function that instantiates the fixed PHY into the PHY subsystem, so that when the driver calls of_phy_connect(), the PHY device associated to the OF node will be found. These two additional functions also support the old fixed-link Device Tree binding used on PowerPC platforms, so that ultimately, the network device drivers for those platforms could be converted to use of_phy_is_fixed_link() and of_phy_register_fixed_link() instead of of_phy_connect_fixed_link(), while keeping compatibility with their respective Device Tree bindings. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: extend fixed driver with fixed_phy_register()Thomas Petazzoni2014-05-162-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing fixed_phy_add() function has several drawbacks that prevents it from being used as is for OF-based declaration of fixed PHYs: * The address of the PHY on the fake bus needs to be passed, while a dynamic allocation is desired. * Since the phy_device instantiation is post-poned until the next mdiobus scan, there is no way to associate the fixed PHY with its OF node, which later prevents of_phy_connect() from finding this fixed PHY from a given OF node. To solve this, this commit introduces fixed_phy_register(), which will allocate an available PHY address, add the PHY using fixed_phy_add() and instantiate the phy_device structure associated with the provided OF node. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Grant Likely <grant.likely@linaro.org> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: decouple PHY id and PHY address in fixed PHY driverThomas Petazzoni2014-05-161-10/+10
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the fixed_phy_add() function was taking as argument 'phy_id', which was used both as the PHY address on the fake fixed MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and MII_PHYSID2 registers. However, those two informations are completely unrelated. This patch decouples them. The PHY id of fixed PHYs is hardcoded to be 0x0. Ideally, a really reserved value would be nicer, but there doesn't seem to be an easy of making sure a dummy value can be assigned to the Linux kernel for such usage. The PHY address remains passed by the caller of phy_fixed_add(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'bridge-non-promisc'David S. Miller2014-05-167-20/+294
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vlad Yasevich says: ==================== bridge: Non-promisc bridge ports support This series adds functionality to the bridge device to enable operations without setting all ports to promiscuous mode. The basic concept is this. The bridge keeps track of the ports that support learning and flooding packets to unknown destinations. We call these ports auto-discovery ports since they automatically discover who is behind them through learning and flooding. If flooding and learning are disabled via flags, then the port requires static configuration to tell it which mac addresses are behind it. This is accomplished through adding of fdbs. These fdbs should be static as dynamic fdbs can expire and systems will become unreachable due to lack of flooding. If the user marks all ports as needing static configuration then we can safely make them non-promiscuous since we will know all the information about them. If the user leaves only 1 port as automatic, then we can mark that port as not-promiscuous as well. One could think of this a edge relay similar to what's support by embedded switches in SRIOV devices. Since we have all the information about the other ports, we can just program the mac addresses into the single automatic port to receive all necessary traffic. More information about this is patch 6. In other cases, we keep all ports promiscuous as before. There are some other cases when promiscuous mode has to be turned back on. One is when the bridge itself if placed in promiscuous mode (user sets promisc flag). The other is if vlan filtering is turned off. Since this is the default configuration, the default bridge operation is not changed. Changes since v2: - White space and spelling fixes from Michael Tsirkin - Squash patches 6, 7 and 8 to prevent bisect breakage. Changes since v1: - Address issues rasied by Stephen Heminger - Address initializer comments raised by Sergey Shtylyov - Rebased recent net-next. Changes since rfc v2: - Better description of in the commit logs - Leave port in promiscuous mode if IFF_UNICAST_FLT is disabled on the device. - Fix issue with flag masking - Rework patch ordering a bit. Changes since rfc v1: - Removed private list. We now traverse the fdb hashtable itself to write necessary addresses to the ports (Stephen's concern) - Add learning flag to the mask for flags that decides if the port is 'auto' or not (suggest by MST and Jamal). - Simplified tracking of such ports at the cost of a loop over all ports (suggested by MST) I've played with quite a large number of ports and the current approach seems to work fairly well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: Automatically manage port promiscuous mode.Vlad Yasevich2014-05-164-7/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There exist configurations where the administrator or another management entity has the foreknowledge of all the mac addresses of end systems that are being bridged together. In these environments, the administrator can statically configure known addresses in the bridge FDB and disable flooding and learning on ports. This makes it possible to turn off promiscuous mode on the interfaces connected to the bridge. Here is why disabling flooding and learning allows us to control promiscuity: Consider port X. All traffic coming into this port from outside the bridge (ingress) will be either forwarded through other ports of the bridge (egress) or dropped. Forwarding (egress) is defined by FDB entries and by flooding in the event that no FDB entry exists. In the event that flooding is disabled, only FDB entries define the egress. Once learning is disabled, only static FDB entries provided by a management entity define the egress. If we provide information from these static FDBs to the ingress port X, then we'll be able to accept all traffic that can be successfully forwarded and drop all the other traffic sooner without spending CPU cycles to process it. Another way to define the above is as following equations: ingress = egress + drop expanding egress ingress = static FDB + learned FDB + flooding + drop disabling flooding and learning we a left with ingress = static FDB + drop By adding addresses from the static FDB entries to the MAC address filter of an ingress port X, we fully define what the bridge can process without dropping and can thus turn off promiscuous mode, thus dropping packets sooner. There have been suggestions that we may want to allow learning and update the filters with learned addresses as well. This would require mac-level authentication similar to 802.1x to prevent attacks against the hw filters as they are limited resource. Additionally, if the user places the bridge device in promiscuous mode, all ports are placed in promiscuous mode regardless of the changes to flooding and learning. Since the above functionality depends on full static configuration, we have also require that vlan filtering be enabled to take advantage of this. The reason is that the bridge has to be able to receive and process VLAN-tagged frames and the there are only 2 ways to accomplish this right now: promiscuous mode or vlan filtering. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>