summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* e1000e: Cleanup - Update GPL header and CopyrightDavid Ertman2014-03-0823-599/+444
| | | | | | | | | | | | | This patch is to update the GPL header by removing the portion that refers to the Free Software Foundation address. Change the copyright date for 2014. Reformat the header comments to conform to kernel networking coding norms Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* e1000e: Fix 82579 sets LPI too early.David Ertman2014-03-083-5/+19
| | | | | | | | | | | | | | Enabling EEE LPI sooner than one second after link up on 82579 causes link issues with some switches. Remove EEE enablement for 82579 parts from the link initialization flow to avoid initializing too early. EEE initialization for 82579 will be done in e1000e_update_phy_task. Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Acked-by: Bruce W Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* e1000e: Resolve issues with Management Engine (ME) briefly blocking PHY resetsDavid Ertman2014-03-081-4/+26
| | | | | | | | | | | | | | | | | | | | | On a ME enabled system with the cable out, the driver init flow would generate an erroneous message indicating that resets were being blocked by an active ME session. Cause was ME clearing the semaphore bit to block further PHY resets for up to 50 msec during power-on/cycle. After this interval, ME would re-set the bit and allow PHY resets. To resolve this, change the flow of e1000e_phy_hw_reset_generic() to utilize a delay and retry method. Poll the FWSM register to minimize any extra time added to the flow. If the delay times out at 100ms (checked in 10msec increments), then return the value E1000_BLK_PHY_RESET, as this is the accurate state of the PHY. Attempting to alter just the call to e1000e_phy_hw_reset_generic() in e1000_init_phy_workarounds_pchlan() just caused the problem to move further down the flow. Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Acked-by: Bruce W. Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* e1000e: Cleanup unecessary referencesDavid Ertman2014-03-081-2/+2
| | | | | | | | Cleaning up some pointer references that are no longer necessary Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* e1000e: PTP lock in e1000e_phc_adjustfreqTodd Fujinaka2014-03-081-0/+5
| | | | | | | | | Add lock in e1000e_phc_adjfreq to prevent concurrent changes to TIMINCA and SYSTIMH/L. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* 6lowpan: reassembly: fix return of init functionAlexander Aring2014-03-071-2/+3
| | | | | | | | | This patch adds a missing return after fragmentation init. Otherwise we register a sysctl interface and deregister it afterwards which makes no sense. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'linux-can-next-for-3.15-20140307' of ↵David S. Miller2014-03-076-88/+172
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://gitorious.org/linux-can/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2014-02-12 this is a pull request of twelve patches for net-next/master. Alexander Shiyan contributes two patches for the mcp251x, one making the driver more quiet and the other one improves the compile time coverage by removing the #ifdef CONFIG_PM_SLEEP. Then two patches for the flexcan driver by me, one removing the #ifdef CONFIG_PM_SLEEP, too, the other one making use of platform_get_device_id(). Another patch by me which converts the janz-ican3 driver to use netdev_<level>(). The remaining 7 patches are by Oliver Hartkopp, they add CAN FD support to the netlink configuration interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * can: add bittiming check at interface open for CAN FDOliver Hartkopp2014-03-071-0/+8
| | | | | | | | | | | | | | | | | | Additionally to have the second (data) bitrate available the data bitrate has to be greater or equal to the arbitration bitrate in CAN FD. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: allow to change the device mtu for CAN FD capable devicesOliver Hartkopp2014-03-073-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | The configuration for CAN FD depends on CAN_CTRLMODE_FD enabled in the driver specific ctrlmode_supported capabilities. The configuration can be done either with the 'fd { on | off }' option in the 'ip' tool from iproute2 or by setting the CAN netdevice MTU to CAN_MTU (16) or to CANFD_MTU (72). Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: introduce the data bitrate configuration for CAN FDOliver Hartkopp2014-03-073-3/+50
| | | | | | | | | | | | | | | | | | | | As CAN FD offers a second bitrate for the data section of the CAN frame the infrastructure for storing and configuring this second bitrate is introduced. Improved the readability of the if-statement by inserting some newlines. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: provide a separate bittiming_const parameter to bittiming functionsOliver Hartkopp2014-03-071-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | As the bittiming calculation functions are to be used with different bittiming_const structures for CAN and CAN FD the direct reference to priv->bittiming_const inside these functions has to be removed. Also moved the check for existing bittiming const to one place. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: move sanity check for bitrate and tq into can_get_bittimingOliver Hartkopp2014-03-071-14/+15
| | | | | | | | | | | | | | | | | | | | | | This patch moves a sanity check in order to have a second user for CAN FD. Also simplify the return value generation in can_get_bittiming() as only correct return values of can_[calc|fixup]_bittiming() lead to a return value of zero. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: only send bitrate data via netlink when availableOliver Hartkopp2014-03-071-4/+6
| | | | | | | | | | | | | | | | | | | | | | When setting the bitrate both can_calc_bittiming() and can_fixup_bittiming() lead to the bitrate variable to be set, when a proper bit timing is available. Only then the bitrate configuration is stored for the device, so checking for priv->bittiming.bitrate is always sufficient. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: preserve skbuff protocol in can_put_echo_skbOliver Hartkopp2014-03-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | The skbuff protocol value was formerly fixed/sanitized to ETH_P_CAN in can_put_echo_skb(). With CAN FD this value has to be preserved. This patch changes the hard assignment of the protocol value to a check of valid protocol values for CAN and CAN FD. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: janz-ican3: convert dev_<level> printing to netdev_<level>Marc Kleine-Budde2014-03-071-34/+30
| | | | | | | | | | | | | | | | | | This patch converts the dev_<level> printing to netdev_<level>, this makes it possible to remove the "struct device *dev" pointer from the "struct ican3_dev". Cc: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: flexcan: make use of platform_get_device_id()Marc Kleine-Budde2014-03-061-2/+2
| | | | | | | | | | | | This patch replaces an open coded pdev->id_entry by platform_get_device_id(). Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: flexcan: Remove #ifdef CONFIG_PM_SLEEPMarc Kleine-Budde2014-03-061-4/+2
| | | | | | | | | | | | This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: mcp251x: Remove #ifdef CONFIG_PM_SLEEPAlexander Shiyan2014-03-061-5/+2
| | | | | | | | | | | | | | This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: mcp251x: Make driver more quietAlexander Shiyan2014-03-061-6/+4
| | | | | | | | | | | | | | | | This patch moves one diagnostic message used for debugging purposes to dev_dbg() and removes one useless message. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
* | Merge branch 'r8152'David S. Miller2014-03-071-60/+263
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hayes Wang says: ==================== r8152: tx/rx improvement - Select the suitable spin lock for each function. - Add additional check to reduce the spin lock. - Up the priority of the tx to avoid interrupted by rx. - Support rx checksum, large send, and IPv6 hw checksum. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: support IPv6hayeswang2014-03-071-3/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support hw IPv6 checksum for TCP and UDP packets. Note that the hw has the limitation of the range of the transport offset. Besides, the TCP Pseudo Header of the IPv6 TSO of the hw bases on the Microsoft document which excludes the packet length. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: support TSOhayeswang2014-03-071-30/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support scatter gather and TSO. Adjust the tx checksum function and set the max gso size to fix the size of the tx aggregation buffer. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: support rx checksumhayeswang2014-03-071-2/+39
| | | | | | | | | | | | | | | | | | | | | Support hw rx checksum for TCP and UDP packets. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: calculate the dropped packets for rxhayeswang2014-03-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Continue dealing with the remain rx packets, even though the allocation of the skb fail. This could calculate the correct dropped packets. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: up the priority of the transmissionhayeswang2014-03-071-18/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | move the tx_bottom() from delayed_work to tasklet. It makes the rx and tx balanced. If the device is in runtime suspend when getting the tx packet, wakeup the device before trasmitting. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: check tx agg list before spin lockhayeswang2014-03-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Check tx agg list before spin lock to avoid doing spin lock every times. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8152: replace spin_lock_irqsave and spin_unlock_irqrestorehayeswang2014-03-071-16/+12
|/ / | | | | | | | | | | | | | | | | | | | | Use spin_lock and spin_unlock in interrupt context. The ndo_start_xmit would not be called in interrupt context, so replace the relative spin_lock_irqsave and spin_unlock_irqrestore with spin_lock_bh and spin_unlock_bh. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2014-03-0714-355/+1432
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to i40e and i40evf. Most notable are: Joseph completes the implementation of the ethtool ntuple rule management interface by adding the get, update and delete interface reset. Akeem provides a fix to prevent a possible overflow due to multiplication of number and size by using kzalloc, so use kcalloc. Jesse provides an implementation for skb_set_hash() and adds the L4 type return when we know it is an L4 hash. He also adds a counter to statistics for Tx timeouts to help users. Lastly he provides a change to stay away from the cache line where the done bit may be getting written back for the transmit ring since the hardware may be writing the whole cache line for a partial update. Shannon cleans up code comments. Anjali removes a firmware workaround for newer firmware since the number of MSIx vectors are being reported correctly. v2: - dropped patch 01 of the series based on feedback from the author Joe Perches and Shannon Nelson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | i40e/i40evf: Bump pf&vf build versionsCatherine Sullivan2014-03-072-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Bump i40e to 0.3.34 and i40evf to 0.9.14. Change-ID: I6b3fb8ccf55b128d2baa4bdc20d3911ec81d4a5b Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e/i40evf: carefully fill tx ringJesse Brandeburg2014-03-072-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure that we stay away from the cache line where the DD bit (done) may be getting written back for the transmit ring since the hardware may be writing the whole cache line for a partial update. Change-ID: Id0b6dfc01f654def6a2a021af185803be1915d7e Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: fix nvm version and remove firmware reportJesse Brandeburg2014-03-072-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver needs to use the format that the current NVM uses when printing the version of the NVM. It should remain this way from now on forward. The driver was reporting when firmware was less than an expected version number, but this is not a requirement for the product and we print the firmware number at init and in ethtool -i output. Just remove the print. Change-ID: Ide0b856cd454ebf867610ef9a0d639bb358a4a60 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Fix static checker warningNeerav Parikh2014-03-071-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following static checker warning: drivers/net/ethernet/intel/i40e/i40e_dcb.c:342 i40e_lldp_to_dcb_config() warn: 'tlv' can't be NULL. Exit criteria from the while loop is encountering LLDP END LV or if the TLV length goes beyond the buffer length. Change-ID: I7548b16db90230ec2ba0fa791b0343ca8b7dd5bb Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Tested-By: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Remove a redundant filter additionAnjali Singhai Jain2014-03-071-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a redundant filter addition to stop FW complaints about a redundant filter removal. Change-ID: I22bef6b682bd8d43432557e6e2b3e73ffb27b985 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: count timeout eventsJesse Brandeburg2014-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ethtool -S statistics should have a counter for tx timeouts in order to better help inform the masses. Change-ID: Ice4b20ed4a151509f366719ab105be49c9e7b2b4 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Remove a FW workaround for Number of MSIX vectorsAnjali Singhai Jain2014-03-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Number of MSIX vectors being reported is correct and hence we need a check to do the right thing for FWs before and after. Change-ID: I50902d1c848adcb960ea49ac73f7865ca871a1c3 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: clean up comment styleShannon Nelson2014-03-071-59/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lots of trivial changes to remove double spaces in function headers, unnecessary periods in short comments, and adjust the English usage here and there. No actual code was harmed in the making of this patch. Change-ID: I6e756c500756945e81a61ffb10221753eb7923ea Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e/i40evf: i40e implementation for skb_set_hashJesse Brandeburg2014-03-076-4/+800
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original comment from Tom Herbert <therbert@google.com> Drivers should call skb_set_hash to set the hash and its type in an skbuff. This patch builds upon Tom's original implementation and adds the L4 type return when we know it is an L4 hash. This requires use of the ptype decoder ring, so enable it. Change-ID: I2f9fa86d1a6add58cff13386f7f4238b1abcc468 CC: Tom Herbert <therbert@google.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Prevent overflow due to kzallocAkeem G Abodunrin2014-03-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prevent the possibility of overflow due multiplication of number and size use kcalloc instead of kzalloc. Change-ID: Ibe4d81ed7d9738d3bbe66ee4844ff9be817e8080 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Flow Director sideband accountingJoseph Gasparakis2014-03-075-264/+535
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch completes implementation of the ethtool ntuple rule management interface. It adds the get, update and delete interface reset. Change-ID: Ida7f481d9ee4e405ed91340b858eabb18a52fdb5 Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40evf: Enable the ndo_set_features netdev opGreg Rose2014-03-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set netdev->hw_features to enable the ndo_set_features netdev op. Change-Id: I5a086fbfa5a089de5adba2800c4d0b3a73747b11 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | Merge tag 'rxrpc-devel-20140304' of ↵David S. Miller2014-03-0714-168/+645
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== net-next: AF_RXRPC fixes and development Here are some AF_RXRPC fixes: (1) Fix to remove incorrect checksum calculation made during recvmsg(). It's unnecessary to try to do this there since we check the checksum before reading the RxRPC header from the packet. (2) Fix to prevent the sending of an ABORT packet in response to another ABORT packet and inducing a storm. (3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we don't handle the ICMP packet not specifying an MTU size. And development patches: (4) Add sysctls for configuring RxRPC parameters, specifically various delays pertaining to ACK generation, the time before we resend a packet for which we don't receive an ACK, the maximum time a call is permitted to live and the amount of time transport, connection and dead call information is cached. (5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED packets, ignoring the MORE_PACKETS flag, delaying the production of otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production (barring the call termination) to half a second. (6) Add more sysctl parameters to expose the Rx window size, the maximum packet size that we're willing to receive and the number of jumbo rxrpc packets we're willing to handle in a single UDP packet. (7) Request ACKs on alternate DATA packets so that the other side doesn't wait till we fill up the Tx window. (8) Use a RCU hash table to look up the rxrpc_call for an incoming packet rather than stepping through a hierarchy involving several spinlocks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | af_rxrpc: Keep rxrpc_call pointers in a hashtableTim Smith2014-03-043-106/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of rxrpc_call structures in a hashtable so they can be found directly from the network parameters which define the call. This allows incoming packets to be routed directly to a call without walking through hierarchy of peer -> transport -> connection -> call and all the spinlocks that that entailed. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Request an ACK for every alternate DATA packetDavid Howells2014-02-261-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the RxRPC header flag to request an ACK packet for every odd-numbered DATA packet unless it's the last one (which implicitly requests an ACK anyway). This is similar to how librx appears to work. If we don't do this, we'll send out a full window of packets and then just sit there until the other side gets bored and sends an ACK to indicate that it's been idle for a while and has received no new packets. Requesting a lot of ACKs shouldn't be a problem as ACKs should be merged when possible. As AF_RXRPC currently works, it will schedule an ACK to be generated upon receipt of a DATA packet with the ACK-request packet set - and in the time taken to schedule this in a work queue, several other packets are likely to arrive and then all get ACK'd together. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Expose more RxRPC parameters via sysctlsDavid Howells2014-02-265-4/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU maximum size and the number of packets that can be glued into a jumbo packet. More info added to Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Improve ACK productionDavid Howells2014-02-263-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve ACK production by the following means: (1) Don't send an ACK_REQUESTED ack immediately even if the RXRPC_MORE_PACKETS flag isn't set on a data packet that has also has RXRPC_REQUEST_ACK set. MORE_PACKETS just means that the sender just emptied its Tx data buffer. More data will be forthcoming unless RXRPC_LAST_PACKET is also flagged. It is possible to see runs of DATA packets with MORE_PACKETS unset that aren't waiting for an ACK. It is therefore better to wait a small instant to see if we can combine an ACK for several packets. (2) Don't send an ACK_IDLE ack immediately unless we're responding to the terminal data packet of a call. Whilst sending an ACK_IDLE mid-call serves to let the other side know that we won't be asking it to resend certain Tx buffers and that it can discard them, spamming it with loads of acks just because we've temporarily run out of data just distracts it. (3) Put the ACK_IDLE ack generation timeout up to half a second rather than a single jiffy. Just because we haven't been given more data immediately doesn't mean that more isn't forthcoming. The other side may be busily finding the data to send to us. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Add sysctls for configuring RxRPC parametersDavid Howells2014-02-2612-27/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysctls for configuring RxRPC protocol handling, specifically controls on delays before ack generation, the delay before resending a packet, the maximum lifetime of a call and the expiration times of calls, connections and transports that haven't been recently used. More info added in Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Fix UDP MTU calculation from ICMP_FRAG_NEEDEDDavid Howells2014-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF_RXRPC sends UDP packets with the "Don't Fragment" bit set in an attempt to determine the maximum packet size between the local socket and the peer by invoking the generation of ICMP_FRAG_NEEDED packets. Once a packet is sent with the "Don't Fragment" bit set, it is then inconvenient to break it up as that requires recalculating all the rxrpc serial and sequence numbers and reencrypting all the fragments, so we switch off the "Don't Fragment" service temporarily and send the bounced packet again. Future packets then use the new MTU. That's all fine. The problem lies in rxrpc_UDP_error_report() where the code that deals with ICMP_FRAG_NEEDED packets lives. Packets of this type have a field (ee_info) to indicate the maximum packet size at the reporting node - but sometimes ee_info isn't filled in and is just left as 0 and the code must allow for this. When ee_info is 0, the code should take the MTU size we're currently using and reduce it for the next packet we want to send. However, it takes ee_info (which is known to be 0) and tries to reduce that instead. This was discovered by Coverity. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Prevent RxRPC peers from ABORT-storming one anotherTim Smith2014-02-071-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an ABORT is sent, aborting a connection, the sender quite reasonably forgets about the connection. If another frame is received, another ABORT will be sent. When the receiver gets it, it no longer applies to an extant connection, so an ABORT is sent, and so on... Prevent this by never sending a rejection for an ABORT packet. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | af_rxrpc: Remove incorrect checksum calculation from rxrpc_recvmsg()Tim Smith2014-02-071-24/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UDP checksum was already verified in rxrpc_data_ready() - which calls skb_checksum_complete() - as the RxRPC packet header contains no checksum of its own. Subsequent calls to skb_copy_and_csum_datagram_iovec() are thus redundant and are, in any case, being passed only a subset of the UDP payload - so the checksum will always fail if that path is taken. So there is no need to check skb->ip_summed in rxrpc_recvmsg(), and no need for the csum_copy_error: exit path. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
* | | | Merge branch 'xen-netback-next'David S. Miller2014-03-073-289/+740
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zoltan Kiss says: ==================== xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy A long known problem of the upstream netback implementation that on the TX path (from guest to Dom0) it copies the whole packet from guest memory into Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a huge perfomance penalty. The classic kernel version of netback used grant mapping, and to get notified when the page can be unmapped, it used page destructors. Unfortunately that destructor is not an upstreamable solution. Ian Campbell's skb fragment destructor patch series [1] tried to solve this problem, however it seems to be very invasive on the network stack's code, and therefore haven't progressed very well. This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to know when the skb is freed up. That is the way KVM solved the same problem, and based on my initial tests it can do the same for us. Avoiding the extra copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node, running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb switch) Based on my investigations the packet get only copied if it is delivered to Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit unfortunate, but luckily it doesn't cause a major regression for this usecase. In the future we should try to eliminate that copy somehow. There are a few spinoff tasks which will be addressed in separate patches: - grant copy the header directly instead of map and memcpy. This should help us avoiding TLB flushing - use something else than ballooned pages - fix grant map to use page->index properly I've tried to broke it down to smaller patches, with mixed results, so I welcome suggestions on that part as well: 1: Use skb->cb to store pending_idx 2: Some refactoring 3: Change RX path for mapped SKB fragments (moved here to keep bisectability, review it after #4) 4: Introduce TX grant mapping 5: Remove old TX grant copy definitons and fix indentations 6: Add stat counters for zerocopy 7: Handle guests with too many frags 8: Timeout packets in RX path 9: Aggregate TX unmap operations v2: I've fixed some smaller things, see the individual patches. I've added a few new stat counters, and handling the important use case when an older guest sends lots of slots. Instead of delayed copy now we timeout packets on the RX path, based on the assumption that otherwise packets should get stucked anywhere else. Finally some unmap batching to avoid too much TLB flush v3: Apart from fixing a few things mentioned in responses the important change is the use the hypercall directly for grant [un]mapping, therefore we can avoid m2p override. v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue timeout logic changed also. v5: Only minor fixes based on Wei's comments v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see first 2 patches. Also rework of handling packets with too many slots, and reorder the series a bit. v7: Small fixes in comments/log messages/error paths, and merging the frag overflow stats patch into its parent. [1] http://lwn.net/Articles/491522/ [2] https://lkml.org/lkml/2012/7/20/363 ==================== Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>