summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* isdn: remove isdn4linuxArnd Bergmann2019-05-3159-26860/+2
| | | | | | | | | | | | | | | | | | | | With all isdn4linux hardware drivers gone, this is only a wrapper around CAPI to support old user space. However, from looking at the mailing list, it seems that the last time anyone asked about it was in 2014, when the upgrade from a linux-2.4 installation failed, and mISDN was suggested as a replacement. The largest public ISDN network (Deutsche Telekom) was supposed to be shut down 2018, which must have drastically reduced the number of legacy installations. When we last discussed removing i4l in 2016, Karsten Keil suggested revisiting this in 2018. I guess this is overdue. Link: http://listserv.isdn4linux.de/pipermail/isdn4linux/2014-October/006165.html Link: https://patchwork.kernel.org/patch/8484861/#17900371 Link: https://listserv.isdn4linux.de/pipermail/isdn4linux/2019-April/thread.html Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* isdn: remove hisax driverArnd Bergmann2019-05-31104-56236/+0
| | | | | | | | | | | | | | | | | | | | | With the decline of ISDN, this seems to have become almost completely obsolete, and even in the past years before that, almost all remaining users appear to have used mISDN instead. Birger Harzenetter noted that he is still using i4l/hisax to take advantage of the 'divert' driver for call diversion, but otherwise uses mISDN on the same hardware. This is a rare edge case as far as I can tell, but we are still breaking an actively used work flow (see https://xkcd.com/1172/). We debated moving i4l/hisax to staging as an intermediate step, but as he is not likely to change the setup, and that would just delay breaking this use case. The alternatives here are to stay on stable kernels < 5.2, to create an external driver repository for isdn4linux, or to add divert functionality to mISDN. Cc: Birger Harzenetter <WIMPy@yeti.dk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* isdn: gigaset: remove i4l supportArnd Bergmann2019-05-314-735/+15
| | | | | | | isdn4linux is getting removed, and the gigaset driver can still use the CAPI support, so this can all go away. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* Merge branch '100GbE' of ↵David S. Miller2019-05-3116-218/+754
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-05-30 This series contains updates to ice driver only. Brett continues his work with interrupt handling by fixing an issue where were writing to the incorrect register to disable all VF interrupts. Tony consolidates the unicast and multicast MAC filters into a single new function. Anirudh adds support for virtual channel vector mapping to receive and transmit queues. This uses a bitmap to associate indicated queues with the specified vector. Makes several cosmetic code cleanups, as well as update the driver to align with the current specification for managing MAC operation codes (opcodes). Paul adds support for Forward Error Correction (FEC) and also adds the ethtool get and set handlers to modify FEC parameters. Bruce cleans up the driver code to fix a number of issues, such as, reducing the scope of some local variables, reduce the number of de-references by changing a local variable and reorder the code to remove unnecessary "goto's". Dave adds switch rules to be able to handle LLDP packets and in the process, fix a couple of issues found, like stop treating DCBx state of "not started" as an error and stop hard coding the filter information flag to transmit. Jacob updates the driver to allow for more granular debugging by developers by using a distinct separate bit for dumping firmware logs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ice: Trivial cosmetic changesAnirudh Venkataramanan2019-05-3012-54/+53
| | | | | | | | | | | | | | | | | | This patch mostly capitalizes abbreviations in code comments. Fixed some typos and removed some unnecessary newlines as well. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Recognize higher speedsAnirudh Venkataramanan2019-05-302-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | In ice_print_link_msg, add cases for 50GB and 100GB speeds. This results in the right speed being reported on load, instead of "Unknownbps". When VF link if forced (in ice_set_pfe_link_forced), report max speed 100GB. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Use a different ICE_DBG bit for firmware log messagesJacob Keller2019-05-302-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Replace the use of the ICE_DBG_AQ_MSG bit when dumping firmware logging messages with a separate distinct type ICE_DBG_FW_LOG. This is useful so that developers may enable ICE_DBG_FW_LOG and get firmware logging messages, without also dumping AdminQ messages at the same time. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Update function headerAnirudh Venkataramanan2019-05-301-0/+4
| | | | | | | | | | | | | | | | Add some details to the function header for ice_deinit_hw. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Move define for ICE_AQC_DRIVER_UNLOADINGAnirudh Venkataramanan2019-05-301-1/+1
| | | | | | | | | | | | | | | | | | The define describing the bits for the struct field should be below the field itself. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Align to updated AQ command formatsAnirudh Venkataramanan2019-05-301-7/+8
| | | | | | | | | | | | | | | | | | | | The current specification has updates to the command formats for manage MAC opcodes (opcodes 0x0107 and 0x0108) and get PHY caps (opcode 0x0600). Update the code to reflect this. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Use continue instead of an else blockAnirudh Venkataramanan2019-05-301-3/+5
| | | | | | | | | | | | | | | | | | For style consistency, use continue instead of an else block in ice_pf_dcb_recfg. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Change minimum descriptor count value for Tx/Rx ringsPreethi Banala2019-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | Change minimum number of descriptor count from 32 to 64. This is to have a feature parity with previous Intel NIC drivers. Signed-off-by: Preethi Banala <preethi.banala@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Add switch rules to handle LLDP packetsDave Ertman2019-05-305-6/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add call to configure dropping egress LLDP packets in ice_vsi_setup and remove the rule in ice_vsi_release. Add calls to add/remove rule to route LLDP packets to default VSI when FW LLDP engine is disabled/enabled and remove rule if applied during ice_vsi_release. In the function ice_add_eth_mac(), there is a line that hard codes the filter info flag to TX. This is incorrect as this flag will be set by the calling function that built the list of filters to add. So remove the hard coded value. This patch also contains a fix to stop treating the DCBx state of "Not Started" as an error state that kicks DCB in SW mode. This will address having non-cabled interfaces automatically go into SW mode with the FW engine running. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Cleanup ice_update_link_infoBruce Allan2019-05-301-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Do not allocate memory for the Get PHY Abilities command data buffer when it is not necessary, change one local variable to another to reduce the number of de-references, reduce the scope of some local variables, and reorder the code and change exit points to get rid of an unnecessary goto label. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Use right type for ice_cfg_vsi_lan returnAkeem G Abodunrin2019-05-301-8/+10
| | | | | | | | | | | | | | | | | | | | ice_cfg_vsi_lan returns a value of type enum ice_status. So use a local of the same type to capture the return value. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Add support for Forward Error Correction (FEC)Paul Greenwalt2019-05-306-3/+349
| | | | | | | | | | | | | | | | | | | | This patch adds driver support for Forward Error Correction (FEC) and ethtool handlers to set/get FEC params. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Add support for virtchnl_vector_map.[rxq|txq]_mapAnirudh Venkataramanan2019-05-303-44/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for virtchnl_vector_map.[rxq|txq]_map to use bitmap to associate indicated queues with the specified vector. This support is needed since the Windows AVF driver calls VIRTCHNL_OP_CONFIG_IRQ_MAP for each vector and used the bitmap to indicate the associated queues. Updated ice_vc_dis_qs_msg to not subtract one from virtchnl_irq_map_info.num_vectors, and changed the VSI vector index to the vector id. This change supports the Windows AVF driver which maps one vector at a time and sets num_vectors to one. Using vectors_id to index the vector array . Add check for vector_id zero, and return VIRTCHNL_STATUS_ERR_PARAM if vector_id is zero and there are rings associated with that vector. Vector_id zero is for the OICR. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Introduce ice_init_mac_fltr and move ice_napi_delTony Nguyen2019-05-303-44/+79
| | | | | | | | | | | | | | | | | | | | | | | | Consolidate adding unicast and broadcast MAC filters in a single new function ice_init_mac_fltr. Move ice_napi_del to ice_lib.c Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Use GLINT_DYN_CTL to disable VF's interruptsBrett Creeley2019-05-301-28/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in ice_free_vf_res() we are writing to the VFINT_DYN_CTLN register in the PF's function space to disable all VF's interrupts. This is incorrect because this register is only for use in the VF's function space. This becomes obvious when seeing that the valid indices used for the VFINT_DYN_CTLN register is from 0-63, which is the maximum number of interrupts for a VF (not including the OICR interrupt). Fix this by writing to the GLINT_DYN_CTL register for each VF. We can do this because we keep track of each VF's first_vector_idx inside of the PF's function space and the number of interrupts given to each VF. Also in ice_free_vfs() we were disabling Rx/Tx queues after calling pci_disable_sriov(). One part of disabling the Tx queues causes the PF driver to trigger a software interrupt, which causes the VF's napi routine to run. This doesn't currently work because pci_disable_sriov() causes iavf_remove() to be called which disables interrupts. Fix this by disabling Rx/Tx queues prior to pci_disable_sriov(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | net: sched: act_ctinfo: minor size optimisationKevin 'ldir' Darbyshire-Bryant2019-05-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | Since the new parameter block is initialised to 0 by kzmalloc we don't need to mask & clear unused operational mode bits, they are already unset. Drop the pointless code. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'complex-c45-phys'David S. Miller2019-05-313-21/+40
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heiner Kallweit says: ==================== net: phy: improve handling of more complex C45 PHY's This series tries to address few problematic aspects raised by Russell. Concrete example is the Marvell 88x3310, the changes should be helpful for other complex C45 PHY's too. v2: - added patch enabling interrupts also if phylib state machine isn't started - removed patch dealing with the double link status read This one needs little bit more thinking and will go separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: export phy_queue_state_machineHeiner Kallweit2019-05-312-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We face the issue that link change interrupt and link status may be reported by different PHY layers. As a result the link change interrupt may occur before the link status changes. Export phy_queue_state_machine to allow PHY drivers to specify a delay between link status change interrupt and link status check. v2: - change jiffies parameter type to unsigned long Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: add callback for custom interrupt handler to struct phy_driverHeiner Kallweit2019-05-312-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The phylib interrupt handler handles link change events only currently. However PHY drivers may want to use other interrupt sources too, e.g. to report temperature monitoring events. Therefore add a callback to struct phy_driver allowing PHY drivers to implement a custom interrupt handler. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: enable interrupts when PHY is attached alreadyHeiner Kallweit2019-05-313-15/+24
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a step towards allowing PHY drivers to handle more interrupt sources than just link change. E.g. several PHY's have built-in temperature monitoring and can raise an interrupt if a temperature threshold is exceeded. We may be interested in such interrupts also if the phylib state machine isn't started. Therefore move enabling interrupts to phy_request_interrupt(). v2: - patch added to series Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qed: Fix static checker warningMichal Kalderon2019-05-311-12/+12
| | | | | | | | | | | | | | | | | | In some cases abs_ppfid could be printed without being initialized. Fixes: 79284adeb99e ("qed: Add llh ppfid interface and 100g support for offload protocols") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: Add error path handling in dsa_tree_setup()Ioana Ciornei2019-05-301-23/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case a call to dsa_tree_setup() fails, an attempt to cleanup is made by calling dsa_tree_remove_switch(), which should take care of removing/unregistering any resources previously allocated. This does not happen because it is conditioned by dst->setup being true, which is set only after _all_ setup steps were performed successfully. This is especially interesting when the internal MDIO bus is registered but afterwards, a port setup fails and the mdiobus_unregister() is never called. This leads to a BUG_ON() complaining about the fact that it's trying to free an MDIO bus that's still registered. Add proper error handling in all functions branching from dsa_tree_setup(). Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'r8169-fw'David S. Miller2019-05-301-34/+37
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heiner Kallweit says: ==================== r8169: decouple firmware handling code from actual driver code These two patches are a step towards eventually factoring out firmware handling code to a separate source file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8169: decouple rtl_phy_write_fw from actual driver codeHeiner Kallweit2019-05-301-20/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a further step towards decoupling firmware handling from the actual driver code. Firmware can be for PHY and/or MAC, and two pairs of read/write functions are needed for handling PHY firmware and MAC firmware respectively. Pass these functions via struct rtl_fw and avoid the ugly switching of mdio_ops behind the back of rtl_writephy(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8169: improve rtl_fw_format_okHeiner Kallweit2019-05-301-14/+9
|/ / | | | | | | | | | | | | Simplify the function a little bit and use strscpy() where appropriate. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | r8169: enable WoL speed down on more chip versionsHeiner Kallweit2019-05-301-24/+2
| | | | | | | | | | | | | | | | | | Call the pll power down function also for chip versions 02..06 and 13..15. The MAC can't be powered down on these chip versions, but at least they benefit from the speed-down power-saving if WoL is enabled. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: deduplicate identical skb_checksum_opsMatteo Croce2019-05-302-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The same skb_checksum_ops struct is defined twice in two different places, leading to code duplication. Declare it as a global variable into a common header instead of allocating it on the stack on each function call. bloat-o-meter reports a slight code shrink. add/remove: 1/1 grow/shrink: 0/10 up/down: 128/-1282 (-1154) Function old new delta sctp_csum_ops - 128 +128 crc32c_csum_ops 16 - -16 sctp_rcv 6616 6583 -33 sctp_packet_pack 4542 4504 -38 nf_conntrack_sctp_packet 4980 4926 -54 execute_masked_set_action 6453 6389 -64 tcf_csum_sctp 575 428 -147 sctp_gso_segment 1292 1126 -166 sctp_csum_check 579 412 -167 sctp_snat_handler 957 772 -185 sctp_dnat_handler 1321 1132 -189 l4proto_manip_pkt 2536 2313 -223 Total: Before=359297613, After=359296459, chg -0.00% Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: avoid indirect calls in L4 checksum calculationMatteo Croce2019-05-301-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin") introduces some macros to avoid doing indirect calls. Use these helpers to remove two indirect calls in the L4 checksum calculation for devices which don't have hardware support for it. As a test I generate packets with pktgen out to a dummy interface with HW checksumming disabled, to have the checksum calculated in every sent packet. The packet rate measured with an i7-6700K CPU and a single pktgen thread raised from 6143 to 6608 Kpps, an increase by 7.5% Suggested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: sja1105: Make static_config_check_memory_size staticYueHaibing2019-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix sparse warning: drivers/net/dsa/sja1105/sja1105_static_config.c:446:1: warning: symbol 'static_config_check_memory_size' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'connection-tracking-support-for-bridge'David S. Miller2019-05-3015-276/+1206
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== connection tracking support for bridge This patchset adds native connection tracking support for the bridge. Patch #1 and #2 extract code from IPv4/IPv6 fragmentation core and introduce the fraglist splitter. That splits a skbuff fraglist into independent fragments. Patch #3 and #4 also extract code from IPv4/IPv6 fragmentation core and introduce the skbuff into fragments transformer. This can be used by linearized skbuffs (eg. coming from nfqueue and ct helpers) as well as cloned skbuffs (that are either seen either with taps or with bridge port flooding). Patch #5 moves the specific IPCB() code from these new fragment splitter/transformer APIs into the IPv4 stack. The bridge has a different control buffer layout and it starts using this new APIs in this patchset. Patch #6 adds basic infrastructure that allows to register bridge conntrack support. Patch #7 adds bridge conntrack support (only for IPv4 in this patch). Patch #8 adds IPv6 support for the bridge conntrack support. Patch #9 registers the IPv4/IPv6 conntrack hooks in case the bridge conntrack is used to deal with local traffic, ie. prerouting -> input bridge hook path. This cover the bridge interface has a IP address scenario. Before this patchset, only chance for people to do stateful filtering is to use the `br_netfilter` emulation layer, that turns bridge frame into IPv4/IPv6 packets and inject them into the IPv4/IPv6 hooks. Apparently, this module allows users to use iptables and all of its feature-set from the bridge, including stateful filtering. However, this approach is flawed in many aspects that have been discussed many times. This is a step forward to deprecate `br_netfilter'. v2: Fix English typo in commit message. v3: Fix another English typo in commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_conntrack_bridge: register inet conntrack for bridgePablo Neira Ayuso2019-05-301-16/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables IPv4 and IPv6 conntrack from the bridge to deal with local traffic. Hence, packets that are passed up to the local input path are confirmed later on from the {ipv4,ipv6}_confirm() hooks. For packets leaving the IP stack (ie. output path), fragmentation occurs after the inet postrouting hook. Therefore, the bridge local out and postrouting bridge hooks see fragments with conntrack objects, which is inconsistent. In this case, we could defragment again from the bridge output hook, but this is expensive. The recommended filtering spot for outgoing locally generated traffic leaving through the bridge interface is to use the classic IPv4/IPv6 output hook, which comes earlier. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_conntrack_bridge: add support for IPv6Pablo Neira Ayuso2019-05-303-2/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | br_defrag() and br_fragment() indirections are added in case that IPv6 support comes as a module, to avoid pulling innecessary dependencies in. The new fraglist iterator and fragment transformer APIs are used to implement the refragmentation code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: bridge: add connection tracking systemPablo Neira Ayuso2019-05-308-4/+410
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_conntrack: allow to register bridge supportPablo Neira Ayuso2019-05-303-3/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds infrastructure to register and to unregister bridge support for the conntrack module via nf_ct_bridge_register() and nf_ct_bridge_unregister(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4: place control buffer handling away from fragmentation iteratorsPablo Neira Ayuso2019-05-301-18/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deal with the IPCB() area away from the iterators. The bridge codebase has its own control buffer layout, move specific IP control buffer handling into the IPv4 codepath. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv6: split skbuff into fragments transformerPablo Neira Ayuso2019-05-302-76/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip6_frag_init(), that initializes the internal state of the transformer. * ip6_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv6 header, and it also copies the payload for each fragment. The ip6_frag_state object stores the internal state of the splitter. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4: split skbuff into fragments transformerPablo Neira Ayuso2019-05-302-88/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip_frag_init(), that initializes the internal state of the transformer. * ip_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv4 header, and it also copies the payload for each fragment. The ip_frag_state object stores the internal state of the splitter. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv6: add skbuff fraglist splitterPablo Neira Ayuso2019-05-302-55/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the skbuff fraglist split iterator. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip6_fraglist_init(), that initializes the internal state of the fraglist iterator. * ip6_fraglist_prepare(), that restores the IPv6 header on the fragment. * ip6_fraglist_next(), that retrieves the fragment from the fraglist and updates the internal state of the iterator to point to the next fragment in the fraglist. The ip6_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4: add skbuff fraglist splitterPablo Neira Ayuso2019-05-302-33/+78
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the skbuff fraglist splitter. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip_fraglist_init(), that initializes the internal state of the fraglist splitter. * ip_fraglist_prepare(), that restores the IPv4 header on the fragments. * ip_fraglist_next(), that retrieves the fragment from the fraglist and it updates the internal state of the splitter to point to the next fragment skbuff in the fraglist. The ip_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'add-TFO-backup-key'David S. Miller2019-05-3011-118/+694
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jason Baron says: ==================== add TFO backup key Christoph, Igor, and I have worked on an API that facilitates TFO key rotation. This is a follow up to the series that Christoph previously posted, with an API that meets both of our use-cases. Here's a link to the previous work: https://patchwork.ozlabs.org/cover/1013753/ Changes in v2: -spelling fixes in ip-sysctl.txt (Jeremy Sowden) -re-base to latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | selftests/net: add TFO key rotation selftestJason Baron2019-05-304-0/+394
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Demonstrate how the primary and backup TFO keys can be rotated while minimizing the number of client cookies that are rejected. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Documentation: ip-sysctl.txt: Document tcp_fastopen_keyJason Baron2019-05-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add docs for /proc/sys/net/ipv4/tcp_fastopen_key Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Cc: Jeremy Sowden <jeremy@azazel.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_keyJason Baron2019-05-301-24/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to add a backup TFO key as: # echo "x-x-x-x,x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key The key before the comma acks as the primary TFO key and the key after the comma is the backup TFO key. This change is intended to be backwards compatible since if only one key is set, userspace will simply read back that single key as follows: # echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key x-x-x-x Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: add support to TCP_FASTOPEN_KEY for optional backup keyJason Baron2019-05-301-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for get/set of an optional backup key via TCP_FASTOPEN_KEY, in addition to the current 'primary' key. The primary key is used to encrypt and decrypt TFO cookies, while the backup is only used to decrypt TFO cookies. The backup key is used to maximize successful TFO connections when TFO keys are rotated. Currently, TCP_FASTOPEN_KEY allows a single 16-byte primary key to be set. This patch now allows a 32-byte value to be set, where the first 16 bytes are used as the primary key and the second 16 bytes are used for the backup key. Similarly, for getsockopt(), we can receive a 32-byte value as output if requested. If a 16-byte value is used to set the primary key via TCP_FASTOPEN_KEY, then any previously set backup key will be removed. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: add backup TFO key infrastructureJason Baron2019-05-306-58/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like to be able to rotate TFO keys while minimizing the number of client cookies that are rejected. Currently, we have only one key which can be used to generate and validate cookies, thus if we simply replace this key clients can easily have cookies rejected upon rotation. We propose having the ability to have both a primary key and a backup key. The primary key is used to generate as well as to validate cookies. The backup is only used to validate cookies. Thus, keys can be rotated as: 1) generate new key 2) add new key as the backup key 3) swap the primary and backup key, thus setting the new key as the primary We don't simply set the new key as the primary key and move the old key to the backup slot because the ip may be behind a load balancer and we further allow for the fact that all machines behind the load balancer will not be updated simultaneously. We make use of this infrastructure in subsequent patches. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: introduce __tcp_fastopen_cookie_gen_cipher()Christoph Paasch2019-05-301-36/+37
|/ / | | | | | | | | | | | | | | | | | | | | | | Restructure __tcp_fastopen_cookie_gen() to take a 'struct crypto_cipher' argument and rename it as __tcp_fastopen_cookie_gen_cipher(). Subsequent patches will provide different ciphers based on which key is being used for the cookie generation. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>