summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* net: Replace nhc_has_gw with nhc_gw_familyDavid Ahern2019-04-0910-38/+35
| | | | | | | | | | | | | Allow the gateway in a fib_nh_common to be from a different address family than the outer fib{6}_nh. To that end, replace nhc_has_gw with nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family. Now nhc_family is used to know if the nh_common is part of a fib_nh or fib6_nh (used for container_of to get to route family specific data), and nhc_gw_family represents the address family for the gateway. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Add neighbor helpers that use the ipv6 stubDavid Ahern2019-04-092-4/+42
| | | | | | | | | | Add ipv6 helpers to handle ndisc references via the stub. Update bpf_ipv6_fib_lookup to use __ipv6_neigh_lookup_noref_stub instead of the open code ___neigh_lookup_noref with the stub. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Add fib6_nh_init and release to stubsDavid Ahern2019-04-093-0/+17
| | | | | | | | | | Add fib6_nh_init and fib6_nh_release to ipv6_stubs. If fib6_nh_init fails, callers should not invoke fib6_nh_release, so there is no reason to have a dummy stub for the IPv6 is not enabled case. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: improve link partner capability detectionHeiner Kallweit2019-04-092-4/+10
| | | | | | | | | | | | | | | genphy_read_status() so far checks phydev->supported, not the actual PHY capabilities. This can make a difference if the supported speeds have been limited by of_set_phy_supported() or phy_set_max_speed(). It seems that this issue only affects the link partner advertisements as displayed by ethtool. Also this patch wouldn't apply to older kernels because linkmode bitmaps have been introduced recently. Therefore net-next. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'mlx5-updates-2019-04-02' of ↵David S. Miller2019-04-0827-397/+747
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mamameed says: ==================== mlx5-updates-2019-04-02 This series provides misc updates to mlx5 driver 1) Aya Levin (1): Handle event of power detection in the PCIE slot 2) Eli Britstein (6): Some TC VLAN related updates and fixes to the previous VLAN modify action support patchset. Offload TC e-switch rules with egress/ingress VLAN devices 3) Max Gurtovoy (1): Fix double mutex initialization in esiwtch.c 4) Tariq Toukan (3): Misc small updates A write memory barrier is sufficient in EQ ci update Obsolete param field holding a constant value Unify logic of MTU boundaries 5) Tonghao Zhang (4): Misc updates to en_tc.c Make the log friendly when decapsulation offload not supported Remove 'parse_attr' argument in parse_tc_fdb_actions() Deletes unnecessary setting of esw_attr->parse_attr Return -EOPNOTSUPP when attempting to offload an unsupported action ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Unify logic of MTU boundariesTariq Toukan2019-04-054-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | Expose a new helper that wraps the logic for setting the netdevice's MTU boundaries. Use it for the different components (Eth, rep, IPoIB). Set the netdevice min MTU to ETH_MIN_MTU, and the max according to both the FW capability and the kernel definition. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Obsolete param field holding a constant valueTariq Toukan2019-04-052-3/+1
| | | | | | | | | | | | | | | | | | The LRO WQE size is a constant, obsolete the parameter field that holds it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: A write memory barrier is sufficient in EQ ci updateTariq Toukan2019-04-051-1/+1
| | | | | | | | | | | | | | | | Soften the memory barrier call of mb() by a sufficient wmb() in the consumer index update of the event queues. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Do not rewrite fields with the same matchEli Britstein2019-04-051-32/+104
| | | | | | | | | | | | | | | | | | | | | | If we have a match for the same value of a rewrite field, there is no point for the rewrite. In order to save rewrite actions, and avoid entirely rewrite actions (if all rewrites are the same), ignore such rewrite fields. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Offload TC e-switch rules with ingress VLAN deviceEli Britstein2019-04-052-5/+44
| | | | | | | | | | | | | | | | | | Offload TC rule on a VLAN device by matching the VLAN properties of the VLAN device and emulating vlan pop actions. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Offload TC e-switch rules with egress VLAN deviceEli Britstein2019-04-051-0/+34
| | | | | | | | | | | | | | | | | | | | Upon redirection to an uplink VLAN device, emulate vlan push actions according to the VLAN properties of the VLAN device and redirect to the uplink. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Allow VLAN rewrite of prio field with the same matchEli Britstein2019-04-051-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Changing the prio field of the VLAN is not supported. With commit 37410902874c ("net/mlx5e: Support VLAN modify action") zero value indicated "no-change". Allow the vid rewrite if the prio match is the same as the prio set value. Fixes: 37410902874c ("net/mlx5e: Support VLAN modify action") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Deny VLAN rewrite if there is no VLAN header matchEli Britstein2019-04-051-0/+11
| | | | | | | | | | | | | | | | | | | | Rewrite of the packet in the VLAN offset may corrupt the packet if it's not VLAN tagged. Deny the rewrite in this case. Fixes: 37410902874c ("net/mlx5e: Support VLAN modify action") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Use helpers to get headers criteria and value pointersEli Britstein2019-04-051-9/+25
| | | | | | | | | | | | | | | | | | | | The headers criteria and value pointers may be either of the inner packet, if a tunnel exists, or of the outer. Simplify the code by using helper functions to retrieve them. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Return -EOPNOTSUPP when attempting to offload an unsupported actionTonghao Zhang2019-04-051-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Now the encapsulation is not supported for mlx5 VFs. When we try to offload that action, the -EINVAL is returned, but not -EOPNOTSUPP. This patch changes the returned value and ignore to confuse user. The command is shown as below [1]. * When max modify header action is zero, we return -EOPNOTSUPP directly. In this way, we can ignore wrong message info (e.g. "mlx5: parsed 0 pedit actions, can't do more"). This happens when offloading pedit actions on mlx(cx4) VFs. The command is shown as below [2]. For example: (p2p1_0 is VF net device) [1] $ tc filter add dev p2p1_0 protocol ip parent ffff: prio 1 flower skip_sw \ src_mac e4:11:22:33:44:01 \ action tunnel_key set \ src_ip 1.1.1.100 \ dst_ip 1.1.1.200 \ dst_port 4789 id 100 \ action mirred egress redirect dev vxlan0 [2] $ tc filter add dev p2p1_0 parent ffff: protocol ip prio 1 \ flower skip_sw dst_mac 00:10:56:fb:64:e8 \ dst_ip 1.1.1.100 src_ip 1.1.1.200 \ action pedit ex munge eth src set 00:10:56:b4:5d:20 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Deletes unnecessary setting of esw_attr->parse_attrTonghao Zhang2019-04-051-1/+0
| | | | | | | | | | | | | | | | | | | | This patch deletes unnecessary setting of the esw_attr->parse_attr to parse_attr in parse_tc_fdb_actions() because it is already done by the mlx5e_flow_esw_attr_init() function. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Remove 'parse_attr' argument in parse_tc_fdb_actions()Tonghao Zhang2019-04-051-2/+2
| | | | | | | | | | | | | | | | This patch is a little improvement. Simplify the parse_tc_fdb_actions(). Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Make the log friendly when decapsulation offload not supportedTonghao Zhang2019-04-051-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | If we try to offload decapsulation actions to VFs hw, we get the log [1]. It's not friendly, because the kind of net device is null, and we don't know what '0' means. [1] "mlx5_core 0000:05:01.2 vf_0: decapsulation offload is not supported for net device (0)" Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: E-Switch, Fix double mutex initializationMax Gurtovoy2019-04-051-2/+0
| | | | | | | | | | | | | | | | | | Delete mutex_init call of a lock that's initialized in inner function. Fixes: eca8cc389535 ("net/mlx5: E-Switch, Refactor offloads flow steering init/cleanup") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Handle event of power detection in the PCIE slotAya Levin2019-04-052-0/+76
| | | | | | | | | | | | | | | | | | | | Handle event of power state change in the PCIE slot. When the event occurs, check if query power state and PCI power fields is supported. If so, read these fields from MPEIN (management PCIE info) register and issue a corresponding message. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * Merge branch 'mlx5-next' of ↵Saeed Mahameed2019-04-0518-315/+398
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Maxim, Remove un-used macros and spinlock from mlx5 code. 2) From Aya, Expose Management PCIE info register layout and add rate limit print macros. 3) From Tariq, Compilation warning fix in fs_core.c 4) From Vu, Huy and Saeed, Improve mlx5 initialization flow: The goal is to provide a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual devices. Mlx5_core driver needs to separate HCA resources from pci resources. Its initialize/load/unload will be broken into stages: 1. Initialize common data structures 2. Setup function which initializes pci resources (for PF/VF) or some other specific resources for virtual device 3. Initialize software objects according to hardware capabilities 4. Load all mlx5_core components It is also necessary to detach mlx5_core mdev name/message from pci device mdev->pdev name/message for a clearer report/debug of different mlx5 device types. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Fix false compilation warningTariq Toukan2019-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warning: drivers/net/ethernet/mellanox/mlx5/core//fs_core.c:845:5: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] No real issue here. This is only a false compiler warning. The 'err' variable is guaranteed to be init by time of usage. gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Expose MPEIN (Management PCIE INfo) register layoutAya Levin2019-04-022-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | Expose PRM layout for handling MPEIN (Management PCIE Info). It will be used in the downstream patch for querying MPEIN via the driver. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Add rate limit print macrosAya Levin2019-04-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | Add rate limited print macros for warning and info level. This protects the system from burst of prints depleting HW resources and spamming dmesg. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Add explicit bar address fieldHuy Nguyen2019-04-026-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/infoHuy Nguyen2019-04-024-103/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace pci dev_err/warn/info messages with mlx5_core_err/warn/info messages to provide a better report/debug of different mlx5 device types. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Use dev->priv.name instead of dev_nameHuy Nguyen2019-04-023-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use mlx5_core mdev private name in message instead of using pci dev_name to provide a better report/debug of different mlx5 device types. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Make mlx5_core messages independent from mdev->pdevHuy Nguyen2019-04-021-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Detach mlx5_core mdev messages from pci device mdev->pdev messages and provide a better report/debug of different mlx5 device types. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
| | * net/mlx5: Break load_one into three stagesSaeed Mahameed2019-04-021-71/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using foundation from previous patches to factor mlx5_load_one flow into three stages: 1. mlx5_function_setup() from previous patch to setup function 2. mlx5_init_once() from previous patch to init software objects according to hw caps 3. New mlx5_load() to load mlx5 components This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Function setup/teardown proceduresSaeed Mahameed2019-04-021-52/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function setup and teardown procedures are the basic procedure that each mlx5 pci function should perform to boot up a mlx5 device function and initialize basic communication with FW, before allocating any higher level software/firmware resources. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Move health and page alloc init to mdev_initSaeed Mahameed2019-04-023-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Software structure initialization should be in mdev_init stage. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Split mdev init and pci initSaeed Mahameed2019-04-021-41/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate resources initialization from pci initialization. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Remove redundant init functions parameterSaeed Mahameed2019-04-021-27/+22
| | | | | | | | | | | | | | | | | | | | | This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Remove spinlock support from mlx5_write64Maxim Mikityanskiy2019-04-025-26/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | As there is no user of mlx5_write64 that passes a spinlock to mlx5_write64, remove this functionality and simplify the function. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macrosMaxim Mikityanskiy2019-04-021-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MLX5_*_DOORBELL_LOCK macros provided a way to avoid locking for mlx5_write64 on 64-bit platforms where it's not necessary. Currently all calls to mlx5_write64 don't use a spinlock, so the macros became unused. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | cxgb4: Don't return EAGAIN when TCAM is full.Vishal Kulkarni2019-04-082-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | During hash filter programming, driver needs to return ENOSPC error intead of EAGAIN when TCAM is full. Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: xilinx: emaclite: add minimal ndo_do_ioctl hookAlexandru Ardelean2019-04-081-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This hook only implements a minimal set of ioctl hooks to be able to access MII regs by using phytool. When using this simple MAC controller, it's pretty difficult to do debugging of the PHY chip without checking MII regs. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: xilinx: emaclite: add minimal ethtool opsAlexandru Ardelean2019-04-081-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This set adds a minimal set of ethtool hooks to the driver, which provide a decent amount of link information via ethtool. With this change, running `ethtool ethX` in user-space provides all the neatly-formatted information about the link (what was negotiated, what is advertised, etc). Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | datagram: remove rendundant 'peeked' argumentPaolo Abeni2019-04-086-38/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit a297569fe00a ("net/udp: do not touch skb->peeked unless really needed") the 'peeked' argument of __skb_try_recv_datagram() and friends is always equal to !!'flags & MSG_PEEK'. Since such argument is really a boolean info, and the callers have already 'flags & MSG_PEEK' handy, we can remove it and clean-up the code a bit. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: flower: insert filter to ht before offloading it to hwVlad Buslov2019-04-081-20/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | John reports: Recent refactoring of fl_change aims to use the classifier spinlock to avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer() function was moved to before the lock is taken. This can create problems for drivers if duplicate filters are created (commmon in ovs tc offload due to filters being triggered by user-space matches). Drivers registered for such filters will now receive multiple copies of the same rule, each with a different cookie value. This means that the drivers would need to do a full match field lookup to determine duplicates, repeating work that will happen in flower __fl_lookup(). Currently, drivers do not expect to receive duplicate filters. To fix this, verify that filter with same key is not present in flower classifier hash table and insert the new filter to the flower hash table before offloading it to hardware. Implement helper function fl_ht_insert_unique() to atomically verify/insert a filter. This change makes filter visible to fast path at the beginning of fl_change() function, which means it can no longer be freed directly in case of error. Refactor fl_change() error handling code to deallocate the filter with rcu timeout. Fixes: 620da4860827 ("net: sched: flower: refactor fl_change") Reported-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'rhashtable-bitlocks'David S. Miller2019-04-0812-173/+271
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NeilBrown says: ==================== Convert rhashtable to use bitlocks This series converts rhashtable to use a per-bucket bitlock rather than a separate array of spinlocks. This: reduces memory usage results in slightly fewer memory accesses slightly improves parallelism makes a configuration option unnecessary The main change from previous version is to use a distinct type for the pointer in the bucket which has a bit-lock in it. This helped find two places where rht_ptr() was missed, one in rhashtable_free_and_destroy() in print_ht in the test code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rhashtable: add lockdep tracking to bucket bit-spin-locks.NeilBrown2019-04-082-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Native bit_spin_locks are not tracked by lockdep. The bit_spin_locks used for rhashtable buckets are local to the rhashtable implementation, so there is little opportunity for the sort of misuse that lockdep might detect. However locks are held while a hash function or compare function is called, and if one of these took a lock, a misbehaviour is possible. As it is quite easy to add lockdep support this unlikely possibility seems to be enough justification. So create a lockdep class for bucket bit_spin_lock and attach through a lockdep_map in each bucket_table. Without the 'nested' annotation in rhashtable_rehash_one(), lockdep correctly reports a possible problem as this lock is taken while another bucket lock (in another table) is held. This confirms that the added support works. With the correct nested annotation in place, lockdep reports no problems. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rhashtable: use bit_spin_locks to protect hash bucket.NeilBrown2019-04-0812-178/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the bucket pointer to lock the hash chain for that bucket. The benefits of a bit spin_lock are: - no need to allocate a separate array of locks. - no need to have a configuration option to guide the choice of the size of this array - locking cost is often a single test-and-set in a cache line that will have to be loaded anyway. When inserting at, or removing from, the head of the chain, the unlock is free - writing the new address in the bucket head implicitly clears the lock bit. For __rhashtable_insert_fast() we ensure this always happens when adding a new key. - even when lockings costs 2 updates (lock and unlock), they are in a cacheline that needs to be read anyway. The cost of using a bit spin_lock is a little bit of code complexity, which I think is quite manageable. Bit spin_locks are sometimes inappropriate because they are not fair - if multiple CPUs repeatedly contend of the same lock, one CPU can easily be starved. This is not a credible situation with rhashtable. Multiple CPUs may want to repeatedly add or remove objects, but they will typically do so at different buckets, so they will attempt to acquire different locks. As we have more bit-locks than we previously had spinlocks (by at least a factor of two) we can expect slightly less contention to go with the slightly better cache behavior and reduced memory consumption. To enhance type checking, a new struct is introduced to represent the pointer plus lock-bit that is stored in the bucket-table. This is "struct rhash_lock_head" and is empty. A pointer to this needs to be cast to either an unsigned lock, or a "struct rhash_head *" to be useful. Variables of this type are most often called "bkt". Previously "pprev" would sometimes point to a bucket, and sometimes a ->next pointer in an rhash_head. As these are now different types, pprev is NULL when it would have pointed to the bucket. In that case, 'blk' is used, together with correct locking protocol. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rhashtable: allow rht_bucket_var to return NULL.NeilBrown2019-04-082-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than returning a pointer to a static nulls, rht_bucket_var() now returns NULL if the bucket doesn't exist. This will make the next patch, which stores a bitlock in the bucket pointer, somewhat cleaner. This change involves introducing __rht_bucket_nested() which is like rht_bucket_nested(), but doesn't provide the static nulls, and changing rht_bucket_nested() to call this and possible provide a static nulls - as is still needed for the non-var case. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | rhashtable: use cmpxchg() in nested_table_alloc()NeilBrown2019-04-081-3/+5
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nested_table_alloc() relies on the fact that there is at most one spinlock allocated for every slot in the top level nested table, so it is not possible for two threads to try to allocate the same table at the same time. This assumption is a little fragile (it is not explicit) and is unnecessary as cmpxchg() can be used instead. A future patch will replace the spinlocks by per-bucket bitlocks, and then we won't be able to protect the slot pointer with a spinlock. So replace rcu_assign_pointer() with cmpxchg() - which has equivalent barrier properties. If it the cmp fails, free the table that was just allocated. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'net-hsr-improvements-and-bug-fixes'David S. Miller2019-04-0714-277/+327
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Murali Karicheri says: ==================== net: hsr: improvements and bug fixes This series has some coding style fixes and other bug fixes. Patch 12/14, I have also done SPDX conversion. Not sure if that is the only thing needed and is correct. So please pay close attention to this patch before merge as I would like to avoid any issue related to licensing applicable for this code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hsr: Fix node prune function for forget time expiryAaron Kramer2019-04-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HSR should forget nodes after configured node forget time expiry based on HSR_NODE_FORGET_TIME. As part of hsr_prune_nodes(), code checks to see if entries are to be flushed out if not heard for longer than forget time. But currently hsr_prune_nodes() is called only once during device creation. Restart the timer at the end of hsr_prune_nodes() so that hsr_prune_nodes() gets called periodically and forgotten entries are removed from node table. Signed-off-by: Aaron Kramer <a-kramer@ti.com> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hsr: add debugfs support for display node listMurali Karicheri2019-04-076-12/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a debugfs interface to allow display the nodes learned by the hsr master. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hsr: convert to SPDX identifierMurali Karicheri2019-04-0712-61/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hsr: add blank line after function declarationMurali Karicheri2019-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a blank line after function declaration as suggested by checkpatch.pl -f Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>