summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* rds; Reset rs->rs_bound_addr in rds_add_bound() failure pathSowmini Varadhan2017-12-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the rds_sock is not added to the bind_hash_table, we must reset rs_bound_addr so that rds_remove_bound will not trip on this rds_sock. rds_add_bound() does a rds_sock_put() in this failure path, so failing to reset rs_bound_addr will result in a socket refcount bug, and will trigger a WARN_ON with the stack shown below when the application subsequently tries to close the PF_RDS socket. WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \ rds_sock_destruct+0x15/0x30 [rds] : __sk_destruct+0x21/0x190 rds_remove_bound.part.13+0xb6/0x140 [rds] rds_release+0x71/0x120 [rds] sock_release+0x1a/0x70 sock_close+0xe/0x20 __fput+0xd5/0x210 task_work_run+0x82/0xa0 do_exit+0x2ce/0xb30 ? syscall_trace_enter+0x1cc/0x2b0 do_group_exit+0x39/0xa0 SyS_exit_group+0x10/0x10 do_syscall_64+0x61/0x1a0 Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sch: api: fix tcf_block_getSudip Mukherjee2017-12-271-1/+2
| | | | | | | | | | | | | | The build of mips bcm47xx_defconfig is failing with the error: net/sched/sch_fq_codel.c: In function 'fq_codel_init': net/sched/sch_fq_codel.c:487:8: error: too many arguments to function 'tcf_block_get' While adding the extack support, the commit missed adding it in the headers when CONFIG_NET_CLS is not defined. Fixes: 8d1a77f974ca ("net: sch: api: add extack support in tcf_block_get") Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'l2tp-next'David S. Miller2017-12-275-8/+40
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Lorenzo Bianconi says: ==================== l2tp: fix offset/peer_offset conf parameters This patchset add peer_offset configuration parameter in order to specify two different values for payload offset on tx/rx side. Moreover fix missing print session offset info ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * l2tp: add peer_offset parameterLorenzo Bianconi2017-12-275-8/+38
| | | | | | | | | | | | | | | | | | | | Introduce peer_offset parameter in order to add the capability to specify two different values for payload offset on tx/rx side. If just offset is provided by userspace use it for rx side as well in order to maintain compatibility with older l2tp versions Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * l2tp: fix missing print session offset infoHangbin Liu2017-12-271-0/+2
|/ | | | | | | | | | | Report offset parameter in L2TP_CMD_SESSION_GET command if it has been configured by userspace Fixes: 309795f4bec ("l2tp: Add netlink control API for L2TP") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2017-12-2713-190/+326
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-12-22 1) Separate ESP handling from segmentation for GRO packets. This unifies the IPsec GSO and non GSO codepath. 2) Add asynchronous callbacks for xfrm on layer 2. This adds the necessary infrastructure to core networking. 3) Allow to use the layer2 IPsec GSO codepath for software crypto, all infrastructure is there now. 4) Also allow IPsec GSO with software crypto for local sockets. 5) Don't require synchronous crypto fallback on IPsec offloading, it is not needed anymore. 6) Check for xdo_dev_state_free and only call it if implemented. From Shannon Nelson. 7) Check for the required add and delete functions when a driver registers xdo_dev_ops. From Shannon Nelson. 8) Define xfrmdev_ops only with offload config. From Shannon Nelson. 9) Update the xfrm stats documentation. From Shannon Nelson. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm: update the stats documentationShannon Nelson2017-12-221-6/+14
| | | | | | | | | | | | | | | | Add a couple of stats that aren't in the documentation file and rework the top description to be a little more readable. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: wrap xfrmdev_ops with offload configShannon Nelson2017-12-211-1/+1
| | | | | | | | | | | | | | | | There's no reason to define netdev->xfrmdev_ops if the offload facility is not CONFIG'd in. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: check for xdo_dev_ops add and deleteShannon Nelson2017-12-211-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a check for the required add and delete functions up front at registration time to be sure both are defined. Since both the features check and the registration check are looking at the same things, break out the check for both to call. Lastly, for some reason the feature check was setting xfrmdev_ops to NULL if the NETIF_F_HW_ESP bit was missing, which would probably surprise the driver later if the driver turned its NETIF_F_HW_ESP bit back on. We shouldn't be messing with the driver's callback list, so we stop doing that with this patch. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: check for xdo_dev_state_freeShannon Nelson2017-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | The current XFRM code assumes that we've implemented the xdo_dev_state_free() callback, even if it is meaningless to the driver. This patch adds a check for it before calling, as done in other APIs, to prevent a NULL function pointer kernel crash. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * esp: Don't require synchronous crypto fallback on offloading anymore.Steffen Klassert2017-12-202-20/+4
| | | | | | | | | | | | | | | | We support asynchronous crypto on layer 2 ESP now. So no need to force synchronous crypto fallback on offloading anymore. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Allow IPsec GSO with software crypto for local sockets.Steffen Klassert2017-12-201-0/+2
| | | | | | | | | | | | | | | | With support of async crypto operations in the GSO codepath we have everything in place to allow GSO for local sockets. This patch enables the GSO codepath. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Allow to use the layer2 IPsec GSO codepath for software crypto.Steffen Klassert2017-12-201-2/+2
| | | | | | | | | | | | | | | | We now have support for asynchronous crypto operations in the layer 2 TX path. This was the missing part to allow the GSO codepath for software crypto, so allow this codepath now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * net: Add asynchronous callbacks for xfrm on layer 2.Steffen Klassert2017-12-208-36/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements asynchronous crypto callbacks and a backlog handler that can be used when IPsec is done at layer 2 in the TX path. It also extends the skb validate functions so that we can update the driver transmit return codes based on async crypto operation or to indicate that we queued the packet in a backlog queue. Joint work with: Aviv Heller <avivh@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Separate ESP handling from segmentation for GRO packets.Steffen Klassert2017-12-207-132/+129
| | | | | | | | | | | | | | | | | | We change the ESP GSO handlers to only segment the packets. The ESP handling and encryption is defered to validate_xmit_xfrm() where this is done for non GRO packets too. This makes the code more robust and prepares for asynchronous crypto handling. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | phylib: rename reset-(post-)delay-us to reset-(de)assert-usRichard Leitner2017-12-274-10/+11
| | | | | | | | | | | | | | | | | | | | | | As suggested by Rob Herring [1] rename the previously introduced reset-{,post-}delay-us bindings to the clearer reset-{,de}assert-us [1] https://patchwork.kernel.org/patch/10104905/ Signed-off-by: Richard Leitner <richard.leitner@skidata.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'hns3-next'David S. Miller2017-12-2710-44/+855
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peng Li says: ==================== add some features and fix some bugs for HNS3 driver This patchset adds some new feature support and fixes some bugs: [Patch 1/17 - 5/17] add the support to modify/query the tqp number through ethtool -L/l command, and also fix some related bugs for change tqp number. [Patch 6/17 - 9-17] add support vlan tag offload on tx&&rx direction for pf, and fix some related bugs. [patch 10/17 - 11/17] fix bugs for auto negotiation. [patch 12/17] adds support for ethtool command set_pauseparam. [patch 13/17 - 14/17] add support to update flow control settings after autoneg. [patch 15/17 - 17/17] fix some other bugs in net-next. --- Change Log: V4 -> V5: 1. change the name spelling of Peng Li. V3 -> V4: 1. change the name spelling of Mingguang Qu and Jian Shen. V2 -> V3: 1. order local variables requested by David Miller. 2. use "int" for index iteration loops requested by David Miller. V1 -> V2: 1. fix the comments from Sergei Shtylyov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: change TM sched mode to TC-based mode when SRIOV enabledPeng Li2017-12-271-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TC-based sched mode supports SRIOV enabled and SRIOV disabled. This patch change the TM sched mode to TC-based mode in initialization process. Fixes: cc9bb43ab394 ("net: hns3: Add tc-based TM support for sriov enabled port") Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: Increase the default depth of bucket for TM shaperPeng Li2017-12-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Burstiness of a flow is determined by the depth of a bucket, When the upper rate of shaper is large, the current depth of a bucket is not enough. The default upper rate of shaper is 100G, so increase the depth of a bucket according to UM. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support for querying advertised pause frame by ethtool ethxPeng Li2017-12-273-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for querying advertised pause frame by using ethtool command(ethtool ethx). Fixes: 496d03e960ae ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add Asym Pause support to phy default featuresFuyun Liang2017-12-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c4fb2cdf575d ("net: hns3: fix a bug for phy supported feature initialization") adds default supported features for phy, but our hardware also supports Asym Pause. This patch adds Asym Pause support to phy default features to prevent Asym Pause can not be advertised when the phy negotiates flow control. Fixes: c4fb2cdf575d ("net: hns3: fix a bug for phy supported feature initialization") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support to update flow control settings after autonegPeng Li2017-12-273-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When auto-negotiation is enabled, the MAC flow control settings is based on the flow control negotiation result. And it should be configured after a valid link has been established. This patch adds support to update flow control settings after auto-negotiation has completed. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support for set_pauseparamPeng Li2017-12-274-1/+98
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds set_pauseparam support for ethtool cmd. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: fix for getting auto-negotiation state in hclge_get_autonegFuyun Liang2017-12-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When phy exists, we use the value of phydev.autoneg to represent the auto-negotiation state of hardware. Otherwise, we use the value of mac.autoneg to represent it. This patch fixes for getting a error value of auto-negotiation state in hclge_get_autoneg(). Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: cleanup mac auto-negotiation state queryFuyun Liang2017-12-271-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When checking whether auto-negotiation is on, driver only needs to check the value of mac.autoneg(SW) directly, and does not need to query it from hardware. Because this value is always synchronized with the auto-negotiation state of hardware. This patch removes the mac auto-negotiation state query. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add handling vlan tag offload in bdPeng Li2017-12-271-5/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch deals with the vlan tag information between sk_buff and rx/tx bd. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add ethtool related offload commandPeng Li2017-12-273-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds offload command related to "ethtool -K". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add vlan offload config commandPeng Li2017-12-273-6/+233
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds vlan offload config commands, initializes the rules of tx/rx vlan tag handle for hw. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add a mask initialization for mac_vlan tablePeng Li2017-12-272-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch sets vlan masked, in order to avoid the received packets being filtered. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: get rss_size_max from configuration but not hardcodePeng Li2017-12-273-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | Add configuration for rss_size_max in hdev but not hardcode it. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Mingguang Qu <qumingguang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: free the ring_data structrue when change tqpsPeng Li2017-12-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a memory leak problems in change tqps process, the function hns3_uninit_all_ring and hns3_init_all_ring may be called many times. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Mingguang Qu <qumingguang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: change the returned tqp number by ethtool -xPeng Li2017-12-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the return data of get_rxnfc, it will return the current handle's rss_size but not the total tqp number. because the tc_size has been change to the log2 of roundup power of two of rss_size. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Mingguang Qu <qumingguang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support to modify tqps numberPeng Li2017-12-275-0/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the support to change tqps number for PF driver by using ehtool -L command. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Mingguang Qu <qumingguang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support to query tqps numberPeng Li2017-12-273-0/+33
|/ / | | | | | | | | | | | | | | | | This patch adds the support to query tqps number for PF driver by using ehtool -l command. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Mingguang Qu <qumingguang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: erspan: remove md NULL checkWilliam Tu2017-12-262-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | The 'md' is allocated from 'tun_dst = ip_tun_rx_dst' and since we've checked 'tun_dst', 'md' will never be NULL. The patch removes it at both ipv4 and ipv6 erspan. Fixes: afb4c97d90e6 ("ip6_gre: fix potential memory leak in ip6erspan_rcv") Fixes: 50670b6ee9bc ("ip_gre: fix potential memory leak in erspan_rcv") Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: md5: Handle RCU dereference of md5sig_infoMat Martineau2017-12-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dereference tp->md5sig_info in tcp_v4_destroy_sock() the same way it is done in the adjacent call to tcp_clear_md5_list(). Resolves this sparse warning: net/ipv4/tcp_ipv4.c:1914:17: warning: incorrect type in argument 1 (different address spaces) net/ipv4/tcp_ipv4.c:1914:17: expected struct callback_head *head net/ipv4/tcp_ipv4.c:1914:17: got struct callback_head [noderef] <asn:4>*<noident> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: lan9303: lan9303_csr_reg_wait cleanupsEgil Hjelmeland2017-12-261-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-functional cleanups in lan9303_csr_reg_wait(): - Change type of param 'mask' from int to u32. - Remove param 'value' (will probably never be used) - Reduced retries from 1000 to 25, consistent with lan9303_read_wait. - Removed comments Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Changes v1 -> v2: - Removed comments Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Reinject IPv6 packets if IPsec policy matches after SNATTobias Brunner2017-12-261-0/+8
| | | | | | | | | | | | | | | | | | | | | | If SNAT modifies the source address the resulting packet might match an IPsec policy, reinject the packet if that's the case. The exact same thing is already done for IPv4. Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | enic: add wq clean up budgetGovindarajulu Varadarajan2017-12-262-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | In case of tx clean up, we set '-1' as budget. This means clean up until wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this will run for long time and cause "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning. This patch sets wq clean up budget to 256. Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: Replace implementation of ASSERT_RTNL() macro with WARN_ONCE()Leon Romanovsky2017-12-261-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with two exceptions. First, it prints stack for multiple hits and not only once as WARN_ONCE() does. Second, the user can disable prints of WARN_ONCE by setting CONFIG_BUG to N. The multiple prints of dump stack are actually not needed, because calls without rtnl lock are programming errors and user can't do anything about them except to complain to the mailing list after first occurrence of such failure. The user who disabled BUG/WARN prints did it explicitly because by default in upstream kernel and distributions this option is enabled. It means that user doesn't want to see prints about missing locks too. This patch replaces open-coded variant in favor of already existing macro and change error prints to be once only. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mediatek: remove superfluous pin setup for MT7622 SoCSean Wang2017-12-262-14/+24
| | | | | | | | | | | | | | | | | | | | Remove superfluous pin setup to get out of accessing invalid I/O pin registers because the way for pin configuring tends to be different from various SoCs and thus it should be better being managed and controlled by the pinctrl driver which MT7622 already can support. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dt-bindings: net: mediatek: add condition to property mediatek, pctlSean Wang2017-12-261-1/+1
| | | | | | | | | | | | | | | | | | The property "mediatek,pctl" is only required for SoCs such as MT2701 and MT7623, so adding a few words for stating the condition. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-12-22202-1137/+2851
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-12-2273-492/+1548
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller" "What's a holiday weekend without some networking bug fixes? [1] 1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function calls, from Daniel Borkmann. 2) Fix regression from errata limiting change to marvell PHY driver, from Zhao Qiang. 3) Fix u16 overflow in SCTP, from Xin Long. 4) Fix potential memory leak during bridge newlink, from Nikolay Aleksandrov. 5) Fix BPF selftest build on s390, from Hendrik Brueckner. 6) Don't append to cfg80211 automatically generated certs file, always write new ones from scratch. From Thierry Reding. 7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai. 8) Fix hang on tg3 MTU change with certain chips, from Brian King. 9) Add stall detection to arc emac driver and reset chip when this happens, from Alexander Kochetkov. 10) Fix MTU limitng in GRE tunnel drivers, from Xin Long. 11) Fix stmmac timestamping bug due to mis-shifting of field. From Fredrik Hallenberg. 12) Fix metrics match when deleting an ipv4 route. The kernel sets some internal metrics bits which the user isn't going to set when it makes the delete request. From Phil Sutter. 13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix from Yelena Krivosheev. 14) Fix double free and memory corruption in get_net_ns_by_id, from Eric W. Biederman. 15) Flush ipv4 FIB tables in the reverse order. Some tables can share their actual backing data, in particular this happens for the MAIN and LOCAL tables. We have to kill the LOCAL table first, because it uses MAIN's backing memory. Fix from Ido Schimmel. 16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann Horn, and Alexei Starovoitov. 17) Make changes to ipv6 autoflowlabel sysctl really propagate to sockets, unless the socket has set the per-socket value explicitly. From Shaohua Li. 18) Fix leaks and double callback invocations of zerocopy SKBs, from Willem de Bruijn" [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) skbuff: skb_copy_ubufs must release uarg even without user frags skbuff: orphan frags before zerocopy clone net: reevalulate autoflowlabel setting after sysctl setting openvswitch: Fix pop_vlan action for double tagged frames ipv6: Honor specified parameters in fibmatch lookup bpf: do not allow root to mangle valid pointers selftests/bpf: add tests for recent bugfixes bpf: fix integer overflows bpf: don't prune branches when a scalar is replaced with a pointer bpf: force strict alignment checks for stack pointers bpf: fix missing error return in check_stack_boundary() bpf: fix 32-bit ALU op verification bpf: fix incorrect tracking of register size truncation bpf: fix incorrect sign extension in check_alu_op() bpf/verifier: fix bounds calculation on BPF_RSH ipv4: Fix use-after-free when flushing FIB tables s390/qeth: fix error handling in checksum cmd callback tipc: remove joining group member from congested list selftests: net: Adding config fragment CONFIG_NUMA=y nfp: bpf: keep track of the offloaded program ...
| | * \ Merge branch 'net-zerocopy-fixes'David S. Miller2017-12-211-3/+4
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Saeed Mahameed says: =================== Mellanox, mlx5 fixes 2017-12-19 The follwoing series includes some fixes for mlx5 core and etherent driver. Please pull and let me know if there is any problem. This series doesn't introduce any conflict with the ongoing mlx5 for-next submission. For -stable: kernels >= v4.7.y ("net/mlx5e: Fix possible deadlock of VXLAN lock") ("net/mlx5e: Add refcount to VXLAN structure") ("net/mlx5e: Prevent possible races in VXLAN control flow") ("net/mlx5e: Fix features check of IPv6 traffic") kernels >= v4.9.y ("net/mlx5: Fix error flow in CREATE_QP command") ("net/mlx5: Fix rate limit packet pacing naming and struct") kernels >= v4.13.y ("net/mlx5: FPGA, return -EINVAL if size is zero") kernels >= v4.14.y ("Revert "mlx5: move affinity hints assignments to generic code") All above patches apply and compile with no issues on corresponding -stable. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | skbuff: skb_copy_ubufs must release uarg even without user fragsWillem de Bruijn2017-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | skbuff: orphan frags before zerocopy cloneWillem de Bruijn2017-12-211-2/+2
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann <andihartmann@01019freenet.de> Reported-by: David Hill <dhill@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: reevalulate autoflowlabel setting after sysctl settingShaohua Li2017-12-214-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | openvswitch: Fix pop_vlan action for double tagged framesEric Garver2017-12-211-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | ipv6: Honor specified parameters in fibmatch lookupIdo Schimmel2017-12-211-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>