summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2016-03-2814-122/+170
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, they are: 1) There was a race condition between parallel save/swap and delete, which resulted a kernel crash due to the increase ref for save, swap, wrong ref decrease operations. Reported and fixed by Vishwanath Pai. 2) OVS should call into CT NAT for packets of new expected connections only when the conntrack state is persisted with the 'commit' option to the OVS CT action. From Jarno Rajahalme. 3) Resolve kconfig dependencies with new OVS NAT support. From Arnd Bergmann. 4) Early validation of entry->target_offset to make sure it doesn't take us out from the blob, from Florian Westphal. 5) Again early validation of entry->next_offset to make sure it doesn't take out from the blob, also from Florian. 6) Check that entry->target_offset is always of of sizeof(struct xt_entry) for unconditional entries, when checking both from check_underflow() and when checking for loops in mark_source_chains(), again from Florian. 7) Fix inconsistent behaviour in nfnetlink_queue when NFQA_CFG_F_FAIL_OPEN is set and netlink_unicast() fails due to buffer overrun, we have to reinject the packet as the user expects. 8) Enforce nul-terminated table names from getsockopt GET_ENTRIES requests. 9) Don't assume skb->sk is set from nft_bridge_reject and synproxy, this fixes a recent update of the code to namespaceify ip_default_ttl, patch from Liping Zhang. This batch comes with four patches to validate x_tables blobs coming from userspace. CONFIG_USERNS exposes the x_tables interface to unpriviledged users and to be honest this interface never received the attention for this move away from the CAP_NET_ADMIN domain. Florian is working on another round with more patches with more sanity checks, so expect a bit more Netfilter fixes in this development cycle than usual. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ipv4: fix NULL dereferenceLiping Zhang2016-03-282-36/+38
| | | | | | | | | | | | | | | | | | | | Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") use sock_net(skb->sk) to get the net namespace, but we can't assume that sk_buff->sk is always exist, so when it is NULL, oops will happen. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: x_tables: enforce nul-terminated table name from getsockopt ↵Pablo Neira Ayuso2016-03-284-0/+10
| | | | | | | | | | | | | | | | | | | | | | GET_ENTRIES Make sure the table names via getsockopt GET_ENTRIES is nul-terminated in ebtables and all the x_tables variants and their respective compat code. Uncovered by KASAN. Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink unicast ↵Pablo Neira Ayuso2016-03-281-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fails When netlink unicast fails to deliver the message to userspace, we should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject the packet back to the stack. I think the user expects no packet drops when this flag is set due to queueing to userspace errors, no matter if related to the internal queue or when sending the netlink message to userspace. The userspace application will still get the ENOBUFS error via recvmsg() so the user still knows that, with the current configuration that is in place, the userspace application is not consuming the messages at the pace that the kernel needs. Reported-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com>
| * netfilter: x_tables: fix unconditional helperFlorian Westphal2016-03-283-33/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Problem is that mark_source_chains should not have been called -- the rule doesn't have a next entry, so its supposed to return an absolute verdict of either ACCEPT or DROP. However, the function conditional() doesn't work as the name implies. It only checks that the rule is using wildcard address matching. However, an unconditional rule must also not be using any matches (no -m args). The underflow validator only checked the addresses, therefore passing the 'unconditional absolute verdict' test, while mark_source_chains also tested for presence of matches, and thus proceeeded to the next (not-existent) rule. Unify this so that all the callers have same idea of 'unconditional rule'. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: x_tables: make sure e->next_offset covers remaining blob sizeFlorian Westphal2016-03-283-6/+12
| | | | | | | | | | | | | | Otherwise this function may read data beyond the ruleset blob. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: x_tables: validate e->target_offset earlyFlorian Westphal2016-03-283-27/+24
| | | | | | | | | | | | | | | | | | We should check that e->target_offset is sane before mark_source_chains gets called since it will fetch the target entry for loop detection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * openvswitch: call only into reachable nf-nat codeArnd Bergmann2016-03-282-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The openvswitch code has gained support for calling into the nf-nat-ipv4/ipv6 modules, however those can be loadable modules in a configuration in which openvswitch is built-in, leading to link errors: net/built-in.o: In function `__ovs_ct_lookup': :(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation' :(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation' The dependency on (!NF_NAT || NF_NAT) prevents similar issues, but NF_NAT is set to 'y' if any of the symbols selecting it are built-in, but the link error happens when any of them are modular. A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in, CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely to be useful in practice, but the driver currently only handles IPv6 being optional. This patch improves the Kconfig dependency so that openvswitch cannot be built-in if either of the two other symbols are set to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute() with two "if (IS_ENABLED())" checks that should catch all corner cases also make the code more readable. The same #ifdef exists ovs_ct_nat_to_attr(), where it does not cause a link error, but for consistency I'm changing it the same way. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * openvswitch: Fix checking for new expected connections.Jarno Rajahalme2016-03-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVS should call into CT NAT for packets of new expected connections only when the conntrack state is persisted with the 'commit' option to the OVS CT action. The test for this condition is doubly wrong, as the CT status field is ANDed with the bit number (IPS_EXPECTED_BIT) rather than the mask (IPS_EXPECTED), and due to the wrong assumption that the expected bit would apply only for the first (i.e., 'new') packet of a connection, while in fact the expected bit remains on for the lifetime of an expected connection. The 'ctinfo' value IP_CT_RELATED derived from the ct status can be used instead, as it is only ever applicable to the 'new' packets of the expected connection. Fixes: 05752523e565 ('openvswitch: Interface with NAT.') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: fix race condition in ipset save, swap and deleteVishwanath Pai2016-03-285-8/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine */ Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to *_head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | net: macb: Only call GPIO functions if there is a valid GPIOCharles Keepax2016-03-281-3/+5
| | | | | | | | | | | | | | | | GPIOlib will print warning messages if we call GPIO functions without a valid GPIO. Change the code to avoid doing so. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns: set-coalesce-usecs returns errno by dsaf.koLisheng2016-03-283-6/+8
| | | | | | | | | | | | | | | | | | | | It may fail to set coalesce usecs to HW, and Ethtool needs to know if it is successful to cfg the parameter or not. So it needs return the errno by dsaf.ko. Signed-off-by: Lisheng <lisheng011@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns: fixed the setting and getting overtime bugLisheng2016-03-284-154/+126
| | | | | | | | | | | | | | | | | | The overtime setting and getting REGs in HNS V2 is defferent from HNS V1. It needs to be distinguished between them if getting or setting the REGs. Signed-off-by: Lisheng <lisheng011@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | openvswitch: Use proper buffer size in nla_memcpyHaishuang Yan2016-03-281-1/+2
|/ | | | | | | | | | For the input parameter count, it's better to use the size of destination buffer size, as nla_memcpy would take into account the length of the source netlink attribute when a data is copied from an attribute. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ravb: fix software timestampingLino Sanfilippo2016-03-281-1/+1
| | | | | | | | | | | | | | In ravb_start_xmit dont call skb_tx_timestamp only when hardware timestamping is requested: in the latter case software timestamps are suppressed and thus the call of skb_tx_timestamp does not have any effect. Instead call skb_tx_timestamp unconditionally in ravb_start_xmit, since the function checks itself if software timestamping is required or should be skipped due to hardware timestamping. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sxgbe: fix error paths in sxgbe_platform_probe()Rasmus Villemoes2016-03-281-2/+2
| | | | | | | | | | | | We need to use post-decrement to ensure that irq_dispose_mapping is also called on priv->rxq[0]->irq_no; moreover, if one of the above for loops failed already at i==0 (so we reach one of these labels with that value of i), we'll enter an essentially infinite loop of out-of-bounds accesses. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Drivers: isdn: hisax: isac.c: Fix assignment and check into one expression.Cosmin-Gabriel Samoila2016-03-281-5/+10
| | | | | | | Fix variable assignment inside if statement. It is error-prone and hard to read. Signed-off-by: Cosmin-Gabriel Samoila <gabrielcsmo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Fix returned tc and hoplimit values for route with IPv6 encapsulationQuentin Armitage2016-03-281-2/+2
| | | | | | | | | | | | | For a route with IPv6 encapsulation, the traffic class and hop limit values are interchanged when returned to userspace by the kernel. For example, see below. ># ip route add 192.168.0.1 dev eth0.2 encap ip6 dst 0x50 tc 0x50 hoplimit 100 table 1000 ># ip route show table 1000 192.168.0.1 encap ip6 id 0 src :: dst fe83::1 hoplimit 80 tc 100 dev eth0.2 scope link Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net/usb/plusb.c: Fix typoDiego Viola2016-03-281-1/+1
| | | | | Signed-off-by: Diego Viola <diego.viola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: macb: remove BUG_ON() and reset the queue to handle RX errorsCyrille Pitchen2016-03-251-10/+49
| | | | | | | | | | | | | | | | | This patch removes two BUG_ON() used to notify about RX queue corruptions on macb (not gem) hardware without actually handling the error. The new code skips corrupted frames but still processes faultless frames. Then it resets the RX queue before restarting the reception from a clean state. This patch is a rework of an older patch proposed by Neil Armstrong: http://patchwork.ozlabs.org/patch/371525/ Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* qlge: Update version to 1.00.00.35Manish Chopra2016-03-251-1/+1
| | | | | | | | Just updating version as many fixes got accumulated over 1.00.00.34 Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: bcm7xxx: Add entries for Broadcom BCM7346 and BCM7362Jaedon Shin2016-03-252-0/+6
| | | | | | | | | Add PHY entries for the Broadcom BCM7346 and BCM7362 chips, these are 40nm generation Ethernet PHY. Fixes: 815717d1473e ("net: phy: bcm7xxx: Remove wildcard entries") Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: add missing map_flags to bpf_map_show_fdinfoDaniel Borkmann2016-03-251-2/+4
| | | | | | | | | | | | | Add map_flags attribute to bpf_map_show_fdinfo(), so that tools like tc can check for them when loading objects from a pinned entry, e.g. if user intent wrt allocation (BPF_F_NO_PREALLOC) is different to the pinned object, it can bail out. Follow-up to 6c9059817432 ("bpf: pre-allocate hash map elements"), so that tc can still support this with v4.6. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netpoll: Fix extra refcount release in netpoll_cleanup()Bjorn Helgaas2016-03-251-1/+2
| | | | | | | | | | | | | | | | | | | netpoll_setup() does a dev_hold() on np->dev, the netpoll device. If it fails, it correctly does a dev_put() but leaves np->dev set. If we call netpoll_cleanup() after the failure, np->dev is still set so we do another dev_put(), which decrements the refcount an extra time. It's questionable to call netpoll_cleanup() after netpoll_setup() fails, but it can be difficult to find the problem, and we can easily avoid it in this case. The extra decrements can lead to hangs like this: unregister_netdevice: waiting for bond0 to become free. Usage count = -3 Set and clear np->dev at the points where we dev_hold() and dev_put() the device. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bcmgenet: fix skb_len in bcmgenet_xmit_single()Petri Gynther2016-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | skb_len needs to be skb_headlen(skb) in bcmgenet_xmit_single(). Fragmented skbs can have only Ethernet + IP + TCP headers (14+20+20=54 bytes) in the linear buffer, followed by the rest in fragments. Bumping skb_len to ETH_ZLEN would be incorrect for this case, as it would introduce garbage between TCP header and the fragment data. This also works with regular/non-fragmented small packets < ETH_ZLEN bytes. Successfully tested this on GENETv3 with 42-byte ARP frames. For testing, I used: ethtool -K eth0 tx-checksum-ipv4 off ethtool -K eth0 tx-checksum-ipv6 off echo 0 > /proc/sys/net/ipv4/tcp_timestamps Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Petri Gynther <pgynther@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bcmgenet: fix dev->stats.tx_bytes accountingPetri Gynther2016-03-242-4/+16
| | | | | | | | | | | | | | | | | | 1. Add bytes_compl local variable to __bcmgenet_tx_reclaim() to collect transmitted bytes. dev->stats updates can then be moved outside the while-loop. bytes_compl is also needed for future BQL support. 2. When bcmgenet device uses Tx checksum offload, each transmitted skb gets an extra 64-byte header prepended to it. Before this header is prepended to the skb, we need to save the skb "wire" length in GENET_CB(skb)->bytes_sent, so that proper Tx bytes accounting can be done in __bcmgenet_tx_reclaim(). 3. skb->len covers the entire length of skb, whether it is linear or fragmented. Thus, when we clean the fragments, do not increase transmitted bytes. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* switchdev: fix typo in comments/docNicolas Dichtel2016-03-242-2/+2
| | | | | | | Two minor typo. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: macb: replace macb_writel() call by queue_writel() to update queue ISRCyrille Pitchen2016-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | macb_interrupt() should not use macb_writel(bp, ISR, <value>) but only queue_writel(queue, ISR, <value>). There is one IRQ and one set of {ISR, IER, IDR, IMR} [1] registers per queue on gem hardware, though only queue0 is actually used for now to receive frames: other queues can already be used to transmit frames. The queue_readl() and queue_writel() helper macros are designed to access the relevant IRQ registers. [1] ISR: Interrupt Status Register IER: Interrupt Enable Register IDR: Interrupt Disable Register IMR: Interrupt Mask Register Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Fixes: bfbb92c44670 ("net: macb: Handle the RXUBR interrupt on all devices") Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'hns-fixes'David S. Miller2016-03-246-36/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yisen Zhuang says: ==================== net: hns: fix some bugs in HNS driver Here are some bug fixed patches for HNS driver. They are: >from Kejian, fix for the warning of passing zero to 'PTR_ERR' >from qianqian, four fixes for inappropriate operation in hns driver >from Sheng, one fix for optimization of irq proccess in hns driver, and one fix for hilink status for hns driver. For more details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: bug fix about getting hilink status for HNS v2Sheng Li2016-03-242-18/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hilink status reg in HNS V2 is different from HNS v1. In HNS V2, It distinguishes differnt lane status according to the bit-field of the reg. As is shown below: [0:0] ---> lane0 [1:1] ---> lane1 ... But the current driver reads the reg to get the hilink status ONLY concidering HNS V1 situation. Here is a patch to support both of them. Signed-off-by: Sheng Li <lisheng011@huawei.com> Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: fix warning of passing zero to 'PTR_ERR'Kejian Yan2016-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a misuse of PTR as shown below: ae_node = (void *)of_parse_phandle(dev->of_node, "ae-handle", 0); if (IS_ERR_OR_NULL(ae_node)) { ret = PTR_ERR(ae_node); dev_err(dev, "not find ae-handle\n"); goto out_read_prop_fail; } if the ae_node is NULL, PTR_ERR(ae_node) means it returns success. And the return value should be -ENODEV. Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: optimizate irq proccess for HNS V2Sheng Li2016-03-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | In hns V1, common_poll should check and clean fbd pkts, because it can not pend irq to clean them if there is no new pkt comes in. But hns V2 hw fixes this bug, and will pend irq itself to do this. So, for hns V2, we set ring_data->fini_process to NULL. Signed-off-by: Sheng Li <lisheng011@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: remove useless variable assignment and commentQianqian Xie2016-03-241-4/+1
| | | | | | | | | | | | | | | | | | | | The variable head in hns_nic_tx_fini_pro has read a value, but it is obviously no use. The patch will fix it. And the comment is nothing to do with the routine, so it has to be removed Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: bug fix for return valuesQianqian Xie2016-03-241-2/+2
| | | | | | | | | | | | | | | | | | The return values in the first two functions mdiobus_write() are ignored. The patch will fix it. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: optimizate fmt of snprintf()Qianqian Xie2016-03-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | It misses string format in function snprintf(), as below: snprintf(buff, ETH_GSTRING_LEN, g_gmac_stats_string[i].desc); It needs to add "%s" to fix it as below: snprintf(buff, ETH_GSTRING_LEN, "%s", g_gmac_stats_string[i].desc); Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: hns: fix a bug for cycle indexQianqian Xie2016-03-241-6/+6
|/ | | | | | | | | The cycle index should be varied while the variable j is a fixed value. The patch will fix this bug. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Fix crash observed during device unregistration and decryptionsubashab@codeaurora.org2016-03-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A crash is observed when a decrypted packet is processed in receive path. get_rps_cpus() tries to dereference the skb->dev fields but it appears that the device is freed from the poison pattern. [<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0 [<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc [<ffffffc000af6094>] netif_rx+0x74/0x94 [<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0 [<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c [<ffffffc000ba6eb8>] esp_input_done+0x20/0x30 [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc [<ffffffc0000b7324>] worker_thread+0x2f8/0x418 [<ffffffc0000bb40c>] kthread+0xe0/0xec -013|get_rps_cpu( | dev = 0xFFFFFFC08B688000, | skb = 0xFFFFFFC0C76AAC00 -> ( | dev = 0xFFFFFFC08B688000 -> ( | name = "...................................................... | name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev = 0xAAAAAAAAAAA Following are the sequence of events observed - - Encrypted packet in receive path from netdevice is queued - Encrypted packet queued for decryption (asynchronous) - Netdevice brought down and freed - Packet is decrypted and returned through callback in esp_input_done - Packet is queued again for process in network stack using netif_rx Since the device appears to have been freed, the dereference of skb->dev in get_rps_cpus() leads to an unhandled page fault exception. Fix this by holding on to device reference when queueing packets asynchronously and releasing the reference on call back return. v2: Make the change generic to xfrm as mentioned by Steffen and update the title to xfrm Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'trace-v4.6' of ↵Linus Torvalds2016-03-2417-235/+280
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Nothing major this round. Mostly small clean ups and fixes. Some visible changes: - A new flag was added to distinguish traces done in NMI context. - Preempt tracer now shows functions where preemption is disabled but interrupts are still enabled. Other notes: - Updates were done to function tracing to allow better performance with perf. - Infrastructure code has been added to allow for a new histogram feature for recording live trace event histograms that can be configured by simple user commands. The feature itself was just finished, but needs a round in linux-next before being pulled. This only includes some infrastructure changes that will be needed" * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits) tracing: Record and show NMI state tracing: Fix trace_printk() to print when not using bprintk() tracing: Remove redundant reset per-CPU buff in irqsoff tracer x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c tracing: Fix crash from reading trace_pipe with sendfile tracing: Have preempt(irqs)off trace preempt disabled functions tracing: Fix return while holding a lock in register_tracer() ftrace: Use kasprintf() in ftrace_profile_tracefs() ftrace: Update dynamic ftrace calls only if necessary ftrace: Make ftrace_hash_rec_enable return update bool tracing: Fix typoes in code comment and printk in trace_nop.c tracing, writeback: Replace cgroup path to cgroup ino tracing: Use flags instead of bool in trigger structure tracing: Add an unreg_all() callback to trigger commands tracing: Add needs_rec flag to event triggers tracing: Add a per-event-trigger 'paused' field tracing: Add get_syscall_name() tracing: Add event record param to trigger_ops.func() tracing: Make event trigger functions available tracing: Make ftrace_event_field checking functions available ...
| * tracing: Record and show NMI statePeter Zijlstra2016-03-223-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | The latency tracer format has a nice column to indicate IRQ state, but this is not able to tell us about NMI state. When tracing perf interrupt handlers (which often run in NMI context) it is very useful to see how the events nest. Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Fix trace_printk() to print when not using bprintk()Steven Rostedt (Red Hat)2016-03-222-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()" Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Remove redundant reset per-CPU buff in irqsoff tracerDmitry Safonov2016-03-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason to do it twice: from commit b6f11df26fdc28 ("trace: Call tracing_reset_online_cpus before tracer->init()") resetting of per-CPU buffers done before tracer->init() call. tracer->init() calls {irqs,preempt,preemptirqs}off_tracer_init() and it calls __irqsoff_tracer_init(), which resets per-CPU ringbuffer second time. It's slowpath, but anyway. Link: http://lkml.kernel.org/r/1445278226-16187-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.cLi Bin2016-03-181-1/+1
| | | | | | | | | | | | | | | | Fix the misleading comment for arch/x86/kernel/ftrace.c that it had used nop instead of jmp. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Fix crash from reading trace_pipe with sendfileSteven Rostedt (Red Hat)2016-03-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Have preempt(irqs)off trace preempt disabled functionsSteven Rostedt (Red Hat)2016-03-181-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joel Fernandes reported that the function tracing of preempt disabled sections was not being reported when running either the preemptirqsoff or preemptoff tracers. This was due to the fact that the function tracer callback for those tracers checked if irqs were disabled before tracing. But this fails when we want to trace preempt off locations as well. Joel explained that he wanted to see funcitons where interrupts are enabled but preemption was disabled. The expected output he wanted: <...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit <...>-2265 1d..1 3419us : __do_softirq <-irq_exit <...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq <...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq <...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq <...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq <...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq <...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq <...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq <...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq <...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq <...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq <...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq <...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq There's a comment saying that the irq disabled check is because there's a possible race that tracing_cpu may be set when the function is executed. But I don't remember that race. For now, I added a check for preemption being enabled too to not record the function, as there would be no race if that was the case. I need to re-investigate this, as I'm now thinking that the tracing_cpu will always be correct. But no harm in keeping the check for now, except for the slight performance hit. Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers" Cc: stable@vget.kernel.org # 2.6.37+ Reported-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Fix return while holding a lock in register_tracer()Chunyu Hu2016-03-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag callback") introduces a potential mutex deadlock issue, as it forgets to free the mutex when allocaing the tracer_flags gets fail. The issue was found by Dan Carpenter through Smatch static code check tool. Link: http://lkml.kernel.org/r/1457958941-30265-1-git-send-email-chuhu@redhat.com Fixes: d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag callback") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Use kasprintf() in ftrace_profile_tracefs()Geliang Tang2016-03-181-3/+1
| | | | | | | | | | | | | | | | | | | | Use kasprintf() instead of kmalloc() and snprintf(). Link: http://lkml.kernel.org/r/135a7bc36e51fd9eaa57124dd2140285b771f738.1458050835.git.geliangtang@163.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Update dynamic ftrace calls only if necessaryJiri Olsa2016-03-181-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently dynamic ftrace calls are updated any time the ftrace_ops is un/registered. If we do this update only when it's needed, we save lot of time for perf system wide ftrace function sampling/counting. The reason is that for system wide sampling/counting, perf creates event for each cpu in the system. Each event then registers separate copy of ftrace_ops, which ends up in FTRACE_UPDATE_CALLS updates. On servers with many cpus that means serious stall (240 cpus server): Counting: # time ./perf stat -e ftrace:function -a sleep 1 Performance counter stats for 'system wide': 370,663 ftrace:function 1.401427505 seconds time elapsed real 3m51.743s user 0m0.023s sys 3m48.569s Sampling: # time ./perf record -e ftrace:function -a sleep 1 [ perf record: Woken up 0 times to write data ] Warning: Processed 141200 events and lost 5 chunks! [ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ] real 2m31.429s user 0m0.213s sys 2m29.494s There's no reason to do the FTRACE_UPDATE_CALLS update for each event in perf case, because all the ftrace_ops always share the same filter, so the updated calls are always the same. It's required that only first ftrace_ops registration does the FTRACE_UPDATE_CALLS update (also sometimes the second if the first one used the trampoline), but the rest can be only cheaply linked into the ftrace_ops list. Counting: # time ./perf stat -e ftrace:function -a sleep 1 Performance counter stats for 'system wide': 398,571 ftrace:function 1.377503733 seconds time elapsed real 0m2.787s user 0m0.005s sys 0m1.883s Sampling: # time ./perf record -e ftrace:function -a sleep 1 [ perf record: Woken up 0 times to write data ] Warning: Processed 261730 events and lost 9 chunks! [ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ] real 1m31.948s user 0m0.309s sys 1m32.051s Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Make ftrace_hash_rec_enable return update boolJiri Olsa2016-03-181-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | Change __ftrace_hash_rec_update to return true in case we need to update dynamic ftrace call records. It return false in case no update is needed. Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Fix typoes in code comment and printk in trace_nop.cChunyu Hu2016-03-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | echo nop > /sys/kernel/debug/tracing/options/current_tracer echo 1 > /sys/kernel/debug/tracing/options/test_nop_accept echo 0 > /sys/kernel/debug/tracing/options/test_nop_accept echo 1 > /sys/kernel/debug/tracing/options/test_nop_refuse Before the fix, the dmesg is a bit ugly since a align issue. [ 191.973081] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result [ 195.156942] nop_test_refuse flag set to 1: we refuse.Now cat trace_options to see the result After the fix, the dmesg will show aligned log for nop_test_refuse and nop_test_accept. [ 2718.032413] nop_test_refuse flag set to 1: we refuse. Now cat trace_options to see the result [ 2734.253360] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result Link: http://lkml.kernel.org/r/1457444222-8654-2-git-send-email-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing, writeback: Replace cgroup path to cgroup inoYang Shi2016-03-081-76/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5634cc2aa9aebc77bc862992e7805469dcf83dac ("writeback: update writeback tracepoints to report cgroup") made writeback tracepoints print out cgroup path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt kernel since kernfs_path and kernfs_path_len are called by tracepoints, which acquire spin lock that is sleepable on -rt kernel. BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930 in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3 INFO: lockdep is turned off. Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830 CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Workqueue: writeback wb_workfn (flush-7:0) Call trace: [<ffffffc00008d708>] dump_backtrace+0x0/0x200 [<ffffffc00008d92c>] show_stack+0x24/0x30 [<ffffffc0007b0f40>] dump_stack+0x88/0xa8 [<ffffffc000127d74>] ___might_sleep+0x2ec/0x300 [<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8 [<ffffffc0003e0548>] kernfs_path_len+0x30/0x90 [<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8 [<ffffffc000374f90>] wb_writeback+0x620/0x830 [<ffffffc000376224>] wb_workfn+0x61c/0x950 [<ffffffc000110adc>] process_one_work+0x3ac/0xb30 [<ffffffc0001112fc>] worker_thread+0x9c/0x7a8 [<ffffffc00011a9e8>] kthread+0x190/0x1b0 [<ffffffc000086ca0>] ret_from_fork+0x10/0x30 With unlocked kernfs_* functions, synchronize_sched() has to be called in kernfs_rename which could be called in syscall path, but it is problematic. So, print out cgroup ino instead of path name, which could be converted to path name by userland. Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root dir ino vary from different filesystems, so printing out -1U to indicate an invalid cgroup ino. Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>