summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSERJohn Fastabend2017-08-174-2/+16
| | | | | | | | | | | | | | | | | Resolve issues with !CONFIG_BPF_SYSCALL and !STREAM_PARSER net/core/filter.c: In function ‘do_sk_redirect_map’: net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration] sk = __sock_map_lookup_elem(ri->map, ri->ifindex); ^ net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default] sk = __sock_map_lookup_elem(ri->map, ri->ifindex); Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: sockmap state change warning fixJohn Fastabend2017-08-171-0/+3
| | | | | | | | | | | | | | | psock will uninitialized in default case we need to do the same psock lookup and check as in other branch. Fixes compile warning below. kernel/bpf/sockmap.c: In function ‘smap_state_change’: kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized] struct smap_psock *psock; Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: David Miller <davem@davemloft.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: cls_flower: fix ndo_setup_tc type for stats callJiri Pirko2017-08-161-1/+1
| | | | | | | | | | I made a stupid mistake using TC_CLSFLOWER_STATS instead of TC_SETUP_CLSFLOWER. Funny thing is that both are defined as "2" so it actually did not cause any harm. Anyway, fixing it now. Fixes: 2572ac53c46f ("net: sched: make type an argument for ndo_setup_tc") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tun: make tun_build_skb() thread safeEric Dumazet2017-08-161-6/+1
| | | | | | | | | | | | | tun_build_skb() is not thread safe since it uses per queue page frag, this will break things when multiple threads are sending through same queue. Switch to use per-thread generator (no lock involved). Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet") Tested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4: fix spelling mistake: "availible" -> "available"Colin Ian King2017-08-163-16/+16
| | | | | | | | | Trivial fix to spelling mistakes in the mlx4 driver Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* qdisc: add tracepoint qdisc:qdisc_dequeue for dequeued SKBsJesper Dangaard Brouer2017-08-163-2/+57
| | | | | | | | | | | | | | | | | | | | | The main purpose of this tracepoint is to monitor bulk dequeue in the network qdisc layer, as it cannot be deducted from the existing qdisc stats. The txq_state can be used for determining the reason for zero packet dequeues, see enum netdev_queue_state_t. Notice all packets doesn't necessary activate this tracepoint. As qdiscs with flag TCQ_F_CAN_BYPASS, can directly invoke sch_direct_xmit() when qdisc_qlen is zero. Remember that perf record supports filters like: perf record -e qdisc:qdisc_dequeue \ --filter 'ifindex == 4 && (packets > 1 || txq_state > 0)' Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'nfp-process-MTU-updates-from-firmware-flower-app'David S. Miller2017-08-164-5/+50
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Horman says: ==================== nfp: process MTU updates from firmware flower app The first patch of this series moves processing of control messages from a BH handler to a workqueue. That change makes it safe to process MTU updates from the firmware which is added by the second patch of this series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * nfp: process MTU updates from firmware flower appSimon Horman2017-08-161-2/+9
| | | | | | | | | | | | | | | | | | Now that control message processing occurs in a workqueue rather than a BH handler MTU updates received from the firmware may be safely processed. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * nfp: process control messages in workqueue in flower appSimon Horman2017-08-164-3/+41
|/ | | | | | | | | | | Processing of control messages is not time-critical and future processing of some messages will require taking the RTNL which is not possible in a BH handler. It seems simplest to move all control message processing to a workqueue. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: devmap: remove unnecessary value size checkJohn Fastabend2017-08-161-6/+0
| | | | | | | | | | | | | In the devmap alloc map logic we check to ensure that the sizeof the values are not greater than KMALLOC_MAX_SIZE. But, in the dev map case we ensure the value size is 4bytes earlier in the function because all values should be netdev ifindex values. The second check is harmless but is not needed so remove it. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'bpf-sockmap'David S. Miller2017-08-1629-59/+2316
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | John Fastabend says: ==================== BPF: sockmap and sk redirect support This series implements a sockmap and socket redirect helper for BPF using a model similar to XDP netdev redirect. A sockmap is a BPF map type that holds references to sock structs. Then with a new sk redirect bpf helper BPF programs can use the map to redirect skbs between sockets, bpf_sk_redirect_map(map, key, flags) Finally, we need a call site to attach our BPF logic to do socket redirects. We added hooks to recv_sock using the existing strparser infrastructure to do this. The call site is added via the BPF attach map call. To enable users to use this infrastructure a new BPF program BPF_PROG_TYPE_SK_SKB is created that allows users to reference sock details, such as port and ip address fields, to build useful socket layer program. The sockmap datapath is as follows, recv -> strparser -> verdict/action where this series implements the drop and redirect actions. Additional, actions can be added as needed. A sample program is provided to illustrate how a sockmap can be integrated with cgroups and used to add/delete sockets in a sockmap. The program is simple but should show many of the key ideas. To test this work test_maps in selftests/bpf was leveraged. We added a set of tests to add sockets and do send/recv ops on the sockets to ensure correct behavior. Additionally, the selftests tests a series of negative test cases. We can expand on this in the future. I also have a basic test program I use with iperf/netperf clients that could be sent as an additional sample if folks want this. It needs a bit of cleanup to send to the list and wasn't included in this series. For people who prefer git over pulling patches out of their mail editor I've posted the code here, https://github.com/jrfastab/linux-kernel-xdp/tree/sockmap For some background information on the genesis of this work it might be helpful to review these slides from netconf 2017 by Thomas Graf, http://vger.kernel.org/netconf2017.html https://docs.google.com/a/covalent.io/presentation/d/1dwSKSBGpUHD3WO5xxzZWj8awV_-xL-oYhvqQMOBhhtk/edit?usp=sharing Thanks to Daniel Borkmann for reviewing and providing initial feedback. ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: selftests add sockmap testsJohn Fastabend2017-08-167-39/+443
| | | | | | | | | | | | | | | | | | | | This generates a set of sockets, attaches BPF programs, and sends some simple traffic using basic send/recv pattern. Additionally, we do a bunch of negative tests to ensure adding/removing socks out of the sockmap fail correctly. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: selftests: add tests for new __sk_buff membersJohn Fastabend2017-08-161-0/+152
| | | | | | | | | | | | | | | | This adds tests to access new __sk_buff members from sk skb program type. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: sockmap sample programJohn Fastabend2017-08-168-6/+547
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This program binds a program to a cgroup and then matches hard coded IP addresses and adds these to a sockmap. This will receive messages from the backend and send them to the client. client:X <---> frontend:10000 client:X <---> backend:10001 To keep things simple this is only designed for 1:1 connections using hard coded values. A more complete example would allow many backends and clients. To run, # sockmap <cgroup2_dir> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: add access to sock fields and pkt data from sk_skb programsJohn Fastabend2017-08-163-0/+179
| | | | | | | | | | Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: sockmap with sk redirect supportJohn Fastabend2017-08-169-5/+940
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently we added a new map type called dev map used to forward XDP packets between ports (6093ec2dc313). This patches introduces a similar notion for sockets. A sockmap allows users to add participating sockets to a map. When sockets are added to the map enough context is stored with the map entry to use the entry with a new helper bpf_sk_redirect_map(map, key, flags) This helper (analogous to bpf_redirect_map in XDP) is given the map and an entry in the map. When called from a sockmap program, discussed below, the skb will be sent on the socket using skb_send_sock(). With the above we need a bpf program to call the helper from that will then implement the send logic. The initial site implemented in this series is the recv_sock hook. For this to work we implemented a map attach command to add attributes to a map. In sockmap we add two programs a parse program and a verdict program. The parse program uses strparser to build messages and pass them to the verdict program. The parse programs use the normal strparser semantics. The verdict program is of type SK_SKB. The verdict program returns a verdict SK_DROP, or SK_REDIRECT for now. Additional actions may be added later. When SK_REDIRECT is returned, expected when bpf program uses bpf_sk_redirect_map(), the sockmap logic will consult per cpu variables set by the helper routine and pull the sock entry out of the sock map. This pattern follows the existing redirect logic in cls and xdp programs. This gives the flow, recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock \ -> kfree_skb As an example use case a message based load balancer may use specific logic in the verdict program to select the sock to send on. Sample programs are provided in future patches that hopefully illustrate the user interfaces. Also selftests are in follow-on patches. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: export bpf_prog_inc_not_zeroJohn Fastabend2017-08-162-1/+9
| | | | | | | | | | | | | | | | bpf_prog_inc_not_zero will be used by upcoming sockmap patches this patch simply exports it so we can pull it in. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: introduce new program type for skbs on socketsJohn Fastabend2017-08-163-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A class of programs, run from strparser and soon from a new map type called sock map, are used with skb as the context but on established sockets. By creating a specific program type for these we can use bpf helpers that expect full sockets and get the verifier to ensure these helpers are not used out of context. The new type is BPF_PROG_TYPE_SK_SKB. This patch introduces the infrastructure and type. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fixes for skb_send_sockJohn Fastabend2017-08-162-2/+2
| | | | | | | | | | | | | | | | | | | | | | A couple fixes to new skb_send_sock infrastructure. However, no users currently exist for this code (adding user in next handful of patches) so it should not be possible to trigger a panic with existing in-kernel code. Fixes: 306b13eb3cf9 ("proto_ops: Add locked held versions of sendmsg and sendpage") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: add sendmsg_locked and sendpage_locked to af_inet6John Fastabend2017-08-161-0/+2
| | | | | | | | | | | | | | | | To complete the sendmsg_locked and sendpage_locked implementation add the hooks for af_inet6 as well. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: early init support for strparserJohn Fastabend2017-08-161-6/+4
|/ | | | | | | | It is useful to allow strparser to init sockets before the read_sock callback has been established. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: 3c509: constify pnp_device_idArvind Yadav2017-08-161-1/+1
| | | | | | | | | pnp_device_id are not supposed to change at runtime. All functions working with pnp_device_id provided by <linux/pnp.h> work with const pnp_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* liquidio: update VF's netdev->max_mtu if there's a change in PF's MTUVeerasenareddy Burru2017-08-162-6/+6
| | | | | | | | | | | | A VF's MTU is capped at the parent PF's MTU. So if there's a change in the PF's MTU, then update the VF's netdev->max_mtu. Also remove duplicate log messages for MTU change. Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'net-sizeof-cleanups'David S. Miller2017-08-1619-60/+60
|\ | | | | | | | | | | | | | | | | | | | | | | | | Stephen Hemminger says: ==================== net: various sizeof cleanups Noticed some places that were using sizeof as an operator. This is legal C but is not the convention used in the kernel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlx4: sizeof style usagestephen hemminger2017-08-1615-56/+56
| | | | | | | | | | | | | | | | | | | | | | The kernel coding style is to treat sizeof as a function (ie. with parenthesis) not as an operator. Also use kcalloc and kmalloc_array Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * skge: add paren around sizeof argstephen hemminger2017-08-161-1/+1
| | | | | | | | | | Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * virtio: put paren around sizeofstephen hemminger2017-08-161-1/+1
| | | | | | | | | | | | | | Kernel coding style is to put paren around operand of sizeof. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tun/tap: use paren's with sizeofstephen hemminger2017-08-162-2/+2
|/ | | | | | | | Although sizeof is an operator in C. The kernel coding style convention is to always use it like a function and add parenthesis. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched/hfsc: opencode trivial set_active() and set_passive()Konstantin Khlebnikov2017-08-161-29/+16
| | | | | | | Any move comment abount update_vf() into right place. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: call qlen_notify only if child qdisc is emptyKonstantin Khlebnikov2017-08-166-13/+15
| | | | | | | | | | | | | This callback is used for deactivating class in parent qdisc. This is cheaper to test queue length right here. Also this allows to catch draining screwed backlog and prevent second deactivation of already inactive parent class which will crash kernel for sure. Kernel with print warning at destruction of child qdisc where no packets but backlog is not zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'liquidio-adding-support-for-ethtool-set-channels-feature'David S. Miller2017-08-167-451/+529
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Intiyaz Basha says: ==================== liquidio: adding support for ethtool --set-channels feature Code reorganization is required for adding ethtool --set-channels feature. First three patches are for code reorganization. The last patch is for adding this feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * liquidio: added support for ethtool --set-channels featureIntiyaz Basha2017-08-166-50/+226
| | | | | | | | | | | | | | | | | | adding support for ethtool --set-channels feature Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * liquidio: moved octeon_setup_interrupt to lio_core.cIntiyaz Basha2017-08-164-263/+202
| | | | | | | | | | | | | | | | | | Moving common octeon_setup_interrupt to lio_core.c Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * liquidio: moved liquidio_legacy_intr_handler to lio_core.cIntiyaz Basha2017-08-163-56/+57
| | | | | | | | | | | | | | | | | | Moving liquidio_legacy_intr_handler to lio_core.c Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * liquidio: moved liquidio_msix_intr_handler to lio_core.cIntiyaz Basha2017-08-165-88/+50
|/ | | | | | | | | Moving common liquidio_msix_intr_handler to lio_core.c Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-08-16228-904/+2154
|\
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-08-1658-159/+512
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel Grumbach. 2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From Vivien Didelot. 3) Fix two regressions with bonding on wireless, from Andreas Born. 4) Fix build when busypoll is disabled, from Daniel Borkmann. 5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet. 6) Set SKB cached route properly in inet_rtm_getroute(), from Florian Westphal. 7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding Tianhong. 8) Fix module refcnt leak in ULP code, from Sabrina Dubroca. 9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric Dumazet. 10) Need to purge socket write queue in dccp_destroy_sock(), also from Eric Dumazet. 11) Make bpf_trace_printk() work properly on 32-bit architectures, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) bpf: fix bpf_trace_printk on 32 bit archs PCI: fix oops when try to find Root Port for a PCI device sfc: don't try and read ef10 data on non-ef10 NIC net_sched: remove warning from qdisc_hash_add net_sched/sfq: update hierarchical backlog when drop packet net_sched: reset pointers to tcf blocks in classful qdiscs' destructors ipv4: fix NULL dereference in free_fib_info_rcu() net: Fix a typo in comment about sock flags. ipv6: fix NULL dereference in ip6_route_dev_notify() tcp: fix possible deadlock in TCP stack vs BPF filter dccp: purge write queue in dccp_destroy_sock() udp: fix linear skb reception with PEEK_OFF ipv6: release rt6->rt6i_idev properly during ifdown af_key: do not use GFP_KERNEL in atomic contexts tcp: ulp: avoid module refcnt leak in tcp_set_ulp net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag PCI: Disable Relaxed Ordering Attributes for AMD A1100 PCI: Disable Relaxed Ordering for some Intel processors PCI: Disable PCIe Relaxed Ordering if unsupported ...
| | * bpf: fix bpf_trace_printk on 32 bit archsDaniel Borkmann2017-08-161-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | James reported that on MIPS32 bpf_trace_printk() is currently broken while MIPS64 works fine: bpf_trace_printk() uses conditional operators to attempt to pass different types to __trace_printk() depending on the format operators. This doesn't work as intended on 32-bit architectures where u32 and long are passed differently to u64, since the result of C conditional operators follows the "usual arithmetic conversions" rules, such that the values passed to __trace_printk() will always be u64 [causing issues later in the va_list handling for vscnprintf()]. For example the samples/bpf/tracex5 test printed lines like below on MIPS32, where the fd and buf have come from the u64 fd argument, and the size from the buf argument: [...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688) Instead of this: [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512) One way to get it working is to expand various combinations of argument types into 8 different combinations for 32 bit and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64 as well that it resolves the issue. Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()") Reported-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * PCI: fix oops when try to find Root Port for a PCI devicedingtianhong2017-08-161-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric report a oops when booting the system after applying the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): [ 4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 4.247001] IP: pci_find_pcie_root_port+0x62/0x80 [ 4.253011] PGD 0 [ 4.253011] P4D 0 [ 4.253011] [ 4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 4.262015] Modules linked in: [ 4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316 [ 4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016 [ 4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000 [ 4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80 [ 4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246 [ 4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006 [ 4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000 [ 4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000 [ 4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0 [ 4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818 [ 4.331002] FS: 0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000 [ 4.339002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0 [ 4.351002] Call Trace: [ 4.354012] ? pci_configure_device+0x19f/0x570 [ 4.359002] ? pci_conf1_read+0xb8/0xf0 [ 4.363002] ? raw_pci_read+0x23/0x40 [ 4.366011] ? pci_read+0x2c/0x30 [ 4.370014] ? pci_read_config_word+0x67/0x70 [ 4.374012] pci_device_add+0x28/0x230 [ 4.378012] ? pci_vpd_f0_read+0x50/0x80 [ 4.382014] pci_scan_single_device+0x96/0xc0 [ 4.386012] pci_scan_slot+0x79/0xf0 [ 4.389001] pci_scan_child_bus+0x31/0x180 [ 4.394014] acpi_pci_root_create+0x1c6/0x240 [ 4.398013] pci_acpi_scan_root+0x15f/0x1b0 [ 4.402012] acpi_pci_root_add+0x2e6/0x400 [ 4.406012] ? acpi_evaluate_integer+0x37/0x60 [ 4.411002] acpi_bus_attach+0xdf/0x200 [ 4.415002] acpi_bus_attach+0x6a/0x200 [ 4.418014] acpi_bus_attach+0x6a/0x200 [ 4.422013] acpi_bus_scan+0x38/0x70 [ 4.426011] acpi_scan_init+0x10c/0x271 [ 4.429001] acpi_init+0x2fa/0x348 [ 4.433004] ? acpi_sleep_proc_init+0x2d/0x2d [ 4.437001] do_one_initcall+0x43/0x169 [ 4.441001] kernel_init_freeable+0x1d0/0x258 [ 4.445003] ? rest_init+0xe0/0xe0 [ 4.449001] kernel_init+0xe/0x150 ====================== cut here ============================= It looks like the pci_find_pcie_root_port() was trying to find the Root Port for the PCI device which is the Root Port already, it will return NULL and trigger the problem, so check the highest_pcie_bridge to fix thie problem. Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * sfc: don't try and read ef10 data on non-ef10 NICBert Kenward2017-08-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MAC stats command takes a port ID, which doesn't exist on pre-ef10 NICs (5000- and 6000- series). This is extracted from the NIC specific data; we misinterpret this as the ef10 data structure, causing us to read potentially unallocated data. With a KASAN kernel this can cause errors with: BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats Fixes: 0a2ab4d988d7 ("sfc: set the port-id when calling MC_CMD_MAC_STATS") Reported-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net_sched: remove warning from qdisc_hash_addKonstantin Khlebnikov2017-08-161-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was added in commit e57a784d8cae ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()") to hide duplicates from "tc qdisc show" for incative deivices. After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") it triggered when classful qdisc is added to inactive device because default qdiscs are added before switching root qdisc. Anyway after commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") duplicates are filtered right in dumper. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net_sched/sfq: update hierarchical backlog when drop packetKonstantin Khlebnikov2017-08-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sfq_enqueue() drops head packet or packet from another queue it have to update backlog at upper qdiscs too. Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net_sched: reset pointers to tcf blocks in classful qdiscs' destructorsKonstantin Khlebnikov2017-08-164-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traffic filters could keep direct pointers to classes in classful qdisc, thus qdisc destruction first removes all filters before freeing classes. Class destruction methods also tries to free attached filters but now this isn't safe because tcf_block_put() unlike to tcf_destroy_chain() cannot be called second time. This patch set class->block to NULL after first tcf_block_put() and turn second call into no-op. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ipv4: fix NULL dereference in free_fib_info_rcu()Eric Dumazet2017-08-161-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If fi->fib_metrics could not be allocated in fib_create_info() we attempt to dereference a NULL pointer in free_fib_info_rcu() : m = fi->fib_metrics; if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt)) kfree(m); Before my recent patch, we used to call kfree(NULL) and nothing wrong happened. Instead of using RCU to defer freeing while we are under memory stress, it seems better to take immediate action. This was reported by syzkaller team. Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: Fix a typo in comment about sock flags.Tonghao Zhang2017-08-161-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ipv6: fix NULL dereference in ip6_route_dev_notify()Eric Dumazet2017-08-162-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a syzkaller report [1], I found that a per cpu allocation failure in snmp6_alloc_dev() would then lead to NULL dereference in ip6_route_dev_notify(). It seems this is a very old bug, thus no Fixes tag in this submission. Let's add in6_dev_put_clear() helper, as we will probably use it elsewhere (once available/present in net-next) [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff88019f456680 task.stack: ffff8801c6e58000 RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202 RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000 RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001 RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37 R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8 FS: 00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211 in6_dev_put include/net/addrconf.h:335 [inline] ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678 call_netdevice_notifiers net/core/dev.c:1694 [inline] rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587 register_netdev+0x1a/0x30 net/core/dev.c:7669 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214 ops_init+0x10a/0x570 net/core/net_namespace.c:118 setup_net+0x313/0x710 net/core/net_namespace.c:294 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206 SYSC_unshare kernel/fork.c:2347 [inline] SyS_unshare+0x653/0xfa0 kernel/fork.c:2297 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512c9 RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6 0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP: ffff8801c6e5f1b0 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff8801c6e5f1b0 RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: ffff8801c6e5f1b0 ---[ end trace e441d046c6410d31 ]--- Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge tag 'wireless-drivers-for-davem-2017-08-15' of ↵David S. Miller2017-08-1515-40/+126
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.13 This time quite a few fixes for iwlwifi and one major regression fix for brcmfmac. For the iwlwifi aggregation bug a small change was needed for mac80211, but as Johannes is still away the mac80211 patch is taken via wireless-drivers tree. brcmfmac * fix firmware crash (a recent regression in bcm4343{0,1,8} iwlwifi * Some simple PCI HW ID fix-ups and additions for family 9000 * Remove a bogus warning message with new FWs (bug #196915) * Don't allow illegal channel options to be used (bug #195299) * A fix for checksum offload in family 9000 * A fix serious throughput degradation in 11ac with multiple streams * An old bug in SMPS where the firmware was not aware of SMPS changes * Fix a memory leak in the SAR code * Fix a stuck queue case in AP mode; * Convert a WARN to a simple debug in a legitimate race case (from which we can recover) * Fix a severe throughput aggregation on 9000-family devices due to aggregation issues, needed a small change in mac80211 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * brcmfmac: feature check for multi-scheduled scan fails on bcm4343x devicesArend Van Spriel2017-08-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware feature check introduced for multi-scheduled scan turned out to be failing for bcm4343{0,1,8} devices resulting in a firmware crash. The reason for this crash has not yet been root cause so this patch avoids the feature check for those device as a short-term fix. Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Reported-by: Ian Molton <ian@mnementh.co.uk> Fixes: 9fe929aaace6 ("brcmfmac: add firmware feature detection for gscan feature") Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
| | | * Merge tag 'iwlwifi-for-kalle-2018-08-09' of ↵Kalle Valo2017-08-096-10/+59
| | | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes Some more fixes for 4.13 * Fix a memory leak in the SAR code; * Fix a stuck queue case in AP mode; * Convert a WARN to a simple debug in a legitimate race case (from which we can recover); * Fix a severe throughput aggregation on 9000-family devices due to aggregation issues.
| | | | * iwlwifi: mvm: send delba upon rx ba session timeoutNaftali Goldstein2017-08-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an RX block-ack session times out, the firmware, which offloads RX reordering but not the BA session negotiation, stops the session but doesn't send a DELBA. This causes the the session to remain active in the remote device, so no more BA sessions will be established, causing a severe throughput degradation due to the lack of aggregation. Use the new ieee80211_rx_ba_timer_expired API when the ba session timer expires, since this will tear down the ba session and also send a delba. The previous API used is intended for drivers that offload the addba/delba negotiation, but not the rx reordering, while our driver does the opposite. This patch depends on "mac80211: add api to start ba session timer expired flow". Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>