summaryrefslogtreecommitdiffstats
path: root/drivers/net/macvtap.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-261-9/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvtap: Ignore tap features when VNET_HDR is offVlad Yasevich2013-08-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | When the user turns off VNET_HDR support on the macvtap device, there is no way to provide any offload information to the user. So, it's safer to ignore offload setting then depend on the user setting them correctly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.Vlad Yasevich2013-08-201-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | When the user turns off IFF_VNET_HDR flag, attempts to change offload features via TUNSETOFFLOAD do not work. This could cause GSO packets to be delivered to the user when the user is not prepared to handle them. To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvtap: simplify usage of tap_featuresVlad Yasevich2013-08-201-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In macvtap, tap_features specific the features of that the user has specified via ioctl(). If we treat macvtap as a macvlan+tap then we could all the tap a pseudo-device and give it other features like SG and GSO. Then we can stop using the features of lower device (macvlan) when forwarding the traffic the tap. This solves the issue of possible checksum offload mismatch between tap feature and macvlan features. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-171-3/+9
|\|
| * macvtap: fix two racesEric Dumazet2013-08-121-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ac4e4af1e59e1 ("macvtap: Consistently use rcu functions"), Thomas gets two different warnings : BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892 caller is macvtap_do_read+0x45c/0x600 [macvtap] CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bcec>] dump_stack+0x74/0xd8 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap] [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap] [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<00000000006934a6>] kernel_thread_starter+0x6/0xc [<00000000006934a0>] kernel_thread_starter+0x0/0xc And BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898 caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan] CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bdb8>] dump_stack+0x74/0xd4 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan] [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap] [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap] [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<000000000069356e>] kernel_thread_starter+0x6/0xc [<0000000000693568>] kernel_thread_starter+0x0/0xc 2 locks held by vhost-45897/45898: #0: (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net] #1: (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap] In the first case, macvtap_put_user() calls macvlan_count_rx() in a preempt-able context, and this is not allowed. In the second case, macvtap_get_user() calls macvlan_start_xmit() with BH enabled, and this is not allowed. Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Bisected-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet2013-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: move zerocopy_sg_from_iovec() to net/core/datagram.cJason Wang2013-08-081-80/+0
| | | | | | | | | | | | | | To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: move iov_pages() to net/core/iovec.cJason Wang2013-08-081-23/+0
|/ | | | | | | To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang2013-07-181-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: do not assume 802.1Q when send vlan packetsJason Wang2013-07-161-1/+1
| | | | | | | | | The hard-coded 8021.q proto will break 802.1ad traffic. So switch to use vlan->proto. Cc: Basil Gor <basil.gor@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: fix the missing ret value of TUNSETQUEUEJason Wang2013-07-161-0/+1
| | | | | | | | | | | Commit 441ac0fcaadc76ad09771812382345001dd2b813 (macvtap: Convert to using rtnl lock) forget to return what macvtap_ioctl_set_queue() returns to its caller. This may break multiqueue API by always falling through to TUNGETFEATURES. Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: correctly linearize skb when zerocopy is usedJason Wang2013-07-111-2/+6
| | | | | | | | | | | | | | | | | | Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling macvtap_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-07-031-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvtap: fix recovery from gup errorsMichael S. Tsirkin2013-06-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | get user pages might fail partially in macvtap zero copy mode. To recover we need to put all pages that we got, but code used a wrong index resulting in double-free errors. Reported-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: Perform GSO on forwarding path.Vlad Yasevich2013-06-261-1/+31
| | | | | | | | | | | | | | | | | | | | When macvtap forwards skb to its tap, it needs to check if GSO needs to be performed. This is sometimes necessary when the HW device performed GRO, but the guest reading from the tap does not support it (ex: Windows 7). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: Let TUNSETOFFLOAD actually controll offload features.Vlad Yasevich2013-06-261-1/+64
| | | | | | | | | | | | | | | | | | | | | | When the user issues TUNSETOFFLOAD ioctl, macvtap does not do anything other then to verify arguments. This patch adds functionality to allow users to actually control offload features. NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the features can be controlled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: Consistently use rcu functionsVlad Yasevich2013-06-261-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | Currently macvtap uses rcu_bh functions in its user facing fuction macvtap_get_user() and macvtap_put_user(). However, its packet handlers use normal rcu as the rcu_read_lock() is taken in netif_receive_skb(). We can safely discontinue the usage or rcu with bh disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: Convert to using rtnl lockVlad Yasevich2013-06-261-37/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Macvtap uses a private lock to protect the relationship between macvtap_queue and macvlan_dev. The private lock is not needed since the relationship is managed by user via open(), release(), and dellink() calls. dellink() already happens under rtnl, so we can safely convert open() and release(), and use it in ioctl() as well. Suggested by Eric Dumazet. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: fix uninitialized return value macvtap_ioctl_set_queue()Jason Wang2013-06-131-0/+2
| | | | | | | | | | | | | | | | Return -EINVAL on illegal flag instead of uninitialized value. This fixes the kbuild test warning. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: slient sparse warningsJason Wang2013-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch silents the following sparse warnings: drivers/net/macvtap.c:98:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:98:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:98:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:120:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:120:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:120:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:151:22: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:233:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:243:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:247:15: error: incompatible types in comparison expression (different address spaces) CC [M] drivers/net/macvtap.o drivers/net/macvlan.c:232:24: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: enable multiqueue flagJason Wang2013-06-081-5/+2
| | | | | | | | | | | | | | | | To notify the userspace about our capability of multiqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: add TUNSETQUEUE ioctlJason Wang2013-06-081-19/+116
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or enable a queue of macvtap. This is used to be compatible at API layer of tuntap to simplify the userspace to manage the queues. This is done through introducing a linked list to track all taps while using vlan->taps array to only track active taps. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: eliminate linear searchJason Wang2013-06-081-44/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linear search were used in both get_slot() and macvtap_get_queue(), this is because: - macvtap didn't reshuffle the array of taps when create or destroy a queue, so when adding a new queue, macvtap must do linear search to find a location for the new queue. This will also complicate the TUNSETQUEUE implementation for multiqueue API. - the queue itself didn't track the queue index, so the we must do a linear search in the array to find the location of a existed queue. The solution is straightforward: reshuffle the array and introduce a queue_index to macvtap_queue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: introduce macvtap_get_vlan()Jason Wang2013-06-081-7/+20
| | | | | | | | | | | | | | | | | | Factor out the device holding logic to a macvtap_get_vlan(), this will be also used by multiqueue API. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: do not add self to waitqueue if doing a nonblock readJason Wang2013-06-081-2/+5
| | | | | | | | | | | | | | | | | | There's no need to add self to waitqueue if doing a nonblock read. This could help to avoid the spinlock contention. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: fix a possible race between queue selection and changing queuesJason Wang2013-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Complier may generate codes that re-read the vlan->numvtaps during macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the same time and which can lead unexpected result (e.g. very huge value). We need prevent the compiler from generating such codes by adding an ACCESS_ONCE() to make sure vlan->numvtaps were only read once. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: pass info struct via netdevice notifierJiri Pirko2013-05-281-1/+1
|/ | | | | | | | | | | | | | So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
* net: switch to use skb_probe_transport_header()Jason Wang2013-03-271-8/+1
| | | | | | | | | | | Switch to use the new help skb_probe_transport_header() to do the l4 header probing for untrusted sources. For packets with partial csum, the header should already been set by skb_partial_csum_set(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: set transport header before passing skb to lower deviceJason Wang2013-03-261-0/+9
| | | | | | | | | | | | | | Set the transport header for 1) some drivers (e.g ixgbe) needs l4 header 2) precise packet length estimation (introduced in 1def9238) needs l4 header to compute header length. For the packets with partial checksum, the patch just set the transport header to csum_start. Otherwise tries to use skb_flow_dissect() to get l4 offset, if it fails, just pretend no l4 header. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: convert to idr_alloc()Tejun Heo2013-02-281-16/+5
| | | | | | | | | | Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: Fix possible wrong checksum generation.Pravin B Shelar2013-02-131-2/+2
| | | | | | | | | | | | | | | | | | | | | Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix possible wrong checksum generationEric Dumazet2013-01-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: rcu_dereference outside read-lock sectionDenis Efremov2012-08-121-1/+2
| | | | | | | | | | | | rcu_dereference occurs in update section. Replacement by rcu_dereference_protected in order to prevent lockdep complaint. Found by Linux Driver Verification project (linuxtesting.org) Signed-off-by: Denis Efremov <yefremov.denis@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: use prepare_to_wait/finish_wait to ensure mbHong zhi guo2012-06-071-5/+3
| | | | | | | instead of raw assignment to current->state Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-05-171-5/+38
|\
| * macvtap: restore vlan header on user readBasil Gor2012-05-121-5/+38
| | | | | | | | | | | | | | | | | | | | Ethernet vlan header is not on the packet and kept in the skb->vlan_tci when it comes from lower dev. This patch inserts vlan header in user buffer during skb copy on user read. Signed-off-by: Basil Gor <basil.gor@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvtap: zerocopy: validate vectors before building skbJason Wang2012-05-021-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | There're several reasons that the vectors need to be validated: - Return error when caller provides vectors whose num is greater than UIO_MAXIOV. - Linearize part of skb when userspace provides vectors grater than MAX_SKB_FRAGS. - Return error when userspace provides vectors whose total length may exceed - MAX_SKB_FRAGS * PAGE_SIZE. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | macvtap: zerocopy: set SKBTX_DEV_ZEROCOPY only when skb is built successfullyJason Wang2012-05-021-4/+5
| | | | | | | | | | | | | | | | | | | | | | Current the SKBTX_DEV_ZEROCOPY is set unconditionally after zerocopy_sg_from_iovec(), this would lead NULL pointer when macvtap fails to build zerocopy skb because destructor_arg was not initialized. Solve this by set this flag after the skb were built successfully. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | macvtap: zerocopy: put page when fail to get all requested user pagesJason Wang2012-05-021-2/+4
| | | | | | | | | | | | | | | | | | When get_user_pages_fast() fails to get all requested pages, we could not use kfree_skb() to free it as it has not been put in the skb fragments. So we need to call put_page() instead. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | macvtap: zerocopy: fix truesize underestimationJason Wang2012-05-021-2/+4
| | | | | | | | | | | | | | | | As the skb fragment were pinned/built from user pages, we should account the page instead of length for truesize. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | macvtap: zerocopy: fix offset calculation when building skbJason Wang2012-05-021-6/+7
|/ | | | | | | | | | This patch fixes the offset calculation when building skb: - offset1 were used as skb data offset not vector offset - reset offset to zero only when we advance to next vector Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* security: trim security.hAl Viro2012-02-141-0/+1
| | | | | | | Trim security.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
* macvtap: Fix macvtap_get_queue to use rxhash firstKrishna Kumar2011-12-201-8/+8
| | | | | | | It was reported that the macvtap device selects a Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: treewide use of RCU_INIT_POINTEREric Dumazet2011-11-241-4/+4
| | | | | | | | | | | | rcu_assign_pointer(ptr, NULL) can be safely replaced by RCU_INIT_POINTER(ptr, NULL) (old rcu_assign_pointer() macro was testing the NULL value and could omit the smp_wmb(), but this had to be removed because of compiler warnings) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Fix the minor device number allocationEric W. Biederman2011-10-211-16/+71
| | | | | | | | | | | | | | | | | | | | | | | | On systems that create and delete lots of dynamic devices the 31bit linux ifindex fails to fit in the 16bit macvtap minor, resulting in unusable macvtap devices. I have systems running automated tests that that hit this condition in just a few days. Use a linux idr allocator to track which mavtap minor numbers are available and and to track the association between macvtap minor numbers and macvtap network devices. Remove the unnecessary unneccessary check to see if the network device we have found is indeed a macvtap device. With macvtap specific data structures it is impossible to find any other kind of networking device. Increase the macvtap minor range from 65536 to the full 20 bits that is supported by linux device numbers. It doesn't solve the original problem but there is no penalty for a larger minor device range. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Rewrite macvtap_newlink so the error handling works.Eric W. Biederman2011-10-211-24/+49
| | | | | | | | | | | | | | Place macvlan_common_newlink at the end of macvtap_newlink because failing in newlink after registering your network device is not supported. Move device_create into a netdevice creation notifier. The network device notifier is the only hook that is called after the network device has been registered with the device layer and before register_network_device returns success. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Don't leak unreceived packets when we delete a macvtap device.Eric W. Biederman2011-10-211-0/+6
| | | | | | | | To avoid leaking packets in the receive queue. Add a socket destructor that will run whenever destroy a macvtap socket. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Fix macvtap_open races in the zero copy enable code.Eric W. Biederman2011-10-211-6/+5
| | | | | | | | | | | To see if it is appropriate to enable the macvtap zero copy feature don't test the lowerdev network device flags. Instead test the macvtap network device flags which are a direct copy of the lowerdev flags. This is important because nothing holds a reference to lowerdev and on a very bad day we lowerdev could be a pointer to stale memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Close a race between macvtap_open and macvtap_dellink.Eric W. Biederman2011-10-211-0/+2
| | | | | | | | | | | | There is a small window in macvtap_open between looking up a networking device and calling macvtap_set_queue in which macvtap_del_queues called from macvtap_dellink. After calling macvtap_del_queues it is totally incorrect to allow macvtap_set_queue to proceed so prevent success by reporting that all of the available queues are in use. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>