linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	ethtool: fix drvinfo strings set in drivers	Jiri Pirko	2013-01-07	6	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ethtool: set addr_assign_type to NET_ADDR_SET when addr is passed on create	Jiri Pirko	2013-01-07	1	-1/+3
\| \| \| \| \| \| \| \|	In case user passed address via netlink during create, NET_ADDR_PERM was set. That is not correct so fix this by setting NET_ADDR_SET. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ndisc: Remove unused space at tail of skb for ndisc messages. (TAKE 3)	YOSHIFUJI Hideaki / 吉藤英明	2013-01-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the size of skb allocated for NDISC is MAX_HEADER + LL_RESERVED_SPACE(dev) + packet length + dev->needed_tailroom, but only LL_RESERVED_SPACE(dev) bytes is "reserved" for headers. As a result, the skb looks like this (after construction of the message): head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\|<--MAX_HEADER-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom As the name implies, "MAX_HEADER" is used for headers, and should be "reserved" in prior to packet construction. Or, if some space is really required at the tail of ther skb, it should be explicitly documented. We have several option after construction of NDISC message: Option 1: head data tail end +---------------------------------------------+ + \| \| \| +---------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Option 2: head data tail end +--------------------------------------------------+ + \| \| \| +--------------------------------------------------+ \|<--MAX_HEADER-->\|<---ipv6 packet------>\|<--tlen-->\| = dev ->needed_ tailroom Option 3: head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<--MAX_HEADER-->\|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Our tunnel drivers try expanding headroom and the space for tunnel encapsulation was not a mandatory space -- so we are not seeing bugs here --, but just for optimization for performance critial situations. Since NDISC messages are not performance critical unlike TCP, and as we know outgoing device, LL_RESERVED_SPACE(dev) should be just enough for the device in most (if not all) cases: LL_RESERVED_SPACE(dev) <= LL_MAX_HEADER <= MAX_HEADER Note that LL_RESERVED_SPACE(dev) is also enough for NDISC over SIT (e.g., ISATAP). So, I think Option 1 is just fine here. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: remove no longer used netdev_set_bond_master() and netdev_set_master()	Jiri Pirko	2013-01-04	1	-63/+0
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bonding: remove usage of dev->master	Jiri Pirko	2013-01-04	1	-0/+1
\| \| \| \| \| \| \|	Benefit from new upper dev list and free bonding from dev->master usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vlan: remove usage of dev->master in __vlan_find_dev_deep()	Jiri Pirko	2013-01-04	1	-7/+11
\| \| \| \| \| \| \| \|	Also, since all users call __vlan_find_dev_deep() with rcu_read_lock, make no possibility to call this with rtnl mutex held only. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: remove usage of dev->master	Jiri Pirko	2013-01-04	1	-3/+6
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: remove usage of netdev_set_master()	Jiri Pirko	2013-01-04	1	-3/+3
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rtnetlink: remove usage of dev->master	Jiri Pirko	2013-01-04	1	-32/+37
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vlan: add link to upper device	Jiri Pirko	2013-01-04	1	-1/+9
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: introduce upper device lists	Jiri Pirko	2013-01-04	1	-4/+235
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lists are supposed to serve for storing pointers to all upper devices. Eventually it will replace dev->master pointer which is used for bonding, bridge, team but it cannot be used for vlan, macvlan where there might be multiple upper present. In case the upper link is replacement for dev->master, it is marked with "master" flag. New upper device list resolves this limitation. Also, the information stored in lists is used for preventing looping setups like "bond->somethingelse->samebond" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: remove unnecessary NET_ADDR_RANDOM "bitclean"	Jiri Pirko	2013-01-04	4	-5/+0
\| \| \| \| \| \| \| \|	NET_ADDR_SET is set in dev_set_mac_address() no need to alter dev->addr_assign_type value in drivers. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add address assign type "SET"	Jiri Pirko	2013-01-04	1	-0/+1
\| \| \| \| \| \| \| \|	This is the way to indicate that mac address of a device has been set by dev_set_mac_address() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: call add_device_randomness() only after successful mac change	Jiri Pirko	2013-01-04	1	-3/+4
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rtnl: use dev_set_mac_address() instead of plain ndo_	Jiri Pirko	2013-01-04	1	-18/+2
\| \| \| \| \| \| \| \|	Benefit from existence of dev_set_mac_address() and remove duplicate code. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: respect RFC2863 operational state	stephen hemminger	2012-12-30	4	-6/+9
\| \| \| \| \| \| \| \| \| \| \|	The bridge link detection should follow the operational state of the lower device, rather than the carrier bit. This allows devices like tunnels that are controlled by userspace control plane to work with bridge STP link management. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: return -EINVAL if BPF_S_ANC* operation is not supported	Daniel Borkmann	2012-12-30	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we return -EINVAL for malformed or wrong BPF filters. However, this is not done for BPF_S_ANC* operations, which makes it more difficult to detect if it's actually supported or not by the BPF machine. Therefore, we should also return -EINVAL if K is within the SKF_AD_OFF universe and the ancillary operation did not match. Why exactly is it needed? If tools such as libpcap/tcpdump want to make use of new ancillary operations (like filtering VLAN in kernel space), there is currently no sane way to test if this feature / BPF_S_ANC* op is present or not, since no error is returned. This patch will make life easier for that and allow for a proper usage for user space applications. There was concern, if this patch will break userland. Short answer: Yes and no. Long answer: It will "break" only for code that calls ... { BPF_LD \| BPF_(W\|H\|B) \| BPF_ABS, 0, 0, <K> }, ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is not an ancillary. And here comes the BUT: assuming some old code will have such an instruction where <K> is between [0xfffff000, 0xffffffff] and it doesn't know ancillary operations, then this will give a non-expected / unwanted behavior as well (since we do not return the BPF machine with 0 after a failed load_pointer(), which was the case before introducing ancillary operations, but load sth. into the accumulator instead, and continue with the next instruction, for instance). Thus, user space code would already have been broken by introducing ancillary operations into the BPF machine per se. Code that does such a direct load, e.g. "load word at packet offset 0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken, isn't it? The whole assumption of ancillary operations is that no-one intentionally calls things like "ld [0xffffffff]" and expect this word to be loaded from such a packet offset. Hence, we can also safely make use of this feature testing patch and facilitate application development. Therefore, at least from this patch onwards, we have for sure a check whether current or in future implemented BPF_S_ANC* ops are supported in the kernel. Patch was tested on x86_64. (Thanks to Eric for the previous review.) Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Ani Sinha <ani@aristanetworks.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	skbuff: make __kmalloc_reserve static	stephen hemminger	2012-12-29	1	-2/+3
\| \| \| \| \| \| \| \|	Sparse detected case where this local function should be static. It may even allow some compiler optimizations. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: make proc_tcp_fastopen_key static	stephen hemminger	2012-12-29	1	-2/+2
\| \| \| \| \| \| \|	Detected by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sctp: make sctp_addr_wq_timeout_handler static	stephen hemminger	2012-12-29	1	-1/+1
\| \| \| \| \| \| \|	Fix sparse warning about local function that should be static. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: use per task frag allocator in skb_append_datato_frags	Eric Dumazet	2012-12-29	1	-27/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new per task frag allocator in skb_append_datato_frags(), to reduce number of frags and page allocator overhead. Tested: ifconfig lo mtu 16436 perf record netperf -t UDP_STREAM ; perf report before : Throughput: 32928 Mbit/s 51.79% netperf [kernel.kallsyms] [k] copy_user_generic_string 5.98% netperf [kernel.kallsyms] [k] __alloc_pages_nodemask 5.58% netperf [kernel.kallsyms] [k] get_page_from_freelist 5.01% netperf [kernel.kallsyms] [k] __rmqueue 3.74% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.87% netperf [kernel.kallsyms] [k] prep_new_page 1.42% netperf [kernel.kallsyms] [k] next_zones_zonelist 1.28% netperf [kernel.kallsyms] [k] __inc_zone_state 1.26% netperf [kernel.kallsyms] [k] alloc_pages_current 0.78% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.74% netperf [kernel.kallsyms] [k] udp_sendmsg 0.72% netperf [kernel.kallsyms] [k] zone_watermark_ok 0.68% netperf [kernel.kallsyms] [k] __cpuset_node_allowed_softwall 0.67% netperf [kernel.kallsyms] [k] fib_table_lookup 0.60% netperf [kernel.kallsyms] [k] memcpy_fromiovecend 0.55% netperf [kernel.kallsyms] [k] __udp4_lib_lookup after: Throughput: 47185 Mbit/s 61.74% netperf [kernel.kallsyms] [k] copy_user_generic_string 2.07% netperf [kernel.kallsyms] [k] prep_new_page 1.98% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.02% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.97% netperf [kernel.kallsyms] [k] enqueue_task_fair 0.97% netperf [kernel.kallsyms] [k] udp_sendmsg 0.91% netperf [kernel.kallsyms] [k] __ip_route_output_key 0.88% netperf [kernel.kallsyms] [k] __netif_receive_skb 0.87% netperf [kernel.kallsyms] [k] fib_table_lookup 0.85% netperf [kernel.kallsyms] [k] resched_task 0.78% netperf [kernel.kallsyms] [k] __udp4_lib_lookup 0.77% netperf [kernel.kallsyms] [k] _raw_spin_lock_irqsave Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rtnl: expose carrier value with possibility to set it	Jiri Pirko	2012-12-29	1	-0/+10
\| \| \| \| \| \|	Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: allow to change carrier via sysfs	Jiri Pirko	2012-12-29	1	-1/+14
\| \| \| \| \| \| \| \|	Make carrier writable Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add change_carrier netdev op	Jiri Pirko	2012-12-29	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	This allows a driver to register change_carrier callback which will be called whenever user will like to change carrier state. This is useful for devices like dummy, gre, team and so on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6/ip6_gre: set transport header correctly	Isaku Yamahata	2012-12-27	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4/ip_gre: set transport header correctly to gre header	Isaku Yamahata	2012-12-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipgre_tunnel_xmit() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. So set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IB/rds: suppress incompatible protocol when version is known	Marciniszyn, Mike	2012-12-27	1	-6/+5
\| \| \| \| \| \| \| \|	Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len	Marciniszyn, Mike	2012-12-27	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs") added uses of sg_dma_len() and sg_dma_address(). This makes RDS DOA with the qib driver. IB ulps should use ib_sg_dma_len() and ib_sg_dma_address respectively since some HCAs overload ib_sg_dma* operations. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: should drop incoming frames without ACK flag set	Eric Dumazet	2012-12-27	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 96e0bf4b5193d (tcp: Discard segments that ack data not yet sent) John Dykstra enforced a check against ack sequences. In commit 354e4aa391ed5 (tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation) I added more safety tests. But we missed fact that these tests are not performed if ACK bit is not set. RFC 793 3.9 mandates TCP should drop a frame without ACK flag set. " fifth check the ACK field, if the ACK bit is off drop the segment and return" Not doing so permits an attacker to only guess an acceptable sequence number, evading stronger checks. Many thanks to Zhiyun Qian for bringing this issue to our attention. See : http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf Reported-by: Zhiyun Qian <zhiyunq@umich.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	batman-adv: fix random jitter calculation	Akinobu Mita	2012-12-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	arp: fix a regression in arp_solicit()	Cong Wang	2012-12-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sedat reported the following commit caused a regression: commit 9650388b5c56578fdccc79c57a8c82fb92b8e7f1 Author: Eric Dumazet <edumazet@google.com> Date: Fri Dec 21 07:32:10 2012 +0000 ipv4: arp: fix a lockdep splat in arp_solicit This is due to the 6th parameter of arp_send() needs to be NULL for the broadcast case, the above commit changed it to an all-zero array by mistake. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: sched: integer overflow fix	Stefan Hasko	2012-12-22	1	-1/+1
\| \| \| \| \| \| \| \|	Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	CONFIG_HOTPLUG removal from networking core	Greg KH	2012-12-22	3	-15/+0
\| \| \| \| \| \| \| \| \| \|	CONFIG_HOTPLUG is always enabled now, so remove the unused code that was trying to be compiled out when this option was disabled, in the networking core. Cc: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: call br_netpoll_disable in br_add_if	Gao feng	2012-12-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	When netdev_set_master faild in br_add_if, we should call br_netpoll_disable to do some cleanup jobs,such as free the memory of struct netpoll which allocated in br_netpoll_enable. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: arp: fix a lockdep splat in arp_solicit()	Eric Dumazet	2012-12-21	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Yan Burman reported following lockdep warning : ============================================= [ INFO: possible recursive locking detected ] 3.7.0+ #24 Not tainted --------------------------------------------- swapper/1/0 is trying to acquire lock: (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send +0x2e/0x2f0 but task is already holding lock: (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&n->lock); lock(&n->lock); * DEADLOCK * May be due to missing lock nesting notation 4 locks held by swapper/1/0: #0: (((&n->timer))){+.-...}, at: [<ffffffff8104b350>] call_timer_fn+0x0/0x1c0 #1: (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit +0x1d4/0x280 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>] dev_queue_xmit+0x0/0x5d0 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>] ip_finish_output+0x13e/0x640 stack backtrace: Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24 Call Trace: <IRQ> [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0 [<ffffffff8108d570>] __lock_acquire+0x440/0xc30 [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8108ddf5>] lock_acquire+0x95/0x140 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0 [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270 [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640 [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640 [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan] [<ffffffff813cb9a0>] ip_output+0x80/0xf0 [<ffffffff813ca368>] ip_local_out+0x28/0x80 [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan] [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan] [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0 [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80 [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270 [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0 [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0 [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0 [<ffffffff813f5788>] arp_xmit+0x58/0x60 [<ffffffff813f59db>] arp_send+0x3b/0x40 [<ffffffff813f6424>] arp_solicit+0x204/0x280 [<ffffffff813a1a70>] ? neigh_add+0x310/0x310 [<ffffffff8139f515>] neigh_probe+0x45/0x70 [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0 [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0 [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120 [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0 [<ffffffff813a1a70>] ? neigh_add+0x310/0x310 [<ffffffff81043e51>] __do_softirq+0x101/0x280 [<ffffffff814518cc>] call_softirq+0x1c/0x30 [<ffffffff81003b65>] do_softirq+0x85/0xc0 [<ffffffff81043a7e>] irq_exit+0x9e/0xc0 [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0 [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80 <EOI> [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0 [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0 [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0 [<ffffffff81441127>] start_secondary+0x1b2/0x1b6 Bug is from arp_solicit(), releasing the neigh lock after arp_send() In case of vxlan, we eventually need to write lock a neigh lock later. Its a false positive, but we can get rid of it without lockdep annotations. We can instead use neigh_ha_snapshot() helper. Reported-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: devnet_rename_seq should be a seqcount	Eric Dumazet	2012-12-21	2	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using a seqlock for devnet_rename_seq is not a good idea, as device_rename() can sleep. As we hold RTNL, we dont need a protection for writers, and only need a seqcount so that readers can catch a change done by a writer. Bug added in commit c91f6df2db4972d3 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ip_gre: fix possible use after free	Eric Dumazet	2012-12-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|	Once skb_realloc_headroom() is called, tiph might point to freed memory. Cache tiph->ttl value before the reallocation, to avoid unexpected behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally	Isaku Yamahata	2012-12-21	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	ipgre_tunnel_xmit() parses network header as IP unconditionally. But transmitting packets are not always IP packet. For example such packet can be sent by packet socket with sockaddr_ll.sll_protocol set. So make the function check if skb->protocol is IP. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux	Linus Torvalds	2012-12-20	5	-60/+61
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull nfsd update from Bruce Fields: "Included this time: - more nfsd containerization work from Stanislav Kinsbursky: we're not quite there yet, but should be by 3.9. - NFSv4.1 progress: implementation of basic backchannel security negotiation and the mandatory BACKCHANNEL_CTL operation. See http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues for remaining TODO's - Fixes for some bugs that could be triggered by unusual compounds. Our xdr code wasn't designed with v4 compounds in mind, and it shows. A more thorough rewrite is still a todo. - If you've ever seen "RPC: multiple fragments per record not supported" logged while using some sort of odd userland NFS client, that should now be fixed. - Further work from Jeff Layton on our mechanism for storing information about NFSv4 clients across reboots. - Further work from Bryan Schumaker on his fault-injection mechanism (which allows us to discard selective NFSv4 state, to excercise rarely-taken recovery code paths in the client.) - The usual mix of miscellaneous bugs and cleanup. Thanks to everyone who tested or contributed this cycle." * 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits) nfsd4: don't leave freed stateid hashed nfsd4: free_stateid can use the current stateid nfsd4: cleanup: replace rq_resused count by rq_next_page pointer nfsd: warn on odd reply state in nfsd_vfs_read nfsd4: fix oops on unusual readlike compound nfsd4: disable zero-copy on non-final read ops svcrpc: fix some printks NFSD: Correct the size calculation in fault_inject_write NFSD: Pass correct buffer size to rpc_ntop nfsd: pass proper net to nfsd_destroy() from NFSd kthreads nfsd: simplify service shutdown nfsd: replace boolean nfsd_up flag by users counter nfsd: simplify NFSv4 state init and shutdown nfsd: introduce helpers for generic resources init and shutdown nfsd: make NFSd service structure allocated per net nfsd: make NFSd service boot time per-net nfsd: per-net NFSd up flag introduced nfsd: move per-net startup code to separated function nfsd: pass net to __write_ports() and down nfsd: pass net to nfsd_set_nrthreads() ...
\| *	nfsd4: cleanup: replace rq_resused count by rq_next_page pointer	J. Bruce Fields	2012-12-18	4	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It may be a matter of personal taste, but I find this makes the code clearer. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: fix some printks	J. Bruce Fields	2012-12-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	SUNRPC: remove redundant "linux/nsproxy.h" includes	Stanislav Kinsbursky	2012-12-10	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a cleanup patch. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: support multiple-fragment rpc's	J. Bruce Fields	2012-12-04	1	-25/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Over TCP, RPC's are preceded by a single 4-byte field telling you how long the rpc is (in bytes). The spec also allows you to send an RPC in multiple such records (the high bit of the length field is used to tell you whether this is the final record). We've survived for years without supporting this because in practice the clients we care about don't use it. But the userland rpc libraries do, and every now and then an experimental client will run into this. (Most recently I noticed it while trying to write a pynfs check.) And we're really on the wrong side of the spec here--let's fix this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: track rpc data length separately from sk_tcplen	J. Bruce Fields	2012-12-04	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep a separate field, sk_datalen, that tracks only the data contained in a fragment, not including the fragment header. For now, this is always just max(0, sk_tcplen - 4), but after we allow multiple fragments sk_datalen will accumulate the total rpc data size while sk_tcplen only tracks progress receiving the current fragment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: fix off-by-4 error in "incomplete TCP record" dprintk	J. Bruce Fields	2012-12-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The full reclen doesn't include the fragment header, but sk_tcplen does. Fix this to make it an apples-to-apples comparison. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: delay minimum-rpc-size check till later	J. Bruce Fields	2012-12-04	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Soon we want to support multiple fragments, in which case it may be legal for a single fragment to be smaller than 8 bytes, so we'll want to delay this check till we've reached the last fragment. Also fix an outdated comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: don't byte-swap sk_reclen in place	J. Bruce Fields	2012-12-04	1	-15/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Byte-swapping in place is always a little dubious. Let's instead define this field to always be big-endian, and do the swapping on demand where we need it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
\| *	svcrpc: demote some printks to a dprintk	J. Bruce Fields	2012-11-08	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general I'd rather random bad behavior on the network won't trigger a printk. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2012-12-20	4	-111/+105
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph update from Sage Weil: "There are a few different groups of commits here. The largest is Alex's ongoing work to enable the coming RBD features (cloning, striping). There is some cleanup in libceph that goes along with it. Cyril and David have fixed some problems with NFS reexport (leaking dentries and page locks), and there is a batch of patches from Yan fixing problems with the fs client when running against a clustered MDS. There are a few bug fixes mixed in for good measure, many of which will be going to the stable trees once they're upstream. My apologies for the late pull. There is still a gremlin in the rbd map/unmap code and I was hoping to include the fix for that as well, but we haven't been able to confirm the fix is correct yet; I'll send that in a separate pull once it's nailed down." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits) rbd: get rid of rbd_{get,put}_dev() libceph: register request before unregister linger libceph: don't use rb_init_node() in ceph_osdc_alloc_request() libceph: init event->node in ceph_osdc_create_event() libceph: init osd->o_node in create_osd() libceph: report connection fault with warning libceph: socket can close in any connection state rbd: don't use ENOTSUPP rbd: remove linger unconditionally rbd: get rid of RBD_MAX_SEG_NAME_LEN libceph: avoid using freed osd in __kick_osd_requests() ceph: don't reference req after put rbd: do not allow remove of mounted-on image libceph: Unlock unprocessed pages in start_read() error path ceph: call handle_cap_grant() for cap import message ceph: Fix __ceph_do_pending_vmtruncate ceph: Don't add dirty inode to dirty list if caps is in migration ceph: Fix infinite loop in __wake_requests ceph: Don't update i_max_size when handling non-auth cap bdi_register: add __printf verification, fix arg mismatch ...
\| * \|	libceph: register request before unregister linger	Alex Elder	2012-12-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In kick_requests(), we need to register the request before we unregister the linger request. Otherwise the unregister will reset the request's osd pointer to NULL. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>