linux - linux

	Commit message (Collapse)	Author	Files	Lines
2012-08-04	hyperv: Move wait completion msg code into rndis_filter_halt_device()	Haiyang Zhang	2	-7/+11
	We need to wait for send_completion msg before put_rndis_request() at the end of rndis_filter_halt_device(). Otherwise, netvsc_send_completion() may reference freed memory which is overwritten, and cause panic. Reported-by: Long Li <longli@microsoft.com> Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04	net/mlx4_core: Remove port type restrictions	Yevgeny Petrilin	2	-17/+0
	Port1=Eth, Port2=IB restriction is no longer required. Having RoCE, there will always rdma port initialized over ConnectX physical port, no matter whether the link layer is IB or Ethernet. So we always have dual port IB device. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04	net/mlx4_en: Fixing TX queue stop/wake flow	Yevgeny Petrilin	2	-11/+7
	Removing the ring->blocked flag, it is redundant and leads to a race: We close the TX queue and then set the "blocked" flag. Between those 2 operations the completion function can check the "blocked" flag, sees that it is 0, and wouldn't open the TX queue. Using netif_tx_queue_stopped to check the state of the queue to avoid this race. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04	net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC	Amir Vadai	1	-2/+2
	Should NOT check SMAC=DMAC when: 1. loopback is turned on 2. validate_loopback is true. Fixed it accordingly. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04	net_sched: gact: Fix potential panic in tcf_gact().	Hiroaki SHIMODA	1	-3/+11
	gact_rand array is accessed by gact->tcfg_ptype whose value is assumed to less than MAX_RAND, but any range checks are not performed. So add a check in tcf_gact_init(). And in tcf_gact(), we can reduce a branch. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-04	emulex: benet: Add a missing CR in the end of message	Masanari Iida	1	-1/+1
	Missing a CR in printk causes 2 messages printed in one line. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03	ath9k: Add PID/VID support for AR1111	Mohammed Shafi Shajakhan	3	-0/+3
	AR1111 is same as AR9485. The h/w difference between them is quite insignificant, Felix suggests only very few baseband features may not be available in AR1111. The h/w code for AR9485 is already present, so AR1111 should work fine with the addition of its PID/VID. Cc: stable@vger.kernel.org [2.6.39+] Cc: Felix Bitterli <felixb@qca.qualcomm.com> Reported-by: Tim Bentley <Tim.Bentley@Gmail.com> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Tested-by: Tim Bentley <Tim.Bentley@Gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	brcmsmac: use channel flags to restrict OFDM	Seth Forshee	2	-5/+3
	brcmsmac cannot call freq_reg_info() during channel changes as it does not hold cfg80211_lock, and as a result it generates a lockdep warning. freq_reg_info() is being used to determine whether OFDM is allowed on the current channel, so we can avoid the errant call by using the new IEEE80211_CHAN_NO_OFDM for this purpose instead. Reported-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	libertas: fix two memory leaks	Daniel Drake	2	-0/+2
	The if_sdio_card structure was never being freed, and neither was the command structure used for association. Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	rt2x00 : fix rt3290 resuming failed.	Woody Hung	2	-71/+68
	This patch is going to fix the resuming failed from S3/S4 for rt3290 chip. Signed-off-by: Woody Hung <Woody.Hung@mediatek.com> Cc: Kevin Chou <kevin.chou@mediatek.com> Signed-off-by: Chen, Chien-Chia <machen@suse.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	libertas: don't reset card on error when it is being removed	Daniel Drake	1	-1/+4
	On an OLPC XO-1.5 we have seen the following situation: - the system starts going into suspend - no wake params are set, so the mmc layer removes the card - during remove, we send a command to the card - that command fails, causing if_sdio's reset method to try and remove the mmc card in attempt to reset it - the mmc layer is not happy about being asked to remove a card that it is already removing, and the kernel crashes While the MMC layer could possibly be taught to behave better here, it also seems sensible for libertas not to try and reset a card if we're in the process of removing it anyway. Signed-off-by: Daniel Drake <dsd@laptop.org> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	b43: fix logic in GPIO init	Rafał Miłecki	1	-8/+13
	Add some comments by the way Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	bcma: BCM43228 support	Rafał Miłecki	3	-1/+10
	Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-02	cfg80211: Clear "beacon_found" on regulatory restore	Paul Stewart	1	-0/+1
	Restore the default state to the "beacon_found" flag when the channel flags are restored. Otherwise, we can end up with a channel that we can no longer transmit on even when we can see beacons on that channel. Signed-off-by: Paul Stewart <pstew@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-02	cfg80211: add channel flag to prohibit OFDM operation	Seth Forshee	2	-0/+4
	Currently the only way for wireless drivers to tell whether or not OFDM is allowed on the current channel is to check the regulatory information. However, this requires hodling cfg80211_mutex, which is not visible to the drivers. Other regulatory restrictions are provided as flags in the channel definition, so let's do similarly with OFDM. This patch adds a new flag, IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not allowed. This flag is set on any channels for which regulatory indicates that OFDM is prohibited. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-02	ipv4: route.c cleanup	Eric Dumazet	1	-4/+0
	Remove unused includes after IP cache removal Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	bnx2x: fix mem leak when command is unknown	Jesper Juhl	1	-0/+1
	In bnx2x_mcast_enqueue_cmd() we'll leak the memory allocated to 'new_cmd' if we hit the deafault case of the 'switch (cmd)'. Add a 'kfree(new_cmd)' to that case to avoid the leak. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	Fix unexpected SA hard expiration after changing date	Fan Du	2	-4/+21
	After SA is setup, one timer is armed to detect soft/hard expiration, however the timer handler uses xtime to do the math. This makes hard expiration occurs first before soft expiration after setting new date with big interval. As a result new child SA is deleted before rekeying the new one. Signed-off-by: Fan Du <fdu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	tcp: Apply device TSO segment limit earlier	Ben Hutchings	5	-11/+20
	Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to limit the size of TSO skbs. This avoids the need to fall back to software GSO for local TCP senders. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	sfc: Fix maximum number of TSO segments and minimum TX queue size	Ben Hutchings	4	-9/+46
	Currently an skb requiring TSO may not fit within a minimum-size TX queue. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This issue is designated as CVE-2012-3412. Set the maximum number of TSO segments for our devices to 100. This should make no difference to behaviour unless the actual MSS is less than about 700. Increase the minimum TX queue size accordingly to allow for 2 worst-case skbs, so that there will definitely be space to add an skb after we wake a queue. To avoid invalidating existing configurations, change efx_ethtool_set_ringparam() to fix up values that are too small rather than returning -EINVAL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	net: Allow driver to limit number of GSO segments per skb	Ben Hutchings	2	-0/+6
	A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. Therefore: 1. Add the field net_device::gso_max_segs holding the device-specific limit. 2. In netif_skb_features(), if the number of segments is too high then mask out GSO features to force fall back to software GSO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-02	um: Add arch/x86/um to MAINTAINERS	Richard Weinberger	1	-0/+1
	Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02	um: pass siginfo to guest process	Martin Pärtel	10	-34/+71
	UML guest processes now get correct siginfo_t for SIGTRAP, SIGFPE, SIGILL and SIGBUS. Specifically, si_addr and si_code are now correct where previously they were si_addr = NULL and si_code = 128. Signed-off-by: Martin Pärtel <martin.partel@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02	um: fix ubd_file_size for read-only files	Martin Pärtel	1	-1/+1
	Made ubd_file_size not request write access. Fixes use of read-only images. Signed-off-by: Martin Pärtel <martin.partel@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02	um: pull interrupt_end() into userspace()	Al Viro	2	-8/+6
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02	um: split syscall_trace(), pass pt_regs to it	Al Viro	3	-43/+34
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [richard@nod.at: Fixed some minor build issues] Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-01	um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs	Al Viro	3	-5/+5
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-01	mac80211: cancel mesh path timer	Johannes Berg	1	-0/+1
	The mesh path timer needs to be canceled when leaving the mesh as otherwise it could fire after the interface has been removed already. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-01	mac80211: clear timer bits when disconnecting	Johannes Berg	2	-0/+4
	There's a corner case that can happen when we suspend with a timer running, then resume and disconnect. If we connect again, suspend and resume we might start timers that shouldn't be running. Reset the timer flags to avoid this. This affects both mesh and managed modes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-01	MIPS: Loongson 2: Sort out clock managment.	Ralf Baechle	7	-51/+33
	For unexplainable reasons the Loongson 2 clock API was implemented in a module so fixing this involved shifting large amounts of code around. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	locks: remove unused lm_release_private	J. Bruce Fields	3	-8/+1
	In commit 3b6e2723f32d ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01	MIPS: Loongson 1: more clk support and add select HAVE_CLK	Yoichi Yuasa	2	-0/+17
	This fixes a redefinition of clk_: arch/mips/loongson1/common/clock.c:23:13: error: redefinition of 'clk_get' include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here arch/mips/loongson1/common/clock.c:41:15: error: redefinition of 'clk_get_rate' include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here make[3]: ** [arch/mips/loongson1/common/clock.o] Error 1 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: linux-mips@linux-mips.org Reviewed-by: John Crispin <blogic@openwrt.org> Acked-by: Kelvin Cheung <keguang.zhang@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/4143/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK	Yoichi Yuasa	1	-0/+1
	arch/mips/txx9/generic/setup.c:87:13: error: redefinition of 'clk_get' include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here arch/mips/txx9/generic/setup.c:97:5: error: redefinition of 'clk_enable' include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here arch/mips/txx9/generic/setup.c:103:6: error: redefinition of 'clk_disable' include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here arch/mips/txx9/generic/setup.c:108:15: error: redefinition of 'clk_get_rate' include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here arch/mips/txx9/generic/setup.c:114:6: error: redefinition of 'clk_put' include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here make[3]: *** [arch/mips/txx9/generic/setup.o] Error 1 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: linux-mips@linux-mips.org Reviewed-by: John Crispin <blogic@openwrt.org> Patchwork: https://patchwork.linux-mips.org/patch/4142/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK	Yoichi Yuasa	1	-0/+1
	arch/mips/bcm63xx/clk.c:249:5: error: redefinition of 'clk_enable' include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here arch/mips/bcm63xx/clk.c:259:6: error: redefinition of 'clk_disable' include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here arch/mips/bcm63xx/clk.c:268:15: error: redefinition of 'clk_get_rate' include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here arch/mips/bcm63xx/clk.c:275:13: error: redefinition of 'clk_get' include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here arch/mips/bcm63xx/clk.c:302:6: error: redefinition of 'clk_put' include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here make[2]: *** [arch/mips/bcm63xx/clk.o] Error 1 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: linux-mips@linux-mips.org Reviewed-by: John Crispin <blogic@openwrt.org> Patchwork: https://patchwork.linux-mips.org/patch/4141/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK	Yoichi Yuasa	1	-0/+1
	arch/mips/ar7/clock.c:420:5: error: redefinition of 'clk_enable' include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here arch/mips/ar7/clock.c:426:6: error: redefinition of 'clk_disable' include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here arch/mips/ar7/clock.c:431:15: error: redefinition of 'clk_get_rate' include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here arch/mips/ar7/clock.c:437:13: error: redefinition of 'clk_get' include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here arch/mips/ar7/clock.c:454:6: error: redefinition of 'clk_put' include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here make[2]: *** [arch/mips/ar7/clock.o] Error 1 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: linux-mips@linux-mips.org Reviewed-by: John Crispin <blogic@openwrt.org> Acked-by: Florian Fainelli <florian@openwrt.org> Patchwork: https://patchwork.linux-mips.org/patch/4140/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: Lantiq: Platform specific CLK fixup	John Crispin	1	-0/+5
	As we use CLKDEV_LOOKUP but dont have support for COMMON_CLK yet, we need to provide our own version of of_clk_get_from_provider(). Signed-off-by: John Crispin <blogic@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/4117/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: Lantiq: Add device_tree_init function	John Crispin	1	-0/+22
	Add a lantiq specific version of device_tree_init. The generic MIPS version was removed by. commit 594e966bc412d64eec9282d28ce511bdd62fea39 Author: David Daney <david.daney@cavium.com> Date: Thu Jul 5 18:12:38 2012 +0200 MIPS: Prune some target specific code out of prom.c Signed-off-by: John Crispin <blogic@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/4116/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	MIPS: Lantiq: Fix interface clock and PCI control register offset	John Crispin	1	-21/+28
	The XRX200 based SoC have a different register offset for the interface clock and PCI control registers. This patch detects the SoC and sets the register offset at runtime. This make PCI work on the VR9 SoC. Signed-off-by: John Crispin <blogic@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/4113/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-08-01	delousing target_core_file a bit	Al Viro	1	-27/+5
	* set_fs(KERNEL_DS) + getname() is probably the weirdest implementation of strdup() I've seen. Especially since they don't to copy it at all... * filp_open() never returns NULL; it's ERR_PTR(-E...) on failure. * file->f_dentry is never going to be NULL, TYVM. * match_strdup() + snprintf() + kfree() is a bloody weird way to spell match_strlcpy(). Pox on cargo-cult programmers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-01	DM RAID: Add support for MD RAID10	Jonathan Brassow	2	-5/+116
	Support the MD RAID10 personality through dm-raid.c Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-01	block: remove dead func declaration	Yuanhan Liu	1	-1/+0
	__generic_unplug_device() function is removed with commit 7eaceaccab5f40bbfda044629a6298616aeaed50, which forgot to remove the declaration at meantime. Here remove it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01	block: add partition resize function to blkpg ioctl	Vivek Goyal	5	-9/+132
	Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. This patch converts hd_struct->nr_sects into sequence counter because One might extend a partition while IO is happening to it and update of nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This can lead to issues like reading inconsistent size of a partition. Sequence counter have been used so that readers don't have to take bdev mutex lock as we call sector_in_part() very frequently. Now all the access to hd_struct->nr_sects should happen using sequence counter read/update helper functions part_nr_sects_read/part_nr_sects_write. There is one exception though, set_capacity()/get_capacity(). I think theoritically race should exist there too but this patch does not modify set_capacity()/get_capacity() due to sheer number of call sites and I am afraid that change might break something. I have left that as a TODO item. We can handle it later if need be. This patch does not introduce any new races as such w.r.t set_capacity()/get_capacity(). v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Phillip Susi <psusi@ubuntu.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01	block: uninitialized ioc->nr_tasks triggers WARN_ON	Olof Johansson	1	-0/+1
	Hi, I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the below warning as of 3.5-rc1 and later (3.4 is fine): [ 10.886893] ------------[ cut here ]------------ [ 10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560() [ 10.886905] Hardware name: Bochs [ 10.886906] Modules linked in: [ 10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27 [ 10.886908] Call Trace: [ 10.886911] [<ffffffff8107ce8a>] warn_slowpath_common+0x7a/0xb0 [ 10.886912] [<ffffffff8107ced5>] warn_slowpath_null+0x15/0x20 [ 10.886913] [<ffffffff8107c088>] copy_process+0x1488/0x1560 [ 10.886914] [<ffffffff8107c244>] do_fork+0xb4/0x340 [ 10.886918] [<ffffffff8108effa>] ? recalc_sigpending+0x1a/0x50 [ 10.886919] [<ffffffff8108f6b2>] ? __set_task_blocked+0x32/0x80 [ 10.886920] [<ffffffff81091afa>] ? __set_current_blocked+0x3a/0x60 [ 10.886923] [<ffffffff81051db3>] sys_clone+0x23/0x30 [ 10.886925] [<ffffffff8179bd73>] stub_clone+0x13/0x20 [ 10.886927] [<ffffffff8179baa2>] ? system_call_fastpath+0x16/0x1b [ 10.886928] ---[ end trace 32a14af7ee6a590b ]--- Reproducing is easy, I can hit it on a KVM system with a very basic config (x86_64 make defconfig + enable the drivers needed). To hit it, just install dump (on debian/ubuntu, not sure what the package might be called on Fedora), and: dump -o -f /tmp/foo / You'll see the warning in dmesg once it forks off the I/O process and starts dumping filesystem contents. I bisected it down to the following commit: commit f6e8d01bee036460e03bd4f6a79d014f98ba712e Author: Tejun Heo <tj@kernel.org> Date: Mon Mar 5 13:15:26 2012 -0800 block: add io_context->active_ref Currently ioc->nr_tasks is used to decide two things - whether an ioc is done issuing IOs and whether it's shared by multiple tasks. This patch separate out the first into ioc->active_ref, which is acquired and released using {get\|put}_io_context_active() respectively. This will be used to associate bio's with a given task. This patch doesn't introduce any visible behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> It seems like the init of ioc->nr_tasks was removed in that patch, so it starts out at 0 instead of 1. Tejun, is the right thing here to add back the init, or should something else be done? The below patch removes the warning, but I haven't done any more extensive testing on it. Signed-off-by: Olof Johansson <olof@lixom.net> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01	block: do not artificially constrain max_sectors for stacking drivers	Mike Snitzer	1	-2/+1
	blk_set_stacking_limits is intended to allow stacking drivers to build up the limits of the stacked device based on the underlying devices' limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024) doesn't allow the stacking driver to inherit a max_sectors larger than 1024 -- due to blk_stack_limits' use of min_not_zero. It is now clear that this artificial limit is getting in the way so change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows stacking drivers like dm-multipath to inherit 'max_sectors' from the underlying paths). Reported-by: Vijay Chauhan <vijay.chauhan@netapp.com> Tested-by: Vijay Chauhan <vijay.chauhan@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-01	ALSA: snd-usb: fix clock source validity index	Daniel Mack	1	-1/+2
	uac_clock_source_is_valid() uses the control selector value to access the bmControls bitmap of the clock source unit. This is wrong, as control selector values start from 1, while the bitmap uses all available bits. In other words, "Clock Validity Control" is stored in D3..2, not D5..4 of the clock selector unit's bmControls. Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: Andreas Koch <andreas@akdesigninc.com> Cc: stable@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-08-01	rtc/rtc-88pm80x: remove unneed devm_kfree	Devendra Naga	1	-2/+0
	devm_kzalloc() doesn't need a matching devm_kfree(), the freeing mechanism will trigger when driver unloads. Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Ashish Jangam <ashish.jangam@kpitcummins.com> Cc: David Dajun Chen <dchen@diasemi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01	rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails	Devendra Naga	1	-1/+1
	At the probe we are assigning ret to return value of PTR_ERR right after the rtc_register_drive()r, as we would have done it in the if (IS_ERR(ptr)) check, since the function fails and goes inside that case Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Ashish Jangam <ashish.jangam@kpitcummins.com> Cc: David Dajun Chen <dchen@diasemi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01	mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables	Mel Gorman	3	-3/+38
	If a process creates a large hugetlbfs mapping that is eligible for page table sharing and forks heavily with children some of whom fault and others which destroy the mapping then it is possible for page tables to get corrupted. Some teardowns of the mapping encounter a "bad pmd" and output a message to the kernel log. The final teardown will trigger a BUG_ON in mm/filemap.c. This was reproduced in 3.4 but is known to have existed for a long time and goes back at least as far as 2.6.37. It was probably was introduced in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages look like this; [ ..........] Lots of bad pmd messages followed by this [ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7). [ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7). [ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7). [ 127.186778] ------------[ cut here ]------------ [ 127.186781] kernel BUG at mm/filemap.c:134! [ 127.186782] invalid opcode: 0000 [#1] SMP [ 127.186783] CPU 7 [ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod [ 127.186801] [ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR [ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160 [ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002 [ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0 [ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00 [ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003 [ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8 [ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8 [ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000 [ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0 [ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0) [ 127.186821] Stack: [ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b [ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98 [ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000 [ 127.186827] Call Trace: [ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80 [ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220 [ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30 [ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0 [ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0 [ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50 [ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130 [ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0 [ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230 [ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150 [ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30 [ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80 [ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360 [ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170 [ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b [ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0 [ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160 [ 127.186870] RSP <ffff8804144b5c08> [ 127.186871] ---[ end trace 7cbac5d1db69f426 ]--- The bug is a race and not always easy to reproduce. To reproduce it I was doing the following on a single socket I7-based machine with 16G of RAM. $ hugeadm --pool-pages-max DEFAULT:13G $ echo $((1810485761024)) > /proc/sys/kernel/shmmax $ echo $((1810485761024)) > /proc/sys/kernel/shmall $ for i in `seq 1 9000`; do ./hugetlbfs-test; done On my particular machine, it usually triggers within 10 minutes but enabling debug options can change the timing such that it never hits. Once the bug is triggered, the machine is in trouble and needs to be rebooted. The machine will respond but processes accessing proc like "ps aux" will hang due to the BUG_ON. shutdown will also hang and needs a hard reset or a sysrq-b. The basic problem is a race between page table sharing and teardown. For the most part page table sharing depends on i_mmap_mutex. In some cases, it is also taking the mm->page_table_lock for the PTE updates but with shared page tables, it is the i_mmap_mutex that is more important. Unfortunately it appears to be also insufficient. Consider the following situation Process A Process B --------- --------- hugetlb_fault shmdt LockWrite(mmap_sem) do_munmap unmap_region unmap_vmas unmap_single_vma unmap_hugepage_range Lock(i_mmap_mutex) Lock(mm->page_table_lock) huge_pmd_unshare/unmap tables <--- (1) Unlock(mm->page_table_lock) Unlock(i_mmap_mutex) huge_pte_alloc ... Lock(i_mmap_mutex) ... vma_prio_walk, find svma, spte ... Lock(mm->page_table_lock) ... share spte ... Unlock(mm->page_table_lock) ... Unlock(i_mmap_mutex) ... hugetlb_no_page <--- (2) free_pgtables unlink_file_vma hugetlb_free_pgd_range remove_vma_list In this scenario, it is possible for Process A to share page tables with Process B that is trying to tear them down. The i_mmap_mutex on its own does not prevent Process A walking Process B's page tables. At (1) above, the page tables are not shared yet so it unmaps the PMDs. Process A sets up page table sharing and at (2) faults a new entry. Process B then trips up on it in free_pgtables. This patch fixes the problem by adding a new function __unmap_hugepage_range_final that is only called when the VMA is about to be destroyed. This function clears VM_MAYSHARE during unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA ineligible for sharing and avoids the race. Superficially this looks like it would then be vunerable to truncate and madvise issues but hugetlbfs has its own truncate handlers so does not use unmap_mapping_range() and does not support madvise(DONTNEED). This should be treated as a -stable candidate if it is merged. Test program is as follows. The test case was mostly written by Michal Hocko with a few minor changes to reproduce this bug. ==== CUT HERE ==== static size_t huge_page_size = (2UL << 20); static size_t nr_huge_page_A = 512; static size_t nr_huge_page_B = 5632; unsigned int get_random(unsigned int max) { struct timeval tv; gettimeofday(&tv, NULL); srandom(tv.tv_usec); return random() % max; } static void play(void addr, size_t size) { unsigned char start = addr, end = start + size, a; start += get_random(size/2); /* we could itterate on huge pages but let's give it more time. / for (a = start; a < end; a += 4096) a = 0; } int main(int argc, char *argv) { key_t key = IPC_PRIVATE; size_t sizeA = nr_huge_page_A huge_page_size; size_t sizeB = nr_huge_page_B * huge_page_size; int shmidA, shmidB; void addrA = NULL, addrB = NULL; int nr_children = 300, n = 0; if ((shmidA = shmget(key, sizeA, IPC_CREAT\|SHM_HUGETLB\|0660)) == -1) { perror("shmget:"); return 1; } if ((addrA = shmat(shmidA, addrA, SHM_R\|SHM_W)) == (void )-1UL) { perror("shmat"); return 1; } if ((shmidB = shmget(key, sizeB, IPC_CREAT\|SHM_HUGETLB\|0660)) == -1) { perror("shmget:"); return 1; } if ((addrB = shmat(shmidB, addrB, SHM_R\|SHM_W)) == (void )-1UL) { perror("shmat"); return 1; } fork_child: switch(fork()) { case 0: switch (n%3) { case 0: play(addrA, sizeA); break; case 1: play(addrB, sizeB); break; case 2: break; } break; case -1: perror("fork:"); break; default: if (++n < nr_children) goto fork_child; play(addrA, sizeA); break; } shmdt(addrA); shmdt(addrB); do { wait(NULL); } while (--n > 0); shmctl(shmidA, IPC_RMID, NULL); shmctl(shmidB, IPC_RMID, NULL); return 0; } [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build] Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01	tmpfs: distribute interleave better across nodes	Nathan Zimmer	1	-2/+4
	When tmpfs has the interleave memory policy, it always starts allocating for each file from node 0 at offset 0. When there are many small files, the lower nodes fill up disproportionately. This patch spreads out node usage by starting files at nodes other than 0, by using the inode number to bias the starting node for interleave. Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01	mm: remove redundant initialization	Minchan Kim	2	-12/+2
	pg_data_t is zeroed before reaching free_area_init_core(), so remove the now unnecessary initializations. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>