linux - linux

	Commit message (Collapse)	Author	Files	Lines
2013-06-29	[readdir] convert nfs	Al Viro	1	-26/+25
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert ext4	Al Viro	3	-190/+134
	and trim the living hell out bogosities in inline dir case Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert qnx6	Al Viro	1	-17/+14
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert qnx4	Al Viro	1	-35/+31
	... and use strnlen() instead of strlen() - it's done on untrusted data, after all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert omfs	Al Viro	1	-56/+38
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert nilfs2	Al Viro	1	-30/+18
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert sysfs	Al Viro	1	-48/+18
	get rid of the kludges in sysfs_readdir() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert gfs2	Al Viro	4	-51/+38
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert exofs	Al Viro	1	-22/+16
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert bfs	Al Viro	1	-21/+14
	... and get rid of that ridiculous mutex in bfs_readdir() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert procfs	Al Viro	9	-489/+284
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert openpromfs	Al Viro	1	-51/+44
	what the hell is op_mutex for, BTW? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert efs	Al Viro	1	-42/+33
	* sanity checks belong before risky operation, not after it * don't quit as soon as we'd found an entry Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert configfs	Al Viro	1	-70/+52
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert romfs	Al Viro	1	-12/+9
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert squashfs	Al Viro	1	-28/+12
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert ubifs	Al Viro	1	-41/+16
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert udf	Al Viro	1	-37/+26
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] convert ext3	Al Viro	2	-93/+70
	new helper: dir_relax(inode). Call when you are in location that will _not_ be invalidated by directory modifications (block boundary, in case of ext*). Returns whether the directory has survived (dropping i_mutex allows rmdir to kill the sucker; if it returns false to us, ->iterate() is obviously done) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] switch dcache_readdir() users to ->iterate()	Al Viro	4	-60/+65
	new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx), dir_emit_dots(file, ctx). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] simple local unixlike: switch to ->iterate()	Al Viro	4	-75/+59
	ext2, ufs, minix, sysv Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] introduce ->iterate(), ctx->pos, dir_emit()	Al Viro	7	-18/+47
	New method - ->iterate(file, ctx). That's the replacement for ->readdir(); it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and calls dir_emit(ctx, ...) instead of filldir(data, ...). It does not update file->f_pos (or look at it, for that matter); iterate_dir() does the update. Note that dir_emit() takes the offset from ctx->pos (and eventually filldir_t will lose that argument). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	[readdir] introduce iterate_dir() and dir_context	Al Viro	10	-20/+53
	iterate_dir(): new helper, replacing vfs_readdir(). struct dir_context: contains the readdir callback (and will get more stuff in it), embedded into whatever data that callback wants to deal with; eventually, we'll be passing it to ->readdir() replacement instead of (data,filldir) pair. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	move linux/loop.h to drivers/block	Al Viro	3	-3/+3
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	compat.c: LOOP_CLR_FD is taken care of in loop.c itself...	Al Viro	1	-3/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	pxa3xx: VM_IO is set by io_remap_pfn_range()	Al Viro	1	-1/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	au1100fb: VM_IO is set by io_remap_pfn_range()	Al Viro	1	-2/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	au1200fb: io_remap_pfn_range() sets VM_IO	Al Viro	1	-4/+0
	... and single return is quite sufficient to get out of function, TYVM Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	vfio: remap_pfn_range() sets all those flags...	Al Viro	1	-1/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	i810: VM_IO is set by io_remap_pfn_range()	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	drm: io_remap_pfn_range() sets VM_IO...	Al Viro	1	-1/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	sparc: __pci_mmap_set_flags() is useless	Al Viro	1	-10/+0
	io_remap_pfn_range() does all we need Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	mn10300: don't bother with VM_IO	Al Viro	1	-1/+1
	io_remap_pfn_range() sets it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	hose_mmap_page_range(): io_remap_pfn_range() will set all those flags...	Al Viro	1	-1/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	samsung: don't bother with setting VM_IO	Al Viro	1	-1/+0
	io_remap_pfn_range() will set it just fine Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	consolidate io_remap_pfn_range definitions	Al Viro	34	-105/+7
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	UBIFS: fix a horrid bug	Artem Bityutskiy	1	-3/+27
	Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage, but this may corrupt memory and lead to all kinds of problems like crashes an security holes. This patch fixes the problem by using the 'file->f_version' field, which '->llseek()' always unconditionally sets to zero. We set it to 1 in 'ubifs_readdir()' and whenever we detect that it became 0, we know there was a seek and it is time to clear the state saved in 'file->private_data'. I tested this patch by writing a user-space program which runds readdir and seek in parallell. I could easily crash the kernel without these patches, but could not crash it with these patches. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29	UBIFS: prepare to fix a horrid bug	Artem Bityutskiy	1	-12/+12
	Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. First of all, this means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it. But this particular patch does not fix the problem. This patch is only a preparation, and the fix will follow next. In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly, because 'file->f_pos' can be changed by '->llseek()' at any point. This may lead 'ubifs_readdir()' to returning inconsistent data: directory entry names may correspond to incorrect file positions. So here we introduce a local variable 'pos', read 'file->f_pose' once at very the beginning, and then stick to 'pos'. The result of this is that when 'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of 'ubifs_readdir()', the latter "wins". Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-22	aout32 coredump compat fix	Al Viro	1	-1/+1
	dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout handles it correctly (seeks by PAGE_SIZE - sizeof(struct user), getting the current position to PAGE_SIZE), compat one seeks by PAGE_SIZE and ends up at PAGE_SIZE + already written... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20	splice: don't pass the address of ->f_pos to methods	Al Viro	5	-23/+41
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-19	mconsole: we'd better initialize pos before passing it to vfs_read()...	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-16	lseek(fd, n, SEEK_END) does not go to eof - n	Al Viro	4	-4/+4
	When you copy some code, you are supposed to read it. If nothing else, there's a chance to spot and fix an obvious bug instead of sharing it... X-Song: "I Got It From Agnes", by Tom Lehrer Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ Tom Lehrer? You're dating yourself, Al ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-15	Linux 3.10-rc6v3.10-rc6	Linus Torvalds	1	-1/+1

2013-06-15	smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().	David Daney	1	-7/+12
	Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts are enabled too early"), "bloody murder" is now being screamed. With a MIPS OCTEON config, we use on_each_cpu() in our irq_chip.irq_bus_sync_unlock() function. This gets called in early as a result of the time_init() call. Because the !SMP version of on_each_cpu() unconditionally enables irqs, we get: WARNING: at init/main.c:560 start_kernel+0x250/0x410() Interrupts were enabled early CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801 Call Trace: show_stack+0x68/0x80 warn_slowpath_common+0x78/0xb0 warn_slowpath_fmt+0x38/0x48 start_kernel+0x250/0x410 Suggested fix: Do what we already do in the SMP version of on_each_cpu(), and use local_irq_save/local_irq_restore. Because we need a flags variable, make it a static inline to avoid name space issues. [ Change from v1: Convert on_each_cpu to a static inline function, add #include <linux/irqflags.h> to avoid build breakage on some files. on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as on_each_cpu(), but they are not causing !SMP bugs for me, so I will defer changing them to a less urgent patch. ] Signed-off-by: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-15	powerpc: Fix missing/delayed calls to irq_work	Benjamin Herrenschmidt	1	-1/+1
	When replaying interrupts (as a result of the interrupt occurring while soft-disabled), in the case of the decrementer, we are exclusively testing for a pending timer target. However we also use decrementer interrupts to trigger the new "irq_work", which in this case would be missed. This change the logic to force a replay in both cases of a timer boundary reached and a decrementer interrupt having actually occurred while disabled. The former test is still useful to catch cases where a CPU having been hard-disabled for a long time completely misses the interrupt due to a decrementer rollover. CC: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-15	powerpc: Fix emulation of illegal instructions on PowerNV platform	Paul Mackerras	2	-1/+11
	Normally, the kernel emulates a few instructions that are unimplemented on some processors (e.g. the old dcba instruction), or privileged (e.g. mfpvr). The emulation of unimplemented instructions is currently not working on the PowerNV platform. The reason is that on these machines, unimplemented and illegal instructions cause a hypervisor emulation assist interrupt, rather than a program interrupt as on older CPUs. Our vector for the emulation assist interrupt just calls program_check_exception() directly, without setting the bit in SRR1 that indicates an illegal instruction interrupt. This fixes it by making the emulation assist interrupt set that bit before calling program_check_interrupt(). With this, old programs that use no-longer implemented instructions such as dcba now work again. CC: <stable@vger.kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15	powerpc: Fix stack overflow crash in resume_kernel when ftracing	Michael Ellerman	2	-3/+3
	It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] CC: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15	snd_pcm_link(): fix a leak...	Al Viro	1	-2/+2
	in case when snd_pcm_stream_linked(substream) is true, we end up leaking group. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15	use can_lookup() instead of direct checks of ->i_op->lookup	Al Viro	1	-2/+2
	a couple of places got missed back when Linus has introduced that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15	move exit_task_namespaces() outside of exit_notify()	Oleg Nesterov	1	-1/+1
	exit_notify() does exit_task_namespaces() after forget_original_parent(). This was needed to ensure that ->nsproxy can't be cleared prematurely, an exiting child we are going to reparent can do do_notify_parent() and use the parent's (ours) pid_ns. However, after 32084504 "pidns: use task_active_pid_ns in do_notify_parent" ->nsproxy != NULL is no longer needed, we rely on task_active_pid_ns(). Move exit_task_namespaces() from exit_notify() to do_exit(), after exit_fs() and before exit_task_work(). This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() does fput() which needs task_work_add(). Note: this particular problem can be fixed if we change fput(), and that change makes sense anyway. But there is another reason to move the callsite. The original reason for exit_task_namespaces() from the middle of exit_notify() was subtle and it has already gone away, now this looks confusing. And this allows us do simplify exit_notify(), we can avoid unlock/lock(tasklist) and we can use ->exit_state instead of PF_EXITING in forget_original_parent(). Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>