summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sysctl: allow for strict write position handlingKees Cook2014-06-072-2/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When writing to a sysctl string, each write, regardless of VFS position, begins writing the string from the start. This means the contents of the last write to the sysctl controls the string contents instead of the first: open("/proc/sys/kernel/modprobe", O_WRONLY) = 1 write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096 write(1, "/bin/true", 9) = 9 close(1) = 0 $ cat /proc/sys/kernel/modprobe /bin/true Expected behaviour would be to have the sysctl be "AAAA..." capped at maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the contents of the second write. Similarly, multiple short writes would not append to the sysctl. The old behavior is unlike regular POSIX files enough that doing audits of software that interact with sysctls can end up in unexpected or dangerous situations. For example, "as long as the input starts with a trusted path" turns out to be an insufficient filter, as what must also happen is for the input to be entirely contained in a single write syscall -- not a common consideration, especially for high level tools. This provides kernel.sysctl_writes_strict as a way to make this behavior act in a less surprising manner for strings, and disallows non-zero file position when writing numeric sysctls (similar to what is already done when reading from non-zero file positions). For now, the default (0) is to warn about non-zero file position use, but retain the legacy behavior. Setting this to -1 disables the warning, and setting this to 1 enables the file position respecting behavior. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: move misplaced hunk, per Randy] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: refactor sysctl string writing logicKees Cook2014-06-071-7/+4
| | | | | | | | | | | | | | | | Consolidate buffer length checking with new-line/end-of-line checking. Additionally, instead of reading user memory twice, just do the assignment during the loop. This change doesn't affect the potential races here. It was already possible to read a sysctl that was in the middle of a write. In both cases, the string will always be NULL terminated. The pre-existing race remains a problem to be solved. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: clean up char buffer argumentsKees Cook2014-06-071-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When writing to a sysctl string, each write, regardless of VFS position, began writing the string from the start. This meant the contents of the last write to the sysctl controlled the string contents instead of the first. This misbehavior was featured in an exploit against Chrome OS. While it's not in itself a vulnerability, it's a weirdness that isn't on the mind of most auditors: "This filter looks correct, the first line written would not be meaningful to sysctl" doesn't apply here, since the size of the write and the contents of the final write are what matter when writing to sysctls. This adds the sysctl kernel.sysctl_writes_strict to control the write behavior. The default (0) reports when VFS position is non-0 on a write, but retains legacy behavior, -1 disables the warning, and 1 enables the position-respecting behavior. The long-term plan here is to wait for userspace to be fixed in response to the new warning and to then switch the default kernel behavior to the new position-respecting behavior. This patch (of 4): The char buffer arguments are needlessly cast in weird places. Clean it up so things are easier to read. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rapidio/tsi721: use pci_enable_msix_exact() instead of pci_enable_msix()Alexander Gordeev2014-06-071-8/+3
| | | | | | | | | | | | | | | | As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. The patch has no runtime effect. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Matt Porter <mporter@kernel.crashing.org> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: reorder the fieldsLai Jiangshan2014-06-071-5/+8
| | | | | | | | | | | | | | | idr_layer->layer is always accessed in read path, move it in the front. idr_layer->bitmap is moved on the bottom. And rcu_head shares with bitmap due to they do not be accessed at the same time. idr->id_free/id_free_cnt/lock are free list fields, and moved to the bottom. They will be removed in near future. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: reduce the unneeded check in free_layer()Lai Jiangshan2014-06-071-1/+1
| | | | | | | | | If "idr->hint == p" is true, it also implies "idr->hint" is true(not NULL). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: don't need to shink the free list when idr_remove()Lai Jiangshan2014-06-071-16/+0
| | | | | | | | | | | After idr subsystem is changed to RCU-awared, the free layer will not go to the free list. The free list will not be filled up when idr_remove(). So we don't need to shink it too. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: fix idr_replace()'s returned error codeLai Jiangshan2014-06-071-2/+2
| | | | | | | | | | | | | | When the smaller id is not found, idr_replace() returns -ENOENT. But when the id is bigger enough, idr_replace() returns -EINVAL, actually there is no difference between these two kinds of ids. These are all unallocated id, the return values of the idr_replace() for these ids should be the same: -ENOENT. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: fix NULL pointer dereference when ida_remove(unallocated_id)Lai Jiangshan2014-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the ida has at least one existing id, and when an unallocated ID which meets a certain condition is passed to the ida_remove(), the system will crash because it hits NULL pointer dereference. The condition is that the unallocated ID shares the same lowest idr layer with the existing ID, but the idr slot would be different if the unallocated ID were to be allocated. In this case the matching idr slot for the unallocated_id is NULL, causing @bitmap to be NULL which the function dereferences without checking crashing the kernel. See the test code: static void test3(void) { int id; DEFINE_IDA(test_ida); printk(KERN_INFO "Start test3\n"); if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return; if (ida_get_new(&test_ida, &id) < 0) return; ida_remove(&test_ida, 4000); /* bug: null deference here */ printk(KERN_INFO "End of test3\n"); } It happens only when the caller tries to free an unallocated ID which is the caller's fault. It is not a bug. But it is better to add the proper check and complain rather than crashing the kernel. [tj@kernel.org: updated patch description] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: fix unexpected ID-removal when idr_remove(unallocated_id)Lai Jiangshan2014-06-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If unallocated_id = (ANY * idr_max(idp->layers) + existing_id) is passed to idr_remove(). The existing_id will be removed unexpectedly. The following test shows this unexpected id-removal: static void test4(void) { int id; DEFINE_IDR(test_idr); printk(KERN_INFO "Start test4\n"); id = idr_alloc(&test_idr, (void *)1, 42, 43, GFP_KERNEL); BUG_ON(id != 42); idr_remove(&test_idr, 42 + IDR_SIZE); TEST_BUG_ON(idr_find(&test_idr, 42) != (void *)1); idr_destroy(&test_idr); printk(KERN_INFO "End of test4\n"); } ida_remove() shares the similar problem. It happens only when the caller tries to free an unallocated ID which is the caller's fault. It is not a bug. But it is better to add the proper check and complain rather than removing an existing_id silently. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* idr: fix overflow bug during maximum ID calculation at maximum heightLai Jiangshan2014-06-071-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | idr_replace() open-codes the logic to calculate the maximum valid ID given the height of the idr tree; unfortunately, the open-coded logic doesn't account for the fact that the top layer may have unused slots and over-shifts the limit to zero when the tree is at its maximum height. The following test code shows it fails to replace the value for id=((1<<27)+42): static void test5(void) { int id; DEFINE_IDR(test_idr); #define TEST5_START ((1<<27)+42) /* use the highest layer */ printk(KERN_INFO "Start test5\n"); id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL); BUG_ON(id != TEST5_START); TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1); idr_destroy(&test_idr); printk(KERN_INFO "End of test5\n"); } Fix the bug by using idr_max() which correctly takes into account the maximum allowed shift. sub_alloc() shares the same problem and may incorrectly fail with -EAGAIN; however, this bug doesn't affect correct operation because idr_get_empty_slot(), which already uses idr_max(), retries with the increased @id in such cases. [tj@kernel.org: Updated patch description.] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/kexec.c: convert printk to pr_foo()Fabian Frederick2014-06-071-37/+32
| | | | | | | | | | + some pr_warning -> pr_warn and checkpatch warning fixes Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after ↵Masami Hiramatsu2014-06-072-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | panic_notifers Add a "crash_kexec_post_notifiers" boot option to run kdump after running panic_notifiers and dump kmsg. This can help rare situations where kdump fails because of unstable crashed kernel or hardware failure (memory corruption on critical data/code), or the 2nd kernel is already broken by the 1st kernel (it's a broken behavior, but who can guarantee that the "crashed" kernel works correctly?). Usage: add "crash_kexec_post_notifiers" to kernel boot option. Note that this actually increases risks of the failure of kdump. This option should be set only if you worry about the rare case of kdump failure rather than increasing the chance of success. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com> Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* smp: print more useful debug info upon receiving IPI on an offline CPUSrivatsa S. Bhat2014-06-071-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a longstanding problem related to CPU hotplug which causes IPIs to be delivered to offline CPUs, and the smp-call-function IPI handler code prints out a warning whenever this is detected. Every once in a while this (usually harmless) warning gets reported on LKML, but so far it has not been completely fixed. Usually the solution involves finding out the IPI sender and fixing it by adding appropriate synchronization with CPU hotplug. However, while going through one such internal bug reports, I found that there is a significant bug in the receiver side itself (more specifically, in stop-machine) that can lead to this problem even when the sender code is perfectly fine. This patchset fixes that synchronization problem in the CPU hotplug stop-machine code. Patch 1 adds some additional debug code to the smp-call-function framework, to help debug such issues easily. Patch 2 modifies the stop-machine code to ensure that any IPIs that were sent while the target CPU was online, would be noticed and handled by that CPU without fail before it goes offline. Thus, this avoids scenarios where IPIs are received on offline CPUs (as long as the sender uses proper hotplug synchronization). In fact, I debugged the problem by using Patch 1, and found that the payload of the IPI was always the block layer's trigger_softirq() function. But I was not able to find anything wrong with the block layer code. That's when I started looking at the stop-machine code and realized that there is a race-window which makes the IPI _receiver_ the culprit, not the sender. Patch 2 fixes that race and hence this should put an end to most of the hard-to-debug IPI-to-offline-CPU issues. This patch (of 2): Today the smp-call-function code just prints a warning if we get an IPI on an offline CPU. This info is sufficient to let us know that something went wrong, but often it is very hard to debug exactly who sent the IPI and why, from this info alone. In most cases, we get the warning about the IPI to an offline CPU, immediately after the CPU going offline comes out of the stop-machine phase and reenables interrupts. Since all online CPUs participate in stop-machine, the information regarding the sender of the IPI is already lost by the time we exit the stop-machine loop. So even if we dump the stack on each CPU at this point, we won't find anything useful since all of them will show the stack-trace of the stopper thread. So we need a better way to figure out who sent the IPI and why. To achieve this, when we detect an IPI targeted to an offline CPU, loop through the call-single-data linked list and print out the payload (i.e., the name of the function which was supposed to be executed by the target CPU). This would give us an insight as to who might have sent the IPI and help us debug this further. [akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/vmcore.c: remove NULL assignment to staticFabian Frederick2014-06-071-1/+1
| | | | | | | | | Static values are automatically initialized to NULL. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/task_mmu.c: replace seq_printf by seq_putsFabian Frederick2014-06-071-4/+4
| | | | | | | Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: change wait_for_helper() to use kernel_sigaction()Oleg Nesterov2014-06-071-4/+1
| | | | | | | | | | | | | | | | | | | Now that we have kernel_sigaction() we can change wait_for_helper() to use it and cleans up the code a bit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: introduce kernel_sigaction()Oleg Nesterov2014-06-072-26/+28
| | | | | | | | | | | | | | | | | | | | | | Now that allow_signal() is really trivial we can unify it with disallow_signal(). Add the new helper, kernel_sigaction(), and reimplement allow_signal/disallow_signal as a trivial wrappers. This saves one EXPORT_SYMBOL() and the new helper can have more users. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: disallow_signal() should flush the potentially pending signalOleg Nesterov2014-06-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | disallow_signal() simply sets SIG_IGN, this is not enough and recalc_sigpending() is simply pointless because in can never change the state of TIF_SIGPENDING. If we ignore a signal, we also need to do flush_sigqueue_mask() for the case when this signal is pending, this way recalc_sigpending() can actually clear TIF_SIGPENDING and we do not "leak" the allocated siginfo's. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: kill the obsolete sigdelset() and recalc_sigpending() in allow_signal()Oleg Nesterov2014-06-071-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | allow_signal() does sigdelset(current->blocked) due to historic reason, previously it could be called by a daemonize()'ed kthread, and daemonize() played with current->blocked. Now that daemonize() has gone away we can remove sigdelset() and recalc_sigpending(). If a user really wants to unblock a signal, it must use sigprocmask() or set_current_block() explicitely. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: jffs2: fix the wrong usage of disallow_signal()Oleg Nesterov2014-06-071-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jffs2_garbage_collect_thread() does disallow_signal(SIGHUP) around jffs2_garbage_collect_pass() and the comment says "We don't want SIGHUP to interrupt us". But disallow_signal() can't ensure that jffs2_garbage_collect_pass() won't be interrupted by SIGHUP, the problem is that SIGHUP can be already pending when disallow_signal() is called, and in this case any interruptible sleep won't block. Note: this is in fact because disallow_signal() is buggy and should be fixed, see the next changes. But there is another reason why disallow_signal() is wrong: SIG_IGN set by disallow_signal() silently discards any SIGHUP which can be sent before the next allow_signal(SIGHUP). Change this code to use sigprocmask(SIG_UNBLOCK/SIG_BLOCK, SIGHUP). This even matches the old (and wrong) semantics allow/disallow had when this logic was written. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: mv {dis,}allow_signal() from sched.h/exit.c to signal.[ch]Oleg Nesterov2014-06-074-42/+31
| | | | | | | | | | | | | | | | | | | | | | | | Move the declaration/definition of allow_signal/disallow_signal to signal.h/signal.c. The new place is more logical and allows to use the static helpers in signal.c (see the next changes). While at it, make them return void and remove the valid_signal() check. Nobody checks the returned value, and in-kernel users must not pass the wrong signal number. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: cleanup the usage of t/current in do_sigaction()Oleg Nesterov2014-06-071-8/+7
| | | | | | | | | | | | | | | | | | | | | | | The usage of "task_struct *t" and "current" in do_sigaction() looks really annoying and chaotic. Initially "t" is used as a cached value of current but not consistently, then it is reused as a loop variable and we have to use "current" again. Clean up this mess and also convert the code to use for_each_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: rename rm_from_queue_full() to flush_sigqueue_mask()Oleg Nesterov2014-06-071-11/+8
| | | | | | | | | | | | | | | | | | | | | | | "rm_from_queue_full" looks ugly and misleading, especially now that rm_from_queue() has gone away. Rename it to flush_sigqueue_mask(), this matches flush_sigqueue() we already have. Also remove the obsolete comment which explains the difference with rm_from_queue() we already killed. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: kill rm_from_queue(), change prepare_signal() to use for_each_thread()Oleg Nesterov2014-06-071-33/+10
| | | | | | | | | | | | | | | | | | | | | | rm_from_queue() doesn't make sense. The only caller, prepare_signal(), can use rm_from_queue_full() with the same effect. While at it, change prepare_signal() to use for_each_thread() instead of do/while_each_thread. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: s/siginitset/sigemptyset/ in do_sigtimedwait()Oleg Nesterov2014-06-071-1/+1
| | | | | | | | | | | | | | | | | | | Cosmetic, but siginitset(0) looks a bit strange, sigemptyset() is what do_sigtimedwait() needs. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* signals: kill sigfindinword()Oleg Nesterov2014-06-073-20/+0
| | | | | | | | | | | | | | | | | | | It has no users and it doesn't look useful. I do not know why/when it was introduced, I can't even find any user in the git history. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb()Oleg Nesterov2014-06-071-0/+1
| | | | | | | | | | __wake_up_bit() checks waitqueue_active() and thus the caller needs mb() as wake_up_bit() documents, fix task_clear_jobctl_trapping(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: fix fork event messages across pid namespacesMatthew Dempsky2014-06-072-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | When tracing a process in another pid namespace, it's important for fork event messages to contain the child's pid as seen from the tracer's pid namespace, not the parent's. Otherwise, the tracer won't be able to correlate the fork event with later SIGTRAP signals it receives from the child. We still risk a race condition if a ptracer from a different pid namespace attaches after we compute the pid_t value. However, sending a bogus fork event message in this unlikely scenario is still a vast improvement over the status quo where we always send bogus fork event messages to debuggers in a different pid namespace than the forking process. Signed-off-by: Matthew Dempsky <mdempsky@chromium.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Julien Tinnes <jln@chromium.org> Cc: Roland McGrath <mcgrathr@chromium.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/memory-barriers.txt: fix important typo re memory barriersAlexey Dobriyan2014-06-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Examples introducing neccesity of RMB+WMP pair reads as A=3 READ B www rrrrrr B=4 READ A Note the opposite order of reads vs writes. But the first example without barriers reads as A=3 READ A B=4 READ B There are 4 outcomes in the first example. But if someone new to the concept tries to insert barriers like this: A=3 READ A www rrrrrr B=4 READ B he will still get all 4 possible outcomes, because "READ A" is first. All this can be utterly confusing because barrier pair seems to be superfluous. In short, fixup first example to match latter examples with barriers. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/filesystems/seq_file.txt: create_proc_entry deprecatedFabian Frederick2014-06-071-0/+9
| | | | | | | | | | | | | Linked article in seq_file.txt still uses create_proc_entry which was removed in commit 80e928f7ebb9 ("proc: Kill create_proc_entry()"). This patch adds information for kernel 3.10 and above Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/SubmittingPatches: describe the Fixes: tagJacob Keller2014-06-071-1/+21
| | | | | | | | | | | | | Update the SubmittingPatches process to include howto about the new 'Fixes:' tag to be used when a patch fixes an issue in a previous commit (found by git-bisect for example). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/fat/inode.c: clean up string initializations (char[] instead of char *)Manuel Schölling2014-06-071-1/+1
| | | | | | | | | | | | Initializations like 'char *foo = "bar"' will create two variables: a static string and a pointer (foo) to that static string. Instead 'char foo[] = "bar"' will declare a single variable and will end up in shorter assembly (according to Jeff Garzik on the KernelJanitor's TODO list). Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/fat/: add support for DOS 1.x formatted volumesConrad Meyer2014-06-073-77/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add structure for parsed BPB information, struct fat_bios_param_block, and move all of the deserialization and validation logic from fat_fill_super() into fat_read_bpb(). Add a 'dos1xfloppy' mount option to infer DOS 2.x BIOS Parameter Block defaults from block device geometry for ancient floppies and floppy images, as a fall-back from the default BPB parsing logic. When fat_read_bpb() finds an invalid FAT filesystem and dos1xfloppy is set, fall back to fat_read_static_bpb(). fat_read_static_bpb() validates that the entire BPB is zero, and that the floppy has a DOS-style 8086 code bootstrapping header. Then it fills in default BPB values from media size and a table.[0] Media size is assumed to be static for archaic FAT volumes. See also: [1]. Fixes kernel.org bug #42617. [0]: https://en.wikipedia.org/wiki/File_Allocation_Table#Exceptions [1]: http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html [hirofumi@mail.parknet.co.jp: fix missed error code] Signed-off-by: Conrad Meyer <cse.cem@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hpfs: increase pr_warn levelFabian Frederick2014-06-078-47/+47
| | | | | | | | | | This patch applies a suggestion by Mikulas Patocka asking to increase all pr_warn without commented ones to pr_err Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hpfs: use __func__ for loggingFabian Frederick2014-06-072-12/+12
| | | | | | | | | Normalize function display fx() using __func__ Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hpfs: use pr_fmt for loggingFabian Frederick2014-06-0710-51/+56
| | | | | | | | | Also remove redundant level names (warning:...) Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hpfs: convert printk to pr_foo()Fabian Frederick2014-06-0710-67/+89
| | | | | | | | | | | | No level printk in hptfs_error converted to pr_err (others to pr_warn or pr_info) This patch also fixes if/then/else checkpatch warnings Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/ufs/balloc.c: remove err parameter in ufs_add_fragmentsFabian Frederick2014-06-071-3/+3
| | | | | | | | | | err is used in ufs_new_fragments (ufs_add_fragments only callsite) not in ufs_add_fragments. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: fix compiler warning on PowerPCChristian Kujau2014-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a99b7069aab8 ("hfsplus: Fix undefined __divdi3 in hfsplus_init_header_node()") introduced do_div() to xattr.c and the warning below too. As Geert remarked: "tmp" is "loff_t" which is "__kernel_loff_t", which is "long long", i.e. signed, while include/asm-generic/div64.h compares its type with "uint64_t". As inode sizes are positive, it should be safe to change the type of "tmp" to "u64". In file included from arch/powerpc/include/asm/div64.h:1:0, from include/linux/kernel.h:124, from include/asm-generic/bug.h:13, from arch/powerpc/include/asm/bug.h:127, from include/linux/bug.h:4, from include/linux/thread_info.h:11, from include/asm-generic/preempt.h:4, from arch/powerpc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from include/linux/wait.h:8, from include/linux/fs.h:6, from fs/hfsplus/hfsplus_fs.h:19, from fs/hfsplus/xattr.c:9: fs/hfsplus/xattr.c: In function 'hfsplus_init_header_node': include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ fs/hfsplus/xattr.c:86:2: note: in expansion of macro 'do_div' do_div(tmp, node_size); ^ Signed-off-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hfsplus: fix pr_foo() and hfs_dbg formatsFabian Frederick2014-06-073-6/+6
| | | | | | | Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-By: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: coding style fix for declarations in hfsplus_fs.hSergei Antonov2014-06-071-96/+107
| | | | | | | | | | | | | | | | Some function declarations in hfsplus_fs.h were with argument names, some without, and some were mixed. This patch adds argument names everywhere, sorts function in order they go in .c files, and moves hfs_part_find() to a proper section. Auto-formatting and sorting was done with: cfunctions *.c | indent -linux | sed "s| \* | \*|" Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hfsplus/wrapper.c: replace shift loop by ilog2Fabian Frederick2014-06-071-3/+1
| | | | | | | | | | Replace while blocksize;shift by ilog2 Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: fix "unused node is not erased" errorSergei Antonov2014-06-076-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | Zero newly allocated extents in the catalog tree if volume attributes tell us to. Not doing so we risk getting the "unused node is not erased" error. See kHFSUnusedNodeFix flag in Apple's source code for reference. There was a previous commit clearing the node when it is freed: commit 899bed05e9f6 ("hfsplus: fix issue with unzeroed unused b-tree nodes"). But it did not handle newly allocated extents (this patch fixes it). And it zeroed nodes in all trees unconditionally which is an overkill. This patch adds a condition and also switches to 'tree->node_size' as a simpler method of getting the length to zero. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Kyle Laracey <kalaracey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hfsplus/wrapper.c: replace min/casting by min_tFabian Frederick2014-06-071-3/+3
| | | | | | | | | Also add * before function comments (it was not detected by kernel-doc) Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hfsplus/options.c: replace seq_printf by seq_putsFabian Frederick2014-06-071-5/+4
| | | | | | | | | Replace seq_printf where possible Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/hfsplus/bnode.c: replace min/casting by min_tFabian Frederick2014-06-071-17/+15
| | | | | | | | | Also fixes some pr_ formats Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: emit proper file type from readdirSergei Antonov2014-06-071-1/+19
| | | | | | | | | | | | | | | | hfsplus_readdir() incorrectly returned DT_REG for symbolic links and special files. Return DT_REG, DT_LNK, DT_FIFO, DT_CHR, DT_BLK, DT_SOCK, or DT_UNKNOWN according to mode field in catalog record. Programs relying on information from readdir will now work correctly with HFS+. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: remove unused routine hfsplus_attr_build_key_uniHin-Tak Leung2014-06-072-28/+0
| | | | | | | | | | | | | | | | The directory/file catalog b-tree equivalent, hfsplus_build_key_uni(), is used by hfsplus_find_cat() for internal referencing between catalog records. There is no corresponding usage for attributes - attribute records do not refer to one another. Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Sougata Santra <sougata@tuxera.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributesHin-Tak Leung2014-06-075-66/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HFSPLUS_ATTR_MAX_STRLEN (=127) is the limit of attribute names for the number of unicode character (UTF-16BE) storable in the HFS+ file system. Almost all the current usage of it is wrong, in relation to NLS to on-disk conversion. Except for one use calling hfsplus_asc2uni (which should stay the same) and its uses in calling hfsplus_uni2asc (which was corrected in the earlier patch in this series concerning usage of hfsplus_uni2asc), all the other uses are of the forms: - char buffer[size] - bound check: "if (namespace_adjusted_input_length > size) return failure;" Conversion between on-disk unicode representation and NLS char strings (in whichever direction) always needs to accommodate the worst-case NLS conversion, so all char buffers of that size need to have a NLS_MAX_CHARSET_SIZE x . The bound checks are all wrong, since they compare nls_length derived from strlen() to a unicode length limit. It turns out that all the bound-checks do is to protect hfsplus_asc2uni(), which can fail if the input is too large. There is only one usage of it as far as attributes are concerned, in hfsplus_attr_build_key(). It is in turn used by hfsplus_find_attr(), hfsplus_create_attr(), hfsplus_delete_attr(). Thus making sure that errors from hfsplus_asc2uni() is caught in hfsplus_attr_build_key() and propagated is sufficient to replace all the bound checks. Unpropagated errors from hfsplus_asc2uni() in the file catalog code was addressed recently in an independent patch "hfsplus: fix longname handling" by Sougata Santra. Before this patch, trying to set a 55 CJK character (in a UTF-8 locale, > 127/3=42) attribute plus user prefix fails with: $ setfattr -n user.`cat testing-string` -v `cat testing-string` \ testing-string setfattr: testing-string: Operation not supported and retrieving a stored long attributes is particular ugly(!): find /mnt/* -type f -exec getfattr -d {} \; getfattr: /mnt/testing-string: Input/output error with console log: [268008.389781] hfsplus: unicode conversion failed After the patch, both of the above works. FYI, the test attribute string is prepared with: echo -e -n \ "\xe9\x80\x99\xe6\x98\xaf\xe4\xb8\x80\xe5\x80\x8b\xe9\x9d\x9e\xe5" \ "\xb8\xb8\xe6\xbc\xab\xe9\x95\xb7\xe8\x80\x8c\xe6\xa5\xb5\xe5\x85" \ "\xb6\xe4\xb9\x8f\xe5\x91\xb3\xe5\x92\x8c\xe7\x9b\xb8\xe7\x95\xb6" \ "\xe7\x84\xa1\xe8\xb6\xa3\xe3\x80\x81\xe4\xbb\xa5\xe5\x8f\x8a\xe7" \ "\x84\xa1\xe7\x94\xa8\xe7\x9a\x84\xe3\x80\x81\xe5\x86\x8d\xe5\x8a" \ "\xa0\xe4\xb8\x8a\xe6\xaf\xab\xe7\x84\xa1\xe6\x84\x8f\xe7\xbe\xa9" \ "\xe7\x9a\x84\xe6\x93\xb4\xe5\xb1\x95\xe5\xb1\xac\xe6\x80\xa7\xef" \ "\xbc\x8c\xe8\x80\x8c\xe5\x85\xb6\xe5\x94\xaf\xe4\xb8\x80\xe5\x89" \ "\xb5\xe5\xbb\xba\xe7\x9b\xae\xe7\x9a\x84\xe5\x83\x85\xe6\x98\xaf" \ "\xe7\x82\xba\xe4\xba\x86\xe6\xb8\xac\xe8\xa9\xa6\xe4\xbd\x9c\xe7" \ "\x94\xa8\xe3\x80\x82" | tr -d ' ' (= "pointlessly long attribute for testing", elaborate Chinese in UTF-8 enoding). However, it is not possible to set double the size (110 + 5 is still under 127) in a UTF-8 locale: $setfattr -n user.`cat testing-string testing-string` -v \ `cat testing-string testing-string` testing-string setfattr: testing-string: Numerical result out of range 110 CJK char in UTF-8 is 330 bytes - the generic get/set attribute system call code in linux/fs/xattr.c imposes a 255 byte limit. One can use a combination of iconv to encode content, changing terminal locale for viewing, and an nls=cp932/cp936/cp949/cp950 mount option to fully use 127-unicode attribute in a double-byte locale. Also, as an additional information, it is possible to (mis-)use unicode half-width/full-width forms (U+FFxx) to write attributes which looks like english but not actually ascii. Thanks Anton Altaparmakov for reviewing the earlier ideas behind this change. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix build] Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Sougata Santra <sougata@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>