linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs, seqfile: always allow oom killer	Greg Thelen	2015-11-07	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or smaller allocations; but larger allocations can use the oom killer via vmalloc(). Thus reads of small files can return ENOMEM, but larger files use the oom killer to avoid ENOMEM. The effect of this bug is that reads from /proc and other virtual filesystems can return ENOMEM instead of the preferred behavior - oom killing something (possibly the calling process). I don't know of anyone except Google who has noticed the issue. I suspect the fix is more needed in smaller systems where there isn't any reclaimable memory. But these seem like the kinds of systems which probably don't use the oom killer for production situations. Memory overcommit requires use of the oom killer to select a victim regardless of file size. Enable oom killer for small seq_buf_alloc() allocations. Fixes: 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	seq_file: reuse string_escape_str()	Andy Shevchenko	2015-11-07	1	-19/+6
\| \| \| \| \| \| \| \| \| \| \|	strint_escape_str() escapes input string by given criteria. In case of seq_escape() the criteria is to convert some characters to their octal representation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs/seq_file: use seq_* helpers in seq_hex_dump()	Andy Shevchenko	2015-11-07	1	-8/+7
\| \| \| \| \| \| \| \| \|	This improves code readability. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	coredump: change zap_threads() and zap_process() to use for_each_thread()	Oleg Nesterov	2015-11-07	1	-14/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change zap_threads() paths to use for_each_thread() rather than while_each_thread(). While at it, change zap_threads() to avoid the nested if's to make the code more readable and lessen the indentation. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP	Oleg Nesterov	2015-11-07	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	task_will_free_mem() is wrong in many ways, and in particular the SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the coredumping without SIGNAL_GROUP_COREDUMP bit set. change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if other CLONE_VM processes can't react to SIGKILL. Fortunately, at least oom-kill case if fine; it kills all tasks sharing the same mm, so it should also kill the process which actually dumps the core. The change in prepare_signal() is not strictly necessary, it just ensures that the patch does not bring another subtle behavioural change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)	Oleg Nesterov	2015-11-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason, SIGCONT will wake a stopped task up even if it is ignored. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()	Oleg Nesterov	2015-11-07	2	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	jffs2_garbage_collect_thread() can race with SIGCONT and sleep in TASK_STOPPED state after it was already sent. Add the new helper, kernel_signal_stop(), which does this correctly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signal: turn dequeue_signal_lock() into kernel_dequeue_signal()	Oleg Nesterov	2015-11-07	4	-21/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: kill block_all_signals() and unblock_all_signals()	Oleg Nesterov	2015-11-07	4	-98/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is hardly possible to enumerate all problems with block_all_signals() and unblock_all_signals(). Just for example, 1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is multithreaded. Another thread can dequeue the signal and force the group stop. 2. Even is the caller is single-threaded, it will "stop" anyway. It will not sleep, but it will spin in kernel space until SIGCONT or SIGKILL. And a lot more. In short, this interface doesn't work at all, at least the last 10+ years. Daniel said: Yeah the only times I played around with the DRM_LOCK stuff was when old drivers accidentally deadlocked - my impression is that the entire DRM_LOCK thing was never really tested properly ;-) Hence I'm all for purging where this leaks out of the drm subsystem. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: fix gcc uninitialized-variable warnings in powerpc build	Ryusuke Konishi	2015-11-07	3	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some false positive warnings are reported for powerpc build. The following warnings are reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/ CC fs/nilfs2/super.o fs/nilfs2/super.c: In function 'nilfs_resize_fs': fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here CC fs/nilfs2/recovery.o fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs': fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here fs/nilfs2/recovery.c: In function 'nilfs_search_super_root': fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] Another similar warning is reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/ CC fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert': include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here This cleans out these warnings by forcing the variables to be initialized. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: fix gcc unused-but-set-variable warnings	Ryusuke Konishi	2015-11-07	5	-13/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the following build warnings: $ make W=1 [...] CC [M] fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_split': fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable] __u64 newptr; ^ fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable] __u64 newkey; ^ CC [M] fs/nilfs2/dat.o fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end': fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable] __u64 start; ^ CC [M] fs/nilfs2/segment.o fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush': fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] int err; ^ CC [M] fs/nilfs2/sufile.o fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc': fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable] unsigned long nsegments, ncleansegs, nsus, cnt; ^ CC [M] fs/nilfs2/alloc.o fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry': fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable] unsigned long n, entries_per_group, groups_per_desc_block; ^ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	MAINTAINERS: nilfs2: add header file for tracing	Ryusuke Konishi	2015-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This adds header file "include/trace/events/nilfs2.h" to maintainer-ship of nilfs2 so that updates to the nilfs2 header file go to the mailing list of nilfs2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: add tracepoints for analyzing reading and writing metadata files	Hitoshi Mitake	2015-11-07	2	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds tracepoints for analyzing requests of reading and writing metadata files. The tracepoints cover every in-place mdt files (cpfile, sufile, and datfile). Example of tracing mdt_insert_new_block(): cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155 cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5 cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: add tracepoints for analyzing sufile manipulation	Hitoshi Mitake	2015-11-07	2	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds tracepoints which would be useful for analyzing segment usage from a perspective of high level sufile manipulation (check, alloc, free). sufile is an important in-place updated metadata file, so analyzing the behavior would be useful for performance turning. example of usage (a case of allocation): $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end. segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2 segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benixon Dhas <benixon.dhas@wdc.com> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: add a tracepoint for transaction events	Hitoshi Mitake	2015-11-07	2	-1/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a tracepoint for transaction events of nilfs. With the tracepoint, these events can be tracked: begin, abort, commit, trylock, lock, and unlock. Basically, these events have corresponding functions e.g. begin event corresponds nilfs_transaction_begin(). The unlock event is an exception. It corresponds to the iteration in nilfs_transaction_lock(). Only one tracepoint is introcued: nilfs2_transaction_transition. The above events are distinguished with newly introduced enum. With this tracepoint, we can analyse a critical section of segment constructoin. Sample output by tpoint of perf-tools: cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK This patch also does trivial cleaning of comma usage in collection stage transition event for consistent coding style. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: add a tracepoint for tracking stage transition of segment construction	Hitoshi Mitake	2015-11-07	3	-21/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a tracepoint for tracking stage transition of block collection in segment construction. With the tracepoint, we can analysis the behavior of segment construction in depth. It would be useful for bottleneck detection and debugging, etc. The tracepoint is created with the standard trace API of linux (like ext3, ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of course, more detailed analysis will be possible if we can create nilfs specific analysis tools. Below is an example of event dump with Brendan Gregg's perf-tools (https://github.com/brendangregg/perf-tools). Time consumption between each stage can be obtained. $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE For capturing transition correctly, this patch adds wrappers for the member scnt of nilfs_cstage. With this change, every transition of the stage can produce trace event in a correct manner. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: free unused dat file blocks during garbage collection	Ryusuke Konishi	2015-11-07	2	-17/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a nilfs2 volume ages, the amount of available disk space decreases little by little due to bloat of DAT (disk address translation) metadata file. Even if we delete all files in a file system and free their block addresses from the DAT file through a garbage collection, empty DAT blocks are not freed. This fixes the issue by extending the deallocator of block addresses so that empty data blocks and empty bitmap blocks of DAT are deleted. The following comparison shows the effect of this patch. Each shows disk amount information of a nilfs2 volume that we cleaned out by deleting all files and running gc after having filled 90% of its capacity. Before: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 3022844 472072192 1% /test After: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 16380 475078656 1% /test Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: add helper functions to delete blocks from dat file	Ryusuke Konishi	2015-11-07	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds delete functions for data blocks of metadata files using bitmap based allocator. nilfs_palloc_delete_entry_block() deletes an entry block (e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block() deletes a bitmap block, respectively. These helpers are intended to be used in the successive change on deallocator of block addresses ("nilfs2: free unused dat file blocks during garbage collection"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: get rid of nilfs_palloc_group_is_in()	Ryusuke Konishi	2015-11-07	1	-19/+9
\| \| \| \| \| \| \| \| \| \|	This unfolds nilfs_palloc_group_is_in() helper function into nilfs_palloc_freev() function to simplify a range check and an index calculation repeatedy performed in a loop of the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: refactor nilfs_palloc_find_available_slot()	Ryusuke Konishi	2015-11-07	1	-27/+21
\| \| \| \| \| \| \| \| \| \|	The current implementation of nilfs_palloc_find_available_slot() function is overkill. The underlying bit search routine is well optimized, so this uses it more simply in nilfs_palloc_find_available_slot(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: do not call nilfs_mdt_bgl_lock() needlessly	Ryusuke Konishi	2015-11-07	1	-44/+40
\| \| \| \| \| \| \| \| \| \| \|	In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper is frequently used to get a spinlock protecting a target block group. This reduces its usage and simplifies arguments of some related functions by directly passing a pointer to the spinlock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: use nilfs_warning() in allocator implementation	Ryusuke Konishi	2015-11-07	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \|	This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the bitmap based allocator implementation of nilfs2. The warning messages are modified to include the device name and the inode number in each message. This makes it clear which metadata file of which device has output warnings such as "entry number xxxx already freed". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: drop null test before destroy functions	Julia Lawall	2015-11-07	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) $kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy$(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	checkpatch: improve the unnecessary initialisers tests	Joe Perches	2015-11-07	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Global and static variables don't need to be initialized to 0. There is already a test for this but the output message doesn't mention booleans initialized to false. Improve the output message and the test by adding various forms with possible specific integer types and possible multiple zeros. Miscellanea: o Use a variable to hold the possible 0 test Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Shailendra Verma <shailendra.v@samsung.com> Tested-by: Shailendra Verma <shailendra.v@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	checkpatch: improve tests for fixes:, long lines and stack dumps in commit log	Joe Perches	2015-11-07	1	-25/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Including BUG and stack dumps in commit logs makes checkpatch produce some false positive warning messages. checkpatch has multiple types of false positives: o Commit message lines > 75 chars o Stack dump address are mistaken for git commit IDs o Link: and Fixes: lines are allowed to be > 75 chars. o Fixes: style doesn't require ("<commit_description>") parentheses and double quotes like other uses of git commit ID and description. Fix these. Miscellanea: o Move the test for checking $commit_log_possible_stack_dump above the test for a long line commit message o Add test for hex address surrounded by square or angle brackets Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/hexdump.c: truncate output in case of overflow	Andy Shevchenko	2015-11-07	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a classical off-by-one error in case when we try to place, for example, 1+1 bytes as hex in the buffer of size 6. The expected result is to get an output truncated, but in the reality we get 6 bytes filed followed by terminating NUL. Change the logic how we fill the output in case of byte dumping into limited space. This will follow the snprintf() behaviour by truncating output even on half bytes. Fixes: 114fc1afb2de (hexdump: make it return number of bytes placed in buffer) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	rbtree: clarify documentation of rbtree_postorder_for_each_entry_safe()	Cody P Schafer	2015-11-07	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed that commit a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") added a usage of rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears to try to rb_erase() elements from an rbtree while iterating over it using rbtree_postorder_for_each_entry_safe(). Doing this will cause random nodes to be missed by the iteration because rb_erase() may rebalance the tree, changing the ordering that we're trying to iterate over. The previous documentation for rbtree_postorder_for_each_entry_safe() wasn't clear that this wasn't allowed, it was taken from the docs for list_for_each_entry_safe(), where erasing isn't a problem due to list_del() not reordering. Explicitly warn developers about this potential pit-fall. Note that I haven't fixed the actual issue that (it appears) the commit referenced above introduced (not familiar enough with that code). In general (and in this case), the patterns to follow are: - switch to rb_first() + rb_erase(), don't use rbtree_postorder_for_each_entry_safe(). - keep the postorder iteration and don't rb_erase() at all. Instead just clear the fields of rb_node & cgwb_congested_tree as required by other users of those structures. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Cody P Schafer <dev@codyps.com> Cc: John de la Garza <john@jjdev.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/is_single_threaded.c: change current_is_single_threaded() to use ↵	Oleg Nesterov	2015-11-07	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	for_each_thread() Change current_is_single_threaded() to use for_each_thread() rather than deprecated while_each_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/kobject.c: use kvasprintf_const for formatting ->name	Rasmus Villemoes	2015-11-07	1	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes kobject_set_name_vargs is called with a format string conaining no %, or a format string of precisely "%s", where the single vararg happens to point to .rodata. kvasprintf_const detects these cases for us and returns a copy of that pointer instead of duplicating the string, thus saving some run-time memory. Otherwise, it falls back to kvasprintf. We just need to always deallocate ->name using kfree_const. Unfortunately, the dance we need to do to perform the '/' -> '!' sanitization makes the resulting code rather ugly. I instrumented kstrdup_const to provide some statistics on the memory saved, and for me this gave an additional ~14KB after boot (306KB was already saved; this patch bumped that to 320KB). I have KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were satisfied by an 8-byte allocation, the 14K would roughly be quadrupled when KMALLOC_SHIFT_LOW==5. Whether these numbers are sufficient to justify the ugliness I'll leave to others to decide. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/kasprintf.c: introduce kvasprintf_const	Rasmus Villemoes	2015-11-07	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds kvasprintf_const which tries to use kstrdup_const if possible: If the format string contains no % characters, or if the format string is exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to kvasprintf. Just as for kstrdup_const, the main motivation is to save memory by reusing .rodata when possible. The return value should be freed by kfree_const, just like for kstrdup_const. There is deliberately no kasprintf_const: In the vast majority of cases, the format string argument is a literal, so one can determine statically whether one could instead use kstrdup_const directly (which would also require one to change all corresponding kfree calls to kfree_const). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/llist.c: fix data race in llist_del_first	Dmitry Vyukov	2015-11-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llist_del_first reads entry->next, but it did not acquire visibility over the entry node. As the result it can get a stale value of entry->next (e.g. NULL or whatever garbage was there before the appending thread wrote correct value). And then commit that value as llist head with cmpxchg. That will corrupt llist. Note there is a control-dependency between read of head->first and read of entry->next, but it does not make the code correct. Kernel memory model unambiguously says: "A load-load control dependency requires a full read memory barrier". Use smp_load_acquire to acquire visibility over the entry node. The data race was found with KernelThreadSanitizer (KTSAN). Here is an example of KTSAN report: ThreadSanitizer: data-race in llist_del_first Read of size 1 by thread T389 (K2630, CPU0): [<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74 [< inlined >] tty_buffer_alloc drivers/tty/tty_buffer.c:181 [<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292 [<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337 [< inlined >] tty_insert_flip_string include/linux/tty_flip.h:35 [<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110 [< inlined >] process_output_block drivers/tty/n_tty.c:611 [<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401 [< inlined >] do_tty_write drivers/tty/tty_io.c:1159 [<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245 [<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489 [<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538 [< inlined >] SYSC_write fs/read_write.c:585 [<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577 [<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186 Previous write of size 8 by thread T226 (K761, CPU0): [<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16) [< inlined >] llist_add include/linux/llist.h:180 [<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221 [<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514 [<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036 [<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170 [<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209 [<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526 Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huang Ying <ying.huang@intel.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/test-string_helpers.c: add string_get_size() tests	Vitaly Kuznetsov	2015-11-07	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a couple of simple tests for string_get_size(). The last one will hang the kernel without the 'lib/string_helpers.c: fix infinite loop in string_get_size()' fix. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/halfmd4.c: use rol32 inline function in the ROUND macro	Alexander Kuleshov	2015-11-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	<linux/bitops.h> provides rol32() inline function, let's use already predefined function instead of direct expression. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension	Martin Kepplinger	2015-11-07	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: George Spelvin <linux@horizon.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Maxime Coquelin <maxime.coquelin@st.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	arch/sh/kernel/traps_64.c: use sign_extend64() for sign extension	Martin Kepplinger	2015-11-07	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: George Spelvin <linux@horizon.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Maxime Coquelin <maxime.coquelin@st.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bitops.h: add sign_extend64()	Martin Kepplinger	2015-11-07	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289 The result was the 64-bit version being "likely fine", "valuable" and "correct". The discussion fell asleep but since there are possible users, let's add it. Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: George Spelvin <linux@horizon.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Maxime Coquelin <maxime.coquelin@st.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bitops.h: improve sign_extend32()'s documentation	Martin Kepplinger	2015-11-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is often overlooked that sign_extend32(), despite its name, is safe to use for 16 and 8 bit types as well. This should help prevent sign extension being done manually some other way. Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: George Spelvin <linux@horizon.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Maxime Coquelin <maxime.coquelin@st.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	MAINTAINERS: add missing extcon directory	Chanwoo Choi	2015-11-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Add the missing extcon directory to maintain them. When using get_maintainer.pl, the result should include the correct maintainer information. Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	get_maintainer: add subsystem to reviewer output	Joe Perches	2015-11-07	1	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewer output currently does not include the subsystem that matched. Add it. Miscellanea: o Add a get_subsystem_name routine to centralize this Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	get_maintainer: --r (list reviewer) is on by default	Brian Norris	2015-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We don't consistenly document the default value next to the option listing, but we do have a list of defaults here, so let's keep it up to date. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	get_maintainer: add --no-foo options to --help	Brian Norris	2015-11-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many flag options are boolean and support both a positive and a negative invocation from the command line. Some of these are even mentioned by example (e.g., --nogit is mentioned as a default option), but they aren't explicitly mentioned in the list of options. It happens that some of these are pretty important, as they are default-on, and to turn them off, you have to know about the --no-foo version. Rather than clutter the whole help text with bracketed '--[no]foo', let's just mention the general rule, a la 'man gcc'. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	get_maintainer: it's '--pattern-depth', not '-pattern-depth'	Brian Norris	2015-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Though it appears that Perl's GetOptions will take either, the latter is not documented in the options listing. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	get_maintainer: add missing documentation for --git-blame-signatures	Brian Norris	2015-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	I really haven't used this option much myself, so feel free to improve on the documentation for it. I just noticed it while inspecting this script for undocumented features. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	printk: prevent userland from spoofing kernel messages	Mathias Krause	2015-11-07	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x \| tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Alex Elder <elder@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay@vrfy.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/vsprintf.c: update documentation	Rasmus Villemoes	2015-11-07	2	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \|	%n is no longer just ignored; it results in early return from vsnprintf. Also add a request to add test cases for future %p extensions. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	selftests: run lib/test_printf module	Kees Cook	2015-11-07	3	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	This runs the lib/test_printf module to make sure printf is operating sanely. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	test_printf: test printf family at runtime	Rasmus Villemoes	2015-11-07	3	-0/+366
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a simple module for testing the kernel's printf facilities. Previously, some %p extensions have caused a wrong return value in case the entire output didn't fit and/or been unusable in kasprintf(). This should help catch such issues. Also, it should help ensure that changes to the formatting algorithms don't break anything. I'm not sure if we have a struct dentry or struct file lying around at boot time or if we can fake one, but most %p extensions should be testable, as should the ordinary number and string formatting. The nature of vararg functions means we can't use a more conventional table-driven approach. For now, this is mostly a skeleton; contributions are very welcome. Some tests are/will be slightly annoying to write, since the expected output depends on stuff like CONFIG_*, sizeof(long), runtime values etc. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Martin Kletzander <mkletzan@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/vsprintf.c: remove SPECIAL handling in pointer()	Rasmus Villemoes	2015-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a quick git grep -E '%[ +0#-]#[ +0#-](\\|[0-9]+)?(\.(\\|[0-9]+)?)?p' shows, nobody uses the # flag with %p. Should one try to do so, one will be met with warning: `#' flag used with `%p' gnu_printf format [-Wformat] (POSIX and C99 both say "... For other conversion specifiers, the behavior is undefined.". Obviously, the kernel can choose to define the behaviour however it wants, but as long as gcc issues that warning, users are unlikely to show up.) Since default_width is effectively always 2sizeof(void), we can simplify the prologue of pointer() and save a few instructions. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Martin Kletzander <mkletzan@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/vsprintf.c: also improve sanity check in bstr_printf()	Rasmus Villemoes	2015-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Quoting from 2aa2f9e21e4e ("lib/vsprintf.c: improve sanity check in vsnprintf()"): On 64 bit, size may very well be huge even if bit 31 happens to be 0. Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a 3 GiB one. So cap at INT_MAX as was probably the intention all along. This is also the made-up value passed by sprintf and vsprintf. I should have seen this copy-pasted instance back then, but let's just do it now. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Martin Kletzander <mkletzan@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/vsprintf.c: handle invalid format specifiers more robustly	Rasmus Villemoes	2015-11-07	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we meet any invalid or unsupported format specifier, 'handling' it by just printing it as a literal string is not safe: Presumably the format string and the arguments passed gcc's type checking, but that means something like sprintf(buf, "%n %pd", &intvar, dentry) would end up interpreting &intvar as a struct dentry*. When the offending specifier was %n it used to be at the end of the format string, but we can't rely on that always being the case. Also, gcc doesn't complain about some more or less exotic qualifiers (or 'length modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by the kernel's printf implementation, they'd be interpreted as unknown specifiers, and the rest of arguments would be interpreted wrongly. So let's complain about anything we don't understand, not just %n, and stop pretending that we'd be able to make sense of the rest of the format/arguments. If the offending specifier is in a printk() call we unfortunately only get a "BUG: recent printk recursion!", but at least direct users of the sprintf family will be caught. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Martin Kletzander <mkletzan@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>