linux - linux

	Commit message (Collapse)	Author	Files	Lines
2020-06-08	panic: add sysctl to dump all CPUs backtraces on oops event	Guilherme G. Piccoli	4	-0/+44
	Usually when the kernel reaches an oops condition, it's a point of no return; in case not enough debug information is available in the kernel splat, one of the last resorts would be to collect a kernel crash dump and analyze it. The problem with this approach is that in order to collect the dump, a panic is required (to kexec-load the crash kernel). When in an environment of multiple virtual machines, users may prefer to try living with the oops, at least until being able to properly shutdown their VMs / finish their important tasks. This patch implements a way to collect a bit more debug details when an oops event is reached, by printing all the CPUs backtraces through the usage of NMIs (on architectures that support that). The sysctl added (and documented) here was called "oops_all_cpu_backtrace", and when set will (as the name suggests) dump all CPUs backtraces. Far from ideal, this may be the last option though for users that for some reason cannot panic on oops. Most of times oopses are clear enough to indicate the kernel portion that must be investigated, but in virtual environments it's possible to observe hypervisor/KVM issues that could lead to oopses shown in other guests CPUs (like virtual APIC crashes). This patch hence aims to help debug such complex issues without resorting to kdump. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel/hung_task.c: introduce sysctl to print all traces when a hung task is ↵	Guilherme G. Piccoli	4	-2/+50
	detected Commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before panic") introduced a change in that we started to show all CPUs backtraces when a hung task is detected _and_ the sysctl/kernel parameter "hung_task_panic" is set. The idea is good, because usually when observing deadlocks (that may lead to hung tasks), the culprit is another task holding a lock and not necessarily the task detected as hung. The problem with this approach is that dumping backtraces is a slightly expensive task, specially printing that on console (and specially in many CPU machines, as servers commonly found nowadays). So, users that plan to collect a kdump to investigate the hung tasks and narrow down the deadlock definitely don't need the CPUs backtrace on dmesg/console, which will delay the panic and pollute the log (crash tool would easily grab all CPUs traces with 'bt -a' command). Also, there's the reciprocal scenario: some users may be interested in seeing the CPUs backtraces but not have the system panic when a hung task is detected. The current approach hence is almost as embedding a policy in the kernel, by forcing the CPUs backtraces' dump (only) on hung_task_panic. This patch decouples the panic event on hung task from the CPUs backtraces dump, by creating (and documenting) a new sysctl called "hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard lockups, that have both a panic and an "all_cpu_backtrace" sysctl to allow individual control. The new mechanism for dumping the CPUs backtraces on hung task detection respects "hung_task_warnings" by not dumping the traces in case there's no warnings left. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases	Guilherme G. Piccoli	3	-35/+19
	After a recent change introduced by Vlastimil's series [0], kernel is able now to handle sysctl parameters on kernel command line; also, the series introduced a simple infrastructure to convert legacy boot parameters (that duplicate sysctls) into sysctl aliases. This patch converts the watchdog parameters softlockup_panic and {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure. It fixes the documentation too, since the alias only accepts values 0 or 1, not the full range of integers. We also took the opportunity here to improve the documentation of the previously converted hung_task_panic (see the patch series [0]) and put the alias table in alphabetical order. [0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	lib/test_sysctl: support testing of sysctl. boot parameter	Vlastimil Babka	2	-0/+55
	Testing is done by a new parameter debug.test_sysctl.boot_int which defaults to 0 and it's expected that the tester passes a boot parameter that sets it to 1. The test checks if it's set to 1. To distinguish true failure from parameter not being set, the test checks /proc/cmdline for the expected parameter, and whether test_sysctl is built-in and not a module. [vbabka@suse.cz: skip the new test if boot_int sysctl is not present] Link: http://lkml.kernel.org/r/305af605-1e60-cf84-fada-6ce1ca37c102@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-6-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	tools/testing/selftests/sysctl/sysctl.sh: support CONFIG_TEST_SYSCTL=y	Vlastimil Babka	1	-1/+1
	The testing script recommends CONFIG_TEST_SYSCTL=y, but actually only works with CONFIG_TEST_SYSCTL=m. Testing of sysctl setting via boot param however requires the test to be built-in, so make sure the test script supports it. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-5-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel/hung_task convert hung_task_panic boot parameter to sysctl	Vlastimil Babka	3	-11/+2
	We can now handle sysctl parameters on kernel command line and have infrastructure to convert legacy command line options that duplicate sysctl to become a sysctl alias. This patch converts the hung_task_panic parameter. Note that the sysctl handler is more strict and allows only 0 and 1, while the legacy parameter allowed any non-zero value. But there is little reason anyone would not be using 1. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel/sysctl: support handling command line aliases	Vlastimil Babka	2	-16/+41
	We can now handle sysctl parameters on kernel command line, but historically some parameters introduced their own command line equivalent, which we don't want to remove for compatibility reasons. We can, however, convert them to the generic infrastructure with a table translating the legacy command line parameters to their sysctl names, and removing the one-off param handlers. This patch adds the support and makes the first conversion to demonstrate it, on the (deprecated) numa_zonelist_order parameter. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel/sysctl: support setting sysctl parameters from kernel command line	Vlastimil Babka	4	-0/+122
	Patch series "support setting sysctl parameters from kernel command line", v3. This series adds support for something that seems like many people always wanted but nobody added it yet, so here's the ability to set sysctl parameters via kernel command line options in the form of sysctl.vm.something=1 The important part is Patch 1. The second, not so important part is an attempt to clean up legacy one-off parameters that do the same thing as a sysctl. I don't want to remove them completely for compatibility reasons, but with generic sysctl support the idea is to remove the one-off param handlers and treat the parameters as aliases for the sysctl variants. I have identified several parameters that mention sysctl counterparts in Documentation/admin-guide/kernel-parameters.txt but there might be more. The conversion also has varying level of success: - numa_zonelist_order is converted in Patch 2 together with adding the necessary infrastructure. It's easy as it doesn't really do anything but warn on deprecated value these days. - hung_task_panic is converted in Patch 3, but there's a downside that now it only accepts 0 and 1, while previously it was any integer value - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so there's no straighforward conversion possible - traceoff_on_warning is a flag without value and it would be required to handle that somehow in the conversion infractructure, which seems pointless for a single flag This patch (of 5): A recently proposed patch to add vm_swappiness command line parameter in addition to existing sysctl [1] made me wonder why we don't have a general support for passing sysctl parameters via command line. Googling found only somebody else wondering the same [2], but I haven't found any prior discussion with reasons why not to do this. Settings the vm_swappiness issue aside (the underlying issue might be solved in a different way), quick search of kernel-parameters.txt shows there are already some that exist as both sysctl and kernel parameter - hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism would remove the need to add more of those one-offs and might be handy in situations where configuration by e.g. /etc/sysctl.d/ is impractical. Hence, this patch adds a new parse_args() pass that looks for parameters prefixed by 'sysctl.' and tries to interpret them as writes to the corresponding sys/ files using an temporary in-kernel procfs mount. This mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically registered sysctl tables, even though we don't handle modular sysctls. Errors due to e.g. invalid parameter name or value are reported in the kernel log. The processing is hooked right before the init process is loaded, as some handlers might be more complicated than simple setters and might need some subsystems to be initialized. At the moment the init process can be started and eventually execute a process writing to /proc/sys/ then it should be also fine to do that from the kernel. Sysctls registered later on module load time are not set by this mechanism - it's expected that in such scenarios, setting sysctl values from userspace is practical enough. [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/ [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	xarray.h: correct return code documentation for xa_store_{bh,irq}()	Manfred Spraul	1	-2/+2
	__xa_store() and xa_store() document that the functions can fail, and that the return code can be an xa_err() encoded error code. xa_store_bh() and xa_store_irq() do not document that the functions can fail and that they can also return xa_err() encoded error codes. Thus: Update the documentation. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200430111424.16634-1-manfred@colorfullife.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	kernel: add panic_on_taint	Rafael Aquini	6	-1/+75
	Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	dynamic_debug: add an option to enable dynamic debug for modules only	Orson Zhai	10	-14/+46
	Instead of enabling dynamic debug globally with CONFIG_DYNAMIC_DEBUG, CONFIG_DYNAMIC_DEBUG_CORE will only enable core function of dynamic debug. With the DYNAMIC_DEBUG_MODULE defined for any modules, dynamic debug will be tied to them. This is useful for people who only want to enable dynamic debug for kernel modules without worrying about kernel image size and memory consumption is increasing too much. [orson.zhai@unisoc.com: v2] Link: http://lkml.kernel.org/r/1587408228-10861-1-git-send-email-orson.unisoc@gmail.com Signed-off-by: Orson Zhai <orson.zhai@unisoc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Petr Mladek <pmladek@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: http://lkml.kernel.org/r/1586521984-5890-1-git-send-email-orson.unisoc@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	ipc/namespace.c: use a work queue to free_ipc	Giuseppe Scrivano	2	-2/+24
	the reason is to avoid a delay caused by the synchronize_rcu() call in kern_umount() when the mqueue mount is freed. the code: #define _GNU_SOURCE #include <sched.h> #include <error.h> #include <errno.h> #include <stdlib.h> int main() { int i; for (i = 0; i < 1000; i++) if (unshare(CLONE_NEWIPC) < 0) error(EXIT_FAILURE, errno, "unshare"); } goes from Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.06 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05 to Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.02 Percent of CPU this job got: 96% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Link: http://lkml.kernel.org/r/20200225145419.527994-1-gscrivan@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	ipc/msg: add missing annotation for freeque()	Jules Irenge	1	-0/+2
	Sparse reports a warning at freeque() warning: context imbalance in freeque() - unexpected unlock The root cause is the missing annotation at freeque() Add the missing __releases(RCU) annotation Add the missing __releases(&msq->q_perm) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lu Shuaibing <shuaibinglu@126.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20200403160505.2832-2-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08	mm/page_idle.c: skip offline pages	SeongJae Park	1	-5/+2
	'Idle page tracking' users can pass random pfn that might be mapped to an offline page. To avoid accessing such pages, this commit modifies the 'page_idle_get_page()' to use 'pfn_to_online_page()' instead of 'pfn_valid()' and 'pfn_to_page()' combination, so that the pfn mapped to an offline page can be skipped. Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/20200605092502.18018-2-sjpark@amazon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-06	hpfs: fix warning due to superfluous semicolon	Zou Wei	1	-1/+1
	Fixes coccicheck warning: fs/hpfs/buffer.c:56:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-06	kbuild: add variables for compression tools	Denis Efremov	10	-24/+45
	Allow user to use alternative implementations of compression tools, such as pigz, pbzip2, pxz. For example, multi-threaded tools to speed up the build: $ make GZIP=pigz BZIP2=pbzip2 Variables _GZIP, _BZIP2, _LZOP are used internally because original env vars are reserved by the tools. The use of GZIP in gzip tool is obsolete since 2015. However, alternative implementations (e.g., pigz) still rely on it. BZIP2, BZIP, LZOP vars are not obsolescent. The credit goes to @grsecurity. As a sidenote, for multi-threaded lzma, xz compression one can use: $ export XZ_OPT="--threads=0" Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	Makefile: install modules.builtin even if CONFIG_MODULES=n	Jonas Zeiger	1	-3/+11
	Many applications check for available kernel features via: - /proc/modules (loaded modules, present if CONFIG_MODULES=y) - $(MODLIB)/modules.builtin (builtin modules) They fail to detect features if the kernel was built with CONFIG_MODULES=n and modules.builtin isn't installed. Therefore, add the target "_builtin_inst_" and make "install" and "modules_install" depend on it. Tests results: - make install: kernel image is copied as before, modules.builtin copied - make modules_install: (CONFIG_MODULES=n) nothing is copied, exit 1 Signed-off-by: Jonas Zeiger <jonas.zeiger@talpidae.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	mksysmap: Fix the mismatch of '.L' symbols in System.map	ashimida	1	-1/+1
	When System.map was generated, the kernel used mksysmap to filter the kernel symbols, but all the symbols with the second letter 'L' in the kernel were filtered out, not just the symbols starting with 'dot + L'. For example: ashimida@ubuntu:~/linux$ cat System.map \|grep ' .L' ashimida@ubuntu:~/linux$ nm -n vmlinux \|grep ' .L' ffff0000088028e0 t bLength_show ...... ffff0000092e0408 b PLLP_OUTC_lock ffff0000092e0410 b PLLP_OUTA_lock The original intent should be to filter out all local symbols starting with '.L', so the dot should be escaped. Fixes: 00902e984732 ("mksysmap: Add h8300 local symbol pattern") Signed-off-by: ashimida <ashimida@linux.alibaba.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	kbuild: doc: rename LDFLAGS to KBUILD_LDFLAGS	Masahiro Yamada	1	-2/+2
	Commit d503ac531a52 ("kbuild: rename LDFLAGS to KBUILD_LDFLAGS") missed to update the documentation. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: change elf_info->size to size_t	Masahiro Yamada	2	-6/+5
	Align with the mmap / munmap APIs. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove is_vmlinux() helper	Masahiro Yamada	1	-15/+1
	Now that is_vmlinux() is called only in new_module(), we can inline the function call. modname is the basename with '.o' is stripped. No need to compare it with 'vmlinux.o'. vmlinux is always located at the current working directory. No need to strip the directory path. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: strip .o from modname before calling new_module()	Masahiro Yamada	2	-10/+12
	new_module() conditionally strips the .o because the modname has .o suffix when it is called from read_symbols(), but no .o when it is called from read_dump(). It is clearer to strip .o in read_symbols(). I also used flexible-array for mod->name. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: set have_vmlinux in new_module()	Masahiro Yamada	1	-5/+3
	Set have_vmlinux flag in a single place. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove mod->skip struct member	Masahiro Yamada	2	-7/+3
	The meaning of 'skip' is obscure since it does not explain "what to skip". mod->skip is set when it is vmlinux or the module info came from a dump file. So, mod->skip is equivalent to (mod->is_vmlinux \|\| mod->from_dump). For the check in write_namespace_deps_files(), mod->is_vmlinux is unneeded because the -d option is not passed in the first pass of modpost. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: add mod->is_vmlinux struct member	Masahiro Yamada	2	-9/+11
	is_vmlinux() is called in several places to check whether the current module is vmlinux or not. It is faster and clearer to check mod->is_vmlinux flag. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove is_vmlinux() call in check_for_{gpl_usage,unused}()	Masahiro Yamada	1	-12/+8
	check_exports() is never called for vmlinux because mod->skip is set for vmlinux. Hence, check_for_gpl_usage() and check_for_unused() are not called for vmlinux, either. is_vmlinux() is always false here. Remove the is_vmlinux() calls, and hard-code the ".ko" suffix. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove mod->is_dot_o struct member	Masahiro Yamada	2	-13/+2
	Previously, there were two cases where mod->is_dot_o is unset: [1] the executable 'vmlinux' in the second pass of modpost [2] modules loaded by read_dump() I think [1] was intended usage to distinguish 'vmlinux.o' and 'vmlinux'. Now that modpost does not parse the executable 'vmlinux', this case does not happen. [2] is obscure, maybe a bug. Module.symver stores module paths without extension. So, none of modules loaded by read_dump() has the .o suffix, and new_module() unsets ->is_dot_o. Anyway, it is not a big deal because handle_symbol() is not called for the case. To sum up, all the parsed ELF files are .o files. mod->is_dot_o is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: move -d option in scripts/Makefile.modpost	Masahiro Yamada	1	-3/+1
	Collect options for modules into a single place. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove -s option	Masahiro Yamada	2	-9/+3
	The -s option was added by commit 8d8d8289df65 ("kbuild: do not do section mismatch checks on vmlinux in 2nd pass"). Now that the second pass does not parse vmlinux, this option is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: remove get_next_text() and make {grab,release_}file static	Masahiro Yamada	2	-39/+2
	get_next_line() is no longer used. Remove. grab_file() and release_file() are only used in modpost.c. Make them static. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: use read_text_file() and get_line() for reading text files	Masahiro Yamada	2	-17/+14
	grab_file() mmaps a file, but it is not so efficient here because get_next_line() copies every line to the temporary buffer anyway. read_text_file() and get_line() are simpler. get_line() exploits the library function strchr(). Going forward, the missing .symvers or .cmd is a fatal error. This should not happen because scripts/Makefile.modpost guards the -i option files with $(wildcard $(input-symdump)). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: avoid false-positive file open error	Masahiro Yamada	1	-4/+3
	One problem of grab_file() is that it cannot distinguish the following two cases: - It cannot read the file (the file does not exist, or read permission is not set) - It can read the file, but the file size is zero This is because grab_file() calls mmap(), which requires the mapped length is greater than 0. Hence, grab_file() fails for both cases. If an empty header file were included for checksum calculation, the following warning would be printed: WARNING: modpost: could not open ...: Invalid argument An empty file is a valid source file, so it should not fail. Use read_text_file() instead. It can read a zero-length file. Then, parse_file() will succeed with doing nothing. Going forward, the first case (it cannot read the file) is a fatal error. If the source file from which an object was compiled is missing, something went wrong. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: fix potential mmap'ed file overrun in get_src_version()	Masahiro Yamada	1	-17/+11
	I do not know how reliably this function works, but it looks dangerous to me. strchr(sources, '\n'); ... continues searching until it finds '\n' or it reaches the '\0' terminator. In other words, 'sources' should be a null-terminated string. However, grab_file() just mmaps a file, so 'sources' is not terminated with null byte. If the file does not contain '\n' at all, strchr() will go beyond the mmap'ed memory. Use read_text_file(), which loads the file content into a malloc'ed buffer, appending null byte. Here we are interested only in the first line of .mod files. Use get_line() helper to get the first line. This also makes missing .mod file a fatal error. Commit 4be40e22233c ("kbuild: do not emit src version warning for non-modules") ignored missing .mod files. I do not fully understand what that commit addressed, but commit 91341d4b2c19 ("kbuild: introduce new option to enhance section mismatch analysis") introduced partial section checks by using modpost. built-in.o was parsed by modpost. Even modules had a problem because .mod files were created after the modpost check. Commit b7dca6dd1e59 ("kbuild: create .mod with full directory path and remove MODVERDIR") stopped doing that. Now that modpost is only invoked after the directory descend, .mod files should always exist at the modpost stage. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: add read_text_file() and get_line() helpers	Masahiro Yamada	2	-0/+51
	modpost uses grab_file() to open a file, but it is not suitable for a text file because the mmap'ed file is not terminated by null byte. Actually, I see some issues for the use of grab_file(). The new helper, read_text_file() loads the whole file content into a malloc'ed buffer, and appends a null byte. Then, get_line() reads each line. To handle text files, I intend to replace as follows: grab_file() -> read_text_file() get_new_line() -> get_line() Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: do not call get_modinfo() for vmlinux(.o)	Masahiro Yamada	1	-21/+24
	The three calls of get_modinfo() ("license", "import_ns", "version") always return NULL for vmlinux(.o) because the built-in module info is prefixed with __MODULE_INFO_PREFIX. It is harmless to call get_modinfo(), but there is no point to search for what apparently does not exist. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: drop RCS/CVS $Revision handling in MODULE_VERSION()	Masahiro Yamada	3	-73/+0
	As far as I understood, this code gets rid of '$Revision$' or '$Revision:' of CVS, RCS or whatever in MODULE_VERSION() tags. Remove the primeval code. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: show warning if any of symbol dump files is missing	Masahiro Yamada	2	-10/+5
	If modpost fails to load a symbol dump file, it cannot check unresolved symbols, hence module dependency will not be added. Nor CRCs can be added. Currently, external module builds check only $(objtree)/Module.symvers, but it should check files specified by KBUILD_EXTRA_SYMBOLS as well. Move the warning message from the top Makefile to scripts/Makefile.modpost and print the warning if any dump file is missing. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: show warning if vmlinux is not found when processing modules	Masahiro Yamada	1	-2/+7
	check_exports() does not print warnings about unresolved symbols if vmlinux is missing because there would be too many. This situation happens when you do 'make modules' from the clean tree, or compile external modules against a kernel tree that has not been completely built. It is dangerous to not check unresolved symbols because you might be building useless modules. At least it should be warned. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: invoke modpost only when input files are updated	Masahiro Yamada	2	-15/+37
	Currently, the second pass of modpost is always invoked when you run 'make' or 'make modules' even if none of modules is changed. Use if_changed to invoke it only when it is necessary. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: generate vmlinux.symvers and reuse it for the second modpost	Masahiro Yamada	5	-6/+7
	The full build runs modpost twice, first for vmlinux.o and second for modules. The first pass dumps all the vmlinux symbols into Module.symvers, but the second pass parses vmlinux again instead of reusing the dump file, presumably because it needs to avoid accumulating stale symbols. Loading symbol info from a dump file is faster than parsing an ELF object. Besides, modpost deals with various issues to parse vmlinux in the second pass. A solution is to make the first pass dumps symbols into a separate file, vmlinux.symvers. The second pass reads it, and parses module .o files. The merged symbol information is dumped into Module.symvers in the same way as before. This makes further modpost cleanups possible. Also, it fixes the problem of 'make vmlinux', which previously overwrote Module.symvers, throwing away module symbols. I slightly touched scripts/link-vmlinux.sh so that vmlinux is re-linked when you cross this commit. Otherwise, vmlinux.symvers would not be generated. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: refactor -i option calculation	Masahiro Yamada	1	-4/+2
	Prepare to use -i for in-tree modpost as well. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: print symbol dump file as the build target in short log	Masahiro Yamada	1	-13/+20
	The symbol dump *.symvers is the output of modpost. Print it in the short log. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: re-add -e to set external_module flag	Masahiro Yamada	2	-2/+8
	Previously, the -i option had two functions; load a symbol dump file, and set the external_module flag. I want to assign a dedicate option for each of them. Going forward, the -i is used to load a symbol dump file, and the -e to set the external_module flag. With this, we will be able to use -i for loading in-kernel symbols. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: rename ext_sym_list to dump_list	Masahiro Yamada	1	-13/+14
	The -i option is used to include Modules.symver as well as files from $(KBUILD_EXTRA_SYMBOLS). Make the struct and variable names more generic. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: allow to pass -i option multiple times to remove -e option	Masahiro Yamada	2	-9/+2
	Now that there is no difference between -i and -e, they can be unified. Make modpost accept the -i option multiple times, then remove -e. I will reuse -e for a different purpose. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	modpost: track if the symbol origin is a dump file or ELF object	Masahiro Yamada	2	-10/+6
	The meaning of sym->kernel is obscure; it is set for in-kernel symbols loaded from Modules.symvers. This happens only when we are building external modules, and it is used to determine whether to dump symbols to $(KBUILD_EXTMOD)/Modules.symvers It is clearer to remember whether the symbol or module came from a dump file or ELF object. This changes the KBUILD_EXTRA_SYMBOLS behavior. Previously, symbols loaded from KBUILD_EXTRA_SYMBOLS are accumulated into the current $(KBUILD_EXTMOD)/Modules.symvers Going forward, they will be only used to check symbol references, but not dumped into the current $(KBUILD_EXTMOD)/Modules.symvers. I believe this makes more sense. sym->vmlinux will have no user. Remove it too. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-06	firmware/dmi: Report DMI Bios & EC firmware release	Erwan Velu	4	-0/+40
	Some vendors like HPe or Dell, encode the release version of their BIOS in the "System BIOS {Major\|Minor} Release" fields of Type 0. This information is used to know which bios release actually runs. It could be used for some quirks, debugging sessions or inventory tasks. A typical output for a Dell system running the 65.27 bios is : [root@t1700 ~]# cat /sys/devices/virtual/dmi/id/bios_release 65.27 [root@t1700 ~]# Servers that have a BMC encode the release version of their firmware in the "Embedded Controller Firmware {Major\|Minor} Release" fields of Type 0. This information is used to know which BMC release actually runs. It could be used for some quirks, debugging sessions or inventory tasks. A typical output for a Dell system running the 3.75 bmc release is : [root@t1700 ~]# cat /sys/devices/virtual/dmi/id/ec_firmware_release 3.75 [root@t1700 ~]# Signed-off-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2020-06-05	dm crypt: avoid truncating the logical block size	Eric Biggers	1	-1/+1
	queue_limits::logical_block_size got changed from unsigned short to unsigned int, but it was forgotten to update crypt_io_hints() to use the new type. Fix it. Fixes: ad6bf88a6c19 ("block: fix an integer overflow in logical block size") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05	dm mpath: add DM device name to Failing/Reinstating path log messages	Mike Snitzer	1	-2/+6
	When there are many DM multipath devices it really helps to have additional context for which DM device a failed or reinstated path is part of. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05	dm mpath: enhance queue_if_no_path debugging	Mike Snitzer	1	-7/+23
	Add more DMDEBUG that shows arguments passed and caller, and another that shows state of related flags at end of queue_if_no_path(). Also add queue_if_no_path DMDEBUG to multipath_resume(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>