summaryrefslogtreecommitdiffstats
path: root/tools/perf (unfollow)
Commit message (Collapse)AuthorFilesLines
2020-08-12perf trace beauty: Add script to autogenerate socket families tableArnaldo Carvalho de Melo2-0/+466
To use with 'perf trace', to convert the protocol families to strings, e.g: $ tools/perf/trace/beauty/socket.sh static const char *socket_families[] = { [0] = "UNSPEC", [1] = "LOCAL", [2] = "INET", [3] = "AX25", [4] = "IPX", [5] = "APPLETALK", [6] = "NETROM", [7] = "BRIDGE", [8] = "ATMPVC", [9] = "X25", [10] = "INET6", [11] = "ROSE", [12] = "DECnet", [13] = "NETBEUI", [14] = "SECURITY", [15] = "KEY", [16] = "NETLINK", [17] = "PACKET", [18] = "ASH", [19] = "ECONET", [20] = "ATMSVC", [21] = "RDS", [22] = "SNA", [23] = "IRDA", [24] = "PPPOX", [25] = "WANPIPE", [26] = "LLC", [27] = "IB", [28] = "MPLS", [29] = "CAN", [30] = "TIPC", [31] = "BLUETOOTH", [32] = "IUCV", [33] = "RXRPC", [34] = "ISDN", [35] = "PHONET", [36] = "IEEE802154", [37] = "CAIF", [38] = "ALG", [39] = "NFC", [40] = "VSOCK", [41] = "KCM", [42] = "QIPCRTR", [43] = "SMC", [44] = "XDP", }; $ This uses a copy of include/linux/socket.h that is kept in a directory to be used just for these table generation scripts and for checking if the kernel has a new file that maybe gets something new for these tables. This allows us to: - Avoid accessing files outside tools/, in the kernel sources, that may be changed in unexpected ways and thus break these scripts. - Notice when those files change and thus check if the changes don't break those scripts, update them to automatically get the new definitions, a new socket family, for instance. - Not add then to the tools/include/ where it may end up used while building the tools and end up requiring dragging yet more stuff from the kernel or plain break the build in some of the myriad environments where perf may be built. This will replace the previous static array in tools/perf/ that was dated and was already missing the AF_KCM, AF_QIPCRTR, AF_SMC and AF_XDP families. The next cset will wire this up to the perf build process. At some point this must be made into a library to be used in places such as libtraceevent, bpftrace, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-11zonefs: update documentation to reflect zone size vs capacityJohannes Thumshirn1-10/+12
Update the zonefs documentation to reflect the difference between a zone's size and it's capacity. The maximum file size in zonefs is the zones capacity, for ZBC and ZAC based devices, which do not have a separate zone capacity, the zone capacity is equal to the zone size. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-08-11zonefs: add zone-capacity supportJohannes Thumshirn2-4/+15
In the zoned storage model, the sectors within a zone are typically all writeable. With the introduction of the Zoned Namespace (ZNS) Command Set in the NVM Express organization, the model was extended to have a specific writeable capacity. This zone capacity can be less than the overall zone size for a NVMe ZNS device or null_blk in zoned-mode. For other ZBC/ZAC devices the zone capacity is always equal to the zone size. Use the zone capacity field instead from blk_zone for determining the maximum inode size and inode blocks in zonefs. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-08-10ktest.pl: Fix spelling mistake "Cant" -> "Can't"Colin Ian King1-1/+1
There is a spelling mistake in an error message. Fix it. Link: https://lkml.kernel.org/r/20200810100750.61475-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-10ktest.pl: Change the logic to control the size of the log file emailedSteven Rostedt (VMware)1-3/+6
If the log file for a given test is larger than the max size given then use set the seek from the end of the log file instead of from the start of the test. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-10vdpa/mlx5: fix up endian-ness for mtuMichael S. Tsirkin1-1/+11
VDPA mlx5 accesses config space as native endian - this is wrong since it's a modern device and actually uses LE. It only supports modern guests so we could punt and just force LE, but let's use the full virtio APIs since people tend to copy/paste code, and this is not data path anyway. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-10vdpa: Fix pointer math bug in vdpasim_get_config()Dan Carpenter1-1/+1
If "offset" is non-zero then we end up copying from beyond the end of the config because of pointer math. We can fix this by casting the struct to a u8 pointer. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200406144552.GF68494@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-08-10recordmcount: Fix build failure on non arm64Christophe Leroy1-0/+2
Commit ea0eada45632 leads to the following build failure on powerpc: HOSTCC scripts/recordmcount scripts/recordmcount.c: In function 'arm64_is_fake_mcount': scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use in this function) scripts/recordmcount.c:440: error: (Each undeclared identifier is reported only once scripts/recordmcount.c:440: error: for each function it appears in.) make[2]: *** [scripts/recordmcount] Error 1 Make sure R_AARCH64_CALL26 is always defined. Fixes: ea0eada45632 ("recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Gregory Herrero <gregory.herrero@oracle.com> Cc: Gregory Herrero <gregory.herrero@oracle.com> Link: https://lore.kernel.org/r/5ca1be21fa6ebf73203b45fd9aadd2bafb5e6b15.1597049145.git.christophe.leroy@csgroup.eu Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-10vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()Dan Carpenter1-1/+1
There is a pointer math bug here so if "offset" is non-zero then this will copy memory from beyond the end of the array. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200808093241.GB115053@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Cc: Jason Wang <jasowang@redhat.com>; Parav Pandit <parav@mellanox.com>; virtualization@lists.linux-foundation.org; linux-kernel@vger.kernel.org; kernel-janitors@vger.kernel.org Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2020-08-10vdpa/mlx5: fix memory allocation failure checksColin Ian King1-7/+6
The memory allocation failure checking for in and out is currently checking if the pointers are valid rather than the contents of what they point to. Hence the null check on failed memory allocations is incorrect. Fix this by adding the missing indirection in the check. Also for the default case, just set the *in and *out to null as these don't have any thing allocated to kfree. Finally remove the redundant *in and *out check as these have been already done on each allocation in the case statement. Addresses-Coverity: ("Null pointer dereference") Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200806160828.90463-1-colin.king@canonical.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eli Cohen <eli@mellanox.com>
2020-08-10vdpa/mlx5: Fix uninitialised variable in core/mr.cAlex Dewar1-1/+3
If the kernel is unable to allocate memory for the variable dmr then err will be returned without being set. Set err to -ENOMEM in this case. Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Addresses-Coverity: ("Uninitialized variables") Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk> Link: https://lore.kernel.org/r/20200806185625.67344-1-alex.dewar@gmx.co.uk Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eli Cohen <eli@mellanox.com>
2020-08-10vdpa_sim: init iommu lockMichael S. Tsirkin1-0/+1
The patch adding the iommu lock did not initialize it. The struct is zero-initialized so this is mostly a problem when using lockdep. Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Max Gurtovoy <maxg@mellanox.com> Fixes: 0ea9ee430e74 ("vdpasim: protect concurrent access to iommu iotlb") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-09thunderbolt: merge fix for kunix_resource changesStephen Rothwell1-2/+2
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-09kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option baseMasahiro Yamada2-8/+7
Commit d26e94149276 ("kbuild: no gcc-plugins during cc-option tests") was neeeded because scripts/Makefile.gcc-plugins was too early. This is unneeded by including scripts/Makefile.gcc-plugins last, and being careful to not add cc-option tests after it. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-08-09kbuild: include scripts/Makefile.* only when relevant CONFIG is enabledMasahiro Yamada4-18/+9
Currently, the top Makefile includes all of scripts/Makefile.<feature> even if the associated CONFIG option is disabled. Do not include unneeded Makefiles in order to slightly optimize the parse stage. Include $(include-y), and ignore $(include-). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-08-09kbuild: introduce hostprogs-always-y and userprogs-always-yMasahiro Yamada22-55/+78
To build host programs, you need to add the program names to 'hostprogs' to use the necessary build rule, but it is not enough to build them because there is no dependency. There are two types of host programs: built as the prerequisite of another (e.g. gen_crc32table in lib/Makefile), or always built when Kbuild visits the Makefile (e.g. genksyms in scripts/genksyms/Makefile). The latter is typical in Makefiles under scripts/, which contains host programs globally used during the kernel build. To build them, you need to add them to both 'hostprogs' and 'always-y'. This commit adds hostprogs-always-y as a shorthand. The same applies to user programs. net/bpfilter/Makefile builds bpfilter_umh on demand, hence always-y is unneeded. In contrast, programs under samples/ are added to both 'userprogs' and 'always-y' so they are always built when Kbuild visits the Makefiles. userprogs-always-y works as a shorthand. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2020-08-09kbuild: sort hostprogs before passing it to ifneqMasahiro Yamada2-7/+8
The conditional: ifneq ($(hostprogs),) ... is evaluated to true if $(hostprogs) does not contain any word but whitespace characters. ifneq ($(strip $(hostprogs)),) ... is a safe way to avoid interpreting whitespace as a non-empty value, but I'd rather want to use the side-effect of $(sort ...) to do the equivalent. $(sort ...) is used in scripts/Makefile.host in order to drop duplication in $(hostprogs). It is also useful to strip excessive spaces. Move $(sort ...) before evaluating the ifneq. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-08-09kbuild: move host .so build rules to scripts/gcc-plugins/MakefileMasahiro Yamada4-43/+55
The host shared library rules are currently implemented in scripts/Makefile.host, but actually GCC-plugin is the only user of them. (The VDSO .so files are built for the target by different build rules) Hence, they do not need to be treewide available. Move all the relevant build rules to scripts/gcc-plugins/Makefile. I also optimized the build steps so *.so is directly built from .c because every upstream plugin is compiled from a single source file. I am still keeping the multi-file plugin support, which Kees Cook mentioned might be needed by out-of-tree plugins. (https://lkml.org/lkml/2019/1/11/1107) If the plugin, foo.so, is compiled from two files foo.c and foo2.c, then you can do like follows: foo-objs := foo.o foo2.o Single-file plugins do not need the *-objs notation. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Kees Cook <keescook@chromium.org>
2020-08-09kbuild: Replace HTTP links with HTTPS onesAlexander A. Klimov5-11/+11
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-08-09kbuild: trace functions in subdirectories of lib/Masahiro Yamada17-40/+0
ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE) exists here in sub-directories of lib/ to keep the behavior of commit 2464a609ded0 ("ftrace: do not trace library functions"). Since that commit, not only the objects in lib/ but also the ones in the sub-directories are excluded from ftrace (although the commit description did not explicitly mention this). However, most of library functions in sub-directories are not so hot. Re-add them to ftrace. Going forward, only the objects right under lib/ will be excluded. Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-09kbuild: introduce ccflags-remove-y and asflags-remove-yMasahiro Yamada24-23/+64
CFLAGS_REMOVE_<file>.o filters out flags when compiling a particular object, but there is no convenient way to do that for every object in a directory. Add ccflags-remove-y and asflags-remove-y to make it easily. Use ccflags-remove-y to clean up some Makefiles. The add/remove order works as follows: [1] KBUILD_CFLAGS specifies compiler flags used globally [2] ccflags-y adds compiler flags for all objects in the current Makefile [3] ccflags-remove-y removes compiler flags for all objects in the current Makefile (New feature) [4] CFLAGS_<file> adds compiler flags per file. [5] CFLAGS_REMOVE_<file> removes compiler flags per file. Having [3] before [4] allows us to remove flags from most (but not all) objects in the current Makefile. For example, kernel/trace/Makefile removes $(CC_FLAGS_FTRACE) from all objects in the directory, then adds it back to trace_selftest_dynamic.o and CFLAGS_trace_kprobe_selftest.o The same applies to lib/livepatch/Makefile. Please note ccflags-remove-y has no effect to the sub-directories. In contrast, the previous notation got rid of compiler flags also from all the sub-directories. The following are not affected because they have no sub-directories: arch/arm/boot/compressed/ arch/powerpc/xmon/ arch/sh/ kernel/trace/ However, lib/ has several sub-directories. To keep the behavior, I added ccflags-remove-y to all Makefiles in subdirectories of lib/, except the following: lib/vdso/Makefile - Kbuild does not descend into this Makefile lib/raid/test/Makefile - This is not used for the kernel build I think commit 2464a609ded0 ("ftrace: do not trace library functions") excluded too much. In the next commit, I will remove ccflags-remove-y from the sub-directories of lib/. Suggested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Brendan Higgins <brendanhiggins@google.com> (KUnit) Tested-by: Anders Roxell <anders.roxell@linaro.org>
2020-08-09kbuild: do not export LDFLAGS_vmlinuxMasahiro Yamada3-3/+6
When you clean the build tree for ARCH=arm, you may see the following error message from 'nm' command: $ make -j24 ARCH=arm clean CLEAN arch/arm/crypto CLEAN arch/arm/kernel CLEAN arch/arm/mach-at91 CLEAN arch/arm/mach-omap2 CLEAN arch/arm/vdso CLEAN certs CLEAN lib CLEAN usr CLEAN net/wireless CLEAN drivers/firmware/efi/libstub nm: 'arch/arm/boot/compressed/../../../../vmlinux': No such file /bin/sh: 1: arithmetic expression: expecting primary: " " CLEAN arch/arm/boot/compressed CLEAN drivers/scsi CLEAN drivers/tty/vt CLEAN arch/arm/boot CLEAN vmlinux.symvers modules.builtin modules.builtin.modinfo Even if you rerun the same command, the error message will not be shown despite vmlinux is already gone. To reproduce it, the parallel option -j is needed. Single thread cleaning always executes 'archclean', 'vmlinuxclean' in this order, so vmlinux still exists when arch/arm/boot/compressed/ is cleaned. Looking at arch/arm/boot/compressed/Makefile does not help understand the reason of the error message. Both KBSS_SZ and LDFLAGS_vmlinux are assigned with '=' operator, hence, they are not expanded unless used. Obviously, 'make clean' does not use them. In fact, the root cause exists in the top Makefile: export LDFLAGS_vmlinux Since LDFLAGS_vmlinux is an exported variable, LDFLAGS_vmlinux in arch/arm/boot/compressed/Makefile is expanded when scripts/Makefile.clean has a command to execute. This is why the error message shows up only when there exist build artifacts in arch/arm/boot/compressed/. Adding 'unexport LDFLAGS_vmlinux' to arch/arm/boot/compressed/Makefile will fix it as far as ARCH=arm is concerned, but I think the proper fix is to get rid of 'export LDFLAGS_vmlinux' from the top Makefile. LDFLAGS_vmlinux in the top Makefile contains linker flags for the top vmlinux. LDFLAGS_vmlinux in arch/arm/boot/compressed/Makefile is for arch/arm/boot/compressed/vmlinux. They just happen to have the same variable name, but are used for different purposes. Stop shadowing LDFLAGS_vmlinux. This commit passes LDFLAGS_vmlinux to scripts/link-vmlinux.sh via a command line parameter instead of via an environment variable. LD and KBUILD_LDFLAGS are exported, but I did the same for consistency. Anyway, they must be included in cmd_link-vmlinux to allow if_changed to detect the changes in LD or KBUILD_LDFLAGS. The following Makefiles are not affected: arch/arm/boot/compressed/Makefile arch/h8300/boot/compressed/Makefile arch/nios2/boot/compressed/Makefile arch/parisc/boot/compressed/Makefile arch/s390/boot/compressed/Makefile arch/sh/boot/compressed/Makefile arch/sh/boot/romimage/Makefile arch/x86/boot/compressed/Makefile They use ':=' or '=' to clear the LDFLAGS_vmlinux inherited from the top Makefile. We need to take a closer look at the impact to unicore32 and xtensa. arch/unicore32/boot/compressed/Makefile only uses '+=' operator for LDFLAGS_vmlinux. So, the decompressor previously inherited the linker flags from the top Makefile. However, commit 70fac51feaf2 ("unicore32 additional architecture files: boot process") was merged before commit 1f2bfbd00e46 ("kbuild: link of vmlinux moved to a script"). So, I rather consider this is a bug fix of 1f2bfbd00e46. arch/xtensa/boot/boot-elf/Makefile is also affected, but this is also considered a fix for the same reason. It did not inherit LDFLAGS_vmlinux when commit 4bedea945451 ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 2") was merged. I deleted $(LDFLAGS_vmlinux), which is now empty. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
2020-08-09kbuild: always create directories of targetsMasahiro Yamada1-3/+1
Currently, the directories of objects are automatically created only for O= builds. It should not hurt to cater to this for in-tree builds too. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-08-08arm64: Fix __cpu_logical_map undefined issueKefeng Wang3-5/+16
The __cpu_logical_map undefined issue occued when the new tegra194-cpufreq drvier building as a module. ERROR: modpost: "__cpu_logical_map" [drivers/cpufreq/tegra194-cpufreq.ko] undefined! The driver using cpu_logical_map() macro which will expand to __cpu_logical_map, we can't access it in a drvier. Let's turn cpu_logical_map() into a C wrapper and export it to fix the build issue. Also create a function set_cpu_logical_map(cpu, hwid) when assign a value to cpu_logical_map(cpu). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-08arm64/fixmap: make notes of fixed_addresses more preciselyPingfan Liu1-4/+3
These 'compile-time allocated' memory buffers can occupy more than one page and each enum increment is page-sized. So improve the note about it. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1596460720-19243-1-git-send-email-kernelfans@gmail.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-08powerpc/ptrace: Fix build error in pkey_get()Michael Ellerman1-2/+0
The merge resolution in commit 25d8d4eecace left ret no longer used, leading to: arch/powerpc/kernel/ptrace/ptrace-view.c: In function ‘pkey_get’: arch/powerpc/kernel/ptrace/ptrace-view.c:473:6: error: unused variable ‘ret’ 473 | int ret; Fix it by removing ret. Fixes: 25d8d4eecace ("Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-08fs: fix a struct path leak in path_umountChristoph Hellwig1-14/+18
Make sure we also put the dentry and vfsmnt in the illegal flags and !may_umount cases. Fixes: 41525f56e256 ("fs: refactor ksys_umount") Reported-by: Vikas Kumar <vikas.kumar2@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-07tracing: Add trace_array_init_printk() to initialize instance trace_printk() ↵Steven Rostedt (VMware)2-0/+45
buffers As trace_array_printk() used with not global instances will not add noise to the main buffer, they are OK to have in the kernel (unlike trace_printk()). This require the subsystem to create their own tracing instance, and the trace_array_printk() only writes into those instances. Add trace_array_init_printk() to initialize the trace_printk() buffers without printing out the WARNING message. Reported-by: Sean Paul <sean@poorly.run> Reviewed-by: Sean Paul <sean@poorly.run> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-07net/scm: Fix typo in SCM_RIGHTS compat refactoringKees Cook1-1/+1
When refactoring the SCM_RIGHTS code, I accidentally mis-merged my native/compat diffs, which entirely broke using SCM_RIGHTS in compat mode. Use the correct helper. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html Reported-by: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/ Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Fixes: c0029de50982 ("net/scm: Regularize compat handling of scm_detach_fds()") Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-07mm: vmscan: consistent update to pgrefillShakeel Butt1-1/+2
The vmstat pgrefill is useful together with pgscan and pgsteal stats to measure the reclaim efficiency. However vmstat's pgrefill is not updated consistently at system level. It gets updated for both global and memcg reclaim however pgscan and pgsteal are updated for only global reclaim. So, update pgrefill only for global reclaim. If someone is interested in the stats representing both system level as well as memcg level reclaim, then consult the root memcg's memory.stat instead of /proc/vmstat. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200711011459.1159929-1-shakeelb@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/vmscan.c: fix typodylan-meiners1-1/+1
Change "optizimation" to "optimization". Signed-off-by: dylan-meiners <spacct.spacct@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200609185144.10049-1-spacct.spacct@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07khugepaged: khugepaged_test_exit() check mmget_still_valid()Hugh Dickins1-4/+1
Move collapse_huge_page()'s mmget_still_valid() check into khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP only, and earned its mmget_still_valid() check because it inserts a huge pmd entry in place of the page table's pmd entry; whereas collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp() merely clears the page table's pmd entry. But core dumping without mmap lock must have been as open to mistaking a racily cleared pmd entry for a page table at physical page 0, as exit_mmap() was. And we certainly have no interest in mapping as a THP once dumping core. Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.8+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07khugepaged: retract_page_tables() remember to test exitHugh Dickins1-10/+14
Only once have I seen this scenario (and forgot even to notice what forced the eventual crash): a sequence of "BUG: Bad page map" alerts from vm_normal_page(), from zap_pte_range() servicing exit_mmap(); pmd:00000000, pte values corresponding to data in physical page 0. The pte mappings being zapped in this case were supposed to be from a huge page of ext4 text (but could as well have been shmem): my belief is that it was racing with collapse_file()'s retract_page_tables(), found *pmd pointing to a page table, locked it, but *pmd had become 0 by the time start_pte was decided. In most cases, that possibility is excluded by holding mmap lock; but exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged checks khugepaged_test_exit() after acquiring mmap lock: khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so, for example. But retract_page_tables() did not: fix that. The fix is for retract_page_tables() to check khugepaged_test_exit(), after acquiring mmap lock, before doing anything to the page table. Getting the mmap lock serializes with __mmput(), which briefly takes and drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on mm_users makes sure we don't touch the page table once exit_mmap() might reach it, since exit_mmap() will be proceeding without mmap lock, not expecting anyone to be racing with it. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [4.8+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07khugepaged: collapse_pte_mapped_thp() protect the pmd lockHugh Dickins1-25/+19
When retract_page_tables() removes a page table to make way for a huge pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the case when the original mmap_write_trylock had failed), only mmap_write_trylock and pmd lock are held. That's not enough. One machine has twice crashed under load, with "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving page_referenced() on a file THP, that had found a page table at *pmd) discovers that the page table page and its lock have already been freed by the time it comes to unlock. Follow the example of retract_page_tables(), but we only need one of huge page lock or i_mmap_lock_write to secure against this: because it's the narrower lock, and because it simplifies collapse_pte_mapped_thp() to know the hpage earlier, choose to rely on huge page lock here. Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07khugepaged: collapse_pte_mapped_thp() flush the right rangeHugh Dickins1-1/+1
pmdp_collapse_flush() should be given the start address at which the huge page is mapped, haddr: it was given addr, which at that point has been used as a local variable, incremented to the end address of the extent. Found by source inspection while chasing a hugepage locking bug, which I then could not explain by this. At first I thought this was very bad; then saw that all of the page translations that were not flushed would actually still point to the right pages afterwards, so harmless; then realized that I know nothing of how different architectures and models cache intermediate paging structures, so maybe it matters after all - particularly since the page table concerned is immediately freed. Much easier to fix than to think about. Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possiblePeter Xu1-14/+10
This is found by code observation only. Firstly, the worst case scenario should assume the whole range was covered by pmd sharing. The old algorithm might not work as expected for ranges like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the expected range should be (0, 2g). Since at it, remove the loop since it should not be required. With that, the new code should be faster too when the invalidating range is huge. Mike said: : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only : adjust to (0, 1g+2m) which is incorrect. : : We should cc stable. The original reason for adjusting the range was to : prevent data corruption (getting wrong page). Since the range is not : always adjusted correctly, the potential for corruption still exists. : : However, I am fairly confident that adjust_range_if_pmd_sharing_possible : is only gong to be called in two cases: : : 1) for a single page : 2) for range == entire vma : : In those cases, the current code should produce the correct results. : : To be safe, let's just cc stable. Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: thp: replace HTTP links with HTTPS onesAlexander A. Klimov1-2/+2
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `xmlns`: For each link, `http://[^# ]*(?:\w|/)`: If neither `gnu\.org/license`, nor `mozilla\.org/MPL`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [akpm@linux-foundation.org: fix amd.com URL, per Vlastimil] Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200713164345.36088-1-grandmaster@al2klimov.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc: fix memalloc_nocma_{save/restore} APIsJoonsoo Kim2-17/+22
Currently, memalloc_nocma_{save/restore} API that prevents CMA area in page allocation is implemented by using current_gfp_context(). However, there are two problems of this implementation. First, this doesn't work for allocation fastpath. In the fastpath, original gfp_mask is used since current_gfp_context() is introduced in order to control reclaim and it is on slowpath. So, CMA area can be allocated through the allocation fastpath even if memalloc_nocma_{save/restore} APIs are used. Currently, there is just one user for these APIs and it has a fallback method to prevent actual problem. Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect to exclude the memory on the ZONE_MOVABLE for allocation target. To fix these problems, this patch changes the implementation to exclude CMA area in page allocation. Main point of this change is using the alloc_flags. alloc_flags is mainly used to control allocation so it fits for excluding CMA area in allocation. Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc) Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Link: http://lkml.kernel.org/r/1595468942-29687-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc.c: skip setting nodemask when we are in interruptMuchun Song1-1/+5
When we are in the interrupt context, it is irrelevant to the current task context. If we use current task's mems_allowed, we can be fair to alloc pages in the fast path and fall back to slow path memory allocation when the current node(which is the current task mems_allowed) does not have enough memory to allocate. In this case, it slows down the memory allocation speed of interrupt context. So we can skip setting the nodemask to allow any node to allocate memory, so that fast path allocation can success. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200706025921.53683-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc: fallbacks at most has 3 elementsWei Yang1-1/+1
MIGRAGE_TYPES is used to be the mark of end and there are at most 3 elements for the one dimension array. Reduce to 3 to save little memory. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200625231022.18784-1-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc: silence a KASAN false positiveQian Cai1-0/+3
kernel_init_free_pages() will use memset() on s390 to clear all pages from kmalloc_order() which will override KASAN redzones because a redzone was setup from the end of the allocation size to the end of the last page. Silence it by not reporting it there. An example of the report is, BUG: KASAN: slab-out-of-bounds in __free_pages_ok Write of size 4096 at addr 000000014beaa000 Call Trace: show_stack+0x152/0x210 dump_stack+0x1f8/0x248 print_address_description.isra.13+0x5e/0x4d0 kasan_report+0x130/0x178 check_memory_region+0x190/0x218 memset+0x34/0x60 __free_pages_ok+0x894/0x12f0 kfree+0x4f2/0x5e0 unpack_to_rootfs+0x60e/0x650 populate_rootfs+0x56/0x358 do_one_initcall+0x1f4/0xa20 kernel_init_freeable+0x758/0x7e8 kernel_init+0x1c/0x170 ret_from_fork+0x24/0x28 Memory state around the buggy address: 000000014bea9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000014bea9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >000000014beaa000: 03 fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ^ 000000014beaa080: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe 000000014beaa100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Link: http://lkml.kernel.org/r/20200610052154.5180-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc.c: remove unnecessary end_bitidx for ↵Wei Yang3-17/+9
[set|get]_pfnblock_flags_mask() After previous cleanup, the end_bitidx is not necessary any more. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20200623124201.8199-4-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc.c: simplify pageblock bitmap accessWei Yang2-22/+13
Due to commit e58469bafd05 ("mm: page_alloc: use word-based accesses for get/set pageblock bitmaps"), pageblock bitmap is accessed with word-based access. This operation could be simplified a little. Intuitively, if we want to get a bit range [start_idx, end_idx] in a word, we can do like this: mask = (1 << (end_bitidx - start_bitidx + 1)) - 1; ret = (word >> start_idx) & mask; And also if we want to set a bit range [start_idx, end_idx] with flags, we can do the same by just shift start_bitidx. By doing so we reduce some instructions for these two helper functions: Before Patched set_pfnblock_flags_mask 209 198(-5%) get_pfnblock_flags_mask 101 87(-13%) Since the syntax is changed a little, we need to check the whole 4-bit migrate_type instead of part of it. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20200623124201.8199-3-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc.c: extract the common part in pfn_to_bitidx()Wei Yang1-2/+1
The return value calculation is the same both for SPARSEMEM or not. Just take it out. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20200623124201.8199-2-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with ↵Wei Yang1-2/+1
PB_migratetype_bits We already have the definition of PB_migratetype_bits and current NR_MIGRATETYPE_BITS looks like a cyclic definition. Just use PB_migratetype_bits is enough. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20200623124201.8199-1-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/shuffle: remove dynamic reconfigurationDavid Hildenbrand2-43/+2
Commit e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") promised "autodetection of a memory-side-cache (to be added in a follow-on patch)" over a year ago. The original series included patches [1], however, they were dropped during review [2] to be followed-up later. Due to lack of platforms that publish an HMAT, autodetection is currently not implemented. However, manual activation is actively used [3]. Let's simplify for now and re-add when really (ever?) needed. [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com [3] https://lkml.kernel.org/r/CAPcyv4irwGUU2x+c6b4L=KbB1dnasNKaaZd6oSpYjL9kfsnROQ@mail.gmail.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/20200624094741.9918-4-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/memory_hotplug: document why shuffle_zone() is relevantDavid Hildenbrand1-0/+8
It's not completely obvious why we have to shuffle the complete zone - introduced in commit e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") - because some sort of shuffling is already performed when onlining pages via __free_one_page(), placing MAX_ORDER-1 pages either to the head or the tail of the freelist. Let's document why we have to shuffle the complete zone when exposing larger, contiguous physical memory areas to the buddy. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200624094741.9918-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/page_alloc: remove nr_free_pagecache_pages()David Hildenbrand2-15/+2
nr_free_pagecache_pages() isn't used outside page_alloc.c anymore - and the name does not really help to understand what's going on. Let's open-code it instead and add a comment. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Link: http://lkml.kernel.org/r/20200619132410.23859-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: remove vm_total_pagesDavid Hildenbrand5-13/+4
The global variable "vm_total_pages" is a relic from older days. There is only a single user that reads the variable - build_all_zonelists() - and the first thing it does is update it. Use a local variable in build_all_zonelists() instead and remove the global variable. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Minchan Kim <minchan@kernel.org> Link: http://lkml.kernel.org/r/20200619132410.23859-2-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm, page_alloc: skip ->waternark_boost for atomic order-0 allocationsCharan Teja Reddy1-4/+20
When boosting is enabled, it is observed that rate of atomic order-0 allocation failures are high due to the fact that free levels in the system are checked with ->watermark_boost offset. This is not a problem for sleepable allocations but for atomic allocations which looks like regression. This problem is seen frequently on system setup of Android kernel running on Snapdragon hardware with 4GB RAM size. When no extfrag event occurred in the system, ->watermark_boost factor is zero, thus the watermark configurations in the system are: _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 0 After launching some memory hungry applications in Android which can cause extfrag events in the system to an extent that ->watermark_boost can be set to max i.e. default boost factor makes it to 150% of high watermark. _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 14077, -->~57MB With default system configuration, for an atomic order-0 allocation to succeed, having free memory of ~2MB will suffice. But boosting makes the min_wmark to ~61MB thus for an atomic order-0 allocation to be successful system should have minimum of ~23MB of free memory(from calculations of zone_watermark_ok(), min = 3/4(min/2)). But failures are observed despite system is having ~20MB of free memory. In the testing, this is reproducible as early as first 300secs since boot and with furtherlowram configurations(<2GB) it is observed as early as first 150secs since boot. These failures can be avoided by excluding the ->watermark_boost in watermark caluculations for atomic order-0 allocations. [akpm@linux-foundation.org: fix comment grammar, reflow comment] [charante@codeaurora.org: fix suggested by Mel Gorman] Link: http://lkml.kernel.org/r/31556793-57b1-1c21-1a9d-22674d9bd938@codeaurora.org Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/1589882284-21010-1-git-send-email-charante@codeaurora.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>