summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ext4: documentation fixesAyush Ranjan2019-08-236-18/+31
| | | | | | | | | | | | | | | | | | | | | | | This commit aims to fix the following issues in ext4 documentation: - Flexible block group docs said that the aim was to group block metadata together instead of block group metadata. - The documentation consistly uses "location" instead of "block number". It is easy to confuse location to be an absolute offset on disk. Added a line to clarify all location values are in terms of block numbers. - Dirent2 docs said that the rec_len field is shortened instead of the name_len field. - Typo in bg_checksum description. - Inode size is 160 bytes now, and hence i_extra_isize is now 32. - Cluster size formula was incorrect, it did not include the +10 to s_log_cluster_size value. - Typo: there were two s_wtime_hi in the superblock struct. - Superblock struct was outdated, added the new fields which were part of s_reserved earlier. - Multiple mount protection seems to be implemented in fs/ext4/mmp.c. Signed-off-by: Ayush Ranjan <ayushr2@illinois.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
* ext4: treat buffers with write errors as containing valid dataZhangXiaoxu2019-08-232-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | I got some errors when I repair an ext4 volume which stacked by an iscsi target: Entry 'test60' in / (2) has deleted/unused inode 73750. Clear? It can be reproduced when the network not good enough. When I debug this I found ext4 will read entry buffer from disk and the buffer is marked with write_io_error. If the buffer is marked with write_io_error, it means it already wroten to journal, and not checked out to disk. IOW, the journal is newer than the data in disk. If this journal record 'delete test60', it means the 'test60' still on the disk metadata. In this case, if we read the buffer from disk successfully and create file continue, the new journal record will overwrite the journal which record 'delete test60', then the entry corruptioned. So, use the buffer rather than read from disk if the buffer is marked with write_io_error. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix warning inside ext4_convert_unwritten_extents_endioRakesh Pandit2019-08-231-2/+2
| | | | | | | | | | | | Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing first argument. This was introduced in commit ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") and splitting extents inside endio would trigger it. Fixes: ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: set error return correctly when ext4_htree_store_dirent failsColin Ian King2019-08-121-1/+1
| | | | | | | | | | | | Currently when the call to ext4_htree_store_dirent fails the error return variable 'ret' is is not being set to the error code and variable count is instead, hence the error code is not being returned. Fix this by assigning ret to the error return code. Addresses-Coverity: ("Unused value") Fixes: 8af0f0822797 ("ext4: fix readdir error in the case of inline_data+dir_index") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: drop legacy pre-1970 encoding workaroundTheodore Ts'o2019-08-121-14/+1
| | | | | | | | | | | | | | | | Originally, support for expanded timestamps had a bug in that pre-1970 times were erroneously encoded as being in the the 24th century. This was fixed in commit a4dad1ae24f8 ("ext4: Fix handling of extended tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps were correctly encoded, but for backwards compatibility those incorrectly encoded timestamps were mapped back to the pre-1970 dates. Given that backwards compatibility workaround has been around for 4 years, and given that running e2fsck from e2fsprogs 1.43.2 and later will offer to fix these timestamps (which has been released for 3 years), it's past time to drop the legacy workaround from the kernel. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add new ioctl EXT4_IOC_GET_ES_CACHETheodore Ts'o2019-08-116-11/+182
| | | | | | | | For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fiemap-style interface to return this information. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add a new ioctl EXT4_IOC_GETSTATETheodore Ts'o2019-08-112-0/+28
| | | | | | | The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of an ext4 inode for debugging purposes. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHETheodore Ts'o2019-08-114-0/+40
| | | | | | | | The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent status cache to be cleared out. This is intended for use for debugging. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: flush_descriptor(): Do not decrease buffer head's ref countChandan Rajendra2019-08-111-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When executing generic/388 on a ppc64le machine, we notice the following call trace, VFS: brelse: Trying to free free buffer WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0 Call Trace: __brelse+0x80/0xc0 (unreliable) invalidate_bh_lru+0x78/0xc0 on_each_cpu_mask+0xa8/0x130 on_each_cpu_cond_mask+0x130/0x170 invalidate_bh_lrus+0x44/0x60 invalidate_bdev+0x38/0x70 ext4_put_super+0x294/0x560 generic_shutdown_super+0xb0/0x170 kill_block_super+0x38/0xb0 deactivate_locked_super+0xa4/0xf0 cleanup_mnt+0x164/0x1d0 task_work_run+0x110/0x160 do_notify_resume+0x414/0x460 ret_from_except_lite+0x70/0x74 The warning happens because flush_descriptor() drops bh reference it does not own. The bh reference acquired by jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and gets released when this list is processed. The reference for doing IO is only acquired in write_dirty_buffer() later in flush_descriptor(). Reported-by: Harish Sriram <harish@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove unnecessary error checkShi Siyuan2019-08-111-2/+0
| | | | | | | | | | Remove unnecessary error check in ext4_file_write_iter(), because this check will be done in upcoming later function -- ext4_write_checks() -> generic_write_checks() Change-Id: I7b0ab27f693a50765c15b5eaa3f4e7c38f42e01e Signed-off-by: shisiyuan <shisiyuan@xiaomi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix warning when turn on dioread_nolock and inline_datayangerkun2019-08-111-9/+9
| | | | | | | | | | | | | | | | | mkfs.ext4 -O inline_data /dev/vdb mount -o dioread_nolock /dev/vdb /mnt echo "some inline data..." >> /mnt/test-file echo "some inline data..." >> /mnt/test-file sync The above script will trigger "WARN_ON(!io_end->handle && sbi->s_journal)" because ext4_should_dioread_nolock() returns false for a file with inline data. Move the check to a place after we have already removed the inline data and prepared inode to write normal pages. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Linux 5.3-rc4v5.3-rc4Linus Torvalds2019-08-111-1/+1
|
* Merge tag 'dax-fixes-5.3-rc4' of ↵Linus Torvalds2019-08-112-1/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A filesystem-dax and device-dax fix for v5.3. The filesystem-dax fix is tagged for stable as the implementation has been mistakenly throwing away all cow pages on any truncate or hole punch operation as part of the solution to coordinate device-dma vs truncate to dax pages. The device-dax change fixes up a regression this cycle from the introduction of a common 'internal per-cpu-ref' implementation. Summary: - Fix dax_layout_busy_page() to not discard private cow pages of fs/dax private mappings. - Update the memremap_pages core to properly cleanup on behalf of internal reference-count users like device-dax" * tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm/memremap: Fix reuse of pgmap instances with internal references dax: dax_layout_busy_page() should not unmap cow pages
| * mm/memremap: Fix reuse of pgmap instances with internal referencesDan Williams2019-08-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, attempts to shutdown and re-enable a device-dax instance trigger: Missing reference count teardown definition WARNING: CPU: 37 PID: 1608 at mm/memremap.c:211 devm_memremap_pages+0x234/0x850 [..] RIP: 0010:devm_memremap_pages+0x234/0x850 [..] Call Trace: dev_dax_probe+0x66/0x190 [device_dax] really_probe+0xef/0x390 driver_probe_device+0xb4/0x100 device_driver_attach+0x4f/0x60 Given that the setup path initializes pgmap->ref, arrange for it to be also torn down so devm_memremap_pages() is ready to be called again and not be mistaken for the 3rd-party per-cpu-ref case. Fixes: 24917f6b1041 ("memremap: provide an optional internal refcount in struct dev_pagemap") Reported-by: Fan Du <fan.du@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/156530042781.2068700.8733813683117819799.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dax: dax_layout_busy_page() should not unmap cow pagesVivek Goyal2019-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vivek: "As of now dax_layout_busy_page() calls unmap_mapping_range() with last argument as 1, which says even unmap cow pages. I am wondering who needs to get rid of cow pages as well. I noticed one interesting side affect of this. I mount xfs with -o dax and mmaped a file with MAP_PRIVATE and wrote some data to a page which created cow page. Then I called fallocate() on that file to zero a page of file. fallocate() called dax_layout_busy_page() which unmapped cow pages as well and then I tried to read back the data I wrote and what I get is old data from persistent memory. I lost the data I had written. This read basically resulted in new fault and read back the data from persistent memory. This sounds wrong. Are there any users which need to unmap cow pages as well? If not, I am proposing changing it to not unmap cow pages. I noticed this while while writing virtio_fs code where when I tried to reclaim a memory range and that corrupted the executable and I was running from virtio-fs and program got segment violation." Dan: "In fact the unmap_mapping_range() in this path is only to synchronize against get_user_pages_fast() and force it to call back into the filesystem to re-establish the mapping. COW pages should be left untouched by dax_layout_busy_page()." Cc: <stable@vger.kernel.org> Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax mappings") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | Merge tag 'ntb-5.3-bugfixes' of git://github.com/jonmason/ntbLinus Torvalds2019-08-111-5/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | Pull NTB fix from Jon Mason: "Bug fix for NTB MSI kernel compile warning" * tag 'ntb-5.3-bugfixes' of git://github.com/jonmason/ntb: NTB/msi: remove incorrect MODULE defines
| * | NTB/msi: remove incorrect MODULE definesLogan Gunthorpe2019-08-051-5/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | msi.c is not a module on its own right and should not have the MODULE_[LICENSE|VERSION|AUTHOR|DESCRIPTION] definitions. This caused a regression noticed by lkp with the following back trace: WARNING: CPU: 0 PID: 1 at kernel/params.c:861 param_sysfs_init+0xb1/0x20a Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc1-00018-g26b3a37b928457 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:param_sysfs_init+0xb1/0x20a Code: 24 38 e8 ec 17 2e fd 49 8b 7c 24 38 e8 76 fe ff ff 48 85 c0 48 89 c5 74 25 31 d2 4c 89 e6 48 89 c7 e8 6d 6f 3c fd 85 c0 74 02 <0f> 0b 48 89 ef 31 f6 e8 5d 70 a7 fe 48 89 ef e8 95 52 a7 fe 48 83 RSP: 0000:ffff88806b0ffe30 EFLAGS: 00010282 RAX: 00000000ffffffef RBX: ffffffff83774220 RCX: ffff88806a85e880 RDX: 00000000ffffffef RSI: ffff88806b000400 RDI: ffff88806a8608c0 RBP: ffff88806b392000 R08: ffffed100d61ff59 R09: ffffed100d61ff59 R10: 0000000000000001 R11: ffffed100d61ff58 R12: ffffffff83974bc0 R13: 0000000000000004 R14: 0000000000000028 R15: 00000000000003b9 FS: 0000000000000000(0000) GS:ffff88806b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000380e000 CR4: 00000000000406b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? file_caps_disable+0x10/0x10 ? locate_module_kobject+0xf2/0xf2 do_one_initcall+0x47/0x1f0 kernel_init_freeable+0x1b1/0x243 ? rest_init+0xd0/0xd0 kernel_init+0xa/0x130 ? calculate_sigpending+0x63/0x80 ? rest_init+0xd0/0xd0 ret_from_fork+0x1f/0x30 ---[ end trace 78201497ae74cc91 ]--- Reported-by: kernel test robot <lkp@intel.com> Fixes: 26b3a37b9284 ("NTB: Introduce MSI library") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
* | Merge tag 'riscv/for-v5.3-rc4' of ↵Linus Torvalds2019-08-117-202/+24
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: "A few minor RISC-V updates for v5.3-rc4: - Remove __udivdi3() from the 32-bit Linux port, converting the only upstream user to use do_div(), per Linux policy - Convert the RISC-V standard clocksource away from per-cpu data structures, since only one is used by Linux, even on a multi-CPU system - A set of DT binding updates that remove an obsolete text binding in favor of a YAML binding, fix a bogus compatible string in the schema (thus fixing a "make dtbs_check" warning), and clarifies the future values expected in one of the RISC-V CPU properties" * tag 'riscv/for-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: dt-bindings: riscv: fix the schema compatible string for the HiFive Unleashed board dt-bindings: riscv: remove obsolete cpus.txt RISC-V: Remove udivdi3 riscv: delay: use do_div() instead of __udivdi3() dt-bindings: Update the riscv,isa string description RISC-V: Remove per cpu clocksource
| * | dt-bindings: riscv: fix the schema compatible string for the HiFive ↵Paul Walmsley2019-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unleashed board The YAML binding document for SiFive boards has an incorrect compatible string for the HiFive Unleashed board. Change it to match the name of the board on the SiFive web site: https://www.sifive.com/boards/hifive-unleashed which also matches the contents of the board DT data file: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts#n13 Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Acked-by: Rob Herring <robh@kernel.org>
| * | dt-bindings: riscv: remove obsolete cpus.txtPaul Walmsley2019-08-092-162/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the now-obsolete riscv/cpus.txt DT binding document, since we are using YAML binding documentation instead. While doing so, transfer the explanatory text about 'harts' (with some edits) into the YAML file, at Rob's request. Link: https://lore.kernel.org/linux-riscv/CAL_JsqJs6MtvmuyAknsUxQymbmoV=G+=JfS1PQj9kNHV7fjC9g@mail.gmail.com/ Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org>
| * | RISC-V: Remove udivdi3Palmer Dabbelt2019-08-092-34/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should never have landed in the first place: it was added as part of 64-bit divide support for 32-bit systems, but the kernel doesn't allow this sort of division. I must have forgotten to remove it. This patch removes the support. Since this routine only worked on 64-bit platforms but was only built on 32-bit platforms, it's essentially just nonsense anyway. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Link: https://lore.kernel.org/linux-riscv/nycvar.YSQ.7.76.1908061413360.19480@knanqh.ubzr/T/#t Reported-by: Eric Lin <tesheng@andestech.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
| * | riscv: delay: use do_div() instead of __udivdi3()Paul Walmsley2019-08-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for removing __udivdi3() from the RISC-V architecture-specific files, convert its one user to use do_div(). This avoids breaking the RV32 build after __udivdi3() is removed. This second version removes the assignment of the remainder to an unused temporary variable. Thanks to Nicolas Pitre <nico@fluxnic.net> for the suggestion. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Nicolas Pitre <nico@fluxnic.net>
| * | dt-bindings: Update the riscv,isa string descriptionAtish Patra2019-08-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the RISC-V specification states that ISA description strings are case-insensitive, there's no functional difference between mixed-case, upper-case, and lower-case ISA strings. Thus, to simplify parsing, specify that the letters present in "riscv,isa" must be all lowercase. Suggested-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
| * | RISC-V: Remove per cpu clocksourceAtish Patra2019-08-061-4/+2
| |/ | | | | | | | | | | | | | | There is only one clocksource in RISC-V. The boot cpu initializes that clocksource. No need to keep a percpu data structure. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-08-117-28/+48
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Don't reset the carefully adjusted build flags for the purgatory and remove the unwanted flags instead. The 'reset all' approach led to build fails under certain circumstances. - Unbreak CLANG build of the purgatory by avoiding the builtin memcpy/memset implementations. - Address missing prototype warnings by including the proper header - Fix yet more fall-through issues" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib/cpu: Address missing prototypes warning x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS x86/purgatory: Do not use __builtin_memcpy and __builtin_memset x86: mtrr: cyrix: Mark expected switch fall-through x86/ptrace: Mark expected switch fall-through
| * | x86/lib/cpu: Address missing prototypes warningValdis Klētnieks2019-08-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building with W=1, warnings about missing prototypes are emitted: CC arch/x86/lib/cpu.o arch/x86/lib/cpu.c:5:14: warning: no previous prototype for 'x86_family' [-Wmissing-prototypes] 5 | unsigned int x86_family(unsigned int sig) | ^~~~~~~~~~ arch/x86/lib/cpu.c:18:14: warning: no previous prototype for 'x86_model' [-Wmissing-prototypes] 18 | unsigned int x86_model(unsigned int sig) | ^~~~~~~~~ arch/x86/lib/cpu.c:33:14: warning: no previous prototype for 'x86_stepping' [-Wmissing-prototypes] 33 | unsigned int x86_stepping(unsigned int sig) | ^~~~~~~~~~~~ Add the proper include file so the prototypes are there. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/42513.1565234837@turing-police
| * | x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGSNick Desaulniers2019-08-081-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KBUILD_CFLAGS is very carefully built up in the top level Makefile, particularly when cross compiling or using different build tools. Resetting KBUILD_CFLAGS via := assignment is an antipattern. The comment above the reset mentions that -pg is problematic. Other Makefiles use `CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)` when CONFIG_FUNCTION_TRACER is set. Prefer that pattern to wiping out all of the important KBUILD_CFLAGS then manually having to re-add them. Seems also that __stack_chk_fail references are generated when using CONFIG_STACKPROTECTOR or CONFIG_STACKPROTECTOR_STRONG. Fixes: 8fc5b4d4121c ("purgatory: core purgatory functionality") Reported-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190807221539.94583-2-ndesaulniers@google.com
| * | x86/purgatory: Do not use __builtin_memcpy and __builtin_memsetNick Desaulniers2019-08-084-23/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementing memcpy and memset in terms of __builtin_memcpy and __builtin_memset is problematic. GCC at -O2 will replace calls to the builtins with calls to memcpy and memset (but will generate an inline implementation at -Os). Clang will replace the builtins with these calls regardless of optimization level. $ llvm-objdump -dr arch/x86/purgatory/string.o | tail 0000000000000339 memcpy: 339: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax 000000000000033b: R_X86_64_64 memcpy 343: ff e0 jmpq *%rax 0000000000000345 memset: 345: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax 0000000000000347: R_X86_64_64 memset 34f: ff e0 Such code results in infinite recursion at runtime. This is observed when doing kexec. Instead, reuse an implementation from arch/x86/boot/compressed/string.c. This requires to implement a stub function for warn(). Also, Clang may lower memcmp's that compare against 0 to bcmp's, so add a small definition, too. See also: commit 5f074f3e192f ("lib/string.c: implement a basic bcmp") Fixes: 8fc5b4d4121c ("purgatory: core purgatory functionality") Reported-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Debugged-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Debugged-by: Manoj Gupta <manojgupta@google.com> Suggested-by: Alistair Delva <adelva@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com> Cc: stable@vger.kernel.org Link: https://bugs.chromium.org/p/chromium/issues/detail?id=984056 Link: https://lkml.kernel.org/r/20190807221539.94583-1-ndesaulniers@google.com
| * | x86: mtrr: cyrix: Mark expected switch fall-throughGustavo A. R. Silva2019-08-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark switch cases where we are expecting to fall through. Fix the following warning (Building: i386_defconfig i386): arch/x86/kernel/cpu/mtrr/cyrix.c:99:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190805201712.GA19927@embeddedor
| * | x86/ptrace: Mark expected switch fall-throughGustavo A. R. Silva2019-08-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark switch cases where we are expecting to fall through. Fix the following warning (Building: allnoconfig i386): arch/x86/kernel/ptrace.c:202:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (unlikely(value == 0)) ^ arch/x86/kernel/ptrace.c:206:2: note: here default: ^~~~~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190805195654.GA17831@embeddedor
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2019-08-1114-16/+69
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes from Thomas Gleixner: "Perf tooling fixes all over the place: - Fix the selection of the main thread COMM in db-export - Fix the disassemmbly display for BPF in annotate - Fix cpumap mask setup in perf ftrace when only one CPU is present - Add the missing 'cpu_clk_unhalted.core' event - Fix CPU 0 bindings in NUMA benchmarks - Fix the module size calculations for s390 - Handle the gap between kernel end and module start on s390 correctly - Build and typo fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf pmu-events: Fix missing "cpu_clk_unhalted.core" event perf annotate: Fix s390 gap between kernel end and module start perf record: Fix module size on s390 perf tools: Fix include paths in ui directory perf tools: Fix a typo in a variable name in the Documentation Makefile perf cpumap: Fix writing to illegal memory in handling cpumap mask perf ftrace: Fix failure to set cpumask when only one cpu is present perf db-export: Fix thread__exec_comm() perf annotate: Fix printing of unaugmented disassembled instructions from BPF perf bench numa: Fix cpu0 binding
| * \ \ Merge tag 'perf-urgent-for-mingo-5.3-20190808' of ↵Thomas Gleixner2019-08-0814-16/+69
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent perf/urgent fixes: db-export: Adrian Hunter: - Fix thread__exec_comm() picking of main thread COMM for pre-existing, synthesized from /proc data records. annotate: Arnaldo Carvalho de Melo: - Fix printing of unaugmented disassembled instructions from BPF, some lines were leaving leftovers from the previous screen, due to use of newlines by binutils's libopcode disassembler. perf ftrace: He Zhe: - Fix cpumask problems when only one CPU is present. PMU events: Jin Yao: - Add missing "cpu_clk_unhalted.core" event. perf bench: Jiri Olsa: - Fix cpu0 binding in the NUMA benchmarks. s390: Thomas Richter: - Fix module size calculations. build system: Masanari Iida: - Fix a typo in a variable name in the Documentation Makefile misc: Ian Rogers: - Fix include paths in ui directory. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf pmu-events: Fix missing "cpu_clk_unhalted.core" eventJin Yao2019-08-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The events defined in pmu-events JSON are parsed and added into perf tool. For fixed counters, we handle the encodings between JSON and perf by using a static array fixed[]. But the fixed[] has missed an important event "cpu_clk_unhalted.core". For example, on the Tremont platform, [root@localhost ~]# perf stat -e cpu_clk_unhalted.core -a event syntax error: 'cpu_clk_unhalted.core' \___ parser error With this patch, the event cpu_clk_unhalted.core can be parsed. [root@localhost perf]# ./perf stat -e cpu_clk_unhalted.core -a -vvv ------------------------------------------------------------ perf_event_attr: type 4 size 112 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ ... Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190729072755.2166-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf annotate: Fix s390 gap between kernel end and module startThomas Richter2019-08-083-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During execution of command 'perf top' the error message: Not enough memory for annotating '__irf_end' symbol!) is emitted from this call sequence: __cmd_top perf_top__mmap_read perf_top__mmap_read_idx perf_event__process_sample hist_entry_iter__add hist_iter__top_callback perf_top__record_precise_ip hist_entry__inc_addr_samples symbol__inc_addr_samples symbol__get_annotation symbol__alloc_hist In this function the size of symbol __irf_end is calculated. The size of a symbol is the difference between its start and end address. When the symbol was read the first time, its start and end was set to: symbol__new: __irf_end 0xe954d0-0xe954d0 which is correct and maps with /proc/kallsyms: root@s8360046:~/linux-4.15.0/tools/perf# fgrep _irf_end /proc/kallsyms 0000000000e954d0 t __irf_end root@s8360046:~/linux-4.15.0/tools/perf# In function symbol__alloc_hist() the end of symbol __irf_end is symbol__alloc_hist sym:__irf_end start:0xe954d0 end:0x3ff80045a8 which is identical with the first module entry in /proc/kallsyms This results in a symbol size of __irf_req for histogram analyses of 70334140059072 bytes and a malloc() for this requested size fails. The root cause of this is function __dso__load_kallsyms() +-> symbols__fixup_end() Function symbols__fixup_end() enlarges the last symbol in the kallsyms map: # fgrep __irf_end /proc/kallsyms 0000000000e954d0 t __irf_end # to the start address of the first module: # cat /proc/kallsyms | sort | egrep ' [tT] ' .... 0000000000e952d0 T __security_initcall_end 0000000000e954d0 T __initramfs_size 0000000000e954d0 t __irf_end 000003ff800045a8 T fc_get_event_number [scsi_transport_fc] 000003ff800045d0 t store_fc_vport_disable [scsi_transport_fc] 000003ff800046a8 T scsi_is_fc_rport [scsi_transport_fc] 000003ff800046d0 t fc_target_setup [scsi_transport_fc] On s390 the kernel is located around memory address 0x200, 0x10000 or 0x100000, depending on linux version. Modules however start some- where around 0x3ff xxxx xxxx. This is different than x86 and produces a large gap for which histogram allocation fails. Fix this by detecting the kernel's last symbol and do no adjustment for it. Introduce a weak function and handle s390 specifics. Reported-by: Klaus Theurich <klaus.theurich@de.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190724122703.3996-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf record: Fix module size on s390Thomas Richter2019-08-083-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390 the modules loaded in memory have the text segment located after the GOT and Relocation table. This can be seen with this output: [root@m35lp76 perf]# fgrep qeth /proc/modules qeth 151552 1 qeth_l2, Live 0x000003ff800b2000 ... [root@m35lp76 perf]# cat /sys/module/qeth/sections/.text 0x000003ff800b3990 [root@m35lp76 perf]# There is an offset of 0x1990 bytes. The size of the qeth module is 151552 bytes (0x25000 in hex). The location of the GOT/relocation table at the beginning of a module is unique to s390. commit 203d8a4aa6ed ("perf s390: Fix 'start' address of module's map") adjusts the start address of a module in the map structures, but does not adjust the size of the modules. This leads to overlapping of module maps as this example shows: [root@m35lp76 perf] # ./perf report -D 0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x25000) @ 0]: x /lib/modules/.../qeth.ko.xz 0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x8000) @ 0]: x /lib/modules/.../ip6_tables.ko.xz The module qeth.ko has an adjusted start address modified to b3990, but its size is unchanged and the module ends at 0x3ff800d8990. This end address overlaps with the next modules start address of 0x3ff800d85a0. When the size of the leading GOT/Relocation table stored in the beginning of the text segment (0x1990 bytes) is subtracted from module qeth end address, there are no overlaps anymore: 0x3ff800d8990 - 0x1990 = 0x0x3ff800d7000 which is the same as 0x3ff800b2000 + 0x25000 = 0x0x3ff800d7000. To fix this issue, also adjust the modules size in function arch__fix_module_text_start(). Add another function parameter named size and reduce the size of the module when the text segment start address is changed. Output after: 0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x23670) @ 0]: x /lib/modules/.../qeth.ko.xz 0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x7a60) @ 0]: x /lib/modules/.../ip6_tables.ko.xz Reported-by: Stefan Liebler <stli@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org Fixes: 203d8a4aa6ed ("perf s390: Fix 'start' address of module's map") Link: http://lkml.kernel.org/r/20190724122703.3996-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf tools: Fix include paths in ui directoryIan Rogers2019-08-082-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These paths point to the wrong location but still work because they get picked up by a -I flag that happens to direct to the correct file. Fix paths to point to the correct location without -I flags. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190731225441.233800-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf tools: Fix a typo in a variable name in the Documentation MakefileMasanari Iida2019-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a spelling typo in a variable name in the Documentation Makefile. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190801032812.25018-1-standby24x7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf cpumap: Fix writing to illegal memory in handling cpumap maskHe Zhe2019-08-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_map__snprint_mask() would write to illegal memory pointed by zalloc(0) when there is only one cpu. This patch fixes the calculation and adds sanity check against the input parameters. Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: 4400ac8a9a90 ("perf cpumap: Introduce cpu_map__snprint_mask()") Link: http://lkml.kernel.org/r/1564734592-15624-2-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf ftrace: Fix failure to set cpumask when only one cpu is presentHe Zhe2019-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer containing the string used to set cpumask is overwritten at the end of the string later in cpu_map__snprint_mask due to not enough memory space, when there is only one cpu. And thus causes the following failure: $ perf ftrace ls failed to reset ftrace $ This patch fixes the calculation of the cpumask string size. Signed-off-by: He Zhe <zhe.he@windriver.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: dc23103278c5 ("perf ftrace: Add support for -a and -C option") Link: http://lkml.kernel.org/r/1564734592-15624-1-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf db-export: Fix thread__exec_comm()Adrian Hunter2019-08-081-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Threads synthesized from /proc have comms with a start time of zero, and not marked as "exec". Currently, there can be 2 such comms. The first is created by processing a synthesized fork event and is set to the parent's comm string, and the second by processing a synthesized comm event set to the thread's current comm string. In the absence of an "exec" comm, thread__exec_comm() picks the last (oldest) comm, which, in the case above, is the parent's comm string. For a main thread, that is very probably wrong. Use the second-to-last in that case. This affects only db-export because it is the only user of thread__exec_comm(). Example: $ sudo perf record -a -o pt-a-sleep-1 -e intel_pt//u -- sleep 1 $ sudo chown ahunter pt-a-sleep-1 Before: $ perf script -i pt-a-sleep-1 --itrace=bep -s tools/perf/scripts/python/export-to-sqlite.py pt-a-sleep-1.db branches calls $ sqlite3 -header -column pt-a-sleep-1.db 'select * from comm_threads_view' comm_id command thread_id pid tid ---------- ---------- ---------- ---------- ---------- 1 swapper 1 0 0 2 rcu_sched 2 10 10 3 kthreadd 3 78 78 5 sudo 4 15180 15180 5 sudo 5 15180 15182 7 kworker/4: 6 10335 10335 8 kthreadd 7 55 55 10 systemd 8 865 865 10 systemd 9 865 875 13 perf 10 15181 15181 15 sleep 10 15181 15181 16 kworker/3: 11 14179 14179 17 kthreadd 12 29376 29376 19 systemd 13 746 746 21 systemd 14 401 401 23 systemd 15 879 879 23 systemd 16 879 945 25 kthreadd 17 556 556 27 kworker/u1 18 14136 14136 28 kworker/u1 19 15021 15021 29 kthreadd 20 509 509 31 systemd 21 836 836 31 systemd 22 836 967 33 systemd 23 1148 1148 33 systemd 24 1148 1163 35 kworker/2: 25 17988 17988 36 kworker/0: 26 13478 13478 After: $ perf script -i pt-a-sleep-1 --itrace=bep -s tools/perf/scripts/python/export-to-sqlite.py pt-a-sleep-1b.db branches calls $ sqlite3 -header -column pt-a-sleep-1b.db 'select * from comm_threads_view' comm_id command thread_id pid tid ---------- ---------- ---------- ---------- ---------- 1 swapper 1 0 0 2 rcu_sched 2 10 10 3 kswapd0 3 78 78 4 perf 4 15180 15180 4 perf 5 15180 15182 6 kworker/4: 6 10335 10335 7 kcompactd0 7 55 55 8 accounts-d 8 865 865 8 accounts-d 9 865 875 10 perf 10 15181 15181 12 sleep 10 15181 15181 13 kworker/3: 11 14179 14179 14 kworker/1: 12 29376 29376 15 haveged 13 746 746 16 systemd-jo 14 401 401 17 NetworkMan 15 879 879 17 NetworkMan 16 879 945 19 irq/131-iw 17 556 556 20 kworker/u1 18 14136 14136 21 kworker/u1 19 15021 15021 22 kworker/u1 20 509 509 23 thermald 21 836 836 23 thermald 22 836 967 25 unity-sett 23 1148 1148 25 unity-sett 24 1148 1163 27 kworker/2: 25 17988 17988 28 kworker/0: 26 13478 13478 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: 65de51f93ebf ("perf tools: Identify which comms are from exec") Link: http://lkml.kernel.org/r/20190808064823.14846-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf annotate: Fix printing of unaugmented disassembled instructions from BPFArnaldo Carvalho de Melo2019-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to disassemble BPF programs uses binutil's disassembling routines, and those use in turn fprintf to print to a memstream FILE, adding a newline at the end of each line, which ends up confusing the TUI routines called from: annotate_browser__write() annotate_line__write() annotate_browser__printf() ui_browser__vprintf() SLsmg_vprintf() The SLsmg_vprintf() function in the slang library gets confused with the terminating newline, so make the disasm_line__parse() function that parses the lines produced by the BPF specific disassembler (that uses binutil's libopcodes) and the lines produced by the objdump based disassembler used for everything else (and that doesn't adds this terminating newline) trim the end of the line in addition of the beginning. This way when disasm_line->ops.raw, i.e. for instructions without a special scnprintf() method, we'll not have that \n getting in the way of filling the screen right after the instruction with spaces to avoid leaving what was on the screen before and thus garbling the annotation screen, breaking scrolling, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") Link: https://lkml.kernel.org/n/tip-unbr5a5efakobfr6rhxq99ta@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf bench numa: Fix cpu0 bindingJiri Olsa2019-08-011-2/+4
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael reported an issue with perf bench numa failing with binding to cpu0 with '-0' option. # perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd" binding to node 0, mask: 0000000000000001 => -1 perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed. Aborted (core dumped) This happens when the cpu0 is not part of node0, which is the benchmark assumption and we can see that's not the case for some powerpc servers. Using correct node for cpu0 binding. Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2019-08-112-10/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Three fixlets for the scheduler: - Avoid double bandwidth accounting in the push & pull code - Use a sane FIFO priority for the Pressure Stall Information (PSI) thread. - Avoid permission checks when setting the scheduler params for the PSI thread" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/psi: Do not require setsched permission from the trigger creator sched/psi: Reduce psimon FIFO priority sched/deadline: Fix double accounting of rq/running bw in push & pull
| * | | | sched/psi: Do not require setsched permission from the trigger creatorSuren Baghdasaryan2019-08-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a process creates a new trigger by writing into /proc/pressure/* files, permissions to write such a file should be used to determine whether the process is allowed to do so or not. Current implementation would also require such a process to have setsched capability. Setting of psi trigger thread's scheduling policy is an implementation detail and should not be exposed to the user level. Remove the permission check by using _nocheck version of the function. Suggested-by: Nick Kralevich <nnk@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: lizefan@huawei.com Cc: mingo@redhat.com Cc: akpm@linux-foundation.org Cc: kernel-team@android.com Cc: dennisszhou@gmail.com Cc: dennis@kernel.org Cc: hannes@cmpxchg.org Cc: axboe@kernel.dk Link: https://lkml.kernel.org/r/20190730013310.162367-1-surenb@google.com
| * | | | sched/psi: Reduce psimon FIFO priorityPeter Zijlstra2019-08-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PSI defaults to a FIFO-99 thread, reduce this to FIFO-1. FIFO-99 is the very highest priority available to SCHED_FIFO and it not a suitable default; it would indicate the psi work is the most important work on the machine. Since Real-Time tasks will have pre-allocated memory and locked it in place, Real-Time tasks do not care about PSI. All it needs is to be above OTHER. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de>
| * | | | sched/deadline: Fix double accounting of rq/running bw in push & pullDietmar Eggemann2019-08-061-8/+0
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | {push,pull}_dl_task() always calls {de,}activate_task() with .flags=0 which sets p->on_rq=TASK_ON_RQ_MIGRATING. {push,pull}_dl_task()->{de,}activate_task()->{de,en}queue_task()-> {de,en}queue_task_dl() calls {sub,add}_{running,rq}_bw() since p->on_rq==TASK_ON_RQ_MIGRATING. So {sub,add}_{running,rq}_bw() in {push,pull}_dl_task() is double-accounting for that task. Fix it by removing rq/running bw accounting in [push/pull]_dl_task(). Fixes: 7dd778841164 ("sched/core: Unify p->on_rq updates") Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Qais Yousef <qais.yousef@arm.com> Link: https://lkml.kernel.org/r/20190802145945.18702-2-dietmar.eggemann@arm.com
* | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2019-08-111-4/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A small fix for the affinity spreading code. It failed to handle situations where a single vector was requested either due to only one CPU being available or vector exhaustion causing only a single interrupt to be granted. The fix is to simply remove the requirement in the affinity spreading code for more than one interrupt being available" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Create affinity mask for single vector
| * | | | genirq/affinity: Create affinity mask for single vectorMing Lei2019-08-081-4/+2
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit c66d4bd110a1f8 ("genirq/affinity: Add new callback for (re)calculating interrupt sets"), irq_create_affinity_masks() returns NULL in case of single vector. This change has caused regression on some drivers, such as lpfc. The problem is that single vector requests can happen in some generic cases: 1) kdump kernel 2) irq vectors resource is close to exhaustion. If in that situation the affinity mask for a single vector is not created, every caller has to handle the special case. There is no reason why the mask cannot be created, so remove the check for a single vector and create the mask. Fixes: c66d4bd110a1f8 ("genirq/affinity: Add new callback for (re)calculating interrupt sets") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190805011906.5020-1-ming.lei@redhat.com
* | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2019-08-111-11/+9
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool warning fix from Thomas Gleixner: "The recent objtool fixes/enhancements unearthed a unbalanced CLAC in the i915 driver. Chris asked me to pick the fix up and route it through" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drm/i915: Remove redundant user_access_end() from __copy_from_user() error path
| * | | | drm/i915: Remove redundant user_access_end() from __copy_from_user() error pathJosh Poimboeuf2019-08-091-11/+9
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Objtool reports: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x36: redundant UACCESS disable __copy_from_user() already does both STAC and CLAC, so the user_access_end() in its error path adds an extra unnecessary CLAC. Fixes: 0b2c8f8b6b0c ("i915: fix missing user_access_end() in page fault exception case") Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://github.com/ClangBuiltLinux/linux/issues/617 Link: https://lkml.kernel.org/r/51a4155c5bc2ca847a9cbe85c1c11918bb193141.1564086017.git.jpoimboe@redhat.com