summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tools/power turbostat: extend --add option to accept /sys pathLen Brown2017-03-011-23/+69
| | | | | | | | | | | Previously, the --add option could specify only an MSR. Here is is extended so an arbitrary /sys attribute, as specified by an absolute file path name. sudo ./turbostat --add /sys/devices/system/cpu/cpu0/cpuidle/state5/usage Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: skip unused counters on BDXLen Brown2017-03-011-0/+17
| | | | | | | Skip these two counters on BDX, as they are always zero: cc7, pc7 Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limitsLen Brown2017-03-011-23/+67
| | | | | | | | | | | | | | | | | | | | | | | | Newer processors do not hard-code the the number of cpus in each bin to {1, 2, 3, 4, 5, 6, 7, 8} Rather, they can specify any number of CPUS in each of the 8 bins: eg. ... 37 * 100.0 = 3600.0 MHz max turbo 4 active cores 38 * 100.0 = 3700.0 MHz max turbo 3 active cores 39 * 100.0 = 3800.0 MHz max turbo 2 active cores 39 * 100.0 = 3900.0 MHz max turbo 1 active cores could now look something like this: ... 37 * 100.0 = 3600.0 MHz max turbo 16 active cores 38 * 100.0 = 3700.0 MHz max turbo 8 active cores 39 * 100.0 = 3800.0 MHz max turbo 4 active cores 39 * 100.0 = 3900.0 MHz max turbo 2 active cores Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: skip unused counters on SKXLen Brown2017-03-011-0/+18
| | | | | | | | Skip these four counters on SKX, as they are always zero: cc3, pc3 cc7, pc7 Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7Len Brown2017-03-011-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CC1 column in tubostat can be computed by subtracting the core c-state residency countes from the total Cx residency. CC1 = (Idle_time_as_measured by MPERF) - (all core C-states with residency counters) However, as the underlying counter reads are not atomic, error can be noticed in this calculations, especially when the numbers are small. Denverton has a hardware CC1 residency counter to improve the accuracy of the cc1 statistic -- use it. At the same time, Denverton has no concept of CC3, PC3, CC7, PC7, so skip collecting and printing those columns. Finally, a note of clarification. Turbostat prints the standard PC2 residency counter, but on Denverton hardware, that actually means PC1E. Turbostat prints the standard PC6 residency counter, but on Denverton hardware, that actually means PC2. At this point, we document that differnce in this commit message, rather than adding a quirk to the software. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: initial Gemini Lake SOC supportLen Brown2017-03-011-0/+5
| | | | | | Gemini Lake is similar to Apollo Lake (Broxton/Goldmont) Signed-off-by: Len Brown <len.brown@intel.com>
* x86: intel-family.h: Add GEMINI_LAKE SOCLen Brown2017-03-011-0/+1
| | | | | Cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: bug fixes to --add, --show/--hide featuresLen Brown2017-03-011-61/+77
| | | | | | | | | | | | | Fix a bug with --add, where the title of the column is un-initialized if not specified by the user. The initial implementation of --show and --hide neglected to handle the pc8/pc9/pc10 counters. Fix a bug where "--show Core" only worked with --debug Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: use tsc_tweak everwhere it is neededLen Brown2017-03-011-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | The CPU ticks at a rate in the "bus clock" domain. eg. 100 MHz * bus_ratio. On newer processors, the TSC has been moved out of this BCLK domain and into a separate crystal-clock domain. While the TSC ticks "close to" the base frequency, those that look closely at the numbers will notice small errors in calculations that mix units of TSC clocks and bus clocks. "tsc_tweak" was introduced to address the most visible mixing -- the %Busy and the the Busy_MHz calculations. (A simplification as since removed TSC from the BusyMHz calculation) Here we apply the tsc_tweak to everyplace where BCLK and TSC units are mixed. The results is that on a system which is 100% idle, the sum of the C-states are now much more likely to be closer to 100%. Reported-by: Travis Downs <travis.downs@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: print system config, unless --quietLen Brown2017-03-012-60/+52
| | | | | | | | | | | | | | | | | | Some users want turbostat to tell them everything, by default. Some users want turbostat to be quiet, by default. I find that I'm in the 1st camp, and so I've never liked needing to type the --debug parameter to decode the system configuration. So here we change the default and print the system configuration, by default. (The --debug option is now un-documented, though it does still exist for debugging turbostat internals) When you do not want to see the system configuration header, use the new "--quiet" option. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: show all columns, independent of --debugLen Brown2017-03-011-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | Some time ago, turbostat overflowed 80 columns. So on the assumption that a "casual" user would always want topology and frequency columns, we hid the rest of the columns and the system configuration decoding behind the --debug option. Not everybody liked that change -- including me. I use --debug 99% of the time... Well, now we have "-o file" to put turbostat output into a file, so unless you are watching real-time in a small window, column count is less frequently a factor. And more recently, we got the "--hide columnA,columnB" option to specify columns to skip. So now we "un-hide" the rest of the columns from behind --debug, and show them all, by default. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: decode MSR_MISC_FEATURE_CONTROLLen Brown2017-03-011-0/+24
| | | | | | | | useful for observing if the BIOS disabled prefetch Not architectural, but docuemented as present on NHM, SNB and is present on others. Signed-off-by: Len Brown <len.brown@intel.com>
* x86 msr_index.h: Define MSR_MISC_FEATURE_CONTROLLen Brown2017-03-011-0/+1
| | | | | | | | | | | | This non-architectural MSR has disable bits for various prefetchers on modern processors. While these bits are generally touched only by the BIOS, say, via BIOS SETUP, it is useful to dump them when examining options that can alter performance. Cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: decode CPUID(6).TURBOLen Brown2017-03-011-1/+4
| | | | | | | | | | show the CPUID feature for turbo to clarify the case when it may not be shown in MISC_ENABLE CPUID(6): APERF, TURBO, DTS, PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB cpu4: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO) Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: dump Atom P-states correctlyLen Brown2017-03-011-21/+82
| | | | | | | | Turbostat dumps MSR_TURBO_RATIO_LIMIT on Core Architecture. But Atom Architecture uses MSR_ATOM_CORE_RATIOS and MSR_ATOM_CORE_TURBO_RATIOS. Signed-off-by: Len Brown <len.brown@intel.com>
* intel_pstate: use MSR_ATOM_RATIOS definitions from msr-index.hLen Brown2017-03-011-10/+5
| | | | | | | Originally, these MSRs were locally defined in this driver. Now the definitions are in msr-index.h -- use them. Signed-off-by: Len Brown <len.brown@intel.com>
* x86 msr-index.h: Define Atom specific core ratio MSR locationsLen Brown2017-03-011-0/+6
| | | | | | | | These MSRs are currently used by the intel_pstate driver, using a local definition. Cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: further decode MSR_IA32_MISC_ENABLELen Brown2017-03-011-4/+6
| | | | | | | | | | | | | | Decode MISC_ENABLE.NO_TURBO, also use the #defines in msr-index.h for decoding this register cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO) Although it is not architectural, decode also MSR_IA32_MISC_ENABLE.prefetch-disable (bit-9). documented to be present on: Core, P4, Intel-Xeon reserved on: Atom, Silvermont, Nehalem, SNB, PHI ec. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: add precision to --debug frequency outputLen Brown2017-03-011-21/+21
| | | | | | | | | | | | | | | Add a digit of precision to the --debug output for frequency range. This is useful when BCLK is not an integer. old: 6 * 83 = 500 MHz max efficiency frequency 26 * 83 = 2166 MHz base frequency new: 6 * 83.3 = 499.8 MHz max efficiency frequency 26 * 83.3 = 2165.8 MHz base frequency Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: Baytrail c-state supportLen Brown2017-03-012-9/+39
| | | | | | | | | | | | The Baytrail SOC, with its Silvermont core, has some unique properties: 1. a hardware CC1 residency counter 2. a module-c6 residency counter 3. a package-c6 counter at traditional package-c7 counter address. The SOC does not support c3, pc3, c7 or pc7 counters. Signed-off-by: Len Brown <len.brown@intel.com>
* x86: msr-index.h: Remove unused MSR_NHM_SNB_PKG_CST_CFG_CTLLen Brown2017-03-011-1/+0
| | | | | | | | The two users, intel_idle driver and turbostat utility are using the new name, MSR_PKG_CST_CONFIG_CONTROL Cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: use new name for MSR_PKG_CST_CONFIG_CONTROLLen Brown2017-03-011-5/+5
| | | | | | Previously called MSR_NHM_SNB_PKG_CST_CFG_CTL Signed-off-by: Len Brown <len.brown@intel.com>
* intel_idle: use new name for MSR_PKG_CST_CONFIG_CONTROLLen Brown2017-03-011-3/+3
| | | | | | previously known as MSR_NHM_SNB_PKG_CST_CFG_CTL Signed-off-by: Len Brown <len.brown@intel.com>
* x86: msr-index.h: Define MSR_PKG_CST_CONFIG_CONTROLLen Brown2017-03-011-0/+1
| | | | | | | | | | | define MSR_PKG_CST_CONFIG_CONTROL (0xE2), which is the string used by Intel Documentation. We use this MSR in intel_idle and turbostat by a previous name, to be updated in the next patch. Cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: update MSR_PKG_CST_CONFIG_CONTROL decodingLen Brown2017-02-251-1/+1
| | | | | | AMT value 0 is unlimited, not PC0 Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: Baytrail: remove debug line in quiet modeLen Brown2017-02-251-1/+2
| | | | | | | | Without --debug, a debug line was printed on Baytrail: SLM BCLK: 83.3 Mhz Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: decode Baytrail CC6 and MC6 demotion configurationLen Brown2017-02-251-0/+42
| | | | | | | | | | | | with --debug, see: cpu0: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion) cpu0: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion) Note that the hardware default is to enable demotion, and Linux started clearing these registers in 3.17. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: BYT does not have MSR_MISC_PWR_MGMTLen Brown2017-02-251-2/+10
| | | | | | | | | | | and so --debug fails with: turbostat: msr 1 offset 0x1aa read failed: Input/output error It seems that baytrail, and airmont do not have this MSR. It is included in subsequent Goldmont Atom. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: Add --show and --hide parametersLen Brown2017-02-252-120/+347
| | | | | | | | | | | | | | | | | Add the "--show" and "--hide" cmdline parameters. By default, turbostat shows all columns. turbostat --hide counter_list will continue showing all columns, except for those listed. turbostat --show counter_list will show _only_ the listed columns These features work for built-in counters, and have no effect on columns added with the --add parameter. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: fix bugs in --add optionLen Brown2017-02-251-36/+52
| | | | | | | | | | | | When --add was used more than once, overflowed buffers caused some counters to be stored on top of others, corrupting the results. Simplify the code by simply reserving space for up to 16 added counters per each cpu, core, package. Per-cpu added counters were being printed only per-core. Signed-off-by: Len Brown <len.brown@intel.com>
* Linux 4.10v4.10Linus Torvalds2017-02-191-1/+1
|
* Fix missing sanity check in /dev/sgAl Viro2017-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | What happens is that a write to /dev/sg is given a request with non-zero ->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing to an array full of empty iovecs. Having write permission to /dev/sg shouldn't be equivalent to the ability to trigger BUG_ON() while holding spinlocks... Found by Dmitry Vyukov and syzkaller. [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the underlying issue. - Linus ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* scsi: don't BUG_ON() empty DMA transfersJohannes Thumshirn2017-02-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Don't crash the machine just because of an empty transfer. Use WARN_ON() combined with returning an error. Found by Dmitry Vyukov and syzkaller. [ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON() might still be a cause of excessive log spamming. NOTE! If this warning ever triggers, we may end up leaking resources, since this doesn't bother to try to clean the command up. So this WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is much worse. People really need to stop using BUG_ON() for "this shouldn't ever happen". It makes pretty much any bug worse. - Linus ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipv6: release dst on error in ip6_dst_lookup_tailWillem de Bruijn2017-02-191-2/+4
| | | | | | | | | | If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped check, release the dst before returning an error. Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'fixes-for-linus' of ↵Linus Torvalds2017-02-192-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Two more bugfixes that came in during this week: - a defconfig change to enable a vital driver used on some Qualcomm based phones. This was already queued for 4.11, but the maintainer asked to have it in 4.10 after all. - a regression fix for the reset controller framework, this got broken by a typo in the 4.10 merge window" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: multi_v7_defconfig: enable Qualcomm RPMCC reset: fix shared reset triggered_count decrement on error
| * ARM: multi_v7_defconfig: enable Qualcomm RPMCCAndy Gross2017-02-181-0/+1
| | | | | | | | | | | | | | | | | | | | This patch enables the Qualcomm RPM based Clock Controller present on A-family boards. Signed-off-by: Andy Gross <andy.gross@linaro.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| * Merge tag 'reset-for-4.10-fixes' of https://git.pengutronix.de/git/pza/linux ↵Arnd Bergmann2017-02-171-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into fixes Pull "Reset controller fixes for v4.10" from Philipp Zabel: - Remove erroneous negation of the error check of the reset function to decrement trigger_count in the error case, not on success. This fixes shared resets to actually only trigger once, as intended. * tag 'reset-for-4.10-fixes' of https://git.pengutronix.de/git/pza/linux: reset: fix shared reset triggered_count decrement on error
| | * reset: fix shared reset triggered_count decrement on errorJerome Brunet2017-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a shared reset, when the reset is successful, the triggered_count is incremented when trying to call the reset callback, so that another device sharing the same reset line won't trigger it again. If the reset has not been triggered successfully, the trigger_count should be decremented. The code does the opposite, and decrements the trigger_count on success. As a consequence, another device sharing the reset will be able to trigger it again. Fixed be removing negation in from of the error code of the reset function. Fixes: 7da33a37b48f ("reset: allow using reset_control_reset with shared reset") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
* | | Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2017-02-192-13/+33
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM fixes from Russell King: "A couple of fixes from Kees concerning problems he spotted with our user access support" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user() ARM: 8657/1: uaccess: consistently check object sizes
| * | | ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()Kees Cook2017-02-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 64-bit get_user() wasn't clearing the high word due to a typo in the error handler. The exception handler entry was already correct, though. Noticed during recent usercopy test additions in lib/test_user_copy.c. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
| * | | ARM: 8657/1: uaccess: consistently check object sizesKees Cook2017-02-161-12/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 76624175dcae ("arm64: uaccess: consistently check object sizes"), the object size checks are moved outside the access_ok() so that bad destinations are detected before hitting the "memset(dest, 0, size)" in the copy_from_user() failure path. This makes the same change for arm, with attention given to possibly extracting the uaccess routines into a common header file for all architectures in the future. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
* | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-02-191-2/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "Make the build clean by working around yet another GCC stupidity" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vm86: Fix unused variable warning if THP is disabled
| * | | | x86/vm86: Fix unused variable warning if THP is disabledKirill A. Shutemov2017-02-131-2/+3
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC complains about unused variable 'vma' in mark_screen_rdonly() if THP is disabled: arch/x86/kernel/vm86_32.c: In function ‘mark_screen_rdonly’: arch/x86/kernel/vm86_32.c:180:26: warning: unused variable ‘vma’ [-Wunused-variable] struct vm_area_struct *vma = find_vma(mm, 0xA0000); That's silly. pmd_trans_huge() resolves to 0 when THP is disabled, so the whole block should be eliminated. Moving the variable declaration outside the if() block shuts GCC up. Reported-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Carlos O'Donell <carlos@redhat.com> Link: http://lkml.kernel.org/r/20170213125228.63645-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2017-02-191-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "Move the futex init function to core initcall so user mode helper does not run into an uninitialized futex syscall" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Move futex_init() to core_initcall
| * | | | futex: Move futex_init() to core_initcallYang Yang2017-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UEVENT user mode helper is enabled before the initcalls are executed and is available when the root filesystem has been mounted. The user mode helper is triggered by device init calls and the executable might use the futex syscall. futex_init() is marked __initcall which maps to device_initcall, but there is no guarantee that futex_init() is invoked _before_ the first device init call which triggers the UEVENT user mode helper. If the user mode helper uses the futex syscall before futex_init() then the syscall crashes with a NULL pointer dereference because the futex subsystem has not been initialized yet. Move futex_init() to core_initcall so futexes are initialized before the root filesystem is mounted and the usermode helper becomes available. [ tglx: Rewrote changelog ] Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: jiang.biao2@zte.com.cn Cc: jiang.zhengxiong@zte.com.cn Cc: zhong.weidong@zte.com.cn Cc: deng.huali@zte.com.cn Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2017-02-192-10/+9
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two small fixes:: - Prevent deadlock on the tick broadcast lock. Found and fixed by Mike. - Stop using printk() in the timekeeping debug code to prevent a deadlock against the scheduler" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Use deferred printk() in debug code tick/broadcast: Prevent deadlock on tick_broadcast_lock
| * | | | | timekeeping: Use deferred printk() in debug codeSergey Senozhatsky2017-02-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot do printk() from tk_debug_account_sleep_time(), because tk_debug_account_sleep_time() is called under tk_core seq lock. The reason why printk() is unsafe there is that console_sem may invoke scheduler (up()->wake_up_process()->activate_task()), which, in turn, can return back to timekeeping code, for instance, via get_time()->ktime_get(), deadlocking the system on tk_core seq lock. [ 48.950592] ====================================================== [ 48.950622] [ INFO: possible circular locking dependency detected ] [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted [ 48.950622] ------------------------------------------------------- [ 48.950622] kworker/0:0/3 is trying to acquire lock: [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 [ 48.950683] but task is already holding lock: [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.950714] which lock already depends on the new lock. [ 48.950714] the existing dependency chain (in reverse order) is: [ 48.950714] -> #5 (hrtimer_bases.lock){-.-...}: [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 [ 48.950775] lock_hrtimer_base+0x28/0x58 [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 [ 48.950775] __enqueue_rt_entity+0x320/0x360 [ 48.950805] enqueue_rt_entity+0x2c/0x44 [ 48.950805] enqueue_task_rt+0x24/0x94 [ 48.950836] ttwu_do_activate+0x54/0xc0 [ 48.950836] try_to_wake_up+0x248/0x5c8 [ 48.950836] __setup_irq+0x420/0x5f0 [ 48.950836] request_threaded_irq+0xdc/0x184 [ 48.950866] devm_request_threaded_irq+0x58/0xa4 [ 48.950866] omap_i2c_probe+0x530/0x6a0 [ 48.950897] platform_drv_probe+0x50/0xb0 [ 48.950897] driver_probe_device+0x1f8/0x2cc [ 48.950897] __driver_attach+0xc0/0xc4 [ 48.950927] bus_for_each_dev+0x6c/0xa0 [ 48.950927] bus_add_driver+0x100/0x210 [ 48.950927] driver_register+0x78/0xf4 [ 48.950958] do_one_initcall+0x3c/0x16c [ 48.950958] kernel_init_freeable+0x20c/0x2d8 [ 48.950958] kernel_init+0x8/0x110 [ 48.950988] ret_from_fork+0x14/0x24 [ 48.950988] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [ 48.951019] _raw_spin_lock+0x40/0x50 [ 48.951019] rq_offline_rt+0x9c/0x2bc [ 48.951019] set_rq_offline.part.2+0x2c/0x58 [ 48.951049] rq_attach_root+0x134/0x144 [ 48.951049] cpu_attach_domain+0x18c/0x6f4 [ 48.951049] build_sched_domains+0xba4/0xd80 [ 48.951080] sched_init_smp+0x68/0x10c [ 48.951080] kernel_init_freeable+0x160/0x2d8 [ 48.951080] kernel_init+0x8/0x110 [ 48.951080] ret_from_fork+0x14/0x24 [ 48.951110] -> #3 (&rq->lock){-.-.-.}: [ 48.951110] _raw_spin_lock+0x40/0x50 [ 48.951141] task_fork_fair+0x30/0x124 [ 48.951141] sched_fork+0x194/0x2e0 [ 48.951141] copy_process.part.5+0x448/0x1a20 [ 48.951171] _do_fork+0x98/0x7e8 [ 48.951171] kernel_thread+0x2c/0x34 [ 48.951171] rest_init+0x1c/0x18c [ 48.951202] start_kernel+0x35c/0x3d4 [ 48.951202] 0x8000807c [ 48.951202] -> #2 (&p->pi_lock){-.-.-.}: [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951232] try_to_wake_up+0x30/0x5c8 [ 48.951232] up+0x4c/0x60 [ 48.951263] __up_console_sem+0x2c/0x58 [ 48.951263] console_unlock+0x3b4/0x650 [ 48.951263] vprintk_emit+0x270/0x474 [ 48.951293] vprintk_default+0x20/0x28 [ 48.951293] printk+0x20/0x30 [ 48.951324] kauditd_hold_skb+0x94/0xb8 [ 48.951324] kauditd_thread+0x1a4/0x56c [ 48.951324] kthread+0x104/0x148 [ 48.951354] ret_from_fork+0x14/0x24 [ 48.951354] -> #1 ((console_sem).lock){-.....}: [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951385] down_trylock+0xc/0x2c [ 48.951385] __down_trylock_console_sem+0x24/0x80 [ 48.951385] console_trylock+0x10/0x8c [ 48.951416] vprintk_emit+0x264/0x474 [ 48.951416] vprintk_default+0x20/0x28 [ 48.951416] printk+0x20/0x30 [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 [ 48.951446] timekeeping_resume+0x218/0x23c [ 48.951477] syscore_resume+0x94/0x42c [ 48.951477] suspend_enter+0x554/0x9b4 [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 [ 48.951507] enter_state+0x934/0xbd4 [ 48.951507] pm_suspend+0x14/0x70 [ 48.951507] state_store+0x68/0xc8 [ 48.951538] kernfs_fop_write+0xf4/0x1f8 [ 48.951538] __vfs_write+0x1c/0x114 [ 48.951538] vfs_write+0xa0/0x168 [ 48.951568] SyS_write+0x3c/0x90 [ 48.951568] __sys_trace_return+0x0/0x10 [ 48.951568] -> #0 (tk_core){----..}: [ 48.951599] lock_acquire+0xe0/0x294 [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 [ 48.951629] retrigger_next_event+0x4c/0x90 [ 48.951629] on_each_cpu+0x40/0x7c [ 48.951629] clock_was_set_work+0x14/0x20 [ 48.951660] process_one_work+0x2b4/0x808 [ 48.951660] worker_thread+0x3c/0x550 [ 48.951660] kthread+0x104/0x148 [ 48.951690] ret_from_fork+0x14/0x24 [ 48.951690] other info that might help us debug this: [ 48.951690] Chain exists of: tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock [ 48.951721] Possible unsafe locking scenario: [ 48.951721] CPU0 CPU1 [ 48.951721] ---- ---- [ 48.951721] lock(hrtimer_bases.lock); [ 48.951751] lock(&rt_b->rt_runtime_lock); [ 48.951751] lock(hrtimer_bases.lock); [ 48.951751] lock(tk_core); [ 48.951782] *** DEADLOCK *** [ 48.951782] 3 locks held by kworker/0:0/3: [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.951843] stack backtrace: [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ [ 48.951904] Workqueue: events clock_was_set_work [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0) [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308) [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324) [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8) [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294) [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4) [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90) [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c) [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20) [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808) [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550) [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148) [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24) Replace printk() with printk_deferred(), which does not call into the scheduler. Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend") Reported-and-tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "[4.9+]" <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | tick/broadcast: Prevent deadlock on tick_broadcast_lockMike Galbraith2017-02-131-8/+7
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tick_broadcast_lock is taken from interrupt context, but the following call chain takes the lock without disabling interrupts: [ 12.703736] _raw_spin_lock+0x3b/0x50 [ 12.703738] tick_broadcast_control+0x5a/0x1a0 [ 12.703742] intel_idle_cpu_online+0x22/0x100 [ 12.703744] cpuhp_invoke_callback+0x245/0x9d0 [ 12.703752] cpuhp_thread_fun+0x52/0x110 [ 12.703754] smpboot_thread_fn+0x276/0x320 So the following deadlock can happen: lock(tick_broadcast_lock); <Interrupt> lock(tick_broadcast_lock); intel_idle_cpu_online() is the only place which violates the calling convention of tick_broadcast_control(). This was caused by the removal of the smp function call in course of the cpu hotplug rework. Instead of slapping local_irq_disable/enable() at the call site, we can relax the calling convention and handle it in the core code, which makes the whole machinery more robust. Fixes: 29d7bbada98e ("intel_idle: Remove superfluous SMP fuction call") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: lwn@lwn.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-02-195-31/+45
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix leak in dpaa_eth error paths, from Dan Carpenter. 2) Use after free when using IPV6_RECVPKTINFO, from Andrey Konovalov. 3) fanout_release() cannot be invoked from atomic contexts, from Anoob Soman. 4) Fix bogus attempt at lockdep annotation in IRDA. 5) dev_fill_metadata_dst() can OOP on a NULL dst cache pointer, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: irda: Fix lockdep annotations in hashbin_delete(). vxlan: fix oops in dev_fill_metadata_dst dccp: fix freeing skb too early for IPV6_RECVPKTINFO dpaa_eth: small leak on error packet: Do not call fanout_release from atomic contexts
| * | | | | irda: Fix lockdep annotations in hashbin_delete().David S. Miller2017-02-171-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A nested lock depth was added to the hasbin_delete() code but it doesn't actually work some well and results in tons of lockdep splats. Fix the code instead to properly drop the lock around the operation and just keep peeking the head of the hashbin queue. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>