summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* clocksource/drivers/timer-atmel-tcb: Allow selecting first dividerAlexandre Belloni2020-07-111-4/+2
| | | | | | | | | | The divider selection algorithm never allowed to get index 0. It was also continuing to look for dividers, trying to find the slow clock selection. This is not necessary anymore. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-9-alexandre.belloni@bootlin.com
* clocksource/drivers/timer-atmel-tcb: Stop using the 32kHz for clockeventsAlexandre Belloni2020-07-111-30/+33
| | | | | | | | | | Stop using the slow clock as the clock source for 32 bit counters because even at 10MHz, they are able to handle delays up to two minutes. This provides a way better resolution. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-8-alexandre.belloni@bootlin.com
* clocksource/drivers/timer-atmel-tcb: Fill tcb_configAlexandre Belloni2020-07-111-3/+15
| | | | | | | | | | Use the tcb_config and struct atmel_tcb_config to get the timer counter width. This is necessary because atmel_tcb_config will be extended later on. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-7-alexandre.belloni@bootlin.com
* clocksource/drivers/timer-atmel-tcb: Rework 32khz clock selectionAlexandre Belloni2020-07-111-9/+2
| | | | | | | | | On all the supported SoCs, the slow clock is always ATMEL_TC_TIMER_CLOCK5, avoid looking it up and pass it directly to setup_clkevents. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-6-alexandre.belloni@bootlin.com
* ARM: at91: add atmel tcb capabilitiesKamel Bouhara2020-07-111-0/+5
| | | | | | | | | | Some atmel socs have extra tcb capabilities that allow using a generic clock source or enabling a quadrature decoder. Signed-off-by: Kamel Bouhara <kamel.bouhara@bootlin.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-5-alexandre.belloni@bootlin.com
* ARM: dts: at91: sama5d2: add TCB GCLKAlexandre Belloni2020-07-111-6/+6
| | | | | | | | The sama5d2 tcbs take an extra input clock, their gclk. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-4-alexandre.belloni@bootlin.com
* dt-bindings: microchip: atmel,at91rm9200-tcb: add sama5d2 compatibleAlexandre Belloni2020-07-111-9/+33
| | | | | | | | | | The sama5d2 TC block TIMER_CLOCK1 is different from the at91sam9x5 one. Instead of being MCK / 2, it is the TCB GCLK. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-3-alexandre.belloni@bootlin.com
* dt-bindings: atmel-tcb: convert bindings to json-schemaAlexandre Belloni2020-07-112-56/+131
| | | | | | | | | | | | Convert Atmel Timer Counter Blocks bindings to DT schema format using json-schema. Also move it out of mfd as it is not and has never been related to mfd. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200710230813.1005150-2-alexandre.belloni@bootlin.com
* dt-bindings: timer: Add renesas,em-sti bindingsGeert Uytterhoeven2020-05-231-0/+46
| | | | | | | | | Document Device Tree bindings for the Renesas EMMA Mobile System Timer. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200519081101.28973-1-geert+renesas@glider.be
* clocksource/drivers/timer-versatile: Clear OF_POPULATED flagSaravana Kannan2020-05-231-0/+3
| | | | | | | | | | | | | | | | | | | The commit 4f41fe386a94 ("clocksource/drivers/timer-probe: Avoid creating dead devices") broke the handling of arm,vexpress-sysreg [1]. The arm,vexpress-sysreg device is handled by both timer-versatile.c and drivers/mfd/vexpress-sysreg.c. While the timer driver doesn't use the device, the mfd driver still needs a device to probe. So, this patch clears the OF_POPULATED flag to continue creating the device. [1] - https://lore.kernel.org/lkml/20200324175955.GA16972@arm.com/ Fixes: 4f41fe386a94 ("clocksource/drivers/timer-probe: Avoid creating dead devices") Signed-off-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200324195302.203115-1-saravanak@google.com
* clocksource: mips-gic-timer: Mark GIC timer as unstable if ref clock changesSerge Semin2020-05-232-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently clocksource framework doesn't support the clocks with variable frequency. Since MIPS GIC timer ticks rate might be unstable on some platforms, we must make sure that it justifies the clocksource requirements. MIPS GIC timer is incremented with the CPU cluster reference clocks rate. So in case if CPU frequency changes, the MIPS GIC tick rate changes synchronously. Due to this the clocksource subsystem can't rely on the timer to measure system clocks anymore. This commit marks the MIPS GIC based clocksource as unstable if reference clock (normally it's a CPU reference clocks) rate changes. The clocksource will execute a watchdog thread, which lowers the MIPS GIC timer rating to zero and fallbacks to a new stable one. Note we don't need to set the CLOCK_SOURCE_MUST_VERIFY flag to the MIPS GIC clocksource since normally the timer is stable. The only reason why it gets unstable is due to the ref clock rate change, which event we detect here in the driver by means of the clocks event notifier. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: devicetree@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-9-Sergey.Semin@baikalelectronics.ru
* clocksource: mips-gic-timer: Register as sched_clockPaul Burton2020-05-231-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MIPS GIC timer is well suited for use as sched_clock, so register it as such. Whilst the existing gic_read_count() function matches the prototype needed by sched_clock_register() already, we split it into 2 functions in order to remove the need to evaluate the mips_cm_is64 condition within each call since sched_clock should be as fast as possible. Note the sched clock framework needs the clock source being stable in order to rely on it. So we register the MIPS GIC timer as schedule clocks only if it's, if either the system doesn't have CPU-frequency enabled or the CPU frequency is changed by means of the CPC core clock divider available on the platforms with CM3 or newer. Signed-off-by: Paul Burton <paulburton@kernel.org> Co-developed-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> [Sergey.Semin@baikalelectronics.ru: Register sched-clock if CM3 or !CPU-freq] Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: devicetree@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-8-Sergey.Semin@baikalelectronics.ru
* clocksource: dw_apb_timer_of: Fix missing clockevent timersSerge Semin2020-05-231-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 100214889973 ("clocksource: dw_apb_timer_of: use clocksource_of_init") replaced a publicly available driver initialization method with one called by the timer_probe() method available after CLKSRC_OF. In current implementation it traverses all the timers available in the system and calls their initialization methods if corresponding devices were either in dtb or in acpi. But if before the commit any number of available timers would be installed as clockevent and clocksource devices, after that there would be at most two. The rest are just ignored since default case branch doesn't do anything. I don't see a reason of such behaviour, neither the commit message explains it. Moreover this might be wrong if on some platforms these timers might be used for different purpose, as virtually CPU-local clockevent timers and as an independent broadcast timer. So in order to keep the compatibility with the platforms where the order of the timers detection has some meaning, lets add the secondly discovered timer to be of clocksource/sched_clock type, while the very first and the others would provide the clockevents service. Fixes: 100214889973 ("clocksource: dw_apb_timer_of: use clocksource_of_init") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: devicetree@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-7-Sergey.Semin@baikalelectronics.ru
* clocksource: dw_apb_timer: Affiliate of-based timer with any CPUSerge Semin2020-05-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently any DW APB Timer device detected in OF is bound to CPU #0. Doing so is redundant since DW APB Timer isn't CPU-local timer, but as having APB interface is normally accessible from any CPU in the system. By artificially affiliating the DW timer to the very first CPU we may and in our case will make the clockevent subsystem to decline the more performant real CPU-local timers selection in favor of in fact non-local and accessible over a slow bus - DW APB Timers. Let's not affiliate the of-detected DW APB Timers to any CPU. By doing so the clockevent framework would prefer to select the real CPU-local timer instead of DW APB one. Otherwise if there is no other than DW APB device for clockevents tracking then it will be selected. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: devicetree@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-6-Sergey.Semin@baikalelectronics.ru
* clocksource: dw_apb_timer: Make CPU-affiliation being optionalSerge Semin2020-05-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the DW APB Timer driver binds each clockevent timers to a particular CPU. This isn't good for multiple reasons. First of all seeing the device is placed on APB bus (which makes it accessible from any CPU core), accessible over MMIO and having the DYNIRQ flag set we can be sure that manually binding the timer to any CPU just isn't correct. By doing so we just set an extra limitation on device usage. This also doesn't reflect the device actual capability, since by setting the IRQ affinity we can make it virtually local to any CPU. Secondly imagine if you had a real CPU-local timer with the same rating and the same CPU-affinity. In this case if DW APB timer was registered first, then due to the clockevent framework tick-timer selection procedure we'll end up with the real CPU-local timer being left unselected for clock-events tracking. But on most of the platforms (MIPS/ARM/etc) such timers are normally embedded into the CPU core and are accessible with much better performance then devices placed on APB. For instance in MIPS architectures there is r4k-timer, which is CPU-local, assigned with the same rating, and normally its clockevent device is registered after the platform-specific one. So in order to fix all of these issues let's make the DW APB Timer CPU affinity being optional and deactivated by passing a negative CPU id, which will effectively set the DW APB clockevent timer cpumask to 'cpu_possible_mask'. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: devicetree@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-5-Sergey.Semin@baikalelectronics.ru
* dt-bindings: timer: Move snps,dw-apb-timer DT schema from rtcSerge Semin2020-05-231-1/+1
| | | | | | | | | | | | | | | | | | | This binding file doesn't belong to the rtc seeing it's a pure timer with no rtc facilities like days/months/years counting and alarms. So move the YAML-file to the Documentation/devicetree/bindings/timer/ directory. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-mips@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-3-Sergey.Semin@baikalelectronics.ru
* dt-bindings: rtc: Convert snps,dw-apb-timer to DT schemaSerge Semin2020-05-232-32/+88
| | | | | | | | | | | | | | | | | | | | | | | | | Modern device tree bindings are supposed to be created as YAML-files in accordance with DT schema. This commit replaces Synopsys DW Timer legacy bare text binding with YAML file. As before the binding file states that the corresponding dts node is supposed to be compatible with generic DW APB Timer indicated by the "snps,dw-apb-timer" compatible string and to provide a mandatory registers memory range, one timer interrupt, either reference clock source or a fixed clock rate value. It may also have an optional APB bus reference clock phandle specified. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-mips@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200521204818.25436-2-Sergey.Semin@baikalelectronics.ru
* clocksource/drivers/timer-ti-dm: Do one override clock parent in prepare()Lokesh Vutla2020-05-231-3/+1
| | | | | | | | | | | | | omap_dm_timer_prepare() is setting up the parent 32KHz clock. This prepare() gets called by request_timer in the client's driver. Because of this, the timer clock parent that is set with assigned-clock-parent is being overwritten. So drop this default setting of parent in prepare(). Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Reviewed-by: Suman Anna <s-anna@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200427172831.16546-1-lokeshvutla@ti.com
* Merge branch 'omap-for-v5.8/dt-timer' of ↵Daniel Lezcano2020-05-234-0/+4
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into timers/drivers/next
| * ARM: dts: Add 32KHz clock as default clock sourceLokesh Vutla2020-05-054-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Clocksource to timer configured in pwm mode can be selected using the DT property ti,clock-source. There are few pwm timers which are not selecting the clock source and relying on default value in hardware or selected by driver. Instead of relying on default value, always select the clock source from DT. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Reviewed-by: Suman Anna <s-anna@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
* | clocksource/drivers/timer-ti-dm: Fix spelling mistake "detectt" -> "detect"Colin Ian King2020-05-231-1/+1
| | | | | | | | | | | | | | | | There is a spelling mistake in a pr_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200519224428.6195-1-colin.king@canonical.com
* | Merge branch 'timers/drivers/timer-ti' into timers/drivers/nextDaniel Lezcano2020-05-2311345-247494/+517478
|\ \
| * | clocksource/drivers/timer-ti-dm: Fix warning for set but not usedTony Lindgren2020-05-191-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can get a warning for dmtimer_clocksource_init() with 'pa' set but not used. This was used in the earlier revisions of the code but no longer needed, so let's remove the unused pa and of_translate_address(). Let's also do it for dmtimer_clockevent_init() that has a similar issue. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200519155157.12804-1-tony@atomide.com
| * | clocksource/drivers/timer-ti-dm: Add clockevent and clocksource supportTony Lindgren2020-05-182-0/+732
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can move the TI dmtimer clockevent and clocksource to live under drivers/clocksource if we rely only on the clock framework, and handle the module configuration directly in the clocksource driver based on the device tree data. This removes the early dependency with system timers to the interconnect related code, and we can probe pretty much everything else later on at the module_init level. Let's first add a new driver for timer-ti-dm-systimer based on existing arch/arm/mach-omap2/timer.c. Then let's start moving SoCs to probe with device tree data while still keeping the old timer.c. And eventually we can just drop the old timer.c. Let's take the opportunity to switch to use readl/writel as pointed out by Daniel Lezcano <daniel.lezcano@linaro.org>. This allows further clean-up of the timer-ti-dm code the a lot of the shared helpers can just become static to the non-syster related code. Note the boards can optionally configure different timer source clocks if needed with assigned-clocks and assigned-clock-parents. Cc: linux-kernel@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Rob Herring <robh@kernel.org> Cc: Tero Kristo <t-kristo@ti.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200507172330.18679-3-tony@atomide.com
| * | clocksource/drivers/timer-ti-32k: Add support for initializing directlyTony Lindgren2020-05-181-1/+47
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's allow probing the 32k counter directly based on devicetree data to prepare for dropping the related legacy platform code. Let's only do this if the parent node is compatible with ti-sysc to make sure we have the related devicetree data available. Let's also show the 32k counter information before registering the clocksource, now we see it after the clocksource information which is a bit confusing. Cc: linux-kernel@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Rob Herring <robh@kernel.org> Cc: Tero Kristo <t-kristo@ti.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200507172330.18679-2-tony@atomide.com
| * Linux 5.7-rc1v5.7-rc1Linus Torvalds2020-04-121-2/+2
| |
| * MAINTAINERS: sort field names for all entriesLinus Torvalds2020-04-121-1974/+1974
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This sorts the actual field names too, potentially causing even more chaos and confusion at merge time if you have edited the MAINTAINERS file. But the end result is a more consistent layout, and hopefully it's a one-time pain minimized by doing this just before the -rc1 release. This was entirely scripted: ./scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS --order Requested-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * MAINTAINERS: sort entries by entry nameLinus Torvalds2020-04-121-820/+820
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They are all supposed to be sorted, but people who add new entries don't always know the alphabet. Plus sometimes the entry names get edited, and people don't then re-order the entry. Let's see how painful this will be for merging purposes (the MAINTAINERS file is often edited in various different trees), but Joe claims there's relatively few patches in -next that touch this, and doing it just before -rc1 is likely the best time. Fingers crossed. This was scripted with /scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS but then I also ended up manually upper-casing a few entry names that stood out when looking at the end result. Requested-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge tag 'x86-urgent-2020-04-12' of ↵Linus Torvalds2020-04-124-9/+79
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of three patches to fix the fallout of the newly added split lock detection feature. It addressed the case where a KVM guest triggers a split lock #AC and KVM reinjects it into the guest which is not prepared to handle it. Add proper sanity checks which prevent the unconditional injection into the guest and handles the #AC on the host side in the same way as user space detections are handled. Depending on the detection mode it either warns and disables detection for the task or kills the task if the mode is set to fatal" * tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest KVM: x86: Emulate split-lock access as a write in emulator x86/split_lock: Provide handle_guest_split_lock()
| | * KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guestXiaoyao Li2020-04-111-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC 2. split lock #AC Reflect #AC back into the guest if the guest has legacy alignment checks enabled or if split lock detection is disabled. If the #AC is not a legacy one and split lock detection is enabled, then invoke handle_guest_split_lock() which will either warn and disable split lock detection for this task or force SIGBUS on it. [ tglx: Switch it to handle_guest_split_lock() and rename the misnamed helper function. ] Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115517.176308876@linutronix.de
| | * KVM: x86: Emulate split-lock access as a write in emulatorXiaoyao Li2020-04-111-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emulate split-lock accesses as writes if split lock detection is on to avoid #AC during emulation, which will result in a panic(). This should never occur for a well-behaved guest, but a malicious guest can manipulate the TLB to trigger emulation of a locked instruction[1]. More discussion can be found at [2][3]. [1] https://lkml.kernel.org/r/8c5b11c9-58df-38e7-a514-dc12d687b198@redhat.com [2] https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com [3] https://lkml.kernel.org/r/20200227001117.GX9940@linux.intel.com Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115517.084300242@linutronix.de
| | * x86/split_lock: Provide handle_guest_split_lock()Thomas Gleixner2020-04-112-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without at least minimal handling for split lock detection induced #AC, VMX will just run into the same problem as the VMWare hypervisor, which was reported by Kenneth. It will inject the #AC blindly into the guest whether the guest is prepared or not. Provide a function for guest mode which acts depending on the host SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a warning, disable SLD and mark the task accordingly. Otherwise force SIGBUS. [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
| * | Merge tag 'timers-urgent-2020-04-12' of ↵Linus Torvalds2020-04-123-0/+10
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time(keeping) updates from Thomas Gleixner: - Fix the time_for_children symlink in /proc/$PID/ so it properly reflects that it part of the 'time' namespace - Add the missing userns limit for the allowed number of time namespaces, which was half defined but the actual array member was not added. This went unnoticed as the array has an exessive empty member at the end but introduced a user visible regression as the output was corrupted. - Prevent further silent ucount corruption by adding a BUILD_BUG_ON() to catch half updated data. * tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ucount: Make sure ucounts in /proc/sys/user don't regress again time/namespace: Add max_time_namespaces ucount time/namespace: Fix time_for_children symlink
| | * | ucount: Make sure ucounts in /proc/sys/user don't regress againJan Kara2020-04-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 769071ac9f20 "ns: Introduce Time Namespace" broke reporting of inotify ucounts (max_inotify_instances, max_inotify_watches) in /proc/sys/user because it has added UCOUNT_TIME_NAMESPACES into enum ucount_type but didn't properly update reporting in kernel/ucount.c:setup_userns_sysctls(). This problem got fixed in commit eeec26d5da82 "time/namespace: Add max_time_namespaces ucount". Add BUILD_BUG_ON to catch a similar problem in the future. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andrei Vagin <avagin@gmail.com> Link: https://lkml.kernel.org/r/20200407154643.10102-1-jack@suse.cz
| | * | time/namespace: Add max_time_namespaces ucountDmitry Safonov2020-04-072-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael noticed that userns limit for number of time namespaces is missing. Furthermore, time namespace introduced UCOUNT_TIME_NAMESPACES, but didn't introduce an array member in user_table[]. It would make array's initialisation OOB write, but by luck the user_table array has an excessive empty member (all accesses to the array are limited with UCOUNT_COUNTS - so it silently reuses the last free member. Fixes user-visible regression: max_inotify_instances by reason of the missing UCOUNT_ENTRY() has limited max number of namespaces instead of the number of inotify instances. Fixes: 769071ac9f20 ("ns: Introduce Time Namespace") Reported-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andrei Vagin <avagin@gmail.com> Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/20200406171342.128733-1-dima@arista.com
| | * | time/namespace: Fix time_for_children symlinkMichael Kerrisk (man-pages)2020-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looking at the contents of the /proc/PID/ns/time_for_children symlink shows an anomaly: $ ls -l /proc/self/ns/* |awk '{print $9, $10, $11}' ... /proc/self/ns/pid -> pid:[4026531836] /proc/self/ns/pid_for_children -> pid:[4026531836] /proc/self/ns/time -> time:[4026531834] /proc/self/ns/time_for_children -> time_for_children:[4026531834] /proc/self/ns/user -> user:[4026531837] ... The reference for 'time_for_children' should be a 'time' namespace, just as the reference for 'pid_for_children' is a 'pid' namespace. In other words, the above time_for_children link should read: /proc/self/ns/time_for_children -> time:[4026531834] Fixes: 769071ac9f20 ("ns: Introduce Time Namespace") Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dima@arista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Andrei Vagin <avagin@gmail.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/a2418c48-ed80-3afe-116e-6611cb799557@gmail.com
| * | | Merge tag 'sched-urgent-2020-04-12' of ↵Linus Torvalds2020-04-125-62/+51
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes/updates from Thomas Gleixner: - Deduplicate the average computations in the scheduler core and the fair class code. - Fix a raise between runtime distribution and assignement which can cause exceeding the quota by up to 70%. - Prevent negative results in the imbalanace calculation - Remove a stale warning in the workqueue code which can be triggered since the call site was moved out of preempt disabled code. It's a false positive. - Deduplicate the print macros for procfs - Add the ucmap values to the SCHED_DEBUG procfs output for completness * tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Add task uclamp values to SCHED_DEBUG procfs sched/debug: Factor out printing formats into common macros sched/debug: Remove redundant macro define sched/core: Remove unused rq::last_load_update_tick workqueue: Remove the warning in wq_worker_sleeping() sched/fair: Fix negative imbalance in imbalance calculation sched/fair: Fix race between runtime distribution and assignment sched/fair: Align rq->avg_idle and rq->avg_scan_cost
| | * | | sched/debug: Add task uclamp values to SCHED_DEBUG procfsValentin Schneider2020-04-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requested and effective uclamp values can be a bit tricky to decipher when playing with cgroup hierarchies. Add them to a task's procfs when SCHED_DEBUG is enabled. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com
| | * | | sched/debug: Factor out printing formats into common macrosValentin Schneider2020-04-081-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The printing macros in debug.c keep redefining the same output format. Collect each output format in a single definition, and reuse that definition in the other macros. While at it, add a layer of parentheses and replace printf's with the newly introduced macros. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com
| | * | | sched/debug: Remove redundant macro defineValentin Schneider2020-04-081-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most printing macros for procfs are defined globally in debug.c, and they are re-defined (to the exact same thing) within proc_sched_show_task(). Get rid of the duplicate defines. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com
| | * | | sched/core: Remove unused rq::last_load_update_tickVincent Donnefort2020-04-082-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 5e83eafbfd3b ("sched/fair: Remove the rq->cpu_load[] update code") eliminated the last use case for rq->last_load_update_tick, so remove the field as well. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1584710495-308969-1-git-send-email-vincent.donnefort@arm.com
| | * | | workqueue: Remove the warning in wq_worker_sleeping()Sebastian Andrzej Siewior2020-04-082-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel test robot triggered a warning with the following race: task-ctx A interrupt-ctx B worker -> process_one_work() -> work_item() -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> ->sleeping = 1 atomic_dec_and_test(nr_running) __schedule(); *interrupt* async_page_fault() -> local_irq_enable(); -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> if (WARN_ON(->sleeping)) return -> __schedule() -> sched_update_worker() -> wq_worker_running() -> atomic_inc(nr_running); -> ->sleeping = 0; -> sched_update_worker() -> wq_worker_running() if (!->sleeping) return In this context the warning is pointless everything is fine. An interrupt before wq_worker_sleeping() will perform the ->sleeping assignment (0 -> 1 > 0) twice. An interrupt after wq_worker_sleeping() will trigger the warning and nr_running will be decremented (by A) and incremented once (only by B, A will skip it). This is the case until the ->sleeping is zeroed again in wq_worker_running(). Remove the WARN statement because this condition may happen. Document that preemption around wq_worker_sleeping() needs to be disabled to protect ->sleeping and not just as an optimisation. Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
| | * | | sched/fair: Fix negative imbalance in imbalance calculationAubrey Li2020-04-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A negative imbalance value was observed after imbalance calculation, this happens when the local sched group type is group_fully_busy, and the average load of local group is greater than the selected busiest group. Fix this problem by comparing the average load of the local and busiest group before imbalance calculation formula. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com
| | * | | sched/fair: Fix race between runtime distribution and assignmentHuaixin Chang2020-04-081-20/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is a potential race between distribute_cfs_runtime() and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read, distributes without holding lock and finds out there is not enough runtime to charge against after distribution. Because assign_cfs_rq_runtime() might be called during distribution, and use cfs_b->runtime at the same time. Fibtest is the tool to test this race. Assume all gcfs_rq is throttled and cfs period timer runs, slow threads might run and sleep, returning unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local pool. If all this happens sufficiently quickly, cfs_b->runtime will drop a lot. If runtime distributed is large too, over-use of runtime happens. A runtime over-using by about 70 percent of quota is seen when we test fibtest on a 96-core machine. We run fibtest with 1 fast thread and 95 slow threads in test group, configure 10ms quota for this group and see the CPU usage of fibtest is 17.0%, which is far more than the expected 10%. On a smaller machine with 32 cores, we also run fibtest with 96 threads. CPU usage is more than 12%, which is also more than expected 10%. This shows that on similar workloads, this race do affect CPU bandwidth control. Solve this by holding lock inside distribute_cfs_runtime(). Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/
| | * | | sched/fair: Align rq->avg_idle and rq->avg_scan_costValentin Schneider2020-04-083-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an open-coded version (with the exact same decay factor) for rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to compare these two fields. The only difference between the two is that rq->avg_scan_cost is computed using a pure division rather than a shift. Turns out it actually matters, first of all because the shifted value can be negative, and the standard has this to say about it: """ The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1 has a signed type and a negative value, the resulting value is implementation-defined. """ Not only this, but (arithmetic) right shifting a negative value (using 2's complement) is *not* equivalent to dividing it by the corresponding power of 2. Let's look at a few examples: -4 -> 0xF..FC -4 >> 3 -> 0xF..FF == -1 != -4 / 8 -8 -> 0xF..F8 -8 >> 3 -> 0xF..FF == -1 == -8 / 8 -9 -> 0xF..F7 -9 >> 3 -> 0xF..FE == -2 != -9 / 8 Make update_avg() use a division, and export it to the private scheduler header to reuse it where relevant. Note that this still lets compilers use a shift here, but should prevent any unwanted surprise. The disassembly of select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2 instructions; the diff sort of looks like this: - sub x1, x1, x0 + subs x1, x1, x0 // set condition codes + add x0, x1, #0x7 + csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1 add x0, x3, x0, asr #3 which does the right thing (i.e. gives us the expected result while still using an arithmetic shift) Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
| * | | | Merge tag 'perf-urgent-2020-04-12' of ↵Linus Torvalds2020-04-124-31/+573
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixes/updates for perf: - Fix the perf event cgroup tracking which tries to track the cgroup even for disabled events. - Add Ice Lake server support for uncore events - Disable pagefaults when retrieving the physical address in the sampling code" * tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Disable page faults when getting phys address perf/x86/intel/uncore: Add Ice Lake server uncore support perf/cgroup: Correct indirection in perf_less_group_idx() perf/core: Fix event cgroup tracking
| | * | | | perf/core: Disable page faults when getting phys addressJiri Olsa2020-04-081-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hit following warning when running tests on kernel compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y: WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200 CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3 RIP: 0010:__get_user_pages_fast+0x1a4/0x200 ... Call Trace: perf_prepare_sample+0xff1/0x1d90 perf_event_output_forward+0xe8/0x210 __perf_event_overflow+0x11a/0x310 __intel_pmu_pebs_event+0x657/0x850 intel_pmu_drain_pebs_nhm+0x7de/0x11d0 handle_pmi_common+0x1b2/0x650 intel_pmu_handle_irq+0x17b/0x370 perf_event_nmi_handler+0x40/0x60 nmi_handle+0x192/0x590 default_do_nmi+0x6d/0x150 do_nmi+0x2f9/0x3c0 nmi+0x8e/0xd7 While __get_user_pages_fast() is IRQ-safe, it calls access_ok(), which warns on: WARN_ON_ONCE(!in_task() && !pagefault_disabled()) Peter suggested disabling page faults around __get_user_pages_fast(), which gets rid of the warning in access_ok() call. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org
| | * | | | perf/x86/intel/uncore: Add Ice Lake server uncore supportKan Liang2020-04-083-0/+522
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uncore subsystem in Ice Lake server is similar to previous server. There are some differences in config register encoding and pci device IDs. The uncore PMON units in Ice Lake server include Ubox, Chabox, IIO, IRP, M2PCIE, PCU, M2M, PCIE3 and IMC. - For CHA, filter 1 register has been removed. The filter 0 register can be used by and of CHA events to be filterd by Thread/Core-ID. To do so, the control register's tid_en bit must be set to 1. - For IIO, there are some changes on event constraints. The MSR address and MSR offsets among counters are also changed. - For IRP, the MSR address and MSR offsets among counters are changed. - For M2PCIE, the counters are accessed by MSR now. Add new MSR address and MSR offsets. Change event constraints. - To determine the number of CHAs, have to read CAPID6(Low) and CAPID7 (High) now. - For M2M, update the PCICFG address and Device ID. - For UPI, update the PCICFG address, Device ID and counter address. - For M3UPI, update the PCICFG address, Device ID, counter address and event constraints. - For IMC, update the formular to calculate MMIO BAR address, which is MMIO_BASE + specific MEM_BAR offset. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1585842411-150452-1-git-send-email-kan.liang@linux.intel.com
| | * | | | perf/cgroup: Correct indirection in perf_less_group_idx()Ian Rogers2020-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The void* in perf_less_group_idx() is to a member in the array which points at a perf_event*, as such it is a perf_event**. Reported-By: John Sperbeck <jsperbeck@google.com> Fixes: 6eef8a7116de ("perf/core: Use min_heap in visit_groups_merge()") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200321164331.107337-1-irogers@google.com
| | * | | | perf/core: Fix event cgroup trackingPeter Zijlstra2020-04-081-27/+43
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Song reports that installing cgroup events is broken since: db0503e4f675 ("perf/core: Optimize perf_install_in_event()") The problem being that cgroup events try to track cpuctx->cgrp even for disabled events, which is pointless and actively harmful since the above commit. Rework the code to have explicit enable/disable hooks for cgroup events, such that we can limit cgroup tracking to active events. More specifically, since the above commit disabled events are no longer added to their context from the 'right' CPU, and we can't access things like the current cgroup for a remote CPU. Cc: <stable@vger.kernel.org> # v5.5+ Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()") Reported-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200318193337.GB20760@hirez.programming.kicks-ass.net