summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* hwmon: (nct6775) Integrate new model nct6116Björn Gerhart2019-09-031-6/+174
| | | | | | | Add support for NCT6116D to nct6775 driver. Signed-off-by: Bjoern Gerhart <gerhart@posteo.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (adt7475) Convert to use hwmon_device_register_with_groups()Grant McEwan2019-09-031-96/+50
| | | | | | | | | | | hwmon_device_register() is a deprecated function and produces a warning. Converting the driver to use the hwmon_device_register_with_groups() instead. Signed-off-by: Grant McEwan <grant.mcewan@alliedtelesis.co.nz> Link: https://lore.kernel.org/r/20190721225530.28799-2-grant.mcewan@alliedtelesis.co.nz Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (w83781d) convert to i2c_new_dummy_deviceWolfram Sang2019-09-031-3/+3
| | | | | | | | | Move from i2c_new_dummy() to i2c_new_dummy_device(), so we now get an ERRPTR which we use in error handling. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20190722172611.3797-4-wsa+renesas@sang-engineering.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (smm665) convert to i2c_new_dummy_deviceWolfram Sang2019-09-031-3/+3
| | | | | | | | | Move from i2c_new_dummy() to i2c_new_dummy_device(), so we now get an ERRPTR which we use in error handling. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20190722172611.3797-3-wsa+renesas@sang-engineering.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (asb100) convert to i2c_new_dummy_deviceWolfram Sang2019-09-031-6/+6
| | | | | | | | | Move from i2c_new_dummy() to i2c_new_dummy_device(), so we now get an ERRPTR which we use in error handling. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20190722172611.3797-2-wsa+renesas@sang-engineering.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (k10temp) Add support for AMD family 17h, model 70h CPUsMarcel Bocu2019-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would seem like model 70h is behaving in the same way as model 30h, so let's just add the new F3 PCI ID to the list of compatible devices. Unlike previous Ryzen/Threadripper, Ryzen gen 3 processors do not need temperature offsets anymore. This has been reported in the press and verified on my Ryzen 3700X by checking that the idle temperature reported by k10temp is matching the temperature reported by the firmware. Vicki Pfau sent an identical patch after I checked that no-one had written this patch. I would have been happy about dropping my patch but unlike for his patch series, I had already Cc:ed the x86 people and they already reviewed the changes. Since Vicki has not answered to any email after his initial series, let's assume she is on vacation and let's avoid duplication of reviews from the maintainers and merge my series. To acknowledge Vicki's anteriority, I added her S-o-b to the patch. v2, suggested by Guenter Roeck and Brian Woods: - rename from 71h to 70h Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Marcel Bocu <marcel.p.bocu@gmail.com> Tested-by: Marcel Bocu <marcel.p.bocu@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: "Woods, Brian" <Brian.Woods@amd.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-hwmon@vger.kernel.org Link: https://lore.kernel.org/r/20190722174653.2391-1-marcel.p.bocu@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* x86/amd_nb: Add PCI device IDs for family 17h, model 70hMarcel Bocu2019-09-032-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AMD Ryzen gen 3 processors came with a different PCI IDs for the function 3 & 4 which are used to access the SMN interface. The root PCI address however remained at the same address as the model 30h. Adding the F3/F4 PCI IDs respectively to the misc and link ids appear to be sufficient for k10temp, so let's add them and follow up on the patch if other functions need more tweaking. Vicki Pfau sent an identical patch after I checked that no-one had written this patch. I would have been happy about dropping my patch but unlike for his patch series, I had already Cc:ed the x86 people and they already reviewed the changes. Since Vicki has not answered to any email after his initial series, let's assume she is on vacation and let's avoid duplication of reviews from the maintainers and merge my series. To acknowledge Vicki's anteriority, I added her S-o-b to the patch. v2, suggested by Guenter Roeck and Brian Woods: - rename from 71h to 70h Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Marcel Bocu <marcel.p.bocu@gmail.com> Tested-by: Marcel Bocu <marcel.p.bocu@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Brian Woods <brian.woods@amd.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: "Woods, Brian" <Brian.Woods@amd.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-hwmon@vger.kernel.org Link: https://lore.kernel.org/r/20190722174510.2179-1-marcel.p.bocu@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* docs: hwmon: pxe1610: convert to ReST format and add to the indexMauro Carvalho Chehab2019-09-032-8/+26
| | | | | | | | | This document was recently introduced. Convert it to ReST just like the other hwmon documents, adding it to the hwmon index. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Link: https://lore.kernel.org/r/657bf022625e0888d3becf10c78d162eeb864608.1563792334.git.mchehab+samsung@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (k8temp) update to use new hwmon registration APIRobert Karszniewicz2019-09-031-164/+69
| | | | | | | | | | | | | | | | | | Removes: - hwmon_dev from k8temp_data struct, as that is now passed to callbacks, anyway. - other k8temp_data struct fields, too. - k8temp_update_device() Also reduces binary size: text data bss dec hex filename 4139 1448 0 5587 15d3 drivers/hwmon/k8temp.ko.bak 3103 1220 0 4323 10e3 drivers/hwmon/k8temp.ko Signed-off-by: Robert Karszniewicz <avoidr@firemail.cc> Signed-off-by: Robert Karszniewicz <avoidr@riseup.net> Link: https://lore.kernel.org/r/20190721120051.28064-1-avoidr@riseup.net Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (pmbus/max31785) Remove a useless #defineChristophe JAILLET2019-09-031-2/+0
| | | | | | | | | | | | | | | | There is a typo in MAX37185_NUM_FAN_PAGES. To be consistent, it should be MAX31785_NUM_FAN_PAGES (1 and 7 switched). At line 24, we already have: #define MAX31785_NR_FAN_PAGES 6 and MAX37185_NUM_FAN_PAGES seems to be unused. It is likely that it is only a typo and/or a left-over. So, axe it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20190721101553.20911-1-christophe.jaillet@wanadoo.fr Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (lm75) add support for PCT2075Daniel Mack2019-09-032-2/+14
| | | | | | | | | | | | | The NXP PCT2075 is largely compatible with other chips already supported by the LM75 driver. It uses an 11-bit resolution and defaults to 100 ms sampling period. The datasheet is here: https://www.nxp.com/docs/en/data-sheet/PCT2075.pdf Signed-off-by: Daniel Mack <daniel@zonque.org> Link: https://lore.kernel.org/r/20190711124504.7580-2-daniel@zonque.org [groeck: Documentation update] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* device-tree: bindinds: add NXP PCT2075 as compatible device to LM75Daniel Mack2019-09-031-0/+1
| | | | | | | | | The PCT2075 is compatible to other chips that are already handled by the LM75 driver. Signed-off-by: Daniel Mack <daniel@zonque.org> Link: https://lore.kernel.org/r/20190711124504.7580-1-daniel@zonque.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: Remove ads1015 driverGuenter Roeck2019-09-038-435/+1
| | | | | | | | | | | | | | A driver for ADS1015 with more functionality is available in the iio subsystem. Remove the hwmon driver as duplicate. If the chip is used for hardware monitoring, the iio->hwmon bridge should be used. Cc: Dirk Eibach <eibach@gdsys.de> Link: https://lore.kernel.org/r/1562004758-13025-1-git-send-email-linux@roeck-us.net Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon (coretemp) Fix a memory leak bugWenwen Wang2019-08-311-1/+2
| | | | | | | | | | | | In coretemp_init(), 'zone_devices' is allocated through kcalloc(). However, it is not deallocated in the following execution if platform_driver_register() fails, leading to a memory leak. To fix this issue, introduce the 'outzone' label to free 'zone_devices' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Link: https://lore.kernel.org/r/1566248402-6538-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: (lm75) Fix write operations for negative temperaturesGuenter Roeck2019-08-311-1/+1
| | | | | | | | | | | | | Writes into limit registers fail if the temperature written is negative. The regmap write operation checks the value range, regmap_write accepts an unsigned int as parameter, and the temperature value passed to regmap_write is kept in a variable declared as long. Negative values are converted large unsigned integers, which fails the range check. Fix by type casting the temperature to u16 when calling regmap_write(). Cc: Iker Perez del Palomar Sustatxa <iker.perez@codethink.co.uk> Fixes: e65365fed87f ("hwmon: (lm75) Convert to use regmap") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* hwmon: pmbus: ucd9000: remove unneeded includeBartosz Golaszewski2019-08-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Build bot reports the following build issue after commit 9091373ab7ea ("gpio: remove less important #ifdef around declarations): In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: >> include/linux/gpio/driver.h:576:1: error: redefinition of 'gpiochip_add_pin_range' gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: include/linux/gpio.h:245:1: note: previous definition of 'gpiochip_add_pin_range' was here gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: >> include/linux/gpio/driver.h:583:1: error: redefinition of 'gpiochip_add_pingroup_range' gpiochip_add_pingroup_range(struct gpio_chip *chip, ^~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: include/linux/gpio.h:254:1: note: previous definition of 'gpiochip_add_pingroup_range' was here gpiochip_add_pingroup_range(struct gpio_chip *chip, ^~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: >> include/linux/gpio/driver.h:591:1: error: redefinition of 'gpiochip_remove_pin_ranges' gpiochip_remove_pin_ranges(struct gpio_chip *chip) ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: include/linux/gpio.h:263:1: note: previous definition of 'gpiochip_remove_pin_ranges' was here gpiochip_remove_pin_ranges(struct gpio_chip *chip) This is caused by conflicting defines from linux/gpio.h and linux/gpio/driver.h. Drivers should not include both the legacy and the new API headers. This driver doesn't even use linux/gpio.h so remove it. Reported-by: kbuild test robot <lkp@intel.com> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Link: https://lore.kernel.org/r/20190808080144.6183-1-brgl@bgdev.pl Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* Linux 5.3-rc6v5.3-rc6Linus Torvalds2019-08-251-1/+1
|
* Merge tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linuxLinus Torvalds2019-08-251-2/+2
|\ | | | | | | | | | | | | | | Pull auxdisplay cleanup from Miguel Ojeda: "Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)" * tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux: auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant
| * auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constantNishka Dasgupta2019-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The static structures ht16k33_fb_fix and ht16k33_fb_var, of types fb_fix_screeninfo and fb_var_screeninfo respectively, are not used except to be copied into other variables. Hence make both of them constant to prevent unintended modification. Issue found with Coccinelle. Acked-by: Robin van der Gracht <robin@protonic.nl> Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
* | Merge tag 'for-linus-5.3-rc6' of ↵Linus Torvalds2019-08-253-12/+20
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML fix from Richard Weinberger: "Fix time travel mode" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: fix time travel mode
| * | um: fix time travel modeJohannes Berg2019-08-233-12/+20
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, my build fix for when time travel mode isn't enabled broke time travel mode, because I forgot that we need to use the timer time after the timer has been marked disabled, and thus need to leave the time stored instead of zeroing it. Fix that by splitting the inline into two, so we can call only the _mode() one in the relevant code path. Fixes: b482e48d29f1 ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | Merge tag 'for-linus-5.3-rc6' of ↵Linus Torvalds2019-08-254-8/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBIFS and JFFS2 fixes from Richard Weinberger: "UBIFS: - Don't block too long in writeback_inodes_sb() - Fix for a possible overrun of the log head - Fix double unlock in orphan_delete() JFFS2: - Remove C++ style from UAPI header and unbreak picky toolchains" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Limit the number of pages in shrink_liability ubifs: Correctly initialize c->min_log_bytes ubifs: Fix double unlock around orphan_delete() jffs2: Remove C++ style comments from uapi header
| * | ubifs: Limit the number of pages in shrink_liabilityLiu Song2019-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the number of dirty pages to be written back is large, then writeback_inodes_sb will block waiting for a long time, causing hung task detection alarm. Therefore, we should limit the maximum number of pages written back this time, which let the budget be completed faster. The remaining dirty pages tend to rely on the writeback mechanism to complete the synchronization. Fixes: b6e51316daed ("writeback: separate starting of sync vs opportunistic writeback") Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Correctly initialize c->min_log_bytesRichard Weinberger2019-08-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently on a freshly mounted UBIFS, c->min_log_bytes is 0. This can lead to a log overrun and make commits fail. Recent kernels will report the following assert: UBIFS assert failed: c->lhead_lnum != c->ltail_lnum, in fs/ubifs/log.c:412 c->min_log_bytes can have two states, 0 and c->leb_size. It controls how much bytes of the log area are reserved for non-bud nodes such as commit nodes. After a commit it has to be set to c->leb_size such that we have always enough space for a commit. While a commit runs it can be 0 to make the remaining bytes of the log available to writers. Having it set to 0 right after mount is wrong since no space for commits is reserved. Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-and-tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix double unlock around orphan_delete()Richard Weinberger2019-08-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | We unlock after orphan_delete(), so no need to unlock in the function too. Reported-by: Han Xu <han.xu@nxp.com> Fixes: 8009ce956c3d ("ubifs: Don't leak orphans on memory during commit") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | jffs2: Remove C++ style comments from uapi headerMasahiro Yamada2019-08-221-5/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux kernel tolerates C++ style comments these days. Actually, the SPDX License tags for .c files start with //. On the other hand, uapi headers are written in more strict C, where the C++ comment style is forbidden. I simply dropped these lines instead of fixing the comment style. This code has been always commented out since it was added around Linux 2.4.9 (i.e. commented out for more than 17 years). 'Maybe later...' will never happen. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at>
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-08-259-33/+227
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Fix a boot regression caused by the recent bootparam sanitizing change, which escaped the attention of all people who reviewed that code. - Address a boot problem on machines with broken E820 tables caused by an underflow which ended up placing the trampoline start at physical address 0. - Handle machines which do not advertise a legacy timer of any form, but need calibration of the local APIC timer gracefully by making the calibration routine independent from the tick interrupt. Marked for stable as well as there seems to be quite some new laptops rolled out which expose this. - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are affected by broken firmware which does not initialize RDRAND correctly after resume. Add a command line parameter to override this for machine which either do not use suspend/resume or have a fixed BIOS. Unfortunately there is no way to detect this on boot, so the only safe decision is to turn it off by default. - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which caused fast KVM instruction emulation to break. - Explain the Intel CPU model naming convention so that the repeating discussions come to an end" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 x86/boot: Fix boot regression caused by bootparam sanitizing x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h x86/boot/compressed/64: Fix boot on machines with broken E820 table x86/apic: Handle missing global clockevent gracefully x86/cpu: Explain Intel model naming convention
| * | x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386Sean Christopherson2019-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to avoid clobbering flags. KVM's emulator makes indirect calls into a jump table of sorts, where the destination of the CALL_NOSPEC is a small blob of code that performs fast emulation by executing the target instruction with fixed operands. adcb_al_dl: 0x000339f8 <+0>: adc %dl,%al 0x000339fa <+2>: ret A major motiviation for doing fast emulation is to leverage the CPU to handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is both an input and output to the target of CALL_NOSPEC. Clobbering flags results in all sorts of incorrect emulation, e.g. Jcc instructions often take the wrong path. Sans the nops... asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" 0x0003595a <+58>: mov 0xc0(%ebx),%eax 0x00035960 <+64>: mov 0x60(%ebx),%edx 0x00035963 <+67>: mov 0x90(%ebx),%ecx 0x00035969 <+73>: push %edi 0x0003596a <+74>: popf 0x0003596b <+75>: call *%esi 0x000359a0 <+128>: pushf 0x000359a1 <+129>: pop %edi 0x000359a2 <+130>: mov %eax,0xc0(%ebx) 0x000359b1 <+145>: mov %edx,0x60(%ebx) ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); 0x000359a8 <+136>: mov -0x10(%ebp),%eax 0x000359ab <+139>: and $0x8d5,%edi 0x000359b4 <+148>: and $0xfffff72a,%eax 0x000359b9 <+153>: or %eax,%edi 0x000359bd <+157>: mov %edi,0x4(%ebx) For the most part this has gone unnoticed as emulation of guest code that can trigger fast emulation is effectively limited to MMIO when running on modern hardware, and MMIO is rarely, if ever, accessed by instructions that affect or consume flags. Breakage is almost instantaneous when running with unrestricted guest disabled, in which case KVM must emulate all instructions when the guest has invalid state, e.g. when the guest is in Big Real Mode during early BIOS. Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support") Fixes: 1a29b5b7f347a ("KVM: x86: Make indirect calls in emulator speculation safe") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
| * | x86/boot: Fix boot regression caused by bootparam sanitizingJohn Hubbard2019-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") had two errors: * It preserved boot_params.acpi_rsdp_addr, and * It failed to preserve boot_params.hdr Therefore, zero out acpi_rsdp_addr, and preserve hdr. Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") Reported-by: Neil MacLeod <neil@nmacleod.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neil MacLeod <neil@nmacleod.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.com
| * | x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16hTom Lendacky2019-08-194-13/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chen Yu <yu.c.chen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org> Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
| * | x86/boot/compressed/64: Fix boot on machines with broken E820 tableKirill A. Shutemov2019-08-191-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that consists of 2 entries: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved It breaks logic in find_trampoline_placement(): bios_start lands on the end of the first 4k page and trampoline start gets placed below 0. Detect underflow and don't touch bios_start for such cases. It makes kernel ignore E820 table on machines that doesn't have two usable pages below BIOS_START_MAX. Fixes: 1b3a62643660 ("x86/boot/compressed/64: Validate trampoline placement against E820") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463 Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
| * | x86/apic: Handle missing global clockevent gracefullyThomas Gleixner2019-08-191-15/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well. Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency. As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device. Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago. Reported-by: Daniel Drake <drake@endlessm.com> Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Drake <drake@endlessm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
| * | x86/cpu: Explain Intel model naming conventionTony Luck2019-08-171-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Hansen spelled out the rules in an e-mail: https://lkml.kernel.org/r/91eefbe4-e32b-d762-be4d-672ff915db47@intel.com Copy those right into the <asm/intel-family.h> file to make it easy for people to find them. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190815224704.GA10025@agluck-desk2.amr.corp.intel.com
* | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2019-08-253-9/+23
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping fix from Thomas Gleixner: "A single fix for a regression caused by the generic VDSO implementation where a math overflow causes CLOCK_BOOTTIME to become a random number generator" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
| * | | timekeeping/vsyscall: Prevent math overflow in BOOTTIME updateThomas Gleixner2019-08-233-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the nanoseconds based boot time offset left by the clocksource shift. That overflows once the boot time offset becomes large enough. As a consequence CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to misbehave. Fix it by storing a timespec64 representation of the offset when boot time is adjusted and add that to the MONOTONIC base time value in the vdso data page. Using the timespec64 representation avoids a 64bit division in the update code. Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Reported-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chris Clayton <chris2553@googlemail.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
* | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2019-08-251-1/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Handle the worker management in situations where a task is scheduled out on a PI lock contention correctly and schedule a new worker if possible" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Schedule new worker even if PI-blocked
| * | | | sched/core: Schedule new worker even if PI-blockedSebastian Andrzej Siewior2019-08-191-1/+4
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to schedule a new kworker if we schedule out due to lock contention because !RT does not do that as well. A spinning spinlock disables preemption and a worker does not schedule out on lock contention (but spin). On RT the RW-semaphore implementation uses an rtmutex so tsk_is_pi_blocked() will return true if a task blocks on it. In this case we will now start a new worker which may deadlock if one worker is waiting on progress from another worker. Since a RW-semaphore starts a new worker on !RT, we should do the same on RT. XFS is able to trigger this deadlock. Allow to schedule new worker if the current worker is PI-blocked. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2019-08-252-5/+5
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two small fixes for kprobes and perf: - Prevent a deadlock in kprobe_optimizer() causes by reverse lock ordering - Fix a comment typo" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Fix potential deadlock in kprobe_optimizer() perf/x86: Fix typo in comment
| * | | | kprobes: Fix potential deadlock in kprobe_optimizer()Andrea Righi2019-08-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep reports the following deadlock scenario: WARNING: possible circular locking dependency detected kworker/1:1/48 is trying to acquire lock: 000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290 but task is already holding lock: 00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (module_mutex){+.+.}: __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 set_all_modules_text_rw+0x22/0x90 ftrace_arch_code_modify_prepare+0x1c/0x20 ftrace_run_update_code+0xe/0x30 ftrace_startup_enable+0x2e/0x50 ftrace_startup+0xa7/0x100 register_ftrace_function+0x27/0x70 arm_kprobe+0xb3/0x130 enable_kprobe+0x83/0xa0 enable_trace_kprobe.part.0+0x2e/0x80 kprobe_register+0x6f/0xc0 perf_trace_event_init+0x16b/0x270 perf_kprobe_init+0xa7/0xe0 perf_kprobe_event_init+0x3e/0x70 perf_try_init_event+0x4a/0x140 perf_event_alloc+0x93a/0xde0 __do_sys_perf_event_open+0x19f/0xf30 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x65/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (text_mutex){+.+.}: __lock_acquire+0xfcb/0x1b60 lock_acquire+0xca/0x1d0 __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 kprobe_optimizer+0x163/0x290 process_one_work+0x22b/0x560 worker_thread+0x50/0x3c0 kthread+0x112/0x150 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(module_mutex); lock(text_mutex); lock(module_mutex); lock(text_mutex); *** DEADLOCK *** As a reproducer I've been using bcc's funccount.py (https://github.com/iovisor/bcc/blob/master/tools/funccount.py), for example: # ./funccount.py '*interrupt*' That immediately triggers the lockdep splat. Fix by acquiring text_mutex before module_mutex in kprobe_optimizer(). Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: d5b844a2cf50 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()") Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13 Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86: Fix typo in commentSu Yanjun2019-08-191-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1565945001-4413-1-git-send-email-suyj.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2019-08-251-1/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a imbalanced kobject operation in the irq decriptor code which was unearthed by the new warnings in the kobject code" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Properly pair kobject_del() with kobject_add()
| * | | | genirq: Properly pair kobject_del() with kobject_add()Michael Kelley2019-08-191-1/+14
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If alloc_descs() fails before irq_sysfs_init() has run, free_desc() in the cleanup path will call kobject_del() even though the kobject has not been added with kobject_add(). Fix this by making the call to kobject_del() conditional on whether irq_sysfs_init() has run. This problem surfaced because commit aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") makes kobject_del() stricter about pairing with kobject_add(). If the pairing is incorrrect, a WARNING and backtrace occur in sysfs_remove_group() because there is no parent. [ tglx: Add a comment to the code and make it work with CONFIG_SYSFS=n ] Fixes: ecb3f394c5db ("genirq: Expose interrupt information through sysfs") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1564703564-4116-1-git-send-email-mikelley@microsoft.com
* | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2019-08-259-36/+260
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mergr misc fixes from Andrew Morton: "11 fixes" Mostly VM fixes, one psi polling fix, and one parisc build fix. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y mm/zsmalloc.c: fix race condition in zs_destroy_pool mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely mm, page_owner: handle THP splits correctly userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx psi: get poll_work to run when calling poll syscall next time mm: memcontrol: flush percpu vmevents before releasing memcg mm: memcontrol: flush percpu vmstats before releasing memcg parisc: fix compilation errrors mm, page_alloc: move_freepages should not examine struct page of reserved memory mm/z3fold.c: fix race between migration and destruction
| * | | | mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=yAndrey Ryabinin2019-08-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code like this: ptr = kmalloc(size, GFP_KERNEL); page = virt_to_page(ptr); offset = offset_in_page(ptr); kfree(page_address(page) + offset); may produce false-positive invalid-free reports on the kernel with CONFIG_KASAN_SW_TAGS=y. In the example above we lose the original tag assigned to 'ptr', so kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag is different from the tag in shadow hence print false report. Instead of just comparing tags, do the following: 1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's double-free and it doesn't matter what tag the pointer have. 2) If pointer tag is different from 0xFF, make sure that tag in the shadow is the same as in the pointer. Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | mm/zsmalloc.c: fix race condition in zs_destroy_poolHenry Burns2019-08-251-2/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time. Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work *after* zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it. Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward. In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction). Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitelyHenry Burns2019-08-251-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In zs_page_migrate() we call putback_zspage() after we have finished migrating all pages in this zspage. However, the return value is ignored. If a zs_free() races in between zs_page_isolate() and zs_page_migrate(), freeing the last object in the zspage, putback_zspage() will leave the page in ZS_EMPTY for potentially an unbounded amount of time. To fix this, we need to do the same thing as zs_page_putback() does: schedule free_work to occur. To avoid duplicated code, move the sequence to a new putback_zspage_deferred() function which both zs_page_migrate() and zs_page_putback() call. Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | mm, page_owner: handle THP splits correctlyVlastimil Babka2019-08-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | THP splitting path is missing the split_page_owner() call that split_page() has. As a result, split THP pages are wrongly reported in the page_owner file as order-9 pages. Furthermore when the former head page is freed, the remaining former tail pages are not listed in the page_owner file at all. This patch fixes that by adding the split_page_owner() call into __split_huge_page(). Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling") Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctxOleg Nesterov2019-08-251-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL. Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx. Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | psi: get poll_work to run when calling poll syscall next timeJason Xing2019-08-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only when calling the poll syscall the first time can user receive POLLPRI correctly. After that, user always fails to acquire the event signal. Reproduce case: 1. Get the monitor code in Documentation/accounting/psi.txt 2. Run it, and wait for the event triggered. 3. Kill and restart the process. The question is why we can end up with poll_scheduled = 1 but the work not running (which would reset it to 0). And the answer is because the scheduling side sees group->poll_kworker under RCU protection and then schedules it, but here we cancel the work and destroy the worker. The cancel needs to pair with resetting the poll_scheduled flag. Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | mm: memcontrol: flush percpu vmevents before releasing memcgRoman Gushchin2019-08-251-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to vmstats, percpu caching of local vmevents leads to an accumulation of errors on non-leaf levels. This happens because some leftovers may remain in percpu caches, so that they are never propagated up by the cgroup tree and just disappear into nonexistence with on releasing of the memory cgroup. To fix this issue let's accumulate and propagate percpu vmevents values before releasing the memory cgroup similar to what we're doing with vmstats. Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate only over online cpus. Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>