summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* dm bufio: remove code that merges slab cachesMikulas Patocka2018-04-031-39/+14
| | | | | | | | | All slab allocators can merge duplicate caches. So dm-bufio doesn't need extra slab merging logic. Instead it can just allocate one slab cache per client and let the allocator merge them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm bufio: get rid of slab cache name allocationsMikulas Patocka2018-04-031-18/+3
| | | | | | | | | | | | | | | dm-bufio keeps the dm_bufio_cache_names array that holds names of the slab caches. Since the commit db265eca7700 ("mm/sl[aou]b: Move duping of slab name to slab_common.c"), the kernel automatically duplicates the slab cache name when creating the slab cache, so we no longer have to keep the name allocated. Remove the code that allocates the slab names and keeps them around. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm bufio: move dm-bufio.h to include/linux/Mikulas Patocka2018-04-036-10/+10
| | | | | | | | | | | | | | | | Move dm-bufio.h to include/linux/ so that external GPL'd DM target modules can use it. It is better to allow the use of dm-bufio than force external modules to implement the equivalent buffered IO mechanism in some new way. The hope is this will encourage the use of dm-bufio; which will then make it easier for a GPL'd external DM target module to be included upstream. A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use EXPORT_SYMBOL_GPL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm bufio: delete outdated commentMikulas Patocka2018-04-031-4/+0
| | | | | | | | This comment was true when dm-bufio was written but, since 4.3, bios can now have arbitrary size and the driver splits them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add support for secure erase forwardingDenis Semakin2018-04-035-0/+52
| | | | | | | | | | | Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's data devices support secure erase. Also, add support for secure erase to both the linear and striped targets. Signed-off-by: Denis Semakin <d.semakin@omprussia.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: backfill abnormal IO support to non-splitting IO submissionMike Snitzer2018-04-031-7/+23
| | | | | | | | | | | | Otherwise, these abnormal IOs would be sent to the DM target regardless of whether the target advertised support for them. Factor out __process_abnormal_io() from __split_and_process_non_flush() so that discards, write same, etc may be conditionally processed. Fixes: 978e51ba3 ("dm: optimize bio-based NVMe IO submission") Cc: stable@vger.kernel.org # 4.16 Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix nosync statusHeinz Mauelshagen2018-04-031-1/+2
| | | | | | | | | | | Fix a race for "nosync" activations providing "aa.." device health characters and "0/N" sync ratio rather than "AA..." and "N/N". Occurs when status for the raid set is retrieved during resume before the MD sync thread starts and clears the MD_RECOVERY_NEEDED flag. Cc: stable@vger.kernel.org # 4.16+ Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in ↵Wang Sheng-Hui2018-04-031-1/+1
| | | | | | | process_queued_bios() Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm stripe: get rid of a Variable Length Array (VLA)Tycho Andersen2018-04-031-5/+5
| | | | | | | | | | | | | | Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to the build args: https://lkml.org/lkml/2018/3/7/621 This one is a simple case, since we don't actually need the VLA at all: we can just iterate over the stripes twice, once to emit their names, and the second time to emit status (i.e. trade memory for time). Since the number of stripes is probably low, this is hopefully not that expensive. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm log writes: record metadata flag for better flags recordQu Wenruo2018-04-031-4/+8
| | | | | | | | So developer could distinguish data and metadata bios easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm integrity: fail early if required HMAC key is not availableMilan Broz2018-04-031-0/+3
| | | | | | | | | | | | | | | Since crypto API commit 9fa68f62004 ("crypto: hash - prevent using keyed hashes without setting key") dm-integrity cannot use keyed algorithms without the key being set. The dm-integrity recognizes this too late (during use of HMAC), so it allows creation and formatting of superblock, but the device is in fact unusable. Fix it by detecting the key requirement in integrity table constructor. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: remove unused macro DM_MOD_NAME_SIZEWang Sheng-Hui2018-04-031-2/+0
| | | | | Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm unstripe: remove unnecessary header includesHeinz Mauelshagen2018-04-031-6/+0
| | | | | Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm unstripe: remove superfluous module init error path messageHeinz Mauelshagen2018-04-031-7/+1
| | | | | | Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Reviewed-by: Scott Bauer <Scott.Bauer@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm unstripe: add "dm-unstriped" module aliasHeinz Mauelshagen2018-04-031-0/+1
| | | | | | | | | | | | | | | This target's kernel module being named dm-unstripe.ko doesn't allow lvm2's DM module autoload capability to load the dm-unstripe.ko because lvm2 looks for dm-unstriped.ko due to the target name being "unstriped". Add the "dm-unstriped" module alias to resolve this oversight. NOTE: this isn't needed for the "striped" target, despite its source file being named dm-stripe.c, because it is part of dm-mod.ko. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm unstripe: support non-power-of-2 chunk sizeHeinz Mauelshagen2018-04-031-12/+10
| | | | | | | | | | | Address "FIXME: must support non power of 2 chunk_size, dm-stripe.c does". Bump target version to indicate change. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Tested-by: Scott Bauer <Scott.Bauer@intel.com> Reviewed-by: Scott Bauer <Scott.Bauer@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: limit the number of allocated pagesMikulas Patocka2018-04-031-1/+65
| | | | | | | | | | | | | | | | | | | | | | dm-crypt consumes an excessive amount memory when the user attempts to zero a dm-crypt device with "blkdiscard -z". The command "blkdiscard -z" calls the BLKZEROOUT ioctl, it goes to the function __blkdev_issue_zeroout, __blkdev_issue_zeroout sends a large amount of write bios that contain the zero page as their payload. For each incoming page, dm-crypt allocates another page that holds the encrypted data, so when processing "blkdiscard -z", dm-crypt tries to allocate the amount of memory that is equal to the size of the device. This can trigger OOM killer or cause system crash. Fix this by limiting the amount of memory that dm-crypt allocates to 2% of total system memory. This limit is system-wide and is divided by the number of active dm-crypt devices and each device receives an equal share. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: allow targets to return output from messages they are sentMike Snitzer2018-04-0311-12/+21
| | | | | | | | Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
* dm: fix dropped return code from dm_get_bdev_for_ioctlMike Snitzer2018-03-301-3/+5
| | | | | | | | | | | | | | | | | | | dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it wasn't). Unfortunately commit 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") reused the variable 'r' to store the return from blkdev_get() that follows prepare_ioctl() -- whereby dropping prepare_ioctl()'s result on the floor. This can lead to an ioctl or persistent reservation being issued to a partition going unnoticed, which implies the extra permission check for CAP_SYS_RAWIO is skipped. Fix this by using a different variable to store blkdev_get()'s return. Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: fix support for loading scsi_dh modules during table loadMike Snitzer2018-03-291-1/+1
| | | | | | | | | | | | | The ability to have multipath dynamically attach a scsi_dh, that the user specified in the multipath table, was broken by commit e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue"). Restore the ability to load, and attach, a particular scsi_dh module if one is specified (as noticed by checking m->hw_handler_name). Fixes: e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue") Reported-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: fix passing integrity dataSteffen Maier2018-03-141-2/+3
| | | | | | | | | | | | | | | | | | | After v4.12 commit e2460f2a4bc7 ("dm: mark targets that pass integrity data"), dm-multipath, e.g. on DIF+DIX SCSI disk paths, does not support block integrity any more. So add it to the whitelist. This is also a pre-requisite to use block integrity with other dm layer(s) on top of multipath, such as kpartx partitions (dm-linear) or LVM. Also, bump target version to reflect this fix. Fixes: e2460f2a4bc7 ("dm: mark targets that pass integrity data") Cc: <stable@vger.kernel.org> #4.12+ Bisected-by: Fedor Loshakov <loshakov@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: eliminate need to use scsi_device_from_queueMike Snitzer2018-03-131-9/+8
| | | | | | | | | | | | Instead of scsi_device_from_queue(), use scsi_dh_attached_handler_name() -- whose implementation uses scsi_device_from_queue() to avoid trying to access SCSI-specific resources from non-SCSI devices. Fixes buildbot reported issue when CONFIG_SCSI isn't set: ERROR: "scsi_device_from_queue" [drivers/md/dm-multipath.ko] undefined! Fixes: 8d47e65948dd ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointerMike Snitzer2018-03-131-11/+10
| | | | | | | | | | | | Initialize all the scsi_dh related 'struct multipath' members regardless of whether a scsi_dh is in use or not. The subtle (and fragile) SCSI-assuming legacy code clearly needs further decoupling from non-SCSI (and/or developer understanding). Fixes: 8d47e65948dd ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks") Reported-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Linux 4.16-rc5v4.16-rc5Linus Torvalds2018-03-121-1/+1
|
* Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2018-03-1117-182/+291
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "Yet another pile of melted spectrum related updates: - Drop native vsyscall support finally as it causes more trouble than benefit. - Make microcode loading more robust. There were a few issues especially related to late loading which are now surfacing because late loading of the IB* microcodes addressing spectre issues has become more widely used. - Simplify and robustify the syscall handling in the entry code - Prevent kprobes on the entry trampoline code which lead to kernel crashes when the probe hits before CR3 is updated - Don't check microcode versions when running on hypervisors as they are considered as lying anyway. - Fix the 32bit objtool build and a coment typo" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix kernel crash when probing .entry_trampoline code x86/pti: Fix a comment typo x86/microcode: Synchronize late microcode loading x86/microcode: Request microcode on the BSP x86/microcode/intel: Look into the patch cache first x86/microcode: Do not upload microcode if CPUs are offline x86/microcode/intel: Writeback and invalidate caches before updating microcode x86/microcode/intel: Check microcode revision before updating sibling threads x86/microcode: Get rid of struct apply_microcode_ctx x86/spectre_v2: Don't check microcode versions when running under hypervisors x86/vsyscall/64: Drop "native" vsyscalls x86/entry/64/compat: Save one instruction in entry_INT80_compat() x86/entry: Do not special-case clone(2) in compat entry x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls x86/syscalls: Use proper syscall definition for sys_ioperm() x86/entry: Remove stale syscall prototype x86/syscalls/32: Simplify $entry == $compat entries objtool: Fix 32-bit build
| * x86/kprobes: Fix kernel crash when probing .entry_trampoline codeFrancis Deslauriers2018-03-093-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disable the kprobe probing of the entry trampoline: .entry_trampoline is a code area that is used to ensure page table isolation between userspace and kernelspace. At the beginning of the execution of the trampoline, we load the kernel's CR3 register. This has the effect of enabling the translation of the kernel virtual addresses to physical addresses. Before this happens most kernel addresses can not be translated because the running process' CR3 is still used. If a kprobe is placed on the trampoline code before that change of the CR3 register happens the kernel crashes because int3 handling pages are not accessible. To fix this, add the .entry_trampoline section to the kprobe blacklist to prohibit the probing of code before all the kernel pages are accessible. Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: mathieu.desnoyers@efficios.com Cc: mhiramat@kernel.org Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/pti: Fix a comment typoSeunghun Han2018-03-081-1/+1
| | | | | | | | | | | | | | | | | | s/visinble/visible/ Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1520397135-132809-1-git-send-email-kkamagui@gmail.com
| * x86/microcode: Synchronize late microcode loadingAshok Raj2018-03-081-26/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original idea by Ashok, completely rewritten by Borislav. Before you read any further: the early loading method is still the preferred one and you should always do that. The following patch is improving the late loading mechanism for long running jobs and cloud use cases. Gather all cores and serialize the microcode update on them by doing it one-by-one to make the late update process as reliable as possible and avoid potential issues caused by the microcode update. [ Borislav: Rewrite completely. ] Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
| * x86/microcode: Request microcode on the BSPBorislav Petkov2018-03-081-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... so that any newer version can land in the cache and can later be fished out by the application functions. Do that before grabbing the hotplug lock. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-7-bp@alien8.de
| * x86/microcode/intel: Look into the patch cache firstBorislav Petkov2018-03-081-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache might contain a newer patch - look in there first. A follow-on change will make sure newest patches are loaded into the cache of microcode patches. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-6-bp@alien8.de
| * x86/microcode: Do not upload microcode if CPUs are offlineAshok Raj2018-03-081-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid loading microcode if any of the CPUs are offline, and issue a warning. Having different microcode revisions on the system at any time is outright dangerous. [ Borislav: Massage changelog. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.de
| * x86/microcode/intel: Writeback and invalidate caches before updating microcodeAshok Raj2018-03-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating microcode is less error prone when caches have been flushed and depending on what exactly the microcode is updating. For example, some of the issues around certain Broadwell parts can be addressed by doing a full cache flush. [ Borislav: Massage it and use native_wbinvd() in both cases. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-3-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-4-bp@alien8.de
| * x86/microcode/intel: Check microcode revision before updating sibling threadsAshok Raj2018-03-081-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After updating microcode on one of the threads of a core, the other thread sibling automatically gets the update since the microcode resources on a hyperthreaded core are shared between the two threads. Check the microcode revision on the CPU before performing a microcode update and thus save us the WRMSR 0x79 because it is a particularly expensive operation. [ Borislav: Massage changelog and coding style. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-2-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-3-bp@alien8.de
| * x86/microcode: Get rid of struct apply_microcode_ctxBorislav Petkov2018-03-081-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a useless remnant from earlier times. Use the ucode_state enum directly. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-2-bp@alien8.de
| * x86/spectre_v2: Don't check microcode versions when running under hypervisorsKonrad Rzeszutek Wilk2018-03-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As: 1) It's known that hypervisors lie about the environment anyhow (host mismatch) 2) Even if the hypervisor (Xen, KVM, VMWare, etc) provided a valid "correct" value, it all gets to be very murky when migration happens (do you provide the "new" microcode of the machine?). And in reality the cloud vendors are the ones that should make sure that the microcode that is running is correct and we should just sing lalalala and trust them. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: kvm <kvm@vger.kernel.org> Cc: Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180226213019.GE9497@char.us.oracle.com
| * x86/vsyscall/64: Drop "native" vsyscallsAndy Lutomirski2018-03-084-30/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Linux v3.2, vsyscalls have been deprecated and slow. From v3.2 on, Linux had three vsyscall modes: "native", "emulate", and "none". "emulate" is the default. All known user programs work correctly in emulate mode, but vsyscalls turn into page faults and are emulated. This is very slow. In "native" mode, the vsyscall page is easily usable as an exploit gadget, but vsyscalls are a bit faster -- they turn into normal syscalls. (This is in contrast to vDSO functions, which can be much faster than syscalls.) In "none" mode, there are no vsyscalls. For all practical purposes, "native" was really just a chicken bit in case something went wrong with the emulation. It's been over six years, and nothing has gone wrong. Delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/entry/64/compat: Save one instruction in entry_INT80_compat()Dominik Brodowski2018-03-071-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As %rdi is never user except in the following push, there is no need to restore %rdi to the original value. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/entry: Do not special-case clone(2) in compat entryDominik Brodowski2018-03-074-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the CPU renaming registers on its own, and all the overhead of the syscall entry/exit, it is doubtful whether the compiled output of mov %r8, %rax mov %rcx, %r8 mov %rax, %rcx jmpq sys_clone is measurably slower than the hand-crafted version of xchg %r8, %rcx So get rid of this special case. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscallsDominik Brodowski2018-03-073-62/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While at it, convert declarations of type "unsigned" to "unsigned int". Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/syscalls: Use proper syscall definition for sys_ioperm()Dominik Brodowski2018-03-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using SYSCALL_DEFINEx() is recommended, so use it also here. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/entry: Remove stale syscall prototypeDominik Brodowski2018-03-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys32_vm86_warning() is long gone. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/syscalls/32: Simplify $entry == $compat entriesDominik Brodowski2018-03-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the compat entry point is equivalent to the native entry point, it does not need to be specified explicitly. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * objtool: Fix 32-bit buildJosh Poimboeuf2018-03-071-20/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the objtool build when cross-compiling a 64-bit kernel on a 32-bit host. This also simplifies read_retpoline_hints() a bit and makes its implementation similar to most of the other annotation reading functions. Reported-by: Sven Joachim <svenjoac@gmx.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b5bc2231b8ad ("objtool: Add retpoline validation") Link: http://lkml.kernel.org/r/2ca46c636c23aa9c9d57d53c75de4ee3ddf7a7df.1520380691.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2018-03-111-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Just a single fix which adds a missing Kconfig dependency to avoid unmet dependency warnings" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependency
| * | clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependencyMasahiro Yamada2018-03-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ATMEL_ST config selects MFD_SYSCON, but does not depend on HAS_IOMEM. Compile testing on architecture without HAS_IOMEM causes "unmet direct dependencies" in Kconfig phase. Detected by "make ARCH=score allyesconfig". Add the proper dependency to the ATMEL_ST config. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1520335233-11277-1-git-send-email-yamada.masahiro@socionext.com
* | | Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds2018-03-112-2/+25
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two small fixes for RAS/MCE: - Serialize sysfs changes to avoid concurrent modificaiton of underlying data - Add microcode revision to Machine Check records. This should have been there forever, but now with the broken microcode versions in the wild it has become important" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE: Serialize sysfs changes x86/MCE: Save microcode revision in machine check records
| * | | x86/MCE: Serialize sysfs changesSeunghun Han2018-03-081-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check_interval file in /sys/devices/system/machinecheck/machinecheck<cpu number> directory is a global timer value for MCE polling. If it is changed by one CPU, mce_restart() broadcasts the event to other CPUs to delete and restart the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the mce_timer variable. If more than one CPU writes a specific value to the check_interval file concurrently, mce_timer is not protected from such concurrent accesses and all kinds of explosions happen. Since only root can write to those sysfs variables, the issue is not a big deal security-wise. However, concurrent writes to these configuration variables is void of reason so the proper thing to do is to serialize the access with a mutex. Boris: - Make store_int_with_restart() use device_store_ulong() to filter out negative intervals - Limit min interval to 1 second - Correct locking - Massage commit message Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
| * | | x86/MCE: Save microcode revision in machine check recordsTony Luck2018-03-082-1/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating microcode used to be relatively rare. Now that it has become more common we should save the microcode version in a machine check record to make sure that those people looking at the error have this important information bundled with the rest of the logged information. [ Borislav: Simplify a bit. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2018-03-1113-17/+65
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Another set of perf updates: - Fix a Skylake Uncore event format declaration - Prevent perf pipe mode from crahsing which was caused by a missing buffer allocation - Make the perf top popup message which tells the user that it uses fallback mode on older kernels a debug message. - Make perf context rescheduling work correcctly - Robustify the jump error drawing in perf browser mode so it does not try to create references to NULL initialized offset entries - Make trigger_on() robust so it does not enable the trigger before everything is set up correctly to handle it - Make perf auxtrace respect the --no-itrace option so it does not try to queue AUX data for decoding. - Prevent having different number of field separators in CVS output lines when a counter is not supported. - Make the perf kallsyms man page usage behave like it does for all other perf commands. - Synchronize the kernel headers" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix ctx_event_type in ctx_resched() perf tools: Fix trigger class trigger_on() perf auxtrace: Prevent decoding when --no-itrace perf stat: Fix CVS output format for non-supported counters tools headers: Sync x86's cpufeatures.h tools headers: Sync copy of kvm UAPI headers perf record: Fix crash in pipe mode perf annotate browser: Be more robust when drawing jump arrows perf top: Fix annoying fallback message on older kernels perf kallsyms: Fix the usage on the man page perf/x86/intel/uncore: Fix Skylake UPI event format
| * | | perf/core: Fix ctx_event_type in ctx_resched()Song Liu2018-03-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is added. However, ctx_resched() calculates ctx_event_type before checking this condition. As a result, pinned events will NOT get higher priority than flexible events. The following shows this issue on an Intel CPU (where ref-cycles can only use one hardware counter). 1. First start: perf stat -C 0 -e ref-cycles -I 1000 2. Then, in the second console, run: perf stat -C 0 -e ref-cycles:D -I 1000 The second perf uses pinned events, which is expected to have higher priority. However, because it failed in ctx_resched(). It is never run. This patch fixes this by calculating ctx_event_type after re-evaluating event_type. Reported-by: Ephraim Park <ephiepark@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 487f05e18aa4 ("perf/core: Optimize event rescheduling on active contexts") Link: http://lkml.kernel.org/r/20180306055504.3283731-1-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>