summaryrefslogtreecommitdiffstats
path: root/fs/quota (unfollow)
Commit message (Collapse)AuthorFilesLines
2016-04-26f2fs: report unwritten status in fsync_node_pagesJaegeuk Kim2-8/+9
The fsync_node_pages should return pass or failure so that user could know fsync is completed or not. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: split sync_node_pages with fsync_node_pagesJaegeuk Kim5-33/+84
This patch splits the existing sync_node_pages into (f)sync_node_pages. The fsync_node_pages is used for f2fs_sync_file only. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: avoid writing 0'th page in volatile writesJaegeuk Kim1-2/+4
The first page of volatile writes usually contains a sort of header information which will be used for recovery. (e.g., journal header of sqlite) If this is written without other journal data, user needs to handle the stale journal information. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: avoid needless lock for node pages when fsyncing a fileJaegeuk Kim1-3/+7
When fsync is called, sync_node_pages finds a proper direct node pages to flush. But, it locks unrelated direct node pages together unnecessarily. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: flush dirty pages before starting atomic writesJaegeuk Kim1-1/+10
If somebody wrote some data before atomic writes, we should flush them in order to handle atomic data in a right period. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: don't invalidate atomic page if successfulJaegeuk Kim1-2/+3
If we committed atomic write successfully, we don't need to invalidate pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give -E2BIG for no space in xattrJaegeuk Kim1-1/+1
This patch returns -E2BIG if there is no space to add an xattr entry. This should fix generic/026 in xfstests as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: remove redundant condition checkJaegeuk Kim2-2/+2
This patch resolves the redundant condition check reported by David. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: unset atomic/volatile flag in f2fs_release_fileJaegeuk Kim2-3/+4
The atomic/volatile operation should be done in pair of start and commit ioctl. For example, if a killed process remains open-ended atomic operation, we should drop its flag as well as its atomic data. Otherwise, if sqlite initiates another operation which doesn't require atomic writes, it will lose every data, since f2fs still treats with them as atomic writes; nobody will trigger its commit. Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: fix dropping inmemory pages in a wrong timeJaegeuk Kim1-0/+8
When one reader closes its file while the other writer is doing atomic writes, f2fs_release_file drops atomic data resulting in an empty commit. This patch fixes this wrong commit problem by checking openess of the file. Process0 Process1 open file start atomic write write data read data close file f2fs_release_file() clear atomic data commit atomic write Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: add BUG_ON to avoid unnecessary flowJaegeuk Kim1-5/+2
This patch adds BUG_ON instead of retrying loop. In the case of node pages, we already got this inode page, but unlocked it. By the fact that we don't truncate any node pages in operations, the page's mapping should be unchangeable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: use PGP_LOCK to check its truncationJaegeuk Kim1-7/+2
Previously, after trylock_page is succeeded, it doesn't check its mapping. In order to fix that, we can just give PGP_LOCK to pagecache_get_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: fix to convert inline directory correctlyChao Yu4-43/+144
With below serials, we will lose parts of dirents: 1) mount f2fs with inline_dentry option 2) echo 1 > /sys/fs/f2fs/sdX/dir_level 3) mkdir dir 4) touch 180 files named [1-180] in dir 5) touch 181 in dir 6) echo 3 > /proc/sys/vm/drop_caches 7) ll dir ls: cannot access 2: No such file or directory ls: cannot access 4: No such file or directory ls: cannot access 5: No such file or directory ls: cannot access 6: No such file or directory ls: cannot access 8: No such file or directory ls: cannot access 9: No such file or directory ... total 360 drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./ drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../ -rw-r--r-- 1 root root 0 Feb 19 15:12 1 -rw-r--r-- 1 root root 0 Feb 19 15:12 10 -rw-r--r-- 1 root root 0 Feb 19 15:12 100 -????????? ? ? ? ? ? 101 -????????? ? ? ? ? ? 102 -????????? ? ? ? ? ? 103 ... The reason is: when doing the inline dir conversion, we didn't consider that directory has hierarchical hash structure which can be configured through sysfs interface 'dir_level'. By default, dir_level of directory inode is 0, it means we have one bucket in hash table located in first level, all dirents will be hashed in this bucket, so it has no problem for us to do the duplication simply between inline dentry page and converted normal dentry page. However, if we configured dir_level with the value N (greater than 0), it will expand the bucket number of first level hash table by 2^N - 1, it hashs dirents into different buckets according their hash value, if we still move all dirents to first bucket, it makes incorrent locating for inline dirents, the result is, although we can iterate all dirents through ->readdir, we can't stat some of them in ->lookup which based on hash table searching. This patch fixes this issue by rehashing dirents into correct position when converting inline directory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: show current mount statusJaegeuk Kim1-2/+3
This patch remains the current mount status to f2fs status info. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: treat as a normal umount when remounting roJaegeuk Kim1-8/+10
When user remounts f2fs as read-only, we can mark the checkpoint as umount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give -EINVAL for norecovery and rw mountJaegeuk Kim3-7/+20
Once detecting something to recover, f2fs should stop mounting, given norecovery and rw mount options. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: recover superblock at RW remountsJaegeuk Kim2-9/+28
This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to recover superblock when (re)mounting as RW. This is set only when f2fs is mounted as RO. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give RO message when recovering superblockJaegeuk Kim1-1/+4
When one of superblocks is missing, f2fs recovers it with the valid one. But, even if f2fs is mounted as RO, we'd better notify that too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-14dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macrosMike Snitzer1-24/+40
The READ_LOCK macro was incorrectly returning -EINVAL if dm_bm_is_read_only() was true -- it will always be true once the cache metadata transitions to read-only by dm_cache_metadata_set_read_only(). Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0). Also, all accesses of the 'cmd' argument passed to these related macros are now encapsulated in parenthesis. A follow-up patch can be developed to eliminate the use of macros in favor of pure C code. Avoiding that now given that this needs to apply to stable@. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: d14fcf3dd79 ("dm cache: make sure every metadata function checks fail_io") Cc: stable@vger.kernel.org
2016-04-14/proc/iomem: only expose physical resource addresses to privileged usersLinus Torvalds1-2/+11
In commit c4004b02f8e5b ("x86: remove the kernel code/data/bss resources from /proc/iomem") I was hoping to remove the phyiscal kernel address data from /proc/iomem entirely, but that had to be reverted because some system programs actually use it. This limits all the detailed resource information to properly credentialed users instead. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14pci-sysfs: use proper file capability helper functionLinus Torvalds1-1/+1
The PCI config access checked the file capabilities correctly, but used the itnernal security capability check rather than the helper function that is actually meant for that. The security_capable() has unusual return values and is not meant to be used elsewhere (the only other use is in the capability checking functions that we actually intend people to use, and this odd PCI usage really stood out when looking around the capability code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14Make file credentials available to the seqfile interfacesLinus Torvalds2-12/+8
A lot of seqfile users seem to be using things like %pK that uses the credentials of the current process, but that is actually completely wrong for filesystem interfaces. The unix semantics for permission checking files is to check permissions at _open_ time, not at read or write time, and that is not just a small detail: passing off stdin/stdout/stderr to a suid application and making the actual IO happen in privileged context is a classic exploit technique. So if we want to be able to look at permissions at read time, we need to use the file open credentials, not the current ones. Normal file accesses can just use "f_cred" (or any of the helper functions that do that, like file_ns_capable()), but the seqfile interfaces do not have any such options. It turns out that seq_file _does_ save away the user_ns information of the file, though. Since user_ns is just part of the full credential information, replace that special case with saving off the cred pointer instead, and suddenly seq_file has all the permission information it needs. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14Revert "x86: remove the kernel code/data/bss resources from /proc/iomem"Linus Torvalds1-0/+37
This reverts commit c4004b02f8e5b9ce357a0bb1641756cc86962664. Sadly, my hope that nobody would actually use the special kernel entries in /proc/iomem were dashed by kexec. Which reads /proc/iomem explicitly to find the kernel base address. Nasty. Anyway, that means we can't do the sane and simple thing and just remove the entries, and we'll instead have to mask them out based on permissions. Reported-by: Zhengyu Zhang <zhezhang@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Reported-by: Freeman Zhang <freeman.zhang1992@gmail.com> Reported-by: Emrah Demir <ed@abdsec.com> Reported-by: Baoquan He <bhe@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14pwm: fsl-ftm: Use flat regmap cacheStefan Agner1-1/+1
Use flat regmap cache to avoid lockdep warning at probe: [ 0.697285] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160() [ 0.697449] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) The RB-tree regmap cache needs to allocate new space on first writes. However, allocations in an atomic context (e.g. when a spinlock is held) are not allowed. The function regmap_write calls map->lock, which acquires a spinlock in the fast_io case. Since the pwm-fsl-ftm driver uses MMIO, the regmap bus of type regmap_mmio is being used which has fast_io set to true. The MMIO space of the pwm-fsl-ftm driver is reasonable condense, hence using the much faster flat regmap cache is anyway the better choice. Signed-off-by: Stefan Agner <stefan@agner.ch> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2016-04-13x86/mce: Avoid using object after free in genpoolTony Luck1-2/+2
When we loop over all queued machine check error records to pass them to the registered notifiers we use llist_for_each_entry(). But the loop calls gen_pool_free() for the entry in the body of the loop - and then the iterator looks at node->next after the free. Use llist_for_each_entry_safe() instead. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/0205920@agluck-desk.sc.intel.com Link: http://lkml.kernel.org/r/1459929916-12852-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13ALSA: hda - Fix inconsistent monitor_present state until repollTakashi Iwai1-7/+4
While the previous commit fixed the missing monitor_present flag update, it may be still in an inconsistent state while the driver repolls: the flag itself is updated, but the eld_valid flag and the contents don't follow until the repoll finishes (and may be repeated for a few times). The basic problem is that pin_eld->monitor_present is updated in the caller side. This should have been updated only in update_eld(). So, the proper fix is to avoid accessing pin_eld but only spec->temp_eld. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-04-13ALSA: hda - Fix regression of monitor_present flag in eld proc fileHyungwon Hwang1-0/+2
The commit [bd48128539ab: ALSA: hda - Fix forgotten HDMI monitor_present update] covered the missing update of monitor_present flag, but this caused a regression for devices without the i915 eld notifier. Since the old code supposed that pin_eld->monitor_present was updated by the caller side, the hdmi_present_sense_via_verbs() doesn't update the temporary eld->monitor_present but only pin_eld->monitor_present, which is now overridden in update_eld(). The fix is to update pin_eld->monitor_present as well before calling update_eld(). Note that this may still leave monitor_present flag in an inconsistent state when the driver repolls, but this is at least the old behavior. More proper fix will follow in the later patch. Fixes: bd48128539ab ('ALSA: hda - Fix forgotten HDMI monitor_present update') Signed-off-by: Hyungwon Hwang <hyungwon.hwang7@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-04-13ext4/fscrypto: avoid RCU lookup in d_revalidateJaegeuk Kim2-0/+8
As Al pointed, d_revalidate should return RCU lookup before using d_inode. This was originally introduced by: commit 34286d666230 ("fs: rcu-walk aware d_revalidate method"). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable <stable@vger.kernel.org>
2016-04-12ARM: sa1100: remove references to the defunct handhelds.orgLinus Walleij1-8/+2
The website handhelds.org has been down for a long time and is likely never coming back online. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-04-12bus: uniphier-system-bus: fix condition of overlap checkKunihiko Hayashi1-1/+1
This patch fixes condition whether the specified address ranges overlap each other. Fixes: 4b7f48d395a7 ("bus: uniphier-system-bus: add UniPhier System Bus driver") Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-04-12ARM: uniphier: drop weird sizeof()Masahiro Yamada1-1/+1
My intention was to ioremap a 4-byte register. Coincidentally enough, sizeof(SZ_4) equals to SZ_4, but this code is weird anyway. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-04-12fscrypto: don't let data integrity writebacks fail with ENOMEMJaegeuk Kim3-23/+38
This patch fixes the issue introduced by the ext4 crypto fix in a same manner. For F2FS, however, we flush the pending IOs and wait for a while to acquire free memory. Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12f2fs: use dget_parent and file_dentry in f2fs_file_openJaegeuk Kim1-3/+7
This patch synced with the below two ext4 crypto fixes together. In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes overlayfs. To fix, now we need to use file_dentry() to access that field. Fixes: c0a37d487884 ("ext4: use file_dentry()") Fixes: 9dd78d8c9a7b ("ext4: use dget_parent() in ext4_file_open()") Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12fscrypto: use dget_parent() in fscrypt_d_revalidate()Jaegeuk Kim1-3/+8
This patch updates fscrypto along with the below ext4 crypto change. Fixes: 3d43bcfef5f0 ("ext4 crypto: use dget_parent() in ext4_d_revalidate()") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12ALSA: usb-audio: Skip volume controls triggers hangup on Dell USB DockKailang Yang1-0/+14
This is Dell usb dock audio workaround. It was fixed the master volume keep lower. [Some background: the patch essentially skips the controls of a couple of FU volumes. Although the firmware exposes the dB and the value information via the usb descriptor, changing the values (we set the min volume as default) screws up the device. Although this has been fixed in the newer firmware, the devices are shipped with the old firmware, thus we need the workaround in the driver side. -- tiwai] Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-04-12mailbox: Stop using ENOSYS for anything other than unimplemented syscallsLee Jones1-2/+2
In accordance with e15f431fe2d5 ("errno.h: Improve ENOSYS's comment") and 91c9afaf97ee ("checkpatch.pl: new instances of ENOSYS are errors") we're converting from the old meaning of: ENOSYS "Function not implemented" to a more standard EINVAL. Reported-by: Seraphin Bonnaffe <seraphin.bonnaffe@st.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-04-12mailbox: mailbox-test: Prevent memory leakLee Jones1-3/+6
If we set the Signal twice or more, without using it as part of a message, memory will be re-allocated and the pointer over-written. Prevent this potential leak by only allocating memory when there isn't any already. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-04-12mailbox: mailbox-test: Use more consistent format for calling copy_from_user()Lee Jones1-4/+3
While we're at it, ensure copy-to location is NULL'ed in the error path. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-04-11dm: fix dm_target_io leak if clone_bio() returns an errorMikulas Patocka1-1/+3
Commit c80914e81ec5b08 ("dm: return error if bio_integrity_clone() fails in clone_bio()") changed clone_bio() such that if it does return error then the alloc_tio() created resources (both the bio that was allocated to be a clone and the containing dm_target_io struct) will leak. Fix this by calling free_tio() in __clone_and_map_data_bio()'s clone_bio() error path. Fixes: c80914e81ec5b08 ("dm: return error if bio_integrity_clone() fails in clone_bio()") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-04-11ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T460sSven Eckelmann1-1/+9
The Lenovo Thinkpad T460s requires the alc_fixup_tpt440_dock as well in order to get working sound output on the docking stations headphone jack. Patch tested on a Thinkpad T460s (20F9CT01WW) using a ThinkPad Ultradock on kernel 4.4.6. Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Simon Wunderlich <sw@simonwunderlich.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-04-11ALSA: sscape: Use correct format identifier for size_tWilliam Breathitt Gray1-1/+1
The 'size' member of a struct firmware is passed to snd_printk with a respective format string using the %d identifier. The 'size' member is of type size_t, but format identifier %d indicates a signed int data type. This patch replaces the %d format identifier with the correct %zu format identifier for size_t data types. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-04-11m68k/gpio: remove arch specific sysfs bus deviceGreg Ungerer1-7/+1
The ColdFire architecture specific gpio support code registers a sysfs bus device named "gpio". This clashes with the new generic API device added in commit 3c702e99 ("gpio: add a userspace chardev ABI for GPIOs"). The old ColdFire sysfs gpio device was never used for anything specific, and no links or other nodes were created under it. The new API sysfs gpio device has all the same default sysfs links (device, drivers, etc) and they are properly populated. Remove the old ColdFire sysfs gpio registration. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2016-04-11Linux 4.6-rc3v4.6-rc3Linus Torvalds1-1/+1
2016-04-11Revert "ext4: allow readdir()'s of large empty directories to be interrupted"Linus Torvalds2-10/+0
This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff. It's broken: it makes ext4 return an error at an invalid point, causing the readdir wrappers to write the the position of the last successful directory entry into the position field, which means that the next readdir will now return that last successful entry _again_. You can only return fatal errors (that terminate the readdir directory walk) from within the filesystem readdir functions, the "normal" errors (that happen when the readdir buffer fills up, for example) happen in the iterorator where we know the position of the actual failing entry. I do have a very different patch that does the "signal_pending()" handling inside the iterator function where it is allowable, but while that one passes all the sanity checks, I screwed up something like four times while emailing it out, so I'm not going to commit it today. So my track record is not good enough, and the stars will have to align better before that one gets committed. And it would be good to get some review too, of course, since celestial alignments are always an iffy debugging model. IOW, let's just revert the commit that caused the problem for now. Reported-by: Greg Thelen <gthelen@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-10KVM: x86: mask CPUID(0xD,0x1).EAX against host valuePaolo Bonzini1-0/+1
This ensures that the guest doesn't see XSAVE extensions (e.g. xgetbv1 or xsavec) that the host lacks. Cc: stable@vger.kernel.org Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-10kvm: x86: do not leak guest xcr0 into host interrupt handlersDavid Matlack1-6/+4
An interrupt handler that uses the fpu can kill a KVM VM, if it runs under the following conditions: - the guest's xcr0 register is loaded on the cpu - the guest's fpu context is not loaded - the host is using eagerfpu Note that the guest's xcr0 register and fpu context are not loaded as part of the atomic world switch into "guest mode". They are loaded by KVM while the cpu is still in "host mode". Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The interrupt handler will look something like this: if (irq_fpu_usable()) { kernel_fpu_begin(); [... code that uses the fpu ...] kernel_fpu_end(); } As long as the guest's fpu is not loaded and the host is using eager fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle() returns true). The interrupt handler proceeds to use the fpu with the guest's xcr0 live. kernel_fpu_begin() saves the current fpu context. If this uses XSAVE[OPT], it may leave the xsave area in an undesirable state. According to the SDM, during XSAVE bit i of XSTATE_BV is not modified if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and xcr0[i] == 0 following an XSAVE. kernel_fpu_end() restores the fpu context. Now if any bit i in XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The fault is trapped and SIGSEGV is delivered to the current process. Only pre-4.2 kernels appear to be vulnerable to this sequence of events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly") from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts. This patch fixes the bug by keeping the host's xcr0 loaded outside of the interrupts-disabled region where KVM switches into guest mode. Cc: stable@vger.kernel.org Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David Matlack <dmatlack@google.com> [Move load after goto cancel_injection. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-10KVM: MMU: fix permission_fault()Xiao Guangrong2-6/+5
kvm-unit-tests complained about the PFEC is not set properly, e.g,: test pte.rw pte.d pte.nx pde.p pde.rw pde.pse user fetch: FAIL: error code 15 expected 5 Dump mapping: address: 0x123400000000 ------L4: 3e95007 ------L3: 3e96007 ------L2: 2000083 It's caused by the reason that PFEC returned to guest is copied from the PFEC triggered by shadow page table This patch fixes it and makes the logic of updating errcode more clean Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Do not assume pfec.p=1. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-09i2c: jz4780: really prevent potential division by zeroWolfram Sang1-1/+6
Make sure we avoid a division-by-zero OOPS in case clock-frequency is set too low in DT. Add missing '\n' while we are here. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Axel Lin <axel.lin@ingics.com>
2016-04-09Revert "i2c: jz4780: prevent potential division by zero"Wolfram Sang1-1/+1
This reverts commit 34cf2acdafaa31a13821e45de5ee896adcd307b1. 'ret' is not set when bailing out. Also, there is a better place to check for 0. Reported-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-04-08bridge, netem: mark mailing lists as moderatedstephen hemminger1-2/+2
I moderate these (lightly loaded) lists to block spam. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>